summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-06-30batman-adv: refactor batadv_neigh_node_* functions to follow common styleMarek Lindner
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: remove unused vid local variable in tt seq printSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2016-06-30batman-adv: statically print gateway table headerAntonio Quartulli
To make it easier to search through the code it is better to print static strings directly instead of using format strings printing constants. This was addressed in a previous patch, but the Gateway table header was not updated accordingly. Signed-off-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-06-30batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2016-06-12sched: remove NET_XMIT_POLICEDFlorian Westphal
sch_atm returns this when TC_ACT_SHOT classification occurs. But all other schedulers that use tc_classify (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS in this case so just do that in atm. BATMAN uses it as an intermediate return value to signal forwarding vs. buffering, but it did not return POLICED to callers outside of BATMAN. Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-11ipv6: use TOS marks from sockets for routing decisionHannes Frederic Sowa
In IPv6 the ToS values are part of the flowlabel in flowi6 and get extracted during fib rule lookup, but we forgot to correctly initialize the flowlabel before the routing lookup. Reported-by: <liam.mcbirnie@boeing.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10net_sched: remove generic throttled managementEric Dumazet
__QDISC_STATE_THROTTLED bit manipulation is rather expensive for HTB and few others. I already removed it for sch_fq in commit f2600cf02b5b ("net: sched: avoid costly atomic operation in fq_dequeue()") and so far nobody complained. When one ore more packets are stuck in one or more throttled HTB class, a htb dequeue() performs two atomic operations to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc lock is held. Removing this pair of atomic operations bring me a 8 % performance increase on 200 TCP_RR tests, in presence of throttled classes. This patch has no side effect, since nothing actually uses disc_is_throttled() anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10net_sched: netem: remove qdisc_is_throttled() useEric Dumazet
Looks like it is only there as some optimization attempt. Since __QDISC_STATE_THROTTLED set/unset is way too expensive, and netem is the last user, just remove this check. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10net_sched: cbq: remove a flaky use of qdisc_is_throttled()Eric Dumazet
So far no qdisc ever unset the throttled bit at enqueue() time, so CBQ usage of qdisc_is_throttled() was flaky. Since __QDISC_STATE_THROTTLED set/unset is way too expensive considering that only CBQ was eventually caring for this status, it would make sense to implement a Qdisc ops ->is_throttled() if we find that this is needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10net_sched: sch_plug: use a private throttled statusEric Dumazet
We want to get rid of generic qdisc throttled management, so this qdisc has to use a private flag. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10sctp: sctp should change socket state when shutdown is receivedXin Long
Now sctp doesn't change socket state upon shutdown reception. It changes just the assoc state, even though it's a TCP-style socket. For some cases, if we really need to check sk->sk_state, it's necessary to fix this issue, at least when we use ss or netstat to dump, we can get a more exact information. As an improvement, we will change sk->sk_state when we change asoc->state to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent with sctp_close. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10Merge tag 'mac80211-next-for-davem-2016-06-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the next cycle, we have the following: * the biggest change is Michał's work on integrating FQ/codel with the mac80211 internal software queues * cfg80211 connect result gets clarified for the "no connection at all" case * advertisement of per-interface type capabilities, in case they differ (which makes a lot of sense for some capabilities) * most of the nl80211 & hwsim unprivileged namespace operation changes * human-readable VHT capabilities in debugfs * some other cleanups, like spelling ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10tcp: add NV congestion controlLawrence Brakmo
TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of NV was presented at 2010's LPC. It is a delayed based congestion avoidance for the data center. This version has been tested within a 10G rack where the HW RTTs are 20-50us and with 1 to 400 flows. A description of TCP-NV, including implementation details as well as experimental results, can be found at: http://www.brakmo.org/networking/tcp-nv/TCPNV.html Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10tcp: add in_flight to tcp_skb_cbLawrence Brakmo
Add in_flight (bytes in flight when packet was sent) field to tx component of tcp_skb_cb and make it available to congestion modules' pkts_acked() function through the ack_sample function argument. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10packet: use common code for virtio_net_hdr and skb GSO conversionMike Rapoport
Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_from_skb Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10RDS: IB: Remove deprecated create_workqueueBhaktipriya Shridhar
alloc_workqueue replaces deprecated create_workqueue(). Since the driver is infiniband which can be used as block device and the workqueue seems involved in regular operation of the device, so a dedicated workqueue has been used with WQ_MEM_RECLAIM set to guarantee forward progress under memory pressure. Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10rxrpc: Limit the listening backlogDavid Howells
Limit the socket incoming call backlog queue size so that a remote client can't pump in sufficient new calls that the server runs out of memory. Note that this is partially theoretical at the moment since whilst the number of calls is limited, the number of packets trying to set up new calls is not. This will be addressed in a later patch. If the caller of listen() specifies a backlog INT_MAX, then they get the current maximum; anything else greater than max_backlog or anything negative incurs EINVAL. The limit on the maximum queue size can be set by: echo N >/proc/sys/net/rxrpc/max_backlog where 4<=N<=32. Further, set the default backlog to 0, requiring listen() to be called before we start actually queueing new calls. Whilst this kind of is a change in the UAPI, the caller can't actually *accept* new calls anyway unless they've first called listen() to put the socket into the LISTENING state - thus the aforementioned new calls would otherwise just sit there, eating up kernel memory. (Note that sockets that don't have a non-zero service ID bound don't get incoming calls anyway.) Given that the default backlog is now 0, make the AFS filesystem call kernel_listen() to set the maximum backlog for itself. Possible improvements include: (1) Trimming a too-large backlog to max_backlog when listen is called. (2) Trimming the backlog value whenever the value is used so that changes to max_backlog are applied to an open socket automatically. Note that the AFS filesystem opens one socket and keeps it open for extended periods, so would miss out on changes to max_backlog. (3) Having a separate setting for the AFS filesystem. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10rxrpc: Trim line-terminal whitespaceDavid Howells
Trim line-terminal whitespace in net/rxrpc/ Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10net, cls: allow for deleting all filters for given parentDaniel Borkmann
Add a possibility where the user can just specify the parent and all filters under that parent are then being purged. Currently, for example for scripting, one needs to specify pref/prio to have a well-defined number for 'tc filter del' command for addressing the previously created instance or additionally filter handle in case of priorities being the same. Improve usage by allowing the option for tc to specify the parent and removing the whole chain for that given parent. Example usage after patch, no tc changes required: # tc qdisc replace dev foo clsact # tc filter add dev foo egress bpf da obj ./bpf.o # tc filter add dev foo egress bpf da obj ./bpf.o # tc filter show dev foo egress filter protocol all pref 49151 bpf filter protocol all pref 49151 bpf handle 0x1 bpf.o:[classifier] direct-action filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 bpf.o:[classifier] direct-action # tc filter del dev foo egress # tc filter show dev foo egress # Previously, RTM_DELTFILTER requests with invalid prio of 0 were rejected, so only netlink requests with RTM_NEWTFILTER and NLM_F_CREATE flag were allowed where the kernel would auto-generate a pref/prio. We can piggyback on that and use prio of 0 as a wildcard for requests of RTM_DELTFILTER. For notifying tc netlink monitoring users (e.g. libnl uses this for caching), there are two options, that is, sending individual tfilter_notify() notifications for each tcf_proto, or sending a single one indicating wildcard removal. I tried both and there are pros and cons for each, eventually I decided for sending individual tfilter_notify(), so that user space can support this seamlessly and there won't be a mess of changing each and every application to make sure expectations from the kernel won't break when they don't understand single notification. Since linear chains don't really scale, I expect only a handful of classifiers to be attached at max for a given parent anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10bpf: reject wrong sized filters earlierDaniel Borkmann
Add a bpf_check_basics_ok() and reject filters that are of invalid size much earlier, so we don't do any useless work such as invoking bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic() only, but it's really unnecessarily late and they should be rejected at earliest point. While at it, also clean up one bpf_prog_size() to make it consistent with the remaining invocations. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10bpf: enforce recursion limit on redirectsDaniel Borkmann
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit(). Currently, they are not handeled by the limiter when attached to clsact's egress parent, for example, and a buggy program redirecting it to the same device again could run into stack overflow eventually. It would be good if we could notify an admin to give him a chance to react. We reuse xmit_recursion instead of having one private to eBPF, so that the stack's current recursion depth will be taken into account as well. Follow-up to commit 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and 27b29f63058d ("bpf: add bpf_redirect() helper"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10openvswitch: Add packet truncation support.William Tu
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to truncate packets. A 'max_len' is added for setting up the maximum packet size, and a 'cutlen' field is to record the number of bytes to trim the packet when the packet is outputting to a port, or when the packet is sent to userspace. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/sched/act_police.c net/sched/sch_drr.c net/sched/sch_hfsc.c net/sched/sch_prio.c net/sched/sch_red.c net/sched/sch_tbf.c In net-next the drop methods of the packet schedulers got removed, so the bug fixes to them in 'net' are irrelevant. A packet action unload crash fix conflicts with the addition of the new firstuse timestamp. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ...
2016-06-09packet: compat support for sock_fprogWillem de Bruijn
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains a pointer into user memory. If userland is 32-bit and kernel is 64-bit the two disagree about the layout of struct sock_fprog. Add compat setsockopt support to convert a 32-bit compat_sock_fprog to a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF"). Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: vrf: Fix crash when IPv6 is disabled at boot timeDavid Ahern
Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is disabled at boot using the kernel option ipv6.disable=1. Using current net-next with the boot option: $ ip link add red type vrf table 1001 Generates: [12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748 [12210.921341] IP: [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a [12210.922537] PGD b79e3067 PUD bb32b067 PMD 0 [12210.923479] Oops: 0000 [#1] SMP [12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc [12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235 [12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000 [12210.929328] RIP: 0010:[<ffffffff814b30e3>] [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a [12210.930697] RSP: 0018:ffff8800bacaf888 EFLAGS: 00010202 [12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28 [12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286 [12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b [12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9 [12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000 [12210.937103] FS: 00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000 [12210.938321] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0 [12210.940278] Stack: [12210.940603] ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135 [12210.941818] ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800 [12210.943040] ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280 [12210.944288] Call Trace: [12210.944688] [<ffffffff814b3135>] fib6_new_table+0x24/0x8a [12210.945516] [<ffffffff81397c88>] vrf_dev_init+0xd4/0x162 [12210.946328] [<ffffffff814091e1>] register_netdevice+0x100/0x396 [12210.947209] [<ffffffff8139823d>] vrf_newlink+0x40/0xb3 [12210.948001] [<ffffffff814187f0>] rtnl_newlink+0x5d3/0x6d5 ... The problem above is due to the fact that the fib hash table is not allocated when IPv6 is disabled at boot. As for the VRF driver it should not do any IPv6 initializations if IPv6 is disabled, so it needs to know if IPv6 is disabled at boot. The disable parameter is private to the IPv6 module, so provide an accessor for modules to determine if IPv6 was disabled at boot time. Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09rxrpc: Simplify connect() implementation and simplify sendmsg() opDavid Howells
Simplify the RxRPC connect() implementation. It will just note the destination address it is given, and if a sendmsg() comes along with no address, this will be assigned as the address. No transport struct will be held internally, which will allow us to remove this later. Simplify sendmsg() also. Whilst a call is active, userspace refers to it by a private unique user ID specified in a control message. When sendmsg() sees a user ID that doesn't map to an extant call, it creates a new call for that user ID and attempts to add it. If, when we try to add it, the user ID is now registered, we now reject the message with -EEXIST. We should never see this situation unless two threads are racing, trying to create a call with the same ID - which would be an error. It also isn't required to provide sendmsg() with an address - provided the control message data holds a user ID that maps to a currently active call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/netlink/af_netlink.h: Remove unused structure.Fabien Siron
Signed-off-by: Fabien Siron <fabien.siron@epita.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: add netdev_lockdep_set_classes() helperEric Dumazet
It is time to add netdev_lockdep_set_classes() helper so that lockdep annotations per device type are easier to manage. This removes a lot of copies and missing annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: sched: fix qdisc->running lockdep annotationsEric Dumazet
1) qdisc_run_begin() is really using the equivalent of a trylock. Instead of using write_seqcount_begin(), use a combination of raw_write_seqcount_begin() and correct lockdep annotation. 2) sch_direct_xmit() should use regular spin_lock(root_lock) Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09Merge tag 'mac80211-for-davem-2016-06-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more fixes for now: * a fix for a long-standing iwpriv 32/64 compat issue * two fairly recently introduced (4.6) warning asking for symmetric operations are erroneous and I remove them ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09sit: remove unnecessary protocol check in ipip6_tunnel_xmit()Simon Horman
ipip6_tunnel_xmit() is called immediately after checking that skb->protocol is htons(ETH_P_IPV6) so there is no need to check it a second time. Found by inspection. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09mac80211: implement codel on fair queuing flowsMichal Kazior
There is no other limit other than a global packet count limit when using software queuing. This means a single flow queue can grow insanely long. This is particularly bad for TCP congestion algorithms which requires a little more sophisticated frame dropping scheme than a mere headdrop on limit overflow. Hence apply (a slighly modified, to fit the knobs) CoDel5 on flow queues. This improves TCP convergence and stability when combined with wireless driver which keeps its own tx queue/fifo at a minimum fill level for given link conditions. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09mac80211: add debug knobs for fair queuingMichal Kazior
This adds a debugfs entry to read and modify some fq parameters. This makes it easy to debug, test and experiment. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> [remove module parameter for now] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09mac80211: implement fair queueing per txqMichal Kazior
mac80211's software queues were designed to work very closely with device tx queues. They are required to make use of 802.11 packet aggregation easily and efficiently. Due to the way 802.11 aggregation is designed it only makes sense to keep fair queuing as close to hardware as possible to reduce induced latency and inertia and provide the best flow responsiveness. This change doesn't translate directly to immediate and significant gains. End result depends on driver's induced latency. Best results can be achieved if driver keeps its own tx queue/fifo fill level to a minimum. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09mac80211: skip netdev queue control with software queuingMichal Kazior
Qdiscs are designed with no regard to 802.11 aggregation requirements and hand out packet-by-packet with no guarantee they are destined to the same tid. This does more bad than good no matter how fairly a given qdisc may behave on an ethernet interface. Software queuing used per-AC netdev subqueue congestion control whenever a global AC limit was hit. This meant in practice a single station or tid queue could starve others rather easily. This could resonate with qdiscs in a bad way or could just end up with poor aggregation performance. Increasing the AC limit would increase induced latency which is also bad. Disabling qdiscs by default and performing taildrop instead of netdev subqueue congestion control on the other hand makes it possible for tid queues to fill up "in the meantime" while preventing stations starving each other. This increases aggregation opportunities and should allow software queuing based drivers achieve better performance by utilizing airtime more efficiently with big aggregates. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09nl80211: clarify nl80211_set_reg() success pathJohannes Berg
Setting rd to NULL to avoid freeing it, just to be able to return from the function in a single place, doesn't make much sense. Return the set_regdom() return value directly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09nl80211: Fix checkpatch warnings about blank linesKirtika Ruchandani
This patch fixes the following checkpatch.pl issues - - Please don't use multiple blank lines - Blank lines aren't necessary before a close brace - Missing a blank line after declarations Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Kirtika Ruchandani <kirtika.ruchandani@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09nl80211: Fix spellingKirtika Ruchandani
Fix 'implementation' spelling, reported by checkpatch.pl Signed-off-by: Kirtika Ruchandani <kirtika.ruchandani@gmail.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09wext: Fix 32 bit iwpriv compatibility issue with 64 bit KernelPrasun Maiti
iwpriv app uses iw_point structure to send data to Kernel. The iw_point structure holds a pointer. For compatibility Kernel converts the pointer as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers may use iw_handler_def.private_args to populate iwpriv commands instead of iw_handler_def.private. For those case, the IOCTLs from SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl(). Accordingly when the filled up iw_point structure comes from 32 bit iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends it to driver. So, the driver may get the invalid data. The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory. This patch adds pointer conversion from 32 bit to 64 bit and vice versa, if the ioctl comes from 32 bit iwpriv to 64 bit Kernel. Cc: stable@vger.kernel.org Signed-off-by: Prasun Maiti <prasunmaiti87@gmail.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Tested-by: Dibyajyoti Ghosh <dibyajyotig@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-09cfg80211: remove get/set antenna and tx power warningsJohannes Berg
Since set_tx_power and set_antenna are frequently implemented without the matching get_tx_power/get_antenna, we shouldn't have added warnings for those. Remove them. The remaining ones are correct and need to be implemented symmetrically for correct operation. Cc: stable@vger.kernel.org Fixes: de3bb771f471 ("cfg80211: add more warnings for inconsistent ops") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-06-08sched: remove qdisc->dropFlorian Westphal
after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no more callers of ->drop() outside of other ->drop functions, i.e. nothing calls them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08sched: remove qdisc_rehape_failFlorian Westphal
After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08cbq: remove TCA_CBQ_POLICE supportFlorian Westphal
iproute2 doesn't implement any cbq option that results in this attribute being sent to kernel. To make use of it, user would have to - patch iproute2 - add a class - attach a qdisc to the class (default pfifo doesn't work as q->handle is 0 and cbq_set_police() is a no-op in this case) - re-'add' the same class (tc class change ...) again - user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since this 'police' feature relies on its presence - the added qdisc must be one of bfifo, pfifo or netem If all of these conditions are met and _some_ leaf qdiscs, namely p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into cbq, which will attempt to re-queue the skb into a different class as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT. [ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ]. This feature, which isn't documented or implemented in iproute2, and isn't implemented consistently (most qdiscs like sfq, codel, etc drop right away instead of attempting this reclassification) is the sole reason for the reshape_fail and __parent member in Qdisc struct. So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP so userspace knows we don't support it, and then remove no-longer needed infrastructure in followup commit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08cbq: remove TCA_CBQ_OVL_STRATEGY supportFlorian Westphal
since initial revision of cbq in 2004 iproute 2 has never implemented support for TCA_CBQ_OVL_STRATEGY, which is what needs to be set to activate the class->drop() call (TC_CBQ_OVL_DROP strategy must be set by userspace value must be set by userspace). David Miller says: It seems really safe to kill this thing off, flag an error if someone tries to set the attribute, and therefore kill off all of the non-default cbq_ovl_*() functions. A followup commit can then remove all .drop qdisc methods since this removed the only caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08qfq: don't leak skb if kzalloc failsFlorian Westphal
When we need to create a new aggregate to enqueue the skb we call kzalloc. If that fails we returned ENOBUFS without freeing the skb. Spotted during code review. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08ip6gre: Allow live link address changeShweta Choudaha
The ip6 GRE tap device should not be forced to down state to change the mac address and should allow live address change for tap device similar to ipv4 gre. Signed-off-by: Shweta Choudaha <schoudah@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08ip6gre: Allow live link address changeShweta Choudaha
The ip6 GRE tap device should not be forced to down state to change the mac address and should allow live address change for tap device similar to ipv4 gre. Signed-off-by: Shweta Choudaha <schoudah@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08net: cls_u32: be more strict about skip-sw flag for knodesJakub Kicinski
Return an error if user requested skip-sw and the underlaying hardware cannot handle tc offloads (or offloads are disabled). This patch fixes the knode handling. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-08net: cls_u32: catch all hardware offload errorsJakub Kicinski
Errors reported by u32_replace_hw_hnode() were not propagated. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>