summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-03-20sctp: do not update a_rwnd if we are not issuing a sackMarcelo Ricardo Leitner
The SACK can be lost pretty much elsewhere, but if its allocation fail, we know we are not sending it, so it is better to revert a_rwnd to its previous value as this may give it a chance to issue a window update later. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20net: remove a dubious unlikely() clauseEric Dumazet
TCP protocol is still used these days, and TCP uses clones in its transmit path. We can not optimize linux stack assuming it is mostly used in routers, or that TCP is dead. Fixes: 795bb1c00d ("net: bulk free infrastructure for NAPI context, use napi_consume_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18RDS: TCP: Remove unused constantSowmini Varadhan
RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d24f ("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune"). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socketSowmini Varadhan
Add per-net sysctl tunables to set the size of sndbuf and rcvbuf on the kernel tcp socket. The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf. These values must be set before accept() or connect(), and there may be an arbitrary number of existing rds-tcp sockets when the tunable is modified. To make sure that all connections in the netns pick up the same value for the tunable, we reset existing rds-tcp connections in the netns, so that they can reconnect with the new parameters. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use itDaniel Borkmann
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bpf, dst: add and use dst_tclassid helperDaniel Borkmann
We can just add a small helper dst_tclassid() for retrieving the dst->tclassid value. It makes the code a bit better in that we can get rid of the ifdef from filter.c by moving this into the header. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18bpf: make skb->tc_classid also readableDaniel Borkmann
Currently, the tc_classid from eBPF skb context is write-only, but there's no good reason for tc programs to limit it to write-only. For example, it can be used to transfer its state via tail calls where the resulting tc_classid gets filled gradually. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18cls_bpf: reset class and reuse major in daDaniel Borkmann
There are two issues with the current code. First one is that we need to set res->class to 0 in case we use non-default classid matching. This is important for the case where cls_bpf was initially set up with an optional binding to a default class with tcf_bind_filter(), where the underlying qdisc implements bind_tcf() that fills res->class and tests for it later on when doing the classification. Convention for these cases is that after tc_classify() was called, such qdiscs (atm, drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup based on classid. Second, there's a bug with da mode, where res->classid is only assigned a 16 bit minor, but it needs to expand to the full 32 bit major/minor combination instead, therefore we need to expand with the bound major. This is fine as classes belonging to a classful qdisc must share the same major. Fixes: 045efa82ff56 ("cls_bpf: introduce integrated actions") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18openvswitch: allow output of MPLS packets on tunnel vportsSimon Horman
Currently output of MPLS packets on tunnel vports is not allowed by Open vSwitch. This is because historically encapsulation was done in such a way that the inner_protocol field of the skb needed to hold the inner protocol for both MPLS and tunnel encapsulation in order for GSO segmentation to be performed correctly. Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Open vSwitch makes use of lwt to output to tunnel netdevs which perform encapsulation. As no drivers expose support for MPLS offloads this means that GSO packets are segmented in software by validate_xmit_skb(), which is called from __dev_queue_xmit(), before tunnel encapsulation occurs. This means that the inner protocol of MPLS is no longer needed by the time encapsulation occurs and the contention on the inner_protocol field of the skb no longer occurs. Thus it is now safe to output MPLS to tunnel vports. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18ovs: internal_set_rx_headroom() can be staticWu Fengguang
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18net: dst_cache_per_cpu_dst_set() can be staticWu Fengguang
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...
2016-03-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: drivers/rtc: broken link fix drm/i915 Fix typos in i915_gem_fence.c Docs: fix missing word in REPORTING-BUGS lib+mm: fix few spelling mistakes MAINTAINERS: add git URL for APM driver treewide: Fix typo in printk
2016-03-17tcp/dccp: remove obsolete WARN_ON() in icmp handlersEric Dumazet
Now SYN_RECV request sockets are installed in ehash table, an ICMP handler can find a request socket while another cpu handles an incoming packet transforming this SYN_RECV request socket into an ESTABLISHED socket. We need to remove the now obsolete WARN_ON(req->sk), since req->sk is set when a new child is created and added into listener accept queue. If this race happens, the ICMP will do nothing special. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ben Lazarus <blazarus@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17vlan: propagate gso_max_segsEric Dumazet
vlan drivers lack proper propagation of gso_max_segs from lower device. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17Merge tag 'tty-4.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the big tty/serial driver pull request for 4.6-rc1. Lots of changes in here, Peter has been on a tear again, with lots of refactoring and bugs fixes, many thanks to the great work he has been doing. Lots of driver updates and fixes as well, full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits) serial: 8250: describe CONFIG_SERIAL_8250_RSA serial: samsung: optimize UART rx fifo access routine serial: pl011: add mark/space parity support serial: sa1100: make sa1100_register_uart_fns a function tty: serial: 8250: add MOXA Smartio MUE boards support serial: 8250: convert drivers to use up_to_u8250p() serial: 8250/mediatek: fix building with SERIAL_8250=m serial: 8250/ingenic: fix building with SERIAL_8250=m serial: 8250/uniphier: fix modular build Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular" Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular" serial: mvebu-uart: initial support for Armada-3700 serial port serial: mctrl_gpio: Add missing module license serial: ifx6x60: avoid uninitialized variable use tty/serial: at91: fix bad offset for UART timeout register tty/serial: at91: restore dynamic driver binding serial: 8250: Add hardware dependency to RT288X option TTY, devpts: document pty count limiting tty: goldfish: support platform_device with id -1 drivers: tty: goldfish: Add device tree bindings ...
2016-03-17sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a raceNeilBrown
sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer set. In this case it aborts the queuing of the upcall. However it has already taken a new counted reference on "h" and doesn't "put" it, even though it frees the data structure holding the reference. So let's delay the "cache_get" until we know we need it. Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-17Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...
2016-03-16ethtool: Set cmd field in ETHTOOL_GLINKSETTINGS response to wrong nwordsBen Hutchings
When the ETHTOOL_GLINKSETTINGS implementation finds that userland is using the wrong number of words of link mode bitmaps (or is trying to find out the right numbers) it sets the cmd field to 0 in the response structure. This is inconsistent with the implementation of every other ethtool command, so let's remove that inconsistency before it gets into a stable release. Fixes: 3f1ac7a700d03 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variantNicholas Mc Guire
local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for the unlock/enable case, so replace the calls by the appropriate wrappers. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-16Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust
NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits) xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
2016-03-15tags: Fix DEFINE_PER_CPU expansionsPeter Zijlstra
$ make tags GEN tags ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1" ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1" ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1" ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1" ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1" ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1" ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1" Which are all the result of the DEFINE_PER_CPU pattern: scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' The below cures them. All except the workqueue one are within reasonable distance of the 80 char limit. TJ do you have any preference on how to fix the wq one, or shall we just not care its too long? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS/OVS updates for net-next The following patchset contains Netfilter/IPVS fixes and OVS NAT support, more specifically this batch is composed of: 1) Fix a crash in ipset when performing a parallel flush/dump with set:list type, from Jozsef Kadlecsik. 2) Make sure NFACCT_FILTER_* netlink attributes are in place before accessing them, from Phil Turnbull. 3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP helper, from Arnd Bergmann. 4) Add workaround to IPVS to reschedule existing connections to new destination server by dropping the packet and wait for retransmission of TCP syn packet, from Julian Anastasov. 5) Allow connection rescheduling in IPVS when in CLOSE state, also from Julian. 6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni. 7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef. 8) Check match/targetinfo netlink attribute size in nft_compat, patch from Florian Westphal. 9) Check for integer overflow on 32-bit systems in x_tables, from Florian Westphal. Several patches from Jarno Rajahalme to prepare the introduction of NAT support to OVS based on the Netfilter infrastructure: 10) Schedule IP_CT_NEW_REPLY definition for removal in nf_conntrack_common.h. 11) Simplify checksumming recalculation in nf_nat. 12) Add comments to the openvswitch conntrack code, from Jarno. 13) Update the CT state key only after successful nf_conntrack_in() invocation. 14) Find existing conntrack entry after upcall. 15) Handle NF_REPEAT case due to templates in nf_conntrack_in(). 16) Call the conntrack helper functions once the conntrack has been confirmed. 17) And finally, add the NAT interface to OVS. The batch closes with: 18) Cleanup to use spin_unlock_wait() instead of spin_lock()/spin_unlock(), from Nicholas Mc Guire. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-15netfilter: nf_conntrack: consolidate lock/unlock into unlock_waitNicholas Mc Guire
The spin_lock()/spin_unlock() is synchronizing on the nf_conntrack_locks_all_lock which is equivalent to spin_unlock_wait() but the later should be more efficient. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14net: diag: add a scheduling point in inet_diag_dump_icsk()Eric Dumazet
On loaded TCP servers, looking at millions of sockets can hold cpu for many seconds, if the lookup condition is very narrow. (eg : ss dst 1.2.3.4 ) Better add a cond_resched() to allow other processes to access the cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14openvswitch: Interface with NAT.Jarno Rajahalme
Extend OVS conntrack interface to cover NAT. New nested OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action. A bare OVS_CT_ATTR_NAT only mangles existing and expected connections. If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested attributes, new (non-committed/non-confirmed) connections are mangled according to the rest of the nested attributes. The corresponding OVS userspace patch series includes test cases (in tests/system-traffic.at) that also serve as example uses. This work extends on a branch by Thomas Graf at https://github.com/tgraf/ovs/tree/nat. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14openvswitch: Delay conntrack helper call for new connections.Jarno Rajahalme
There is no need to help connections that are not confirmed, so we can delay helping new connections to the time when they are confirmed. This change is needed for NAT support, and having this as a separate patch will make the following NAT patch a bit easier to review. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14openvswitch: Handle NF_REPEAT in conntrack action.Jarno Rajahalme
Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This avoids dropping a SYN packet re-opening an existing TCP connection. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14openvswitch: Find existing conntrack entry after upcall.Jarno Rajahalme
Add a new function ovs_ct_find_existing() to find an existing conntrack entry for which this packet was already applied to. This is only to be called when there is evidence that the packet was already tracked and committed, but we lost the ct reference due to an userspace upcall. ovs_ct_find_existing() is called from skb_nfct_cached(), which can now hide the fact that the ct reference may have been lost due to an upcall. This allows ovs_ct_commit() to be simplified. This patch is needed by later "openvswitch: Interface with NAT" patch, as we need to be able to pass the packet through NAT using the original ct reference also after the reference is lost after an upcall. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14openvswitch: Update the CT state key only after nf_conntrack_in().Jarno Rajahalme
Only a successful nf_conntrack_in() call can effect a connection state change, so it suffices to update the key only after the nf_conntrack_in() returns. This change is needed for the later NAT patches. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14openvswitch: Add commentary to conntrack.cJarno Rajahalme
This makes the code easier to understand and the following patches more focused. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14netfilter: Allow calling into nat helper without skb_dst.Jarno Rajahalme
NAT checksum recalculation code assumes existence of skb_dst, which becomes a problem for a later patch in the series ("openvswitch: Interface with NAT."). Simplify this by removing the check on skb_dst, as the checksum will be dealt with later in the stack. Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14netfilter: Remove IP_CT_NEW_REPLY definition.Jarno Rajahalme
Remove the definition of IP_CT_NEW_REPLY from the kernel as it does not make sense. This allows the definition of IP_CT_NUMBER to be simplified as well. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-14net: dsa: refine netdev event notifierVivien Didelot
Rework the netdev event handler, similar to what the Mellanox Spectrum driver does, to easily welcome more events later (for example NETDEV_PRECHANGEUPPER) and use netdev helpers (such as netif_is_bridge_master). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14net: dsa: make port_bridge_leave return voidVivien Didelot
netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns void, as well as del_nbp(). So there's no advantage to catch an eventual error from the port_bridge_leave routine at the DSA level. Make this routine void for the DSA layer and its existing drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14net: dsa: rename port_*_bridge routinesVivien Didelot
Rename DSA port_join_bridge and port_leave_bridge routines to respectively port_bridge_join and port_bridge_leave in order to respect an implicit Port::Bridge namespace. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdictFlorian Westphal
Zefir Kurtisi reported kernel panic with an openwrt specific patch. However, it turns out that mainline has a similar bug waiting to happen. Once NF_HOOK() returns the skb is in undefined state and must not be used. Moreover, the okfn must consume the skb to support async processing (NF_QUEUE). Current okfn in this spot doesn't consume it and caller assumes that NF_HOOK return value tells us if skb was freed or not, but thats wrong. It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at LOCAL_IN that returns STOLEN or NF_QUEUE verdicts. Once we add NF_QUEUE support for nftables bridge this will break -- NF_QUEUE holds the skb for async processing, caller will erronoulsy return RX_HANDLER_PASS and on reinject netfilter will access free'd skb. Fix this by pushing skb up the stack in the okfn instead. NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING completely in this case but this is how its been forever so it seems preferable to not change this. Cc: Felix Fietkau <nbd@openwrt.org> Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-03-12 Here's the last bluetooth-next pull request for the 4.6 kernel. - New USB ID for AR3012 in btusb - New BCM2E55 ACPI ID - Buffer overflow fix for the Add Advertising command - Support for a new Bluetooth LE limited privacy mode - Fix for firmware activation in btmrvl_sdio - Cleanups to mac802154 & 6lowpan code Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14phy: fixed: Fix removal of phys.Andrew Lunn
The fixed phys delete function simply removed the fixed phy from the internal linked list and freed the memory. It however did not unregister the associated phy device. This meant it was still possible to find the phy device on the mdio bus. Make fixed_phy_del() an internal function and add a fixed_phy_unregister() to unregisters the phy device and then uses fixed_phy_del() to free resources. Modify DSA to use this new API function, so we don't leak phys. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14dsa: dsa: Fix freeing of fixed-phys from user ports.Andrew Lunn
All ports types can have a fixed PHY associated with it. Remove the check which limits removal to only CPU and DSA ports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14dsa: Destroy fixed link phys after the phy has been disconnectedAndrew Lunn
The phy is disconnected from the slave in dsa_slave_destroy(). Don't destroy fixed link phys until after this, since there can be fixed linked phys connected to ports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14dsa: slave: Don't reference NULL pointer during phy_disconnectAndrew Lunn
When the phy is disconnected, the parent pointer to the netdev it was attached to is set to NULL. The code then tries to suspend the phy, but dsa_slave_fixed_link_update needs the parent pointer to determine which switch the phy is connected to. So it dereferenced a NULL pointer. Check for this condition. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14xprtrdma: Use new CQ API for RPC-over-RDMA client send CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. Send completions were previously handled entirely in the completion upcall handler (ie, deferring to a process context is unneeded). Thus IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma send code path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14xprtrdma: Use an anonymous union in struct rpcrdma_mwChuck Lever
Clean up: Make code more readable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. xprtrdma's reply processing is already handled in a work queue, but there is some initial order-dependent processing that is done in the soft IRQ context before a work item is scheduled. IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma receive code path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14xprtrdma: Serialize credit accounting againChuck Lever
Commit fe97b47cd623 ("xprtrdma: Use workqueue to process RPC/RDMA replies") replaced the reply tasklet with a workqueue that allows RPC replies to be processed in parallel. Thus the credit values in RPC-over-RDMA replies can be applied in a different order than in which the server sent them. To fix this, revert commit eba8ff660b2d ("xprtrdma: Move credit update to RPC reply handler"). Reverting is done by hand to accommodate code changes that have occurred since then. Fixes: fe97b47cd623 ("xprtrdma: Use workqueue to process . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>