summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2016-08-10rps: Inspect PPTP encapsulated by GRE to get flow hashGao Feng
The PPTP is encapsulated by GRE header with that GRE_VERSION bits must contain one. But current GRE RPS needs the GRE_VERSION must be zero. So RPS does not work for PPTP traffic. In my test environment, there are four MIPS cores, and all traffic are passed through by PPTP. As a result, only one core is 100% busy while other three cores are very idle. After this patch, the usage of four cores are balanced well. Signed-off-by: Gao Feng <fgao@ikuai8.com> Reviewed-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10net: sched: convert qdisc linked list to hashtableJiri Kosina
Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09net: Remove fib_local variableDavid Ahern
After commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") fib_local is set but not used. Remove it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06Merge tag 'mac80211-for-davem-2016-08-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== First set of fixes for the current cycle: * fix 80+80 bandwidth warning * fix powersave with mac80211 TXQ implementation * use correct way to free SKBs from multicast buffering * mesh: fix operation ordering to work with all drivers * mesh: end service period even when peer goes away * mesh: correct HT opmode validity checks * pass hw pointer from mac80211 to driver in TPT method, fixing a bug (in a bit the wrong way, but that's what we have right now) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06rxrpc: Fix races between skb free, ACK generation and replyingDavid Howells
Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-05mac80211: Add ieee80211_hw pointer to get_expected_throughputMaxim Altshul
The variable is added to allow the driver an easy access to it's own hw->priv when the op is invoked. This fixes a crash in wlcore because it was relying on a station pointer that wasn't initialized yet. It's the wrong way to fix the crash, but it solves the problem for now and it does make sense to have the hw pointer here. Signed-off-by: Maxim Altshul <maxim.altshul@ti.com> [rewrite commit message, fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix several cases of missing of_node_put() calls in various networking drivers. From Peter Chen. 2) Don't try to remove unconfigured VLANs in qed driver, from Yuval Mintz. 3) Unbalanced locking in TIPC error handling, from Wei Yongjun. 4) Fix lockups in CPDMA driver, from Grygorii Strashko. 5) More MACSEC refcount et al fixes, from Sabrina Dubroca. 6) Fix MAC address setting in r8169 during runtime suspend, from Chun-Hao Lin. 7) Various printf format specifier fixes, from Heinrich Schuchardt. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) qed: Fail driver load in 100g MSI mode. ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle ethernet: stmicro: stmmac: add missing of_node_put after calling of_parse_phandle ethernet: stmicro: stmmac: dwmac-socfpga: add missing of_node_put after calling of_parse_phandle ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle ethernet: marvell: pxa168_eth: add missing of_node_put after calling of_parse_phandle ethernet: marvell: mvpp2: add missing of_node_put after calling of_parse_phandle ethernet: marvell: mvneta: add missing of_node_put after calling of_parse_phandle ethernet: hisilicon: hns: hns_dsaf_main: add missing of_node_put after calling of_parse_phandle ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle ethernet: aurora: nb8800: add missing of_node_put after calling of_parse_phandle ethernet: arc: emac_main: add missing of_node_put after calling of_parse_phandle ethernet: apm: xgene: add missing of_node_put after calling of_parse_phandle ethernet: altera: add missing of_node_put 8139too: fix system hang when there is a tx timeout event. qed: Fix error return code in qed_resc_alloc() net: qlcnic: avoid superfluous assignement dsa: b53: remove redundant if ...
2016-08-02treewide: replace obsolete _refok by __refFabian Frederick
There was only one use of __initdata_refok and __exit_refok __init_refok was used 46 times against 82 for __ref. Those definitions are obsolete since commit 312b1485fb50 ("Introduce new section reference annotations tags: __ref, __refdata, __refconst") This patch removes the following compatibility definitions and replaces them treewide. /* compatibility defines */ #define __init_refok __ref #define __initdata_refok __refdata #define __exit_refok __ref I can also provide separate patches if necessary. (One patch per tree and check in 1 month or 2 to remove old definitions) [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-01sctp: change to use TCP_CLOSE_WAIT as SCTP_SS_CLOSINGXin Long
Prior to this patch, sctp defined TCP_CLOSING as SCTP_SS_CLOSING. TCP_CLOSING is such a special sk state in TCP that inet common codes even exclude it. For instance, inet_accept thinks the accept sk's state never be TCP_CLOSING, or it will give a WARN_ON. TCP works well with that while SCTP may trigger the call trace, as CLOSING state in SCTP has different meaning from TCP. This fix is to change to use TCP_CLOSE_WAIT as SCTP_SS_CLOSING, instead of TCP_CLOSING. Some side-effects could be expected, regardless of not being used before. inet_accept will accept it now. I did all the func_tests in lksctp-tools and ran sctp codnomicon fuzzer tests against this patch, no regression or failure found. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-29Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
2016-07-25net_sched: get rid of struct tcf_commonWANG Cong
After the previous patch, struct tc_action should be enough to represent the generic tc action, tcf_common is not necessary any more. This patch gets rid of it to make tc action code more readable. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25net_sched: move tc_action into tcf_commonWANG Cong
struct tc_action is confusing, currently we use it for two purposes: 1) Pass in arguments and carry out results from helper functions 2) A generic representation for tc actions The first one is error-prone, since we need to make sure we don't miss anything. This patch aims to get rid of this use, by moving tc_action into tcf_common, so that they are allocated together in hashtable and can be cast'ed easily. And together with the following patch, we could really make tc_action a generic representation for all tc actions and each type of action can inherit from it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25mlxsw: spectrum: Fix compilation error when CLS_ACT isn't setIdo Schimmel
When CONFIG_NET_CLS_ACT isn't set 'struct tcf_exts' has no member named 'actions' and we therefore must not access it. Otherwise compilation fails. Fix this by introducing a new macro similar to tc_no_actions(), which always returns 'false' if CONFIG_NET_CLS_ACT isn't set. Fixes: 763b4b70afcd ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25gtp: #define #define _GTP_H_ and not #define _GTP_HColin Ian King
Fix clang build warning: ./include/net/gtp.h:1:9: warning: '_GTP_H_' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] fix by defining _GTP_H_ and not _GTP_H Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24net/sched: act_mirred: Add helper inlines to access tcf_mirred info.Yotam Gigi
The helper function is_tcf_mirred_mirror helps finding whether an action struct is of type mirred and is configured to be of type mirror. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24net/sched: Add match-all classifier hw offloading.Yotam Gigi
Following the work that have been done on offloading classifiers like u32 and flower, now the match-all classifier hw offloading is possible. if the interface supports tc offloading. To control the offloading, two tc flags have been introduced: skip_sw and skip_hw. Typical usage: tc filter add dev eth25 parent ffff: \ matchall skip_sw \ action mirred egress mirror \ dev eth27 Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, they are: 1) Count pre-established connections as active in "least connection" schedulers such that pre-established connections to avoid overloading backend servers on peak demands, from Michal Kubecek via Simon Horman. 2) Address a race condition when resizing the conntrack table by caching the bucket size when fulling iterating over the hashtable in these three possible scenarios: 1) dump via /proc/net/nf_conntrack, 2) unlinking userspace helper and 3) unlinking custom conntrack timeout. From Liping Zhang. 3) Revisit early_drop() path to perform lockless traversal on conntrack eviction under stress, use del_timer() as synchronization point to avoid two CPUs evicting the same entry, from Florian Westphal. 4) Move NAT hlist_head to nf_conn object, this simplifies the existing NAT extension and it doesn't increase size since recent patches to align nf_conn, from Florian. 5) Use rhashtable for the by-source NAT hashtable, also from Florian. 6) Don't allow --physdev-is-out from OUTPUT chain, just like --physdev-out is not either, from Hangbin Liu. 7) Automagically set on nf_conntrack counters if the user tries to match ct bytes/packets from nftables, from Liping Zhang. 8) Remove possible_net_t fields in nf_tables set objects since we just simply pass the net pointer to the backend set type implementations. 9) Fix possible off-by-one in h323, from Toby DiPasquale. 10) early_drop() may be called from ctnetlink patch, so we must hold rcu read size lock from them too, this amends Florian's patch #3 coming in this batch, from Liping Zhang. 11) Use binary search to validate jump offset in x_tables, this addresses the O(n!) validation that was introduced recently resolve security issues with unpriviledge namespaces, from Florian. 12) Fix reference leak to connlabel in error path of nft_ct, from Zhang. 13) Three updates for nft_log: Fix log prefix leak in error path. Bail out on loglevel larger than debug in nft_log and set on the new NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang. 14) Allow to filter rule dumps in nf_tables based on table and chain names. 15) Simplify connlabel to always use 128 bits to store labels and get rid of unused function in xt_connlabel, from Florian. 16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack helper, by Gao Feng. 17) Put back x_tables module reference in nft_compat on error, from Liping Zhang. 18) Add a reference count to the x_tables extensions cache in nft_compat, so we can remove them when unused and avoid a crash if the extensions are rmmod, again from Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22netfilter: connlabels: move set helper to xt_connlabelFlorian Westphal
xt_connlabel is the only user so move it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22netfilter: conntrack: support a fixed size of 128 distinct labelsFlorian Westphal
The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ long unsigned bits[2]; /* 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22gro_cells: gro_cells_receive now return error codePaolo Abeni
so that the caller can update stats accordingly, if needed Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20Merge tag 'nfc-next-4.8-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.8 pull request This is the first NFC pull request for 4.8. We have: - A fairly large NFC digital stack patchset: * RTOX fixes. * Proper DEP RWT support. * ACK and NACK PDUs handling fixes, in both initiator and target modes. * A few memory leak fixes. - A conversion of the nfcsim driver to use the digital stack. The driver supports the DEP protocol in both NFC-A and NFC-F. - Error injection through debugfs for the nfcsim driver. - Improvements to the port100 driver for the Sony USB chipset, in particular to the command abort and cancellation code paths. - A few minor fixes for the pn533, trf7970a and fdp drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-21netfilter: Add helper array register/unregister functionsGao Feng
Add nf_ct_helper_init(), nf_conntrack_helpers_register() and nf_conntrack_helpers_unregister() functions to avoid repetitive opencoded initialization in helpers. This patch keeps an id parameter for nf_ct_helper_init() not to break helper matching by name that has been inconsistently exposed to userspace through ports, eg. ftp-2121, and through an incremental id, eg. tftp-1. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-20Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-07-19 Here's likely the last bluetooth-next pull request for the 4.8 kernel: - Fix for L2CAP setsockopt - Fix for is_suspending flag handling in btmrvl driver - Addition of Bluetooth HW & FW info fields to debugfs - Fix to use int instead of char for callback status. The last one (from Geert Uytterhoeven) is actually not purely a Bluetooth (or 802.15.4) patch, but it was agreed with other maintainers that we take it through the bluetooth-next tree. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/ncsi: Package and channel managementGavin Shan
This manages NCSI packages and channels: * The available packages and channels are enumerated in the first time of calling ncsi_start_dev(). The channels' capabilities are probed in the meanwhile. The NCSI network topology won't change until the NCSI device is destroyed. * There in a queue in every NCSI device. The element in the queue, channel, is waiting for configuration (bringup) or suspending (teardown). The channel's state (inactive/active) indicates the futher action (configuration or suspending) will be applied on the channel. Another channel's state (invisible) means the requested action is being applied. * The hardware arbitration will be enabled if all available packages and channels support it. All available channels try to provide service when hardware arbitration is enabled. Otherwise, one channel is selected as the active one at once. * When channel is in active state, meaning it's providing service, a timer started to retrieve the channe's link status. If the channel's link status fails to be updated in the determined period, the channel is going to be reconfigured. It's the error handling implementation as defined in NCSI spec. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/ncsi: Resource managementGavin Shan
NCSI spec (DSP0222) defines several objects: package, channel, mode, filter, version and statistics etc. This introduces the data structs to represent those objects and implement functions to manage them. Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI stack. * The user (e.g. netdev driver) dereference NCSI device by "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv". The later one is used by NCSI stack internally. * Every NCSI device can have multiple packages simultaneously, up to 8 packages. It's represented by "struct ncsi_package" and identified by 3-bits ID. * Every NCSI package can have multiple channels, up to 32. It's represented by "struct ncsi_channel" and identified by 5-bits ID. * Every NCSI channel has version, statistics, various modes and filters. They are represented by "struct ncsi_channel_version", "struct ncsi_channel_stats", "struct ncsi_channel_mode" and "struct ncsi_channel_filter" separately. * Apart from AEN (Asynchronous Event Notification), the NCSI stack works in terms of command and response. This introduces "struct ncsi_req" to represent a complete NCSI transaction made of NCSI request and response. link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net: dsa: support switchdev ageing time attrVivien Didelot
Add a new function for DSA drivers to handle the switchdev SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute. The ageing time is passed as milliseconds. Also because we can have multiple logical bridges on top of a physical switch and ageing time are switch-wide, call the driver function with the fastest ageing time in use on the chip instead of the requested one. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net: switchdev: change ageing_time type to clock_tVivien Didelot
The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is a clock_t and requires to use helpers such as clock_t_to_jiffies() to convert to milliseconds. Change ageing_time type from u32 to clock_t to make it explicit. Fixes: f55ac58ae64c ("switchdev: add bridge ageing_time attribute") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flagsShmulik Ladkani
This flag indicates whether fragmentation of segments is allowed. Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by either ip_forward or ipmr_forward). Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-18Bluetooth: Add debugfs fields for hardware and firmware infoMarcel Holtmann
Some Bluetooth controllers allow for reading hardware and firmware related vendor specific infos. If they are available, then they can be exposed via debugfs now. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-18Bluetooth: Move hci_recv_frame and hci_recv_diag prototypesMarcel Holtmann
The protoypes for hci_recv_frame and hci_recv_diag are in the wrong location in the header file. Move them close to all the other hci_dev related exported functions. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-14net/switchdev: Export the same parent ID service functionOr Gerlitz
This helper serves to know if two switchdev port netdevices belong to the same HW ASIC, e.g to figure out if forwarding offload is possible between them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: avoid identifying address family many times for a chunkMarcelo Ricardo Leitner
Identifying address family operations during rx path is not something expensive but it's ugly to the eye to have it done multiple times, specially when we already validated it during initial rx processing. This patch takes advantage of the now shared sctp_input_cb and make the pointer to the operations readily available. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: allow GSO frags to access the chunk tooMarcelo Ricardo Leitner
SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb. SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments. For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code. Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it. With these changes we're always able to reach the headers again. The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning. For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible. In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: reorder sctp_ulpevent and shrink msg_flagsMarcelo Ricardo Leitner
The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so we shrink it, and iif member (4 bytes) which can be easily fetched from another place once the next patch is there, so we remove it and thus creating space for 8 bytes. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13sctp: allow others to use sctp_input_cbMarcelo Ricardo Leitner
We process input path in other files too and having access to it is nice, so move it to a header where it's shared. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-07-13 Here's our main bluetooth-next pull request for the 4.8 kernel: - Fixes and cleanups in 802.15.4 and 6LoWPAN code - Fix out of bounds issue in btmrvl driver - Fixes to Bluetooth socket recvmsg return values - Use crypto_cipher_encrypt_one() instead of crypto_skcipher - Cleanup of Bluetooth connection sysfs interface - New Authentication failure reson code for Disconnected mgmt event - New USB IDs for Atheros, Qualcomm and Intel Bluetooth controllers Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13dccp: limit sk_filter trim to payloadWillem de Bruijn
Dccp verifies packet integrity, including length, at initial rcv in dccp_invalid_packet, later pulls headers in dccp_enqueue_skb. A call to sk_filter in-between can cause __skb_pull to wrap skb->len. skb_copy_datagram_msg interprets this as a negative value, so (correctly) fails with EFAULT. The negative length is reported in ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close. Introduce an sk_receive_skb variant that caps how small a filter program can trim packets, and call this in dccp with the header length. Excessively trimmed packets are now processed normally and queued for reception as 0B payloads. Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13Bluetooth: Add Authentication Failed reason to Disconnected Mgmt eventSzymon Janc
If link is disconnected due to Authentication Failure (PIN or Key Missing status) userspace will be notified about this with proper error code. Many LE profiles define "PIN or Key Missing" status as indication of remote lost bond so this allows userspace to take action on this. @ Device Connected: 88:63:DF:88:0E:83 (1) flags 0x0000 02 01 1a 05 03 0a 18 0d 18 0b 09 48 65 61 72 74 ...........Heart 20 52 61 74 65 Rate > HCI Event: Command Status (0x0f) plen 4 LE Read Remote Used Features (0x08|0x0016) ncmd 1 Status: Success (0x00) > ACL Data RX: Handle 3585 flags 0x02 dlen 11 ATT: Read By Group Type Request (0x10) len 6 Handle range: 0x0001-0xffff Attribute group type: Primary Service (0x2800) > HCI Event: LE Meta Event (0x3e) plen 12 LE Read Remote Used Features (0x04) Status: Success (0x00) Handle: 3585 Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 LE Encryption < HCI Command: LE Start Encryption (0x08|0x0019) plen 28 Handle: 3585 Random number: 0x0000000000000000 Encrypted diversifier: 0x0000 Long term key: 26201cd479a0921b6f949f0b1fa8dc82 > HCI Event: Command Status (0x0f) plen 4 LE Start Encryption (0x08|0x0019) ncmd 1 Status: Success (0x00) > HCI Event: Encryption Change (0x08) plen 4 Status: PIN or Key Missing (0x06) Handle: 3585 Encryption: Disabled (0x00) < HCI Command: Disconnect (0x01|0x0006) plen 3 Handle: 3585 Reason: Authentication Failure (0x05) > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) ncmd 1 Status: Success (0x00) > HCI Event: Disconnect Complete (0x05) plen 4 Status: Success (0x00) Handle: 3585 Reason: Connection Terminated By Local Host (0x16) @ Device Disconnected: 88:63:DF:88:0E:83 (1) reason 4 @ Device Connected: C4:43:8F:A3:4D:83 (0) flags 0x0000 08 09 4e 65 78 75 73 20 35 ..Nexus 5 > HCI Event: Command Status (0x0f) plen 4 Authentication Requested (0x01|0x0011) ncmd 1 Status: Success (0x00) > HCI Event: Link Key Request (0x17) plen 6 Address: C4:43:8F:A3:4D:83 (LG Electronics) < HCI Command: Link Key Request Reply (0x01|0x000b) plen 22 Address: C4:43:8F:A3:4D:83 (LG Electronics) Link key: 080812e4aa97a863d11826f71f65a933 > HCI Event: Command Complete (0x0e) plen 10 Link Key Request Reply (0x01|0x000b) ncmd 1 Status: Success (0x00) Address: C4:43:8F:A3:4D:83 (LG Electronics) > HCI Event: Auth Complete (0x06) plen 3 Status: PIN or Key Missing (0x06) Handle: 75 @ Authentication Failed: C4:43:8F:A3:4D:83 (0) status 0x05 < HCI Command: Disconnect (0x01|0x0006) plen 3 Handle: 75 Reason: Remote User Terminated Connection (0x13) > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) ncmd 1 Status: Success (0x00) > HCI Event: Disconnect Complete (0x05) plen 4 Status: Success (0x00) Handle: 75 Reason: Connection Terminated By Local Host (0x16) @ Device Disconnected: C4:43:8F:A3:4D:83 (0) reason 4 Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree. they are: 1) Fix leak in the error path of nft_expr_init(), from Liping Zhang. 2) Tracing from nf_tables cannot be disabled, also from Zhang. 3) Fix an integer overflow on 32bit archs when setting the number of hashtable buckets, from Florian Westphal. 4) Fix configuration of ipvs sync in backup mode with IPv6 address, from Quentin Armitage via Simon Horman. 5) Fix incorrect timeout calculation in nft_ct NFT_CT_EXPIRATION, from Florian Westphal. 6) Skip clash resolution in conntrack insertion races if NAT is in place. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp PRIO policyXin Long
prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully. Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work. asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable. In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: implement prsctp TTL policyXin Long
prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg. This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_ASSOC_STATUS on sctp sockoptXin Long
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy. Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11sctp: add SCTP_PR_SUPPORTED on sctp sockoptXin Long
According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable. ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11netfilter: nf_tables: get rid of possible_net_t from set and basechainPablo Neira Ayuso
We can pass the netns pointer as parameter to the functions that need to gain access to it. From basechains, I didn't find any client for this field anymore so let's remove this too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: constify arg to is_dying/confirmedFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: nat: convert nat bysrc hash to rhashtableFlorian Westphal
It did use a fixed-size bucket list plus single lock to protect add/del. Unlike the main conntrack table we only need to add and remove keys. Convert it to rhashtable to get table autosizing and per-bucket locking. The maximum number of entries is -- as before -- tied to the number of conntracks so we do not need another upperlimit. The change does not handle rhashtable_remove_fast error, only possible "error" is -ENOENT, and that is something that can happen legitimetely, e.g. because nat module was inserted at a later time and no src manip took place yet. Tested with http-client-benchmark + httpterm with DNAT and SNAT rules in place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: move nat hlist_head to nf_connFlorian Westphal
The nat extension structure is 32bytes in size on x86_64: struct nf_conn_nat { struct hlist_node bysource; /* 0 16 */ struct nf_conn * ct; /* 16 8 */ union nf_conntrack_nat_help help; /* 24 4 */ int masq_index; /* 28 4 */ /* size: 32, cachelines: 1, members: 4 */ /* last cacheline: 32 bytes */ }; The hlist is needed to quickly check for possible tuple collisions when installing a new nat binding. Storing this in the extension area has two drawbacks: 1. We need ct backpointer to get the conntrack struct from the extension. 2. When reallocation of extension area occurs we need to fixup the bysource hash head via hlist_replace_rcu. We can avoid both by placing the hlist_head in nf_conn and place nf_conn in the bysource hash rather than the extenstion. We can also remove the ->move support; no other extension needs it. Moving the entire nat extension into nf_conn would be possible as well but then we have to add yet another callback for deletion from the bysource hash table rather than just using nat extension ->destroy hook for this. nf_conn size doesn't increase due to aligment, followup patch replaces hlist_node with single pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: conntrack: simplify early_dropFlorian Westphal
We don't need to acquire the bucket lock during early drop, we can use lockless traveral just like ____nf_conntrack_find. The timer deletion serves as synchronization point, if another cpu attempts to evict same entry, only one will succeed with timer deletion. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: conntrack: fix race between nf_conntrack proc read and hash resizeLiping Zhang
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows: BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [<ffffffff81261a1c>] seq_read+0x2cc/0x390 [<ffffffff812a8d62>] proc_reg_read+0x42/0x70 [<ffffffff8123bee7>] __vfs_read+0x37/0x130 [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0 [<ffffffff8123cf75>] vfs_read+0x95/0x140 [<ffffffff8123e475>] SyS_read+0x55/0xc0 [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f98175 ("netfilter: conntrack: fix lookup race during hash resize"). And add a wrapper function nf_conntrack_get_ht to get hash and hsize suggested by Florian Westphal. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>