summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-05-16netlink: Reset portid after netlink_insert failureHerbert Xu
The commit c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate nl_sk_hash_lock") breaks the autobind retry mechanism because it doesn't reset portid after a failed netlink_insert. This means that should autobind fail the first time around, then the socket will be stuck in limbo as it can never be bound again since it already has a non-zero portid. Fixes: c5adde9468b0 ("netlink: eliminate nl_sk_hash_lock") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix a leak in IPVS, the sysctl table is not released accordingly when destroying a netns, patch from Tommi Rantala. 2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is compiled as module, from Florian Westphal. 3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch from Jesper Dangaard Brouer. 4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores a map, from Mirek Kratochvil. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-15netfilter: nf_tables: fix bogus warning in nft_data_uninit()Mirek Kratochvil
The values 0x00000000-0xfffffeff are reserved for userspace datatype. When, deleting set elements with maps, a bogus warning is triggered. WARNING: CPU: 0 PID: 11133 at net/netfilter/nf_tables_api.c:4481 nft_data_uninit+0x35/0x40 [nf_tables]() This fixes the check accordingly to enum definition in include/linux/netfilter/nf_tables.h Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=1013 Signed-off-by: Mirek Kratochvil <exa.exa@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-15conntrack: RFC5961 challenge ACK confuse conntrack LAST-ACK transitionJesper Dangaard Brouer
In compliance with RFC5961, the network stack send challenge ACK in response to spurious SYN packets, since commit 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets"). This pose a problem for netfilter conntrack in state LAST_ACK, because this challenge ACK is (falsely) seen as ACKing last FIN, causing a false state transition (into TIME_WAIT). The challenge ACK is hard to distinguish from real last ACK. Thus, solution introduce a flag that tracks the potential for seeing a challenge ACK, in case a SYN packet is let through and current state is LAST_ACK. When conntrack transition LAST_ACK to TIME_WAIT happens, this flag is used for determining if we are expecting a challenge ACK. Scapy based reproducer script avail here: https://github.com/netoptimizer/network-testing/blob/master/scapy/tcp_hacks_3WHS_LAST_ACK.py Fixes: 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-15netfilter: avoid build error if TPROXY/SOCKET=y && NF_DEFRAG_IPV6=mFlorian Westphal
With TPROXY=y but DEFRAG_IPV6=m we get build failure: net/built-in.o: In function `tproxy_tg_init': net/netfilter/xt_TPROXY.c:588: undefined reference to `nf_defrag_ipv6_enable' If DEFRAG_IPV6 is modular, TPROXY must be too. (or both must be builtin). This enforces =m for both. Reported-and-tested-by: Liu Hua <liusdu@126.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-14rename RTNH_F_EXTERNAL to RTNH_F_OFFLOADRoopa Prabhu
RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output. This patch renames the flag to be consistent with what the user sees. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14ipv6: Fix udp checksums with raw socketsVlad Yasevich
It was reported that trancerout6 would cause a kernel to crash when trying to compute checksums on raw UDP packets. The cause was the check in __ip6_append_data that would attempt to use partial checksums on the packet. However, raw sockets do not initialize partial checksum fields so partial checksums can't be used. Solve this the same way IPv4 does it. raw sockets pass transhdrlen value of 0 to ip_append_data which causes the checksum to be computed in software. Use the same check in ip6_append_data (check transhdrlen). Reported-by: Wolfgang Walter <linux@stwm.de> CC: Wolfgang Walter <linux@stwm.de> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14netlink: move nl_table in read_mostly sectionEric Dumazet
netlink sockets creation and deletion heavily modify nl_table_users and nl_table_lock. If nl_table is sharing one cache line with one of them, netlink performance is really bad on SMP. ffffffff81ff5f00 B nl_table ffffffff81ff5f0c b nl_table_users Putting nl_table in read_mostly section increased performance of my open/delete netlink sockets test by about 80 % This came up while diagnosing a getaddrinfo() problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14vlan: Correctly propagate promisc|allmulti flags in notifier.Vlad Yasevich
Currently vlan notifier handler will try to update all vlans for a device when that device comes up. A problem occurs, however, when the vlan device was set to promiscuous, but not by the user (ex: a bridge). In that case, dev->gflags are not updated. What results is that the lower device ends up with an extra promiscuity count. Here are the backtraces that prove this: [62852.052179] [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0 [62852.052186] [<ffffffff8160bcbb>] ? _raw_spin_unlock_bh+0x1b/0x40 [62852.052188] [<ffffffff814fe4be>] ? dev_set_rx_mode+0x2e/0x40 [62852.052190] [<ffffffff814fe694>] dev_set_promiscuity+0x24/0x50 [62852.052194] [<ffffffffa0324795>] vlan_dev_open+0xd5/0x1f0 [8021q] [62852.052196] [<ffffffff814fe58f>] __dev_open+0xbf/0x140 [62852.052198] [<ffffffff814fe88d>] __dev_change_flags+0x9d/0x170 [62852.052200] [<ffffffff814fe989>] dev_change_flags+0x29/0x60 The above comes from the setting the vlan device to IFF_UP state. [62852.053569] [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0 [62852.053571] [<ffffffffa032459b>] ? vlan_dev_set_rx_mode+0x2b/0x30 [8021q] [62852.053573] [<ffffffff814fe8d5>] __dev_change_flags+0xe5/0x170 [62852.053645] [<ffffffff814fe989>] dev_change_flags+0x29/0x60 [62852.053647] [<ffffffffa032334a>] vlan_device_event+0x18a/0x690 [8021q] [62852.053649] [<ffffffff8161036c>] notifier_call_chain+0x4c/0x70 [62852.053651] [<ffffffff8109d456>] raw_notifier_call_chain+0x16/0x20 [62852.053653] [<ffffffff814f744d>] call_netdevice_notifiers+0x2d/0x60 [62852.053654] [<ffffffff814fe1a3>] __dev_notify_flags+0x33/0xa0 [62852.053656] [<ffffffff814fe9b2>] dev_change_flags+0x52/0x60 [62852.053657] [<ffffffff8150cd57>] do_setlink+0x397/0xa40 And this one comes from the notification code. What we end up with is a vlan with promiscuity count of 1 and and a physical device with a promiscuity count of 2. They should both have a count 1. To resolve this issue, vlan code can use dev_get_flags() api which correctly masks promiscuity and allmulti flags. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from Avri Altman. 2) Use the correct FW API for scan completions in iwlwifi, from Avraham Stern. 3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad Kaufman. 4) rhashtable conversion of mac80211 station table was buggy, the virtual interface was not taken into account. Fix from Johannes Berg. 5) Fix deadlock in rtlwifi by not using a zero timeout for usb_control_msg(), from Larry Finger. 6) Update reordering state before calculating loss detection, from Yuchung Cheng. 7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter. 8) Fix extended frame handling in xiling_can driver, from Jeppe Ledet-Pedersen. 9) Fix CODEL packet scheduler behavior in the presence of TSO packets, from Eric Dumazet. 10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck. 11) macvlan needs to propagate promisc settings down the the lower device, from Vlad Yasevich. 12) igb driver can oops when changing number of rings, from Toshiaki Makita. 13) Source specific default routes not handled properly in ipv6, from Markus Stenberg. 14) Use after free in tc_ctl_tfilter(), from WANG Cong. 15) Use softirq spinlocking in netxen driver, from Tony Camuso. 16) Two ARM bpf JIT fixes from Nicolas Schichan. 17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from Mathias Kretschmer. 18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei Starovoitov. 19) ll_temac driver DMA maps TX packet header with incorrect length, fix from Michal Simek. 20) We removed pm_qos bits from netdevice.h, but some indirect references remained. Kill them. From David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) net: Remove remaining remnants of pm_qos from netdevice.h e1000e: Add pm_qos header net: phy: micrel: Fix regression in kszphy_probe net: ll_temac: Fix DMA map size bug x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions netns: return RTM_NEWNSID instead of RTM_GETNSID on a get Update be2net maintainers' email addresses net_sched: gred: use correct backlog value in WRED mode pppoe: drop pppoe device in pppoe_unbind_sock_work net: qca_spi: Fix possible race during probe net: mdio-gpio: Allow for unspecified bus id af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT). bnx2x: limit fw delay in kdump to 5s after boot ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits. ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction. mpls: Change reserved label names to be consistent with netbsd usbnet: avoid integer overflow in start_xmit netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2) net: xgene_enet: Set hardware dependency net: amd-xgbe: Add hardware dependency ...
2015-05-12netns: return RTM_NEWNSID instead of RTM_GETNSID on a getNicolas Dichtel
Usually, RTM_NEWxxx is returned on a get (same as a dump). Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11net_sched: gred: use correct backlog value in WRED modeDavid Ward
In WRED mode, the backlog for a single virtual queue (VQ) should not be used to determine queue behavior; instead the backlog is summed across all VQs. This sum is currently used when calculating the average queue lengths. It also needs to be used when determining if the queue's hard limit has been reached, or when reporting each VQ's backlog via netlink. q->backlog will only be used if the queue switches out of WRED mode. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-10af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT).Kretschmer, Mathias
This patch fixes an issue where the send(MSG_DONTWAIT) call on a TX_RING is not fully non-blocking in cases where the device's sndBuf is full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly occuring error code (most likely EGAIN) to the caller. As the fast-path stays as it is, we keep the unlikely() around skb == NULL. Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09mpls: Change reserved label names to be consistent with netbsdTom Herbert
Since these are now visible to userspace it is nice to be consistent with BSD (sys/netmpls/mpls.h in netBSD). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net_sched: fix a use-after-free in tc_ctl_tfilter()WANG Cong
When tcf_destroy() returns true, tp could be already destroyed, we should not use tp->next after that. For long term, we probably should move tp list to list_head. Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket.Sowmini Varadhan
When the peer of an RDS-TCP connection restarts, a reconnect attempt should only be made from the active side of the TCP connection, i.e. the side that has a transient TCP port number. Do not add the passive side of the TCP connection to the c_hash_node and thus avoid triggering rds_queue_reconnect() for passive rds connections. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection.Sowmini Varadhan
When running RDS over TCP, the active (client) side connects to the listening ("passive") side at the RDS_TCP_PORT. After the connection is established, if the client side reboots (potentially without even sending a FIN) the server still has a TCP socket in the esablished state. If the server-side now gets a new SYN comes from the client with a different client port, TCP will create a new socket-pair, but the RDS layer will incorrectly pull up the old rds_connection (which is still associated with the stale t_sock and RDS socket state). This patch corrects this behavior by having rds_tcp_accept_one() always create a new connection for an incoming TCP SYN. The rds and tcp state associated with the old socket-pair is cleaned up via the rds_tcp_state_change() callback which would typically be invoked in most cases when the client-TCP sends a FIN on TCP restart, triggering a transition to CLOSE_WAIT state. In the rarer event of client death without a FIN, TCP_KEEPALIVE probes on the socket will detect the stale socket, and the TCP transition to CLOSE state will trigger the RDS state cleanup. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09ipv6: Fixed source specific default route handling.Markus Stenberg
If there are only IPv6 source specific default routes present, the host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail calls ip6_route_output first, and given source address any, it fails, and ip6_route_get_saddr is never called. The change is to use the ip6_route_get_saddr, even if the initial ip6_route_output fails, and then doing ip6_route_output _again_ after we have appropriate source address available. Note that this is '99% fix' to the problem; a correct fix would be to do route lookups only within addrconf.c when picking a source address, and never call ip6_route_output before source address has been populated. Signed-off-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== Here are a couple of important Bluetooth & mac802154 fixes for 4.1: - mac802154 fix for crypto algorithm allocation failure checking - mac802154 wpan phy leak fix for error code path - Fix for not calling Bluetooth shutdown() if interface is not up Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-08ipvs: fix memory leak in ip_vs_ctl.cTommi Rantala
Fix memory leak introduced in commit a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct."): unreferenced object 0xffff88005785b800 (size 2048): comm "(-localed)", pid 1434, jiffies 4294755650 (age 1421.089s) hex dump (first 32 bytes): bb 89 0b 83 ff ff ff ff b0 78 f0 4e 00 88 ff ff .........x.N.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8262ea8e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811fba74>] __kmalloc_track_caller+0x244/0x430 [<ffffffff811b88a0>] kmemdup+0x20/0x50 [<ffffffff823276b7>] ip_vs_control_net_init+0x1f7/0x510 [<ffffffff8231d630>] __ip_vs_init+0x100/0x250 [<ffffffff822363a1>] ops_init+0x41/0x190 [<ffffffff82236583>] setup_net+0x93/0x150 [<ffffffff82236cc2>] copy_net_ns+0x82/0x140 [<ffffffff810ab13d>] create_new_namespaces+0xfd/0x190 [<ffffffff810ab49a>] unshare_nsproxy_namespaces+0x5a/0xc0 [<ffffffff810833e3>] SyS_unshare+0x173/0x310 [<ffffffff8265cbd7>] system_call_fastpath+0x12/0x6f [<ffffffffffffffff>] 0xffffffffffffffff Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.") Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-05-05tcp_westwood: fix tcp_westwood_info()Eric Dumazet
I forgot to update tcp_westwood when changing get_info() behavior, this patch should fix this. Fixes: 64f40ff5bbdb ("tcp: prepare CC get_info() access from getsockopt()") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05mpls: Move reserved label definitionsTom Herbert
Move to include/uapi/linux/mpls.h to be externally visibile. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04Merge tag 'mac80211-for-davem-2015-05-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have only a few fixes right now: * a fix for an issue with hash collision handling in the rhashtable conversion * a merge issue - rhashtable removed default shrinking just before mac80211 was converted, so enable it now * remove an invalid WARN that can trigger with legitimate userspace behaviour * add a struct member missing from kernel-doc that caused a lot of warnings ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-05-04 Here's the first bluetooth-next pull request for 4.2: - Various fixes for at86rf230 driver - ieee802154: trace events support for rdev->ops - HCI UART driver refactoring - New Realtek IDs added to btusb driver - Off-by-one fix for rtl8723b in btusb driver - Refactoring of btbcm driver for both UART & USB use Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04net/rds: Fix new sparse warningDavid Ahern
c0adf54a109 introduced new sparse warnings: CHECK /home/dahern/kernels/linux.git/net/rds/ib_cm.c net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types) net/rds/ib_cm.c:191:34: expected unsigned long long [unsigned] [usertype] dp_ack_seq net/rds/ib_cm.c:191:34: got restricted __be64 <noident> net/rds/ib_cm.c:194:51: warning: cast to restricted __be64 The temporary variable for sequence number should have been declared as __be64 rather than u64. Make it so. Signed-off-by: David Ahern <david.ahern@oracle.com> Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04net: core: Correct an over-stringent device loop detection.Vlad Yasevich
The code in __netdev_upper_dev_link() has an over-stringent loop detection logic that actually prevents valid configurations from working correctly. In particular, the logic returns an error if an upper device is already in the list of all upper devices for a given dev. This particular check seems to be a overzealous as it disallows perfectly valid configurations. For example: # ip l a link eth0 name eth0.10 type vlan id 10 # ip l a dev br0 typ bridge # ip l s eth0.10 master br0 # ip l s eth0 master br0 <--- Will fail If you switch the last two commands (add eth0 first), then both will succeed. If after that, you remove eth0 and try to re-add it, it will fail! It appears to be enough to simply check adj_list to keeps things safe. I've tried stacking multiple devices multiple times in all different combinations, and either rx_handler registration prevented the stacking of the device linking cought the error. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failuresScott Mayhew
In an environment where the KDC is running Active Directory, the exported composite name field returned in the context could be large enough to span a page boundary. Attaching a scratch buffer to the decoding xdr_stream helps deal with those cases. The case where we saw this was actually due to behavior that's been fixed in newer gss-proxy versions, but we're fixing it here too. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-05-04Revert "net: kernel socket should be released in init_net namespace"Herbert Xu
This reverts commit c243d7e20996254f89c28d4838b5feca735c030d. That patch is solving a non-existant problem while creating a real problem. Just because a socket is allocated in the init name space doesn't mean that it gets hashed in the init name space. When we unhash it the name space must be the same as the one we had when we hashed it. So this patch is completely bogus and causes socket leaks. Reported-by: Andrey Wagin <avagin@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03net/rds: fix unaligned memory accessshamir rabinovitch
rdma_conn_param private data is copied using memcpy after headers such as cma_hdr (see cma_resolve_ib_udp as example). so the start of the private data is aligned to the end of the structure that come before. if this structure end with u32 the meaning is that the start of the private data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are naturally aligned but in case the structure start is not 8 bytes aligned, all u64 members of this structure will not be aligned. to solve this issue we must use special macros that allow unaligned access to those unaligned members. Addresses the following kernel log seen when attempting to use RDMA: Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma] Acked-by: Chien Yen <chien.yen@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> [Minor tweaks for top of tree by:] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03netlink: Remove max_size settingHerbert Xu
We currently limit the hash table size to 64K which is very bad as even 10 years ago it was relatively easy to generate millions of sockets. Since the hash table is naturally limited by memory allocation failure, we don't really need an explicit limit so this patch removes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03codel: fix maxpacket/mtu confusionEric Dumazet
Under presence of TSO/GSO/GRO packets, codel at low rates can be quite useless. In following example, not a single packet was ever dropped, while average delay in codel queue is ~100 ms ! qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0) backlog 13626b 3p requeues 0 count 0 lastcount 0 ldelay 96.9ms drop_next 0us maxpacket 9084 ecn_mark 0 drop_overlimit 0 This comes from a confusion of what should be the minimal backlog. It is pretty clear it is not 64KB or whatever max GSO packet ever reached the qdisc. codel intent was to use MTU of the device. After the fix, we finally drop some packets, and rtt/cwnd of my single TCP flow are meeting our expectations. qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0) backlog 6056b 3p requeues 0 count 1 lastcount 1 ldelay 36.3ms drop_next 0us maxpacket 10598 ecn_mark 0 drop_overlimit 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kathleen Nichols <nichols@pollere.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Van Jacobson <vanj@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01ipv4: Missing sk_nulls_node_init() in ping_unhash().David S. Miller
If we don't do that, then the poison value is left in the ->pprev backlink. This can cause crashes if we do a disconnect, followed by a connect(). Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Wen Xu <hotdog3645@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30ieee802154: trace: fix endian convertionAlexander Aring
This patch fix endian convertions for extended address and short address handling when TP_printk is called. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Guido Günther <agx@sigxcpu.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30cfg802154: pass name_assign_type to rdev_add_virtual_intf()Varka Bhadram
This code is based on commit 6bab2e19c5ffd ("cfg80211: pass name_assign_type to rdev_add_virtual_intf()") This will expose in sysfs whether the ifname of a IEEE-802.15.4 device is set by userspace or generated by the kernel. We are using two types of name_assign_types o NET_NAME_ENUM: Default interface name provided by kernel o NET_NAME_USER: Interface name provided by user. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30ieee802154: Add trace events for rdev->opsGuido Günther
Enabling tracing via echo 1 > /sys/kernel/debug/tracing/events/cfg802154/enable enables event tracing like iwpan dev wpan0 set pan_id 0xbeef cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | iwpan-2663 [000] .... 170.369142: 802154_rdev_set_pan_id: phy0, wpan_dev(1), pan id: 0xbeef iwpan-2663 [000] .... 170.369177: 802154_rdev_return_int: phy0, returned: 0 Signed-off-by: Guido Günther <agx@sigxcpu.org> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30mac802154: llsec: fix return value check in llsec_key_alloc()Wei Yongjun
In case of error, the functions crypto_alloc_aead() and crypto_alloc_blkcipher() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30mac802154: fix ieee802154_register_hw error handlingAlexander Aring
Currently if ieee802154_if_add failed, we don't unregister the wpan phy which was registered before. This patch adds a correct error handling for unregister the wpan phy when ieee802154_if_add failed. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-04-30Bluetooth: Skip the shutdown routine if the interface is not upGabriele Mazzotta
Most likely, the shutdown routine requires the interface to be up. This is the case for BTUSB_INTEL: the routine tries to send a command to the interface, but since this one is down, it fails and exits once HCI_INIT_TIMEOUT has expired. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 4.0.x
2015-04-29tcp: update reordering first before detecting lossYuchung Cheng
tcp_mark_lost_retrans is not used when FACK is disabled. Since tcp_update_reordering may disable FACK, it should be called first before tcp_mark_lost_retrans. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tcp: add TCP_CC_INFO socket optionEric Dumazet
Some Congestion Control modules can provide per flow information, but current way to get this information is to use netlink. Like TCP_INFO, let's add TCP_CC_INFO so that applications can issue a getsockopt() if they have a socket file descriptor, instead of playing complex netlink games. Sample usage would be : union tcp_cc_info info; socklen_t len = sizeof(info); if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tcp: prepare CC get_info() access from getsockopt()Eric Dumazet
We would like that optional info provided by Congestion Control modules using netlink can also be read using getsockopt() This patch changes get_info() to put this information in a buffer, instead of skb, like tcp_get_info(), so that following patch can reuse this common infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tcp: add tcpi_bytes_received to tcp_infoEric Dumazet
This patch tracks total number of payload bytes received on a TCP socket. This is the sum of all changes done to tp->rcv_nxt RFC4898 named this : tcpEStatsAppHCThruOctetsReceived This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_received was placed near tp->rcv_nxt for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tcp: add tcpi_bytes_acked to tcp_infoEric Dumazet
This patch tracks total number of bytes acked for a TCP socket. This is the sum of all changes done to tp->snd_una, and allows for precise tracking of delivered data. RFC4898 named this : tcpEStatsAppHCThruOctetsAcked This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_acked was placed near tp->snd_una for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29net: dsa: Fix scope of eeprom-length propertyGuenter Roeck
eeprom-length is a switch property, not a dsa property, and thus needs to be attached to the switch node, not to the dsa node. Reported-by: Andrew Lunn <andrew@lunn.ch> Fixes: 6793abb4e849 ("net: dsa: Add support for switch EEPROM access") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tipc: fix problem with parallel link synchronization mechanismJon Paul Maloy
Currently, we try to accumulate arrived packets in the links's 'deferred' queue during the parallel link syncronization phase. This entails two problems: - With an unlucky combination of arriving packets the algorithm may go into a lockstep with the out-of-sequence handling function, where the synch mechanism is adding a packet to the deferred queue, while the out-of-sequence handling is retrieving it again, thus ending up in a loop inside the node_lock scope. - Even if this is avoided, the link will very often send out unnecessary protocol messages, in the worst case leading to redundant retransmissions. We fix this by just dropping arriving packets on the upcoming link during the synchronization phase, thus relying on the retransmission protocol to resolve the situation once the two links have arrived to a synchronized state. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29tipc: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api") Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api") Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api") CC: Richard Alpe <richard.alpe@ericsson.com> CC: Jon Maloy <jon.maloy@ericsson.com> CC: Ying Xue <ying.xue@windriver.com> CC: tipc-discussion@lists.sourceforge.net Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29bridge/nl: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: e5a55a898720 ("net: create generic bridge ops") Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf") CC: John Fastabend <john.r.fastabend@intel.com> CC: Sathya Perla <sathya.perla@emulex.com> CC: Subbu Seetharaman <subbu.seetharaman@emulex.com> CC: Ajit Khaparde <ajit.khaparde@emulex.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: intel-wired-lan@lists.osuosl.org CC: Jiri Pirko <jiri@resnulli.us> CC: Scott Feldman <sfeldma@gmail.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink") CC: Cong Wang <amwang@redhat.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29net: sched: act_connmark: don't zap skb->nfctFlorian Westphal
This action is meant to be passive, i.e. we should not alter skb->nfct: If nfct is present just leave it alone. Compile tested only. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29route: Use ipv4_mtu instead of raw rt_pmtuHerbert Xu
The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't increase PMTU with Datagram Too Big message") broke PMTU in cases where the rt_pmtu value has expired but is smaller than the new PMTU value. This obsolete rt_pmtu then prevents the new PMTU value from being installed. Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message") Reported-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>