summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-06-19net: sctp: check proc_dointvec result in proc_sctp_do_authDaniel Borkmann
When writing to the sysctl field net.sctp.auth_enable, it can well be that the user buffer we handed over to proc_dointvec() via proc_sctp_do_auth() handler contains something other than integers. In that case, we would set an uninitialized 4-byte value from the stack to net->sctp.auth_enable that can be leaked back when reading the sysctl variable, and it can unintentionally turn auth_enable on/off based on the stack content since auth_enable is interpreted as a boolean. Fix it up by making sure proc_dointvec() returned sucessfully. Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint") Reported-by: Florian Westphal <fwestpha@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19tg3: Clear NETIF_F_TSO6 flag before doing software GSOPrashant Sreedharan
Commit d3f6f3a1d818410c17445bce4f4caab52eb102f1 ("tg3: Prevent page allocation failure during TSO workaround") modified driver logic to use tg3_tso_bug() for any TSO fragment that hits hardware bug conditions thus the patch increased the scope of work for tg3_tso_bug() to cover devices that support NETIF_F_TSO6 as well. Prior to the patch, tg3_tso_bug() would only be used on devices supporting NETIF_F_TSO. A regression was introduced for IPv6 packets requiring the workaround. To properly perform GSO on SKBs with TCPV6 gso_type, we need to call skb_gso_segment() with NETIF_F_TSO6 feature flag cleared, or the function will return NULL and cause a kernel oops as tg3 is not handling a NULL return value. This patch fixes the problem. Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skbNeal Cardwell
If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19Revert "net: return actual error on register_queue_kobjects"David S. Miller
This reverts commit d36a4f4b472334562b8e7252e35d3d770db83815. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18net: filter: fix upper BPF instruction limitKees Cook
The original checks (via sk_chk_filter) for instruction count uses ">", not ">=", so changing this in sk_convert_filter has the potential to break existing seccomp filters that used exactly BPF_MAXINSNS many instructions. Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org # v3.15+ Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18net: sctp: propagate sysctl errors from proc_do* properlyDaniel Borkmann
sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and proc_sctp_do_rto_max() do not properly reflect some error cases when writing values via sysctl from internal proc functions such as proc_dointvec() and proc_dostring(). In all these cases we pass the test for write != 0 and partially do additional work just to notice that additional sanity checks fail and we return with hard-coded -EINVAL while proc_do* functions might also return different errors. So fix this up by simply testing a successful return of proc_do* right after calling it. This also allows to propagate its return value onwards to the user. While touching this, also fix up some minor style issues. Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl") Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18net: return actual error on register_queue_kobjectsJie Liu
Return the actual error code if call kset_create_and_add() failed Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18bonding: Advertize vxlan offload features when supportedOr Gerlitz
When the underlying device supports TCP offloads for VXLAN/UDP encapulated traffic, we need to reflect that through the hw_enc_features field of the bonding net-device. This will cause the xmit path in the core networking stack to provide bonding with encapsulated GSO frames to offload into the HW etc. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18skge: Added FS A8NE-FM to the list of 32bit DMA boardsMirko Lindner
Added FUJITSU SIEMENS A8NE-FM to the list of 32bit DMA boards >From Tomi O.: After I added an entry to this MB into the skge.c driver in order to enable the mentioned 64bit dma disable quirk, the network data corruptions ended and everything is fine again. Signed-off-by: Mirko Lindner <mlindner@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains netfilter updates for your net tree, they are: 1) Fix refcount leak when dumping the dying/unconfirmed conntrack lists, from Florian Westphal. 2) Fix crash in NAT when removing a netnamespace, also from Florian. 3) Fix a crash in IPVS when trying to remove an estimator out of the sysctl scope, from Julian Anastasov. 4) Add zone attribute to the routing to calculate the message size in ctnetlink events, from Ken-ichirou MATSUZAWA. 5) Another fix for the dying/unconfirmed list which was preventing to dump more than one memory page of entries (~17 entries in x86_64). 6) Fix missing RCU-safe list insertion in the rule replacement code in nf_tables. 7) Since the new transaction infrastructure is in place, we have to upgrade the chain use counter from u16 to u32 to avoid overflow after more than 2^16 rules are added. 8) Fix refcount leak when replacing rule in nf_tables. This problem was also introduced in new transaction. 9) Call the ->destroy() callback when releasing nft-xt rules to fix module refcount leaks. 10) Set the family in the netlink messages that contain set elements in nf_tables to make it consistent with other object types. 11) Don't dump NAT port information if it is unset in nft_nat. 12) Update the MAINTAINERS file, I have merged the ebtables entry into netfilter. While at it, also removed the netfilter users mailing list, the development list should be enough. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18MAINTAINERS: merge ebtables into netfilter entryPablo Neira Ayuso
Moreover, remove reference to the netfilter users mailing list, so they don't receive patches. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-17net: fec: Don't clear IPV6 header checksum field when IP accelerator enableFugang Duan
The commit 96c50caa5148 (net: fec: Enable IP header hardware checksum) enable HW IP header checksum for IPV4 and IPV6, which causes IPV6 TCP/UDP cannot work. (The issue is reported by Russell King) For FEC IP header checksum function: Insert IP header checksum. This "IINS" bit is written by the user. If set, IP accelerator calculates the IP header checksum and overwrites the IINS corresponding header field with the calculated value. The checksum field must be cleared by user, otherwise the checksum always is 0xFFFF. So the previous patch clear IP header checksum field regardless of IP frame type. In fact, IP HW detect the packet as IPV6 type, even if the "IINS" bit is set, the IP accelerator is not triggered to calculates IPV6 header checksum because IPV6 frame format don't have checksum. So this results in the IPV6 frame being corrupted. The patch just add software detect the current packet type, if it is IPV6 frame, it don't clear IP header checksum field. Cc: Russell King <linux@arm.linux.org.uk> Reported-and-tested-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-17ptp: ptp_pch depends on x86_32Jean Delvare
The ptp_pch driver is for a companion chip to the Intel Atom E600 series processors. These are 32-bit x86 processors so the driver is only needed on X86_32. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Richard Cochran <richardcochran@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16hyperv: fix apparent cut-n-paste error in send path teardownDave Jones
c25aaf814a63: "hyperv: Enable sendbuf mechanism on the send path" added some teardown code that looks like it was copied from the recieve path above, but missed a variable name replacement. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16tcp: remove unnecessary tcp_sk assignment.Dave Jones
This variable is overwritten by the child socket assignment before it ever gets used. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16net: tile: fix unused variable warningChris Metcalf
'i' is unused in tile_net_dev_init() after commit d581ebf5a1f ("net: tile: Use helpers from linux/etherdevice.h to check/set MAC"). Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16ptp: In the testptp utility, use clock_adjtime from glibc when availableChristian Riesch
clock_adjtime was included in glibc 2.14. _GNU_SOURCE must be defined to make it available. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Cc: Richard Cochran <richardcochran@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16isdn: hisax: Drop duplicate Kconfig entryJean Delvare
There are 2 HISAX_AVM_A1_PCMCIA Kconfig entries. The kbuild system ignores the second one, and apparently nobody noticed the problem so far, so let's remove that second entry. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16isdn: hisax: Merge Kconfig ifsJean Delvare
The first half of the HiSax config options is presented if ISDN_DRV_HISAX!=n, while the second half of the options is presented if ISDN_DRV_HISAX. That's the same, so merge both conditionals. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16slcan: Port write_wakeup deadlock fix from slipTyler Hall
The commit "slip: Fix deadlock in write_wakeup" fixes a deadlock caused by a change made in both slcan and slip. This is a direct port of that fix. Signed-off-by: Tyler Hall <tylerwhall@gmail.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Andre Naujoks <nautsch2@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16slip: Fix deadlock in write_wakeupTyler Hall
Use schedule_work() to avoid potentially taking the spinlock in interrupt context. Commit cc9fa74e2a ("slip/slcan: added locking in wakeup function") added necessary locking to the wakeup function and 367525c8c2/ddcde142be ("can: slcan: Fix spinlock variant") converted it to spin_lock_bh() because the lock is also taken in timers. Disabling softirqs is not sufficient, however, as tty drivers may call write_wakeup from interrupt context. This driver calls tty->ops->write() with its spinlock held, which may immediately cause an interrupt on the same CPU and subsequent spin_bug(). Simply converting to spin_lock_irq/irqsave() prevents this deadlock, but causes lockdep to point out a possible circular locking dependency between these locks: (&(&sl->lock)->rlock){-.....}, at: slip_write_wakeup (&port_lock_key){-.....}, at: serial8250_handle_irq.part.13 The slip transmit is holding the slip spinlock when calling the tty write. This grabs the port lock. On an interrupt, the handler grabs the port lock and calls write_wakeup which grabs the slip lock. This could be a problem if a serial interrupt occurs on another CPU during the slip transmit. To deal with these issues, don't grab the lock in the wakeup function by deferring the writeout to a workqueue. Also hold the lock during close when de-assigning the tty pointer to safely disarm the worker and timers. This bug is easily reproducible on the first transmit when slip is used with the standard 8250 serial driver. [<c0410b7c>] (spin_bug+0x0/0x38) from [<c006109c>] (do_raw_spin_lock+0x60/0x1d0) r5:eab27000 r4:ec02754c [<c006103c>] (do_raw_spin_lock+0x0/0x1d0) from [<c04185c0>] (_raw_spin_lock+0x28/0x2c) r10:0000001f r9:eabb814c r8:eabb8140 r7:40070193 r6:ec02754c r5:eab27000 r4:ec02754c r3:00000000 [<c0418598>] (_raw_spin_lock+0x0/0x2c) from [<bf3a0220>] (slip_write_wakeup+0x50/0xe0 [slip]) r4:ec027540 r3:00000003 [<bf3a01d0>] (slip_write_wakeup+0x0/0xe0 [slip]) from [<c026e420>] (tty_wakeup+0x48/0x68) r6:00000000 r5:ea80c480 r4:eab27000 r3:bf3a01d0 [<c026e3d8>] (tty_wakeup+0x0/0x68) from [<c028a8ec>] (uart_write_wakeup+0x2c/0x30) r5:ed68ea90 r4:c06790d8 [<c028a8c0>] (uart_write_wakeup+0x0/0x30) from [<c028dc44>] (serial8250_tx_chars+0x114/0x170) [<c028db30>] (serial8250_tx_chars+0x0/0x170) from [<c028dffc>] (serial8250_handle_irq+0xa0/0xbc) r6:000000c2 r5:00000060 r4:c06790d8 r3:00000000 [<c028df5c>] (serial8250_handle_irq+0x0/0xbc) from [<c02933a4>] (dw8250_handle_irq+0x38/0x64) r7:00000000 r6:edd2f390 r5:000000c2 r4:c06790d8 [<c029336c>] (dw8250_handle_irq+0x0/0x64) from [<c028d2f4>] (serial8250_interrupt+0x44/0xc4) r6:00000000 r5:00000000 r4:c06791c4 r3:c029336c [<c028d2b0>] (serial8250_interrupt+0x0/0xc4) from [<c0067fe4>] (handle_irq_event_percpu+0xb4/0x2b0) r10:c06790d8 r9:eab27000 r8:00000000 r7:00000000 r6:0000001f r5:edd52980 r4:ec53b6c0 r3:c028d2b0 [<c0067f30>] (handle_irq_event_percpu+0x0/0x2b0) from [<c006822c>] (handle_irq_event+0x4c/0x6c) r10:c06790d8 r9:eab27000 r8:c0673ae0 r7:c05c2020 r6:ec53b6c0 r5:edd529d4 r4:edd52980 [<c00681e0>] (handle_irq_event+0x0/0x6c) from [<c006b140>] (handle_level_irq+0xe8/0x100) r6:00000000 r5:edd529d4 r4:edd52980 r3:00022000 [<c006b058>] (handle_level_irq+0x0/0x100) from [<c00676f8>] (generic_handle_irq+0x30/0x40) r5:0000001f r4:0000001f [<c00676c8>] (generic_handle_irq+0x0/0x40) from [<c000f57c>] (handle_IRQ+0xd0/0x13c) r4:ea997b18 r3:000000e0 [<c000f4ac>] (handle_IRQ+0x0/0x13c) from [<c00086c4>] (armada_370_xp_handle_irq+0x4c/0x118) r8:000003ff r7:ea997b18 r6:ffffffff r5:60070013 r4:c0674dc0 [<c0008678>] (armada_370_xp_handle_irq+0x0/0x118) from [<c0013840>] (__irq_svc+0x40/0x70) Exception stack(0xea997b18 to 0xea997b60) 7b00: 00000001 20070013 7b20: 00000000 0000000b 20070013 eab27000 20070013 00000000 ed10103e eab27000 7b40: c06790d8 ea997b74 ea997b60 ea997b60 c04186c0 c04186c8 60070013 ffffffff r9:eab27000 r8:ed10103e r7:ea997b4c r6:ffffffff r5:60070013 r4:c04186c8 [<c04186a4>] (_raw_spin_unlock_irqrestore+0x0/0x54) from [<c0288fc0>] (uart_start+0x40/0x44) r4:c06790d8 r3:c028ddd8 [<c0288f80>] (uart_start+0x0/0x44) from [<c028982c>] (uart_write+0xe4/0xf4) r6:0000003e r5:00000000 r4:ed68ea90 r3:0000003e [<c0289748>] (uart_write+0x0/0xf4) from [<bf3a0d20>] (sl_xmit+0x1c4/0x228 [slip]) r10:ed388e60 r9:0000003c r8:ffffffdd r7:0000003e r6:ec02754c r5:ea717eb8 r4:ec027000 [<bf3a0b5c>] (sl_xmit+0x0/0x228 [slip]) from [<c0368d74>] (dev_hard_start_xmit+0x39c/0x6d0) r8:eaf163c0 r7:ec027000 r6:ea717eb8 r5:00000000 r4:00000000 Signed-off-by: Tyler Hall <tylerwhall@gmail.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Andre Naujoks <nautsch2@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16vmxnet3: adjust ring sizes when interface is downNeil Horman
If ethtool is used to update ring sizes on a vmxnet3 interface that isn't running, the change isn't stored, meaning the ring update is effectively is ignored and lost without any indication to the user. Other network drivers store the ring size update so that ring allocation uses the new sizes next time the interface is brought up. This patch modifies vmxnet3 to behave this way as well Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Shreyas Bhatewara <sbhatewara@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16netfilter: nf_nat: fix oops on netns removalFlorian Westphal
Quoting Samu Kallio: Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?). When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin). We ignore conntracks with null nat bindings. But this is wrong, as these are in bysource hash table as well. Restore nat-cleaning for the netns-is-being-removed case. bug: https://bugzilla.kernel.org/show_bug.cgi?id=65191 Fixes: c2d421e1718 ('netfilter: nf_nat: fix race when unloading protocol modules') Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com> Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: ctnetlink: add zone size to lengthKen-ichirou MATSUZAWA
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16Merge branch 'ipvs'Pablo Neira Ayuso
Simon Horman says: ==================== Fix for panic due use of tot_stats estimator outside of CONFIG_SYSCTL It has been present since v3.6.39. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nft_nat: don't dump port information if unsetPablo Neira Ayuso
Don't include port information attributes if they are unset. Reported-by: Ana Rey <anarey@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nf_tables: indicate family when dumping set elementsPablo Neira Ayuso
Set the nfnetlink header that indicates the family of this element. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nft_compat: call {target, match}->destroy() to cleanup entryPablo Neira Ayuso
Otherwise, the reference to external objects (eg. modules) are not released when the rules are removed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nf_tables: fix wrong type in transaction when replacing rulesPablo Neira Ayuso
In b380e5c ("netfilter: nf_tables: add message type to transactions"), I used the wrong message type in the rule replacement case. The rule that is replaced needs to be handled as a deleted rule. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nf_tables: decrement chain use counter when replacing rulesPablo Neira Ayuso
Thus, the chain use counter remains with the same value after the rule replacement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nf_tables: use u32 for chain use counterPablo Neira Ayuso
Since 4fefee5 ("netfilter: nf_tables: allow to delete several objects from a batch"), every new rule bumps the chain use counter. However, this is limited to 16 bits, which means that it will overrun after 2^16 rules. Use a u32 chain counter and check for overflows (just like we do for table objects). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: nf_tables: use RCU-safe list insertion when replacing rulesPablo Neira Ayuso
The patch 5e94846 ("netfilter: nf_tables: add insert operation") did not include RCU-safe list insertion when replacing rules. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: ctnetlink: fix refcnt leak in dying/unconfirmed list dumperFlorian Westphal
'last' keeps track of the ct that had its refcnt bumped during previous dump cycle. Thus it must not be overwritten until end-of-function. Another (unrelated, theoretical) issue: Don't attempt to bump refcnt of a conntrack whose reference count is already 0. Such conntrack is being destroyed right now, its memory is freed once we release the percpu dying spinlock. Fixes: b7779d06 ('netfilter: conntrack: spinlock per cpu to protect special lists.') Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16netfilter: ctnetlink: fix dumping of dying/unconfirmed conntracksPablo Neira Ayuso
The dumping prematurely stops, it seems the callback argument that indicates that all entries have been dumped is set after iterating on the first cpu list. The dumping also may stop before the entire per-cpu list content is also dumped. With this patch, conntrack -L dying now shows the dying list content again. Fixes: b7779d06 ("netfilter: conntrack: spinlock per cpu to protect special lists.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-15Linux 3.16-rc1Linus Torvalds
2014-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix checksumming regressions, from Tom Herbert. 2) Undo unintentional permissions changes for SCTP rto_alpha and rto_beta sysfs knobs, from Denial Borkmann. 3) VXLAN, like other IP tunnels, should advertize it's encapsulation size using dev->needed_headroom instead of dev->hard_header_len. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: sctp: fix permissions for rto_alpha and rto_beta knobs vxlan: Checksum fixes net: add skb_pop_rcv_encapsulation udp: call __skb_checksum_complete when doing full checksum net: Fix save software checksum complete net: Fix GSO constants to match NETIF flags udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup vxlan: use dev->needed_headroom instead of dev->hard_header_len MAINTAINERS: update cxgb4 maintainer
2014-06-15Merge tag 'clk-for-linus-3.16-part2' of ↵Linus Torvalds
git://git.linaro.org/people/mike.turquette/linux Pull more clock framework updates from Mike Turquette: "This contains the second half the of the clk changes for 3.16. They are simply fixes and code refactoring for the OMAP clock drivers. The sunxi clock driver changes include splitting out the one mega-driver into several smaller pieces and adding support for the A31 SoC clocks" * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits) clk: sunxi: document PRCM clock compatible strings clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support clk: sun6i: Protect SDRAM gating bit clk: sun6i: Protect CPU clock clk: sunxi: Rework clock protection code clk: sunxi: Move the GMAC clock to a file of its own clk: sunxi: Move the 24M oscillator to a file of its own clk: sunxi: Remove calls to clk_put clk: sunxi: document new A31 USB clock compatible clk: sunxi: Implement A31 USB clock ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC) CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic) dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock CLK: TI: gate: add composite interface clock to OMAP2 only build ARM: OMAP2: clock: add DT boot support for cpufreq_ck CLK: TI: OMAP2: add clock init support ...
2014-06-15Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds
Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...
2014-06-15net: sctp: fix permissions for rto_alpha and rto_beta knobsDaniel Borkmann
Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15Merge branch 'csum_fixes'David S. Miller
Tom Herbert says: ==================== Fixes related to some recent checksum modifications. - Fix GSO constants to match NETIF flags - Fix logic in saving checksum complete in __skb_checksum_complete - Call __skb_checksum_complete from UDP if we are checksumming over whole packet in order to save checksum. - Fixes to VXLAN to work correctly with checksum complete ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15vxlan: Checksum fixesTom Herbert
Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet header to work properly with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: add skb_pop_rcv_encapsulationTom Herbert
This function is used by UDP encapsulation protocols in RX when crossing encapsulation boundary. If ip_summed is set to CHECKSUM_UNNECESSARY and encapsulation is not set, change to CHECKSUM_NONE since the checksum has not been validated within the encapsulation. Clears csum_valid by the same rationale. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15udp: call __skb_checksum_complete when doing full checksumTom Herbert
In __udp_lib_checksum_complete check if checksum is being done over all the data (len is equal to skb->len) and if it is call __skb_checksum_complete instead of __skb_checksum_complete_head. This allows checksum to be saved in checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: Fix save software checksum completeTom Herbert
Geert reported issues regarding checksum complete and UDP. The logic introduced in commit 7e3cead5172927732f51fde ("net: Save software checksum complete") is not correct. This patch: 1) Restores code in __skb_checksum_complete_header except for setting CHECKSUM_UNNECESSARY. This function may be calculating checksum on something less than skb->len. 2) Adds saving checksum to __skb_checksum_complete. The full packet checksum 0..skb->len is calculated without adding in pseudo header. This value is saved in skb->csum and then the pseudo header is added to that to derive the checksum for validation. 3) In both __skb_checksum_complete_header and __skb_checksum_complete, set skb->csum_valid to whether checksum of zero was computed. This allows skb_csum_unnecessary to return true without changing to CHECKSUM_UNNECESSARY which was done previously. 4) Copy new csum related bits in __copy_skb_header. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: Fix GSO constants to match NETIF flagsTom Herbert
Joseph Gasparakis reported that VXLAN GSO offload stopped working with i40e device after recent UDP changes. The problem is that the SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several GSO constants that were missing to avoid the problem in the future. Reported-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-14Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "This is just a couple of drivers (hpsa and lpfc) that got left out for further testing in linux-next. We also have one fix to a prior submission (qla2xxx sparse)" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits) qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch lpfc: Update lpfc version to driver version 10.2.8001.0 lpfc: Fix ExpressLane priority setup lpfc: mark old devices as obsolete lpfc: Fix for initializing RRQ bitmap lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries lpfc: Update lpfc version to driver version 10.2.8000.0 lpfc: Update Copyright on changed files from 8.3.45 patches lpfc: Update Copyright on changed files lpfc: Fixed locking for scsi task management commands lpfc: Convert runtime references to old xlane cfg param to fof cfg param lpfc: Fix FW dump using sysfs lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock lpfc: Fixed kernel panic in lpfc_abort_handler lpfc: Fix locking for postbufq when freeing lpfc: Fix locking for lpfc_hba_down_post lpfc: Fix dynamic transitions of FirstBurst from on to off hpsa: fix handling of hpsa_volume_offline return value hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id hpsa: remove messages about volume status VPD inquiry page not supported ...
2014-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This has a few fixes since our last pull and a new ioctl for doing btree searches from userland. It's very similar to the existing ioctl, but lets us return larger items back down to the app" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix error handling in create_pending_snapshot btrfs: fix use of uninit "ret" in end_extent_writepage() btrfs: free ulist in qgroup_shared_accounting() error path Btrfs: fix qgroups sanity test crash or hang btrfs: prevent RCU warning when dereferencing radix tree slot Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting btrfs: new ioctl TREE_SEARCH_V2 btrfs: tree_search, search_ioctl: direct copy to userspace btrfs: new function read_extent_buffer_to_user btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer btrfs: tree_search, search_ioctl: accept varying buffer btrfs: tree_search: eliminate redundant nr_items check
2014-06-14Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio fix and cleanups from Ben LaHaise: "This consists of a couple of code cleanups plus a minor bug fix" * git://git.kvack.org/~bcrl/aio-next: aio: cleanup: flatten kill_ioctx() aio: report error from io_destroy() when threads race in io_destroy() fs/aio.c: Remove ctx parameter in kiocb_cancel
2014-06-14fix __swap_writepage() compile failure on old gcc versionsAl Viro
Tetsuo Handa wrote: "Commit 62a8067a7f35 ("bio_vec-backed iov_iter") introduced an unnamed union inside a struct which gcc-4.4.7 cannot handle. Name the unnamed union as u in order to fix build failure" Let's do this instead: there is only one place in the entire tree that steps into this breakage. Anon structs and unions work in older gcc versions; as the matter of fact, we have those in the tree - see e.g. struct ieee80211_tx_info in include/net/mac80211.h What doesn't work is handling their initializers: struct { int a; union { int b; char c; }; } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}}; is the obvious syntax for initializer, perfectly fine for C11 and handled correctly by gcc-4.7 or later. Earlier versions, though, break on it - declaration is fine and so's access to fields (i.e. x[0].c = 'a'; would produce the right code), but members of the anon structs and unions are not inserted into the right namespace. Tellingly, those older versions will not barf on struct {int a; struct {int a;};}; - looks like they just have it hacked up somewhere around the handling of . and -> instead of doing the right thing. The easiest way to deal with that crap is to turn initialization of those fields (in the only place where we have such initializer of iov_iter) into plain assignment. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-14Merge tag 'hsi-for-3.16-fixes1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI build fixes from Sebastian Reichel: - tighten dependency between ssi-protocol and omap-ssi to fix build failures with randconfig. - use normal module refcounting in omap driver to fix build with disabled module support * tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: hsi: omap_ssi_port: use normal module refcounting HSI: fix omap ssi driver dependency