summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-09-16net: sctp: rfc4443: do not report ICMP redirects to user spaceDaniel Borkmann
Adapt the same behaviour for SCTP as present in TCP for ICMP redirect messages. For IPv6, RFC4443, section 2.4. says: ... (e) An ICMPv6 error message MUST NOT be originated as a result of receiving the following: ... (e.2) An ICMPv6 redirect message [IPv6-DISC]. ... Therefore, do not report an error to user space, just invoke dst's redirect callback and leave, same for IPv4 as done in TCP as well. The implication w/o having this patch could be that the reception of such packets would generate a poll notification and in worst case it could even tear down the whole connection. Therefore, stop updating sk_err on redirects. Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16ip6_tunnels: raddr and laddr are inverted in nl msgDing Zhi
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted. Introduced by c075b13098b3 (ip6tnl: advertise tunnel param via rtnl). Signed-off-by: Ding Zhi <zhi.ding@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16netfilter: ipset: Fix serious failure in CIDR trackingOliver Smith
This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Validate the set family and not the set type family at ↵Jozsef Kadlecsik
swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Consistent userspace testing with nomatch flagJozsef Kadlecsik
The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Skip really non-first fragments for IPv6 when getting ↵Jozsef Kadlecsik
port/protocol Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16Bluetooth: Fix ACL alive for long in case of non pariable devicesSyam Sidhardhan
For certain devices (ex: HID mouse), support for authentication, pairing and bonding is optional. For such devices, the ACL alive for too long after the L2CAP disconnection. To avoid the ACL alive for too long after L2CAP disconnection, reset the ACL disconnect timeout back to HCI_DISCONN_TIMEOUT during L2CAP connect. While merging the commit id:a9ea3ed9b71cc3271dd59e76f65748adcaa76422 this issue might have introduced. Hcidump info: sh-4.1# /opt/hcidump -Xt 2013-08-05 16:49:00.894129 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.894195 < HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2 handle 12 2013-08-05 16:49:00.894269 < ACL data: handle 12 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.895645 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1 2013-08-05 16:49:00.934391 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:00.936592 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 12 packets 2 2013-08-05 16:49:00.951577 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004a scid 0x0041 2013-08-05 16:49:00.952820 > ACL data: handle 12 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x0049 scid 0x0040 2013-08-05 16:49:00.969165 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:49:48.175533 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:49:48.219045 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x02 interval 108 Mode: Sniff 2013-08-05 16:51:00.968209 < HCI Command: Disconnect (0x01|0x0006) plen 3 handle 12 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:51:00.969056 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) status 0x00 ncmd 1 2013-08-05 16:51:01.013495 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 12 mode 0x00 interval 0 Mode: Active 2013-08-05 16:51:01.073777 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 12 reason 0x16 Reason: Connection Terminated by Local Host ============================ After fix ================================ 2013-08-05 16:57:35.986648 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004c scid 0x0041 2013-08-05 16:57:35.986713 < HCI Command: Exit Sniff Mode (0x02|0x0004) plen 2 handle 11 2013-08-05 16:57:35.986785 < ACL data: handle 11 flags 0x00 dlen 12 L2CAP(s): Disconn req: dcid 0x004b scid 0x0040 2013-08-05 16:57:35.988110 > HCI Event: Command Status (0x0f) plen 4 Exit Sniff Mode (0x02|0x0004) status 0x00 ncmd 1 2013-08-05 16:57:36.030714 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:36.032950 > HCI Event: Number of Completed Packets (0x13) plen 5 handle 11 packets 2 2013-08-05 16:57:36.047926 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004c scid 0x0041 2013-08-05 16:57:36.049200 > ACL data: handle 11 flags 0x02 dlen 12 L2CAP(s): Disconn rsp: dcid 0x004b scid 0x0040 2013-08-05 16:57:36.065509 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x02 interval 50 Mode: Sniff 2013-08-05 16:57:40.052006 < HCI Command: Disconnect (0x01|0x0006) plen 3 handle 11 reason 0x13 Reason: Remote User Terminated Connection 2013-08-05 16:57:40.052869 > HCI Event: Command Status (0x0f) plen 4 Disconnect (0x01|0x0006) status 0x00 ncmd 1 2013-08-05 16:57:40.104731 > HCI Event: Mode Change (0x14) plen 6 status 0x00 handle 11 mode 0x00 interval 0 Mode: Active 2013-08-05 16:57:40.146935 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 11 reason 0x16 Reason: Connection Terminated by Local Host Signed-off-by: Sang-Ki Park <sangki79.park@samsung.com> Signed-off-by: Chan-yeol Park <chanyeol.park@samsung.com> Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-16Bluetooth: Fix encryption key size for peripheral roleAndre Guedes
This patch fixes the connection encryption key size information when the host is playing the peripheral role. We should set conn->enc_key_ size in hci_le_ltk_request_evt, otherwise it is left uninitialized. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-16Bluetooth: Fix security level for peripheral roleAndre Guedes
While playing the peripheral role, the host gets a LE Long Term Key Request Event from the controller when a connection is established with a bonded device. The host then informs the LTK which should be used for the connection. Once the link is encrypted, the host gets an Encryption Change Event. Therefore we should set conn->pending_sec_level instead of conn-> sec_level in hci_le_ltk_request_evt. This way, conn->sec_level is properly updated in hci_encrypt_change_evt. Moreover, since we have a LTK associated to the device, we have at least BT_SECURITY_MEDIUM security level. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-09-15bridge: fix NULL pointer deref of br_port_get_rcuHong Zhiguo
The NULL deref happens when br_handle_frame is called between these 2 lines of del_nbp: dev->priv_flags &= ~IFF_BRIDGE_PORT; /* --> br_handle_frame is called at this time */ netdev_rx_handler_unregister(dev); In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced without check but br_port_get_rcu(dev) returns NULL if: !(dev->priv_flags & IFF_BRIDGE_PORT) Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary here since we're in rcu_read_lock and we have synchronize_net() in netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT and by the previous patch, make sure br_port_get_rcu is called in bridging code. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-15bridge: use br_port_get_rtnl within rtnl lockHong Zhiguo
current br_port_get_rcu is problematic in bridging path (NULL deref). Change these calls in netlink path first. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-13Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio changes from Ben LaHaise: "First off, sorry for this pull request being late in the merge window. Al had raised a couple of concerns about 2 items in the series below. I addressed the first issue (the race introduced by Gu's use of mm_populate()), but he has not provided any further details on how he wants to rework the anon_inode.c changes (which were sent out months ago but have yet to be commented on). The bulk of the changes have been sitting in the -next tree for a few months, with all the issues raised being addressed" * git://git.kvack.org/~bcrl/aio-next: (22 commits) aio: rcu_read_lock protection for new rcu_dereference calls aio: fix race in ring buffer page lookup introduced by page migration support aio: fix rcu sparse warnings introduced by ioctx table lookup patch aio: remove unnecessary debugging from aio_free_ring() aio: table lookup: verify ctx pointer staging/lustre: kiocb->ki_left is removed aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" aio: be defensive to ensure request batching is non-zero instead of BUG_ON() aio: convert the ioctx list to table lookup v3 aio: double aio_max_nr in calculations aio: Kill ki_dtor aio: Kill ki_users aio: Kill unneeded kiocb members aio: Kill aio_rw_vect_retry() aio: Don't use ctx->tail unnecessarily aio: io_cancel() no longer returns the io_event aio: percpu ioctx refcount aio: percpu reqs_available aio: reqs_active -> reqs_available aio: fix build when migration is disabled ...
2013-09-13Remove GENERIC_HARDIRQ config optionMartin Schwidefsky
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pktPhil Oester
In commit 58a317f1 (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt was added with an incorrect comparison of ICMP codes to types. This causes problems when using NAT rules with the --random option. Correct the comparison. This closes netfilter bugzilla #851, reported by Alexander Neumann. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-12bridge: Clamp forward_delay when enabling STPHerbert Xu
At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12resubmit bridge: fix message_age_timer calculationChris Healy
This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge more patches from Andrew Morton: "The rest of MM. Plus one misc cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) mm/Kconfig: add MMU dependency for MIGRATION. kernel: replace strict_strto*() with kstrto*() mm, thp: count thp_fault_fallback anytime thp fault fails thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() thp: do_huge_pmd_anonymous_page() cleanup thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() mm: cleanup add_to_page_cache_locked() thp: account anon transparent huge pages into NR_ANON_PAGES truncate: drop 'oldsize' truncate_pagecache() parameter mm: make lru_add_drain_all() selective memcg: document cgroup dirty/writeback memory statistics memcg: add per cgroup writeback pages accounting memcg: check for proper lock held in mem_cgroup_update_page_stat memcg: remove MEMCG_NR_FILE_MAPPED memcg: reduce function dereference memcg: avoid overflow caused by PAGE_ALIGN memcg: rename RESOURCE_MAX to RES_COUNTER_MAX memcg: correct RESOURCE_MAX to ULLONG_MAX mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup ...
2013-09-12memcg: rename RESOURCE_MAX to RES_COUNTER_MAXSha Zhengju
RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...
2013-09-12net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmitDaniel Borkmann
Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from 625034113 ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester <alan.chester@tekelec.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_replySonic Zhang
The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP. [ Bug introduced by commit b7394d2429c198b1da3d46ac39192e891029ec0f ("netpoll: prepare for ipv6") ] Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12Merge tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes (part 2) from Trond Myklebust: "Bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations" * tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: No, I did not intend to create a 256KiB hashtable sunrpc: Add missing kuids conversion for printing NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1 fix decode_free_stateid
2013-09-12SUNRPC: No, I did not intend to create a 256KiB hashtableTrond Myklebust
Fix the declaration of the gss_auth_hash_table so that it creates a 16 bucket hashtable, as I had intended. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-12sunrpc: Add missing kuids conversion for printingGeert Uytterhoeven
m68k/allmodconfig: net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’: net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘kuid_t’ commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and kgids to uids and gids for printing") forgot to convert one instance. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - Some pidns/fork/exec tweaks - OCFS2 updates - Most of MM - there remain quite a few memcg parts which depend on pending core cgroups changes. Which might have been already merged - I'll check tomorrow... - Various misc stuff all over the place - A few block bits which I never got around to sending to Jens - relatively minor things. - MAINTAINERS maintenance - A small number of lib/ updates - checkpatch updates - epoll - firmware/dmi-scan - Some kprobes work for S390 - drivers/rtc updates - hfsplus feature work - vmcore feature work - rbtree upgrades - AOE updates - pktcdvd cleanups - PPS - memstick - w1 - New "inittmpfs" feature, which does the obvious - More IPC work from Davidlohr. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (303 commits) lz4: fix compression/decompression signedness mismatch ipc: drop ipc_lock_check ipc, shm: drop shm_lock_check ipc: drop ipc_lock_by_ptr ipc, shm: guard against non-existant vma in shmdt(2) ipc: document general ipc locking scheme ipc,msg: drop msg_unlock ipc: rename ids->rw_mutex ipc,shm: shorten critical region for shmat ipc,shm: cleanup do_shmat pasta ipc,shm: shorten critical region for shmctl ipc,shm: make shmctl_nolock lockless ipc,shm: introduce shmctl_nolock ipc: drop ipcctl_pre_down ipc,shm: shorten critical region in shmctl_down ipc,shm: introduce lockless functions to obtain the ipc object initmpfs: use initramfs if rootfstype= or root= specified initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled initmpfs: move rootfs code from fs/ramfs/ to init/ initmpfs: move bdi setup from init_rootfs to init_ramfs ...
2013-09-11kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user()Mathieu Desnoyers
I found the following pattern that leads in to interesting findings: grep -r "ret.*|=.*__put_user" * grep -r "ret.*|=.*__get_user" * grep -r "ret.*|=.*__copy" * The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat, since those appear in compat code, we could probably expect the kernel addresses not to be reachable in the lower 32-bit range, so I think they might not be exploitable. For the "__get_user" cases, I don't think those are exploitable: the worse that can happen is that the kernel will copy kernel memory into in-kernel buffers, and will fail immediately afterward. The alpha csum_partial_copy_from_user() seems to be missing the access_ok() check entirely. The fix is inspired from x86. This could lead to information leak on alpha. I also noticed that many architectures map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I wonder if the latter is performing the access checks on every architectures. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Brown paper bag fix in HTB scheduler, class options set incorrectly due to a typoe. Fix from Vimalkumar. 2) It's possible for the ipv6 FIB garbage collector to run before all the necessary datastructure are setup during init, defer the notifier registry to avoid this problem. Fix from Michal Kubecek. 3) New i40e ethernet driver from the Intel folks. 4) Add new qmi wwan device IDs, from Bjørn Mork. 5) Doorbell lock in bnx2x driver is not initialized properly in some configurations, fix from Ariel Elior. 6) Revert an ipv6 packet option padding change that broke standardized ipv6 implementation test suites. From Jiri Pirko. 7) Fix synchronization of ARP information in bonding layer, from Nikolay Aleksandrov. 8) Fix missing error return resulting in illegal memory accesses in openvswitch, from Daniel Borkmann. 9) SCTP doesn't signal poll events properly due to mistaken operator precedence, fix also from Daniel Borkmann. 10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which essentially disables caching of TX queue in sockets :-/ Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net_sched: htb: fix a typo in htb_change_class() net: qmi_wwan: add new Qualcomm devices ipv6: don't call fib6_run_gc() until routing is ready net: tilegx driver: avoid compiler warning fib6_rules: fix indentation irda: vlsi_ir: Remove casting the return value which is a void pointer irda: donauboe: Remove casting the return value which is a void pointer net: fix multiqueue selection net: sctp: fix smatch warning in sctp_send_asconf_del_ip net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE net: fib: fib6_add: fix potential NULL pointer dereference net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs bcm63xx_enet: remove deprecated IRQF_DISABLED net: korina: remove deprecated IRQF_DISABLED macvlan: Move skb_clone check closer to call qlcnic: Fix warning reported by kbuild test robot. bonding: fix bond_arp_rcv setting and arp validate desync state bonding: fix store_arp_validate race with mode change ipv6/exthdrs: accept tlv which includes only padding bnx2x: avoid atomic allocations during initialization ...
2013-09-11net_sched: htb: fix a typo in htb_change_class()Vimalkumar
Fix a typo added in commit 56b765b79 ("htb: improved accuracy at high rates") cbuffer should not be a copy of buffer. Signed-off-by: Vimalkumar <j.vimal@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11ipv6: don't call fib6_run_gc() until routing is readyMichal Kubeček
When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11fib6_rules: fix indentationStefan Tomanek
This change just removes two tabs from the source file. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11net: fix multiqueue selectionEric Dumazet
commit 416186fbf8c5b4e4465 ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") added a bug that disables caching of queue index in the socket. This is the source of packet reorders for TCP flows, and again this is happening more often when using FQ pacing. Old code was doing if (queue_index != old_index) sk_tx_queue_set(sk, queue_index); Alexander renamed the variables but forgot to change sk_tx_queue_set() 2nd parameter. if (queue_index != new_index) sk_tx_queue_set(sk, queue_index); This means we store -1 over and over in sk->sk_tx_queue_mapping Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11net: sctp: fix smatch warning in sctp_send_asconf_del_ipDaniel Borkmann
This was originally reported in [1] and posted by Neil Horman [2], he said: Fix up a missed null pointer check in the asconf code. If we don't find a local address, but we pass in an address length of more than 1, we may dereference a NULL laddr pointer. Currently this can't happen, as the only users of the function pass in the value 1 as the addrcnt parameter, but its not hot path, and it doesn't hurt to check for NULL should that ever be the case. The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1 while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR so that we do *not* find a single address in the association's bind address list that is not in the packed array of addresses. If this happens when we have an established association with ASCONF-capable peers, then we could get a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1 and call later sctp_make_asconf_update_ip() with NULL laddr. BUT: this actually won't happen as sctp_bindx_rem() will catch such a case and return with an error earlier. As this is incredably unintuitive and error prone, add a check to catch at least future bugs here. As Neil says, its not hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the single-homed host"). [1] http://www.spinics.net/lists/linux-sctp/msg02132.html [2] http://www.spinics.net/lists/linux-sctp/msg02133.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUEDaniel Borkmann
If we do not add braces around ... mask |= POLLERR | sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0; ... then this condition always evaluates to true as POLLERR is defined as 8 and binary or'd with whatever result comes out of sock_flag(). Hence instead of (X | Y) ? A : B, transform it into X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix smatch warnings inside datagram_poll") forgot about SCTP. :-( Introduced by 7d4c04fc170 ("net: add option to enable error queue packets waking select"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11net: fib: fib6_add: fix potential NULL pointer dereferenceDaniel Borkmann
When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return with an error in fn = fib6_add_1(), then error codes are encoded into the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we write the error code into err and jump to out, hence enter the if(err) condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for: if (pn != fn && pn->leaf == rt) ... if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO)) ... Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn evaluates to true and causes a NULL-pointer dereference on further checks on pn. Fix it, by setting both NULL in error case, so that pn != fn already evaluates to false and no further dereference takes place. This was first correctly implemented in 4a287eba2 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag"), but the bug got later on introduced by 188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Lin Ming <mlin@ss.pku.edu.cn> Cc: Matti Vaittinen <matti.vaittinen@nsn.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrsDaniel Borkmann
In function __parse_flow_nlattrs(), we check for condition (type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do not return from this function as in other checks. It seems this has been forgotten, as otherwise, we could access beyond the memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1]. Hence, a maliciously prepared nla_type from user space could access beyond this upper limit. Introduced by 03f0d916a ("openvswitch: Mega flow implementation"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11ipv6/exthdrs: accept tlv which includes only paddingJiri Pirko
In rfc4942 and rfc2460 I cannot find anything which would implicate to drop packets which have only padding in tlv. Current behaviour breaks TAHI Test v6LC.1.2.6. Problem was intruduced in: 9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-11Merge tag 'for-linus-3.12-merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: "Minor 9p fixes and tweaks for 3.12 merge window The first fixes namespace issues which causes a kernel NULL pointer dereference, the second fixes uevent handling to work better with udev, and the third switches some code to use srlcpy instead of strncpy in order to be safer. All changes have been baking in for-next for at least 2 weeks" * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: avoid accessing utsname after namespace has been torn down 9p: send uevent after adding/removing mount_tag attribute fs: 9p: use strlcpy instead of strncpy
2013-09-10Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This was a very quiet cycle! Just a few bugfixes and some cleanup" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: rpc: let xdr layer allocate gssproxy receieve pages rpc: fix huge kmalloc's in gss-proxy rpc: comment on linux_cred encoding, treat all as unsigned rpc: clean up decoding of gssproxy linux creds svcrpc: remove unused rq_resused nfsd4: nfsd4_create_clid_dir prints uninitialized data nfsd4: fix leak of inode reference on delegation failure Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR" sunrpc: prepare NFS for 2038 nfsd4: fix setlease error return nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
2013-09-10shrinker: convert remaining shrinkers to count/scan APIDave Chinner
Convert the remaining couple of random shrinkers in the tree to the new API. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09libceph: add function to ensure notifies are completeJosh Durgin
Without a way to flush the osd client's notify workqueue, a watch event that is unregistered could continue receiving callbacks indefinitely. Unregistering the event simply means no new notifies are added to the queue, but there may still be events in the queue that will call the watch callback for the event. If the queue is flushed after the event is unregistered, the caller can be sure no more watch callbacks will occur for the canceled watch. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...
2013-09-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "This includes both the first pile of Ceph patches (which I sent to torvalds@vger, sigh) and a few new patches that add support for fscache for Ceph. That includes a few fscache core fixes that David Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski for putting this feature together) This first batch of patches (included here) had (has) several important RBD bug fixes, hole punch support, several different cleanups in the page cache interactions, improvements in the truncate code (new truncate mutex to avoid shenanigans with i_mutex), and a series of fixes in the synchronous striping read/write code. On top of that is a random collection of small fixes all across the tree (error code checks and error path cleanup, obsolete wq flags, etc)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits) ceph: use d_invalidate() to invalidate aliases ceph: remove ceph_lookup_inode() ceph: trivial buildbot warnings fix ceph: Do not do invalidate if the filesystem is mounted nofsc ceph: page still marked private_2 ceph: ceph_readpage_to_fscache didn't check if marked ceph: clean PgPrivate2 on returning from readpages ceph: use fscache as a local presisent cache fscache: Netfs function for cleanup post readpages FS-Cache: Fix heading in documentation CacheFiles: Implement interface to check cache consistency FS-Cache: Add interface to check consistency of a cached object rbd: fix null dereference in dout rbd: fix buffer size for writes to images with snapshots libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc rbd: fix I/O error propagation for reads ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem ceph: allow sync_read/write return partial successed size of read/write. ceph: fix bugs about handling short-read for sync read mode. ceph: remove useless variable revoked_rdcache ...
2013-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
2013-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A quick set of fixes, some to deal with fallout from yesterday's net-next merge. 1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled, from Dmitry Kravkov. 2) Fix a bnx2x regression caused by one of Dave Jones's mistaken braces changes, from Eilon Greenstein. 3) Add some protective filtering in the netlink tap code, from Daniel Borkmann. 4) Fix TCP congestion window growth regression after timeouts, from Yuchung Cheng. 5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: tcp: properly increase rcv_ssthresh for ofo packets net: add documentation for BQL helpers mlx5: remove unused MLX5_DEBUG param in Kconfig bnx2x: Restore a call to config_init bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set tcp: fix no cwnd growth after timeout net: netlink: filter particular protocols from analyzers
2013-09-06tcp: properly increase rcv_ssthresh for ofo packetsEric Dumazet
TCP receive window handling is multi staged. A socket has a memory budget, static or dynamic, in sk_rcvbuf. Because we do not really know how this memory budget translates to a TCP window (payload), TCP announces a small initial window (about 20 MSS). When a packet is received, we increase TCP rcv_win depending on the payload/truesize ratio of this packet. Good citizen packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2 This heuristic takes place in tcp_grow_window() Problem is : We currently call tcp_grow_window() only for in-order packets. This means that reorders or packet losses stop proper grow of rcv_win, and senders are unable to benefit from fast recovery, or proper reordering level detection. Really, a packet being stored in OFO queue is not a bad citizen. It should be part of the game as in-order packets. In our traces, we very often see sender is limited by linux small receive windows, even if linux hosts use autotuning (DRS) and should allow rcv_win to grow to ~3MB. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06tcp: fix no cwnd growth after timeoutYuchung Cheng
In commit 0f7cc9a3 "tcp: increase throughput when reordering is high", it only allows cwnd to increase in Open state. This mistakenly disables slow start after timeout (CA_Loss). Moreover cwnd won't grow if the state moves from Disorder to Open later in tcp_fastretrans_alert(). Therefore the correct logic should be to allow cwnd to grow as long as the data is received in order in Open, Loss, or even Disorder state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06net: netlink: filter particular protocols from analyzersDaniel Borkmann
Fix finer-grained control and let only a whitelist of allowed netlink protocols pass, in our case related to networking. If later on, other subsystems decide they want to add their protocol as well to the list of allowed protocols they shall simply add it. While at it, we also need to tell what protocol is in use otherwise BPF_S_ANC_PROTOCOL can not pick it up (as it's not filled out). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-06Merge tag 'fscache-fixes-for-ceph' into wip-fscacheMilosz Tanski
Patches for Ceph FS-Cache support
2013-09-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...
2013-09-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: "Highlights: - conversion of HID subsystem to use devm-based resource management, from Benjamin Tissoires - i2c-hid support for DT bindings, from Benjamin Tissoires - much improved support for Win8-multitouch devices, from Benjamin Tissoires - cleanup of core code using common hidinput_input_event(), from David Herrmann - fix for bug in implement() access to the bit stream (causing oops) that has been present in the code for ages, but devices that are able to trigger it have started to appear only now, from Jiri Kosina - fixes for CVE-2013-2899, CVE-2013-2898, CVE-2013-2896, CVE-2013-2892, CVE-2013-2888 (all triggerable only by specially crafted malicious HW devices plugged into the system), from Kees Cook - hidraw oops fix, from Manoj Chourasia - various smaller fixes here and there, support for a bunch of new devices by various contributors" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (53 commits) HID: MAINTAINERS: add roccat drivers HID: hid-sensor-hub: change kmalloc + memcpy by kmemdup HID: hid-sensor-hub: move to devm_kzalloc HID: hid-sensor-hub: fix indentation accross the code HID: move HID_REPORT_TYPES closer to the report-definitions HID: check for NULL field when setting values HID: picolcd_core: validate output report details HID: sensor-hub: validate feature report details HID: ntrig: validate feature report details HID: pantherlord: validate output report details HID: hid-wiimote: print small buffers via %*phC HID: uhid: improve uhid example client HID: Correct the USB IDs for the new Macbook Air 6 HID: wiimote: add support for Guitar-Hero guitars HID: wiimote: add support for Guitar-Hero drums Input: introduce BTN/ABS bits for drums and guitars HID: battery: don't do DMA from stack HID: roccat: add support for KonePureOptical v2 HID: picolcd: Prevent NULL pointer dereference on _remove() HID: usbhid: quirk for N-Trig DuoSense Touch Screen ...