summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-11-06ipv6: Allow sending packets through tunnels with wildcard endpointsSteffen Klassert
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit packets through an ipv6 tunnel. This capability is set when the tunnel gets configured, based on the tunnel endpoint addresses. On tunnels with wildcard tunnel endpoints, we need to do the capabiltiy checking on a per packet basis like it is done in the receive path. This patch extends ip6_tnl_xmit_ctl() to take local and remote addresses as parameters to allow for per packet capabiltiy checking. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05fast_hash: avoid indirect function callsHannes Frederic Sowa
By default the arch_fast_hash hashing function pointers are initialized to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get updated to the CRC32 ones. This dispatching scheme incurs a function pointer lookup and indirect call for every hashing operation. rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing functions in its structure, too, causing two indirect branches per hashing operation. Using alternative_call we can get away with one of those indirect branches. Acked-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05ipv6: move INET6_MATCH() to include/net/inet6_hashtables.hWANG Cong
It is only used in net/ipv6/inet6_hashtables.c. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05net: Add and use skb_copy_datagram_msg() helper.David S. Miller
This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05gue: TX support for using remote checksum offload optionTom Herbert
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure remote checksum offload on an IP tunnel. Add logic in gue_build_header to insert remote checksum offload option. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05gue: Protocol constants for remote checksum offloadTom Herbert
Define a private flag for remote checksun offload as well as a length for the option. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05udp: Changes to udp_offload to support remote checksum offloadTom Herbert
Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote checksum offload being done (in this case inner checksum must not be offloaded to the NIC). Added logic in __skb_udp_tunnel_segment to handle remote checksum offload case. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05gue: Add infrastructure for flags and optionsTom Herbert
Add functions and basic definitions for processing standard flags, private flags, and control messages. This includes definitions to compute length of optional fields corresponding to a set of flags. Flag validation is in validate_gue_flags function. This checks for unknown flags, and that length of optional fields is <= length in guehdr hlen. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05net: Move fou_build_header into fou.c and refactorTom Herbert
Move fou_build_header out of ip_tunnel.c and into fou.c splitting it up into fou_build_header, gue_build_header, and fou_build_udp. This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap to call fou_build_header or gue_build_header based on the tunnel encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen functions which are called by ip_encap_hlen. New net/fou.h has prototypes and defines for this. Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels can use FOU/GUE and fou module is also selected. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04net: allow setting ecn via routing tableFlorian Westphal
This patch allows to set ECN on a per-route basis in case the sysctl tcp_ecn is not set to 1. In other words, when ECN is set for specific routes, it provides a tcp_ecn=1 behaviour for that route while the rest of the stack acts according to the global settings. One can use 'ip route change dev $dev $net features ecn' to toggle this. Having a more fine-grained per-route setting can be beneficial for various reasons, for example, 1) within data centers, or 2) local ISPs may deploy ECN support for their own video/streaming services [1], etc. There was a recent measurement study/paper [2] which scanned the Alexa's publicly available top million websites list from a vantage point in US, Europe and Asia: Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side only ECN") ;)); the break in connectivity on-path was found is about 1 in 10,000 cases. Timeouts rather than receiving back RSTs were much more common in the negotiation phase (and mostly seen in the Alexa middle band, ranks around 50k-150k): from 12-thousand hosts on which there _may_ be ECN-linked connection failures, only 79 failed with RST when _not_ failing with RST when ECN is not requested. It's unclear though, how much equipment in the wild actually marks CE when buffers start to fill up. We thought about a fallback to non-ECN for retransmitted SYNs as another global option (which could perhaps one day be made default), but as Eric points out, there's much more work needed to detect broken middleboxes. Two examples Eric mentioned are buggy firewalls that accept only a single SYN per flow, and middleboxes that successfully let an ECN flow establish, but later mark CE for all packets (so cwnd converges to 1). [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15 [2] http://ecn.ethz.ch/ Joint work with Daniel Borkmann. Reference: http://thread.gmane.org/gmane.linux.network/335797 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04syncookies: split cookie_check_timestamp() into two functionsFlorian Westphal
The function cookie_check_timestamp(), both called from IPv4/6 context, is being used to decode the echoed timestamp from the SYN/ACK into TCP options used for follow-up communication with the peer. We can remove ECN handling from that function, split it into a separate one, and simply rename the original function into cookie_decode_options(). cookie_decode_options() just fills in tcp_option struct based on the echoed timestamp received from the peer. Anything that fails in this function will actually discard the request socket. While this is the natural place for decoding options such as ECN which commit 172d69e63c7f ("syncookies: add support for ECN") added, we argue that in particular for ECN handling, it can be checked at a later point in time as the request sock would actually not need to be dropped from this, but just ECN support turned off. Therefore, we split this functionality into cookie_ecn_ok(), which tells us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled. This prepares for per-route ECN support: just looking at the tcp_ecn sysctl won't be enough anymore at that point; if the timestamp indicates ECN and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric. This would mean adding a route lookup to cookie_check_timestamp(), which we definitely want to avoid. As we already do a route lookup at a later point in cookie_{v4,v6}_check(), we can simply make use of that as well for the new cookie_ecn_ok() function w/o any additional cost. Joint work with Daniel Borkmann. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03net: add rbnode to struct sk_buffEric Dumazet
Yaogong replaces TCP out of order receive queue by an RB tree. As netem already does a private skb->{next/prev/tstamp} union with a 'struct rb_node', lets do this in a cleaner way. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03net/mlx4_core: Add retrieval of CONFIG_DEV parametersMatan Barak
Add code to issue CONFIG_DEV "get" firmware command. This command is used in order to obtain certain parameters used for supporting various RX checksumming options and vxlan UDP port. The GET operation is allowed for VFs too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03net: shrink struct softnet_dataEric Dumazet
flow_limit in struct softnet_data is only read from local cpu and can be moved to fill a hole, reducing softnet_data size by 64 bytes on x86_64 While we are at it, move output_queue, output_queue_tailp and completion_queue, so that rx / tx paths touch a single cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/marvell.c Simple overlapping changes in drivers/net/phy/marvell.c Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A bit has accumulated, but it's been a week or so since my last batch of post-merge-window fixes, so... 1) Missing module license in netfilter reject module, from Pablo. Lots of people ran into this. 2) Off by one in mac80211 baserate calculation, from Karl Beldan. 3) Fix incorrect return value from ax88179_178a driver's set_mac_addr op, which broke use of it with bonding. From Ian Morgan. 4) Checking of skb_gso_segment()'s return value was not all encompassing, it can return an SKB pointer, a pointer error, or NULL. Fix from Florian Westphal. This is crummy, and longer term will be fixed to just return error pointers or a real SKB. 6) Encapsulation offloads not being handled by skb_gso_transport_seglen(). From Florian Westphal. 7) Fix deadlock in TIPC stack, from Ying Xue. 8) Fix performance regression from using rhashtable for netlink sockets. The problem was the synchronize_net() invoked for every socket destroy. From Thomas Graf. 9) Fix bug in eBPF verifier, and remove the strong dependency of BPF on NET. From Alexei Starovoitov. 10) In qdisc_create(), use the correct interface to allocate ->cpu_bstats, otherwise the u64_stats_sync member isn't initialized properly. From Sabrina Dubroca. 11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter. 12) nf_tables_newchain() was erroneously expecting error pointers from netdev_alloc_pcpu_stats(). It only returna a valid pointer or NULL. From Sabrina Dubroca. 13) Fix use-after-free in _decode_session6(), from Li RongQing. 14) When we set the TX flow hash on a socket, we mistakenly do so before we've nailed down the final source port. Move the setting deeper to fix this. From Sathya Perla. 15) NAPI budget accounting in amd-xgbe driver was counting descriptors instead of full packets, fix from Thomas Lendacky. 16) Fix total_data_buflen calculation in hyperv driver, from Haiyang Zhang. 17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke Mehrtens. 18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is that something that ends up being vmalloc memory can't be passed to the crypto hash routines via scatter-gather lists. From Eric Dumazet. 19) Fix regression in promiscuous mode enabling in cdc-ether, from Olivier Blin. 20) Bucket eviction and frag entry killing can race with eachother, causing an unlink of the object from the wrong list. Fix from Nikolay Aleksandrov. 21) Missing initialization of spinlock in cxgb4 driver, from Anish Bhatt. 22) Do not cache ipv4 routing failures, otherwise if the sysctl for forwarding is subsequently enabled this won't be seen. From Nicolas Cavallari" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits) drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode drivers: net: cpsw: Fix broken loop condition in switch mode net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 stmmac: pci: set default of the filter bins net: smc91x: Fix gpios for device tree based booting mpls: Allow mpls_gso to be built as module mpls: Fix mpls_gso handler. r8152: stop submitting intr for -EPROTO netfilter: nft_reject_bridge: restrict reject to prerouting and input netfilter: nft_reject_bridge: don't use IP stack to reject traffic netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing drivers/net: macvtap and tun depend on INET drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets drivers/net: Disable UFO through virtio net: skb_fclone_busy() needs to detect orphaned skb gre: Use inner mac length when computing tunnel length mlx4: Avoid leaking steering rules on flow creation error flow net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN ...
2014-10-31Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Various scheduler fixes all over the place: three SCHED_DL fixes, three sched/numa fixes, two generic race fixes and a comment fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/dl: Fix preemption checks sched: Update comments for CLONE_NEWNS sched: stop the unbound recursion in preempt_schedule_context() sched/fair: Fix division by zero sysctl_numa_balancing_scan_size sched/fair: Care divide error in update_task_scan_period() sched/numa: Fix unsafe get_task_struct() in task_numa_assign() sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() sched/deadline: Don't replenish from a !SCHED_DEADLINE entity sched: Fix race between task_group and sched_task_group
2014-10-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus on the kernel side: - a revert for a newly introduced PMU driver which isn't complete yet and where we ran out of time with fixes (to be tried again in v3.19) - this makes up for a large chunk of the diffstat. - compilation warning fixes - a printk message fix - event_idx usage fixes/cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Trivial typo fix for --demangle perf tools: Fix report -F dso_from for data without branch info perf tools: Fix report -F dso_to for data without branch info perf tools: Fix report -F symbol_from for data without branch info perf tools: Fix report -F symbol_to for data without branch info perf tools: Fix report -F mispredict for data without branch info perf tools: Fix report -F in_tx for data without branch info perf tools: Fix report -F abort for data without branch info perf tools: Make CPUINFO_PROC an array to support different kernel versions perf callchain: Use global caching provided by libunwind perf/x86/intel: Revert incomplete and undocumented Broadwell client support perf/x86: Fix compile warnings for intel_uncore perf: Fix typos in sample code in the perf_event.h header perf: Fix and clean up initialization of pmu::event_idx perf: Fix bogus kernel printk perf diff: Add missing hists__init() call at tool start
2014-10-31Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "The tree contains two RCU fixes and a compiler quirk comment fix" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make rcu_barrier() understand about missing rcuo kthreads compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles rcu: More on deadlock between CPU hotplug and expedited grace periods
2014-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains fixes for netfilter/ipvs. This round of fixes is larger than usual at this stage, specifically because of the nf_tables bridge reject fixes that I would like to see in 3.18. The patches are: 1) Fix a null-pointer dereference that may occur when logging errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull out crosses_local_route_boundary logic") in v3.17-rc5. 2) Update hook mask in nft_reject_bridge so we can also filter out packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting"), which needs this chunk to work. 3) Two patches to refactor common code to forge the IPv4 and IPv6 reject packets from the bridge. These are required by the nf_tables reject bridge fix. 4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject packets from the bridge. The idea is to forge the reject packets and inject them to the original port via br_deliver() which is now exported for that purpose. 5) Restrict nft_reject_bridge to bridge prerouting and input hooks. the original skbuff may cloned after prerouting when the bridge stack needs to flood it to several bridge ports, it is too late to reject the traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functionsPablo Neira Ayuso
That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_ip6hdr_put(): to build the IPv6 header. * nf_reject_ip6_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functionsPablo Neira Ayuso
That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_iphdr_put(): to build the IPv4 header. * nf_reject_ip_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-30drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings
UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30net: skb_fclone_busy() needs to detect orphaned skbEric Dumazet
Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30net: dsa: Add support for reading switch registers with ethtoolGuenter Roeck
Add support for reading switch registers with 'ethtool -d'. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30net: dsa: Add support for switch EEPROM accessGuenter Roeck
On some chips it is possible to access the switch eeprom. Add infrastructure support for it. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30net: dsa: Add support for reporting switch chip temperaturesGuenter Roeck
Some switches provide chip temperature data. Add support for reporting it through the hwmon subsystem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30Merge branch 'urgent-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull two RCU fixes from Paul E. McKenney: " - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods), which was intended to allow synchronize_sched_expedited() to be safely used when holding locks acquired by CPU-hotplug notifiers. This commit makes the put_online_cpus() avoid the deadlock instead of just handling the get_online_cpus(). - Complete the work of commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs), which was intended to allow RCU to avoid allocating unneeded kthreads on systems where the firmware says that there are more CPUs than are really present. This commit makes rcu_barrier() aware of the mismatch, so that it doesn't hang waiting for non-existent CPUs. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29Merge branch 'akpm' (incoming from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm/balloon_compaction: fix deflation when compaction is disabled sh: fix sh770x SCIF memory regions zram: avoid NULL pointer access in concurrent situation mm/slab_common: don't check for duplicate cache names ocfs2: fix d_splice_alias() return code checking mm: rmap: split out page_remove_file_rmap() mm: memcontrol: fix missed end-writeback page accounting mm: page-writeback: inline account_page_dirtied() into single caller lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}() drivers/rtc/rtc-bq32k.c: fix register value memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock kernel/kmod: fix use-after-free of the sub_info structure drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc mm, thp: fix collapsing of hugepages on madvise drivers: of: add return value to of_reserved_mem_device_init() mm: free compound page with correct order gcov: add ARM64 to GCOV_PROFILE_ALL fsnotify: next_i is freed during fsnotify_unmount_inodes. mm/compaction.c: avoid premature range skip in isolate_migratepages_range ...
2014-10-29mm: memcontrol: fix missed end-writeback page accountingJohannes Weiner
Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed page migration to uncharge the old page right away. The page is locked, unmapped, truncated, and off the LRU, but it could race with writeback ending, which then doesn't unaccount the page properly: test_clear_page_writeback() migration wait_on_page_writeback() TestClearPageWriteback() mem_cgroup_migrate() clear PCG_USED mem_cgroup_update_page_stat() if (PageCgroupUsed(pc)) decrease memcg pages under writeback release pc->mem_cgroup->move_lock The per-page statistics interface is heavily optimized to avoid a function call and a lookup_page_cgroup() in the file unmap fast path, which means it doesn't verify whether a page is still charged before clearing PageWriteback() and it has to do it in the stat update later. Rework it so that it looks up the page's memcg once at the beginning of the transaction and then uses it throughout. The charge will be verified before clearing PageWriteback() and migration can't uncharge the page as long as that is still set. The RCU lock will protect the memcg past uncharge. As far as losing the optimization goes, the following test results are from a microbenchmark that maps, faults, and unmaps a 4GB sparse file three times in a nested fashion, so that there are two negative passes that don't account but still go through the new transaction overhead. There is no actual difference: old: 33.195102545 seconds time elapsed ( +- 0.01% ) new: 33.199231369 seconds time elapsed ( +- 0.03% ) The time spent in page_remove_rmap()'s callees still adds up to the same, but the time spent in the function itself seems reduced: # Children Self Command Shared Object Symbol old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: <stable@vger.kernel.org> [3.17.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29mm: page-writeback: inline account_page_dirtied() into single callerJohannes Weiner
A follow-up patch would have changed the call signature. To save the trouble, just fold it instead. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: <stable@vger.kernel.org> [3.17.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29mm, thp: fix collapsing of hugepages on madviseDavid Rientjes
If an anonymous mapping is not allowed to fault thp memory and then madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never collapse this memory into thp memory. This occurs because the madvise(2) handler for thp, hugepage_madvise(), clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags until the final action of madvise_behavior(). This causes the khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when the vma had previously had VM_NOHUGEPAGE set. Fix this by passing the correct vma flags to the khugepaged mm slot handler. There's no chance khugepaged can run on this vma until after madvise_behavior() returns since we hold mm->mmap_sem. It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags in hugepage_advise(), but I didn't want to introduce special case behavior into madvise_behavior(). I think it's best to just let it always set vma->vm_flags itself. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Suleiman Souhlal <suleiman@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29drivers: of: add return value to of_reserved_mem_device_init()Marek Szyprowski
Driver calling of_reserved_mem_device_init() might be interested if the initialization has been successful or not, so add support for returning error code. This fixes a build warining caused by commit 7bfa5ab6fa1b ("drivers: dma-coherent: add initialization from device tree"), which has been merged without this change and without fixing function return value. Fixes: 7bfa5ab6fa1b1 ("drivers: dma-coherent: add initialization from device tree") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Josh Cartwright <joshc@codeaurora.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29neigh: optimize neigh_parms_release()Nicolas Dichtel
In neigh_parms_release() we loop over all entries to find the entry given in argument and being able to remove it from the list. By using a double linked list, we can avoid this loop. Here are some numbers with 30 000 dummy interfaces configured: Before the patch: $ time rmmod dummy real 2m0.118s user 0m0.000s sys 1m50.048s After the patch: $ time rmmod dummy real 1m9.970s user 0m0.000s sys 0m47.976s Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29net: introduce napi_schedule_irqoff()Eric Dumazet
napi_schedule() can be called from any context and has to mask hard irqs. Add a variant that can only be called from hard interrupts handlers or when irqs are already masked. Many NIC drivers can use it from their hard IRQ handler instead of generic variant. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29net: ipv6: Add a sysctl to make optimistic addresses useful candidatesErik Kline
Add a sysctl that causes an interface's optimistic addresses to be considered equivalent to other non-deprecated addresses for source address selection purposes. Preferred addresses will still take precedence over optimistic addresses, subject to other ranking in the source address selection algorithm. This is useful where different interfaces are connected to different networks from different ISPs (e.g., a cell network and a home wifi network). The current behaviour complies with RFC 3484/6724, and it makes sense if the host has only one interface, or has multiple interfaces on the same network (same or cooperating administrative domain(s), but not in the multiple distinct networks case. For example, if a mobile device has an IPv6 address on an LTE network and then connects to IPv6-enabled wifi, while the wifi IPv6 address is undergoing DAD, IPv6 connections will try use the wifi default route with the LTE IPv6 address, and will get stuck until they time out. Also, because optimistic nodes can receive frames, issue an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC flag appropriately set). A second RTM_NEWADDR is sent if DAD completes (the address flags have changed), otherwise an RTM_DELADDR is sent. Also: add an entry in ip-sysctl.txt for optimistic_dad. Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29tcp: allow for bigger reordering levelEric Dumazet
While testing upcoming Yaogong patch (converting out of order queue into an RB tree), I hit the max reordering level of linux TCP stack. Reordering level was limited to 127 for no good reason, and some network setups [1] can easily reach this limit and get limited throughput. Allow a new max limit of 300, and add a sysctl to allow admins to even allow bigger (or lower) values if needed. [1] Aggregation of links, per packet load balancing, fabrics not doing deep packet inspections, alternative TCP congestion modules... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current kernel. This contains: - Two error handling fixes from Jan Kara. One for null_blk on failure to add a device, and the other for the block/scsi_ioctl SCSI_IOCTL_SEND_COMMAND fixing up the error jump point. - A commit added in the merge window for the bio integrity bits unfortunately disabled merging for all requests if CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that integrity checking wont disallow merges when not enabled. - A fix from Ming Lei for merging and generating too many segments. This caused a BUG in virtio_blk. - Two error handling printk() fixups from Robert Elliott, improving the information given when we rate limit. - Error handling fixup on elevator_init() failure from Sudip Mukherjee. - A fix from Tony Battersby, fixing up a memory leak in the scatterlist handling with scsi-mq" * 'for-linus' of git://git.kernel.dk/linux-block: block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined lib/scatterlist: fix memory leak with scsi-mq block: fix wrong error return in elevator_init() scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND null_blk: Cleanup error recovery in null_add_dev() blk-merge: recaculate segment if it isn't less than max segments fs: clarify rate limit suppressed buffer I/O errors fs: merge I/O error prints into one line
2014-10-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - workarounds for a couple of misbehaving Elan Touchscreens, by Adel Gadllah - fix for TransducerSerialNumber field implementation, by Jason Gerecke - a couple of new HID usages (added by HUT), by Olivier Gay * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: input: Fix TransducerSerialNumber implementation HID: add keyboard input assist hid usages HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
2014-10-28block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not definedMartin K. Petersen
Commit 4eaf99beadce switched to returning bool and as a result reversed the logic of the integrity merge checks. However, the empty stubs used when the block integrity code is compiled out were still returning 0. Make these stubs return "true". Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Michael L. Semon <mlsemon35@gmail.com> Tested-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-28usbnet: add a callback for set_rx_modeOlivier Blin
To delegate promiscuous mode and multicast filtering to the subdriver. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting supportSaeed Mahameed
Added 100M, 20G and 56G ethtool speed reporting support. Update mlx4_en_test_speed self test with the new speeds. Defined new link speeds in include/uapi/linux/ethtool.h: +#define SPEED_20000 20000 +#define SPEED_40000 40000 +#define SPEED_56000 56000 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28net/mlx4_core: Add ethernet backplane autoneg device capabilitySaeed Mahameed
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev capSaeed Mahameed
Adding ACCESS REG mlx4 command and use it to implement Query method for PTYS (Port Type and Speed Register). Query and store eth_prot_ctrl dev cap. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool supportSaeed Mahameed
Added support for get_module_info/get_module_eeprom ethtool support for cable info reading. Added new cable types enum in include/uapi/linux/ethtool.h for ethtool use. +#define ETH_MODULE_SFF_8636 0x3 +#define ETH_MODULE_SFF_8636_LEN 256 +#define ETH_MODULE_SFF_8436 0x4 +#define ETH_MODULE_SFF_8436_LEN 256 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28net/mlx4_core: Introduce mlx4_get_module_info for cable module info readingSaeed Mahameed
Added new MAD_IFC command to read cable module info with attribute id (0xFF60). Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info) and the needed defines/enums for future use. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28datapath: Rename last_action() as nla_is_last() and move to netlink.hSimon Horman
The original motivation for this change was to allow the helper to be used in files other than actions.c as part of work on an odp select group action. It was as pointed out by Thomas Graf that this helper would be best off living in netlink.h. Furthermore, I think that the generic nature of this helper means it is best off in netlink.h regardless of if it is used more than one .c file or not. Thus, I would like it considered independent of the work on an odp select group action. Cc: Thomas Graf <tgraf@suug.ch> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28skbuff.h: fix kernel-doc warning for headers_endRandy Dunlap
Fix kernel-doc warning in <linux/skbuff.h> by making both headers_start and headers_end private fields. Warning(..//include/linux/skbuff.h:654): No description found for parameter 'headers_end[0]' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28net: pxa168_eth: Fix providing of phy_interface mode on platform_dataSebastian Hesselbarth
Do not add phy include to the board file but platform_data include instead. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28rcu: Make rcu_barrier() understand about missing rcuo kthreadsPaul E. McKenney
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Eric B Munson <emunson@akamai.com> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Yanko Kaneti <yaneti@declera.com> Tested-by: Kevin Fenzi <kevin@scrye.com> Tested-by: Meelis Roos <mroos@linux.ee>