summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2016-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/ 2) Do TCP Small Queues for retransmits, from Eric Dumazet. 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov. 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai. 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn. 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker. 7) Support ndo_poll_controller in mlx5, from Calvin Owens. 8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern. 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern. 10) Congestion control in RXRPC, from David Howells. 11) Support geneve RX offload in ixgbe, from Emil Tantilov. 12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet. 13) Convert XFRM state and policy lookups to RCU, from Florian Westphal. 14) Support RX network flow classification to igb, from Gangfeng Huang. 15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski. 16) New skbmod packet action, from Jamal Hadi Salim. 17) Remove some inefficiencies in snmp proc output, from Jia He. 18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko. 19) New dsa driver for qca8xxx chips, from John Crispin. 20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski. 21) Add L3 mode to ipvlan, from Mahesh Bandewar. 22) Support 802.1ad in mlx4, from Moshe Shemesh. 23) Support hardware LRO in mediatek driver, from Nelson Chang. 24) Add TC offloading to mlx5, from Or Gerlitz. 25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes. 26) TX max rate limiting for cxgb4, from Rahul Lakkireddy. 27) NAPI support for ath10k, from Rajkumar Manoharan. 28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed. 29) UDP replicast support in TIPC, from Richard Alpe. 30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru. 31) Support BQL in thunderx driver, from Sunil Goutham. 32) TSO support in alx driver, from Tobias Regnery. 33) Add stream parser engine and use it in kcm. 34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König. 35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...
2016-10-05mm: filemap: don't plant shadow entries without radix tree nodeJohannes Weiner
When the underflow checks were added to workingset_node_shadow_dec(), they triggered immediately: kernel BUG at ./include/linux/swap.h:276! invalid opcode: 0000 [#1] SMP Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 task: ffff8faa93ecd940 task.stack: ffff8faa7f478000 RIP: page_cache_tree_insert+0xf1/0x100 Call Trace: __add_to_page_cache_locked+0x12e/0x270 add_to_page_cache_lru+0x4e/0xe0 mpage_readpages+0x112/0x1d0 blkdev_readpages+0x1d/0x20 __do_page_cache_readahead+0x1ad/0x290 force_page_cache_readahead+0xaa/0x100 page_cache_sync_readahead+0x3f/0x50 generic_file_read_iter+0x5af/0x740 blkdev_read_iter+0x35/0x40 __vfs_read+0xe1/0x130 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x13/0x8f Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00 RIP page_cache_tree_insert+0xf1/0x100 This is a long-standing bug in the way shadow entries are accounted in the radix tree nodes. The shrinker needs to know when radix tree nodes contain only shadow entries, no pages, so node->count is split in half to count shadows in the upper bits and pages in the lower bits. Unfortunately, the radix tree implementation doesn't know of this and assumes all entries are in node->count. When there is a shadow entry directly in root->rnode and the tree is later extended, the radix tree implementation will copy that entry into the new node and and bump its node->count, i.e. increases the page count bits. Once the shadow gets removed and we subtract from the upper counter, node->count underflows and triggers the warning. Afterwards, without node->count reaching 0 again, the radix tree node is leaked. Limit shadow entries to when we have actual radix tree nodes and can count them properly. That means we lose the ability to detect refaults from files that had only the first page faulted in at eviction time. Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The new features and main improvements in this merge for v4.9 - Support for the UBSAN sanitizer - Set HAVE_EFFICIENT_UNALIGNED_ACCESS, it improves the code in some places - Improvements for the in-kernel fpu code, in particular the overhead for multiple consecutive in kernel fpu users is recuded - Add a SIMD implementation for the RAID6 gen and xor operations - Add RAID6 recovery based on the XC instruction - The PCI DMA flush logic has been improved to increase the speed of the map / unmap operations - The time synchronization code has seen some updates And bug fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/con3270: fix insufficient space padding s390/con3270: fix use of uninitialised data MAINTAINERS: update DASD maintainer s390/cio: fix accidental interrupt enabling during resume s390/dasd: add missing \n to end of dev_err messages s390/config: Enable config options for Docker s390/dasd: make query host access interruptible s390/dasd: fix panic during offline processing s390/dasd: fix hanging offline processing s390/pci_dma: improve lazy flush for unmap s390/pci_dma: split dma_update_trans s390/pci_dma: improve map_sg s390/pci_dma: simplify dma address calculation s390/pci_dma: remove dma address range check iommu/s390: simplify registration of I/O address translation parameters s390: migrate exception table users off module.h and onto extable.h s390: export header for CLP ioctl s390/vmur: fix irq pointer dereference in int handler s390/dasd: add missing KOBJ_CHANGE event for unformatted devices s390: enable UBSAN ...
2016-10-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Yet another batch of cpu hotplug core updates and conversions: - Provide core infrastructure for multi instance drivers so the drivers do not have to keep custom lists. - Convert custom lists to the new infrastructure. The block-mq custom list conversion comes through the block tree and makes the diffstat tip over to more lines removed than added. - Handle unbalanced hotplug enable/disable calls more gracefully. - Remove the obsolete CPU_STARTING/DYING notifier support. - Convert another batch of notifier users. The relayfs changes which conflicted with the conversion have been shipped to me by Andrew. The remaining lot is targeted for 4.10 so that we finally can remove the rest of the notifiers" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) cpufreq: Fix up conversion to hotplug state machine blk/mq: Reserve hotplug states for block multiqueue x86/apic/uv: Convert to hotplug state machine s390/mm/pfault: Convert to hotplug state machine mips/loongson/smp: Convert to hotplug state machine mips/octeon/smp: Convert to hotplug state machine fault-injection/cpu: Convert to hotplug state machine padata: Convert to hotplug state machine cpufreq: Convert to hotplug state machine ACPI/processor: Convert to hotplug state machine virtio scsi: Convert to hotplug state machine oprofile/timer: Convert to hotplug state machine block/softirq: Convert to hotplug state machine lib/irq_poll: Convert to hotplug state machine x86/microcode: Convert to hotplug state machine sh/SH-X3 SMP: Convert to hotplug state machine ia64/mca: Convert to hotplug state machine ARM/OMAP/wakeupgen: Convert to hotplug state machine ARM/shmobile: Convert to hotplug state machine arm64/FP/SIMD: Convert to hotplug state machine ...
2016-10-03Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...
2016-10-03Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "Main changes in this cycle were: - Refactor the EFI memory map code into architecture neutral files and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64. (Matt Fleming) - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel) - Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores. (Sylvain Chouleur) - Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter. (Alex Thorlton) - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel) - Merge the EFI test driver being carried out of tree until now in the FWTS project. (Ivan Hu) - Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec. (Ard Biesheuvel) - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table. (Lukas Wunner) - Miscellaneous cleanups and fixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE x86/efi: Allow invocation of arbitrary boot services x86/efi: Optimize away setup_gop32/64 if unused x86/efi: Use kmalloc_array() in efi_call_phys_prolog() efi/arm64: Treat regions with WT/WC set but WB cleared as memory efi: Add efi_test driver for exporting UEFI runtime service interfaces x86/efi: Defer efi_esrt_init until after memblock_x86_fill efi/arm64: Add debugfs node to dump UEFI runtime page tables x86/efi: Remove unused find_bits() function fs/efivarfs: Fix double kfree() in error path x86/efi: Map in physical addresses in efi_map_region_fixed lib/ucs2_string: Speed up ucs2_utf8size() firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy" x86/efi: Initialize status to ensure garbage is not returned on small size efi: Replace runtime services spinlock with semaphore efi: Don't use spinlocks for efi vars efi: Use a file local lock for efivars efi/arm*: esrt: Add missing call to efi_esrt_init() efi/esrt: Use memremap not ioremap to access ESRT table in memory x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data ...
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner
Get the cr4 fixes so we can apply the final cleanup
2016-09-27lib: clean up put_cpu_var usageShaohua Li
put_cpu_var takes the percpu data, not the data returned from get_cpu_var. This doesn't change the behavior. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25radix tree: fix sibling entry handling in radix_tree_descend()Linus Torvalds
The fixes to the radix tree test suite show that the multi-order case is broken. The basic reason is that the radix tree code uses tagged pointers with the "internal" bit in the low bits, and calculating the pointer indices was supposed to mask off those bits. But gcc will notice that we then use the index to re-create the pointer, and will avoid doing the arithmetic and use the tagged pointer directly. This cleans the code up, using the existing is_sibling_entry() helper to validate the sibling pointer range (instead of open-coding it), and using entry_to_node() to mask off the low tag bit from the pointer. And once you do that, you might as well just use the now cleaned-up pointer directly. [ Side note: the multi-order code isn't actually ever used in the kernel right now, and the only reason I didn't just delete all that code is that Kirill Shutemov piped up and said: "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries. It also converts shmem-with-huge-pages and hugetlb to them. I'm okay with converting it to other mechanism, but I need something. (I looked into Konstantin's RFC patchset[2]. It looks okay, but I don't feel myself qualified to review it as I don't know much about radix-tree internals.)" [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ] Reported-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Cedric Blancher <cedric.blancher@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-09-23locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help textVivien Didelot
Fix the indefinitiley -> indefinitely typo in Kconfig.debug. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-21lib/win_minmax: windowed min or max estimatorNeal Cardwell
This commit introduces a generic library to estimate either the min or max value of a time-varying variable over a recent time window. This is code originally from Kathleen Nichols. The current form of the code is from Van Jacobson. A single struct minmax_sample will track the estimated windowed-max value of the series if you call minmax_running_max() or the estimated windowed-min value of the series if you call minmax_running_min(). Nearly equivalent code is already in place for minimum RTT estimation in the TCP stack. This commit extracts that code and generalizes it to handle both min and max. Moving the code here reduces the footprint and complexity of the TCP code base and makes the filter generally available for other parts of the codebase, including an upcoming TCP congestion control module. This library works well for time series where the measurements are smoothly increasing or decreasing. Signed-off-by: Van Jacobson <vanj@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20ubsan: allow to disable the null sanitizerChristian Borntraeger
Some architectures use a hardware defined structure at address zero. Checking for a null pointer will result in many ubsan reports. Allow users to disable the null sanitizer. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-09-20rhashtable: Add rhlist interfaceHerbert Xu
The insecure_elasticity setting is an ugly wart brought out by users who need to insert duplicate objects (that is, distinct objects with identical keys) into the same table. In fact, those users have a much bigger problem. Once those duplicate objects are inserted, they don't have an interface to find them (unless you count the walker interface which walks over the entire table). Some users have resorted to doing a manual walk over the hash table which is of course broken because they don't handle the potential existence of multiple hash tables. The result is that they will break sporadically when they encounter a hash table resize/rehash. This patch provides a way out for those users, at the expense of an extra pointer per object. Essentially each object is now a list of objects carrying the same key. The hash table will only see the lists so nothing changes as far as rhashtable is concerned. To use this new interface, you need to insert a struct rhlist_head into your objects instead of struct rhash_head. While the hash table is unchanged, for type-safety you'll need to use struct rhltable instead of struct rhashtable. All the existing interfaces have been duplicated for rhlist, including the hash table walker. One missing feature is nulls marking because AFAIK the only potential user of it does not need duplicate objects. Should anyone need this it shouldn't be too hard to add. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20Merge branch 'linus' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-19fault-injection/cpu: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. This is just a temporary vehicle to keep the interface working for now, It'll be replaced by the sysfs interface which allows to step through the hotplug state machine step by step. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160906170457.32393-15-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19lib/irq_poll: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160906170457.32393-8-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-17fix iov_iter_fault_in_readable()Al Viro
... by turning it into what used to be multipages counterpart Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-16lib/syscall: Pin the task stack in collect_syscall()Andy Lutomirski
This will avoid a potential read-after-free if collect_syscall() (e.g. /proc/PID/syscall) is called on an exiting task. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0bfd8e6d4729c97745d3781a29610a33d0a8091d.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15test_bpf: fix the dummy skb after dissector changesJakub Kicinski
Commit d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci") made flow dissector look at vlan_proto when vlan is present. Since test_bpf sets skb->vlan_tci to ~0 (including VLAN_TAG_PRESENT) we have to populate skb->vlan_proto. Fixes false negative on test #24: test_bpf: #24 LD_PAYLOAD_OFF jited:0 175 ret 0 != 42 FAIL (1 times) Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15Merge branch 'linus' into x86/asm, to pick up recent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09lib/ucs2_string: Speed up ucs2_utf8size()Lukas Wunner
No need to calculate the string length on every loop iteration. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Peter Jones <pjones@redhat.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Most relevant updates are the removal of per-conntrack timers to use a workqueue/garbage collection approach instead from Florian Westphal, the hash and numgen expression for nf_tables from Laura Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag, removal of ip_conntrack sysctl and many other incremental updates on our Netfilter codebase. More specifically, they are: 1) Retrieve only 4 bytes to fetch ports in case of non-linear skb transport area in dccp, sctp, tcp, udp and udplite protocol conntrackers, from Gao Feng. 2) Missing whitespace on error message in physdev match, from Hangbin Liu. 3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang. 4) Add nf_ct_expires() helper function and use it, from Florian Westphal. 5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also from Florian. 6) Rename nf_tables set implementation to nft_set_{name}.c 7) Introduce the hash expression to allow arbitrary hashing of selector concatenations, from Laura Garcia Liebana. 8) Remove ip_conntrack sysctl backward compatibility code, this code has been around for long time already, and we have two interfaces to do this already: nf_conntrack sysctl and ctnetlink. 9) Use nf_conntrack_get_ht() helper function whenever possible, instead of opencoding fetch of hashtable pointer and size, patch from Liping Zhang. 10) Add quota expression for nf_tables. 11) Add number generator expression for nf_tables, this supports incremental and random generators that can be combined with maps, very useful for load balancing purpose, again from Laura Garcia Liebana. 12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King. 13) Introduce a nft_chain_parse_hook() helper function to parse chain hook configuration, this is used by a follow up patch to perform better chain update validation. 14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the nft_set_hash implementation to honor the NLM_F_EXCL flag. 15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(), patch from Florian Westphal. 16) Don't use the DYING bit to know if the conntrack event has been already delivered, instead a state variable to track event re-delivery states, also from Florian. 17) Remove the per-conntrack timer, use the workqueue approach that was discussed during the NFWS, from Florian Westphal. 18) Use the netlink conntrack table dump path to kill stale entries, again from Florian. 19) Add a garbage collector to get rid of stale conntracks, from Florian. 20) Reschedule garbage collector if eviction rate is high. 21) Get rid of the __nf_ct_kill_acct() helper. 22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger. 23) Make nf_log_set() interface assertive on unsupported families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01lib/test_hash.c: fix warning in preprocessor symbol evaluationGeert Uytterhoeven
Some versions of gcc don't like tests for the value of an undefined preprocessor symbol, even in the #else branch of an #ifndef: lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined [-Wundef] #elif HAVE_ARCH__HASH_32 != 1 ^ lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined [-Wundef] #elif HAVE_ARCH_HASH_32 != 1 ^ lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined [-Wundef] #elif HAVE_ARCH_HASH_64 != 1 ^ Seen with gcc 4.9, not seen with 4.1.2. Change the logic to only check the value inside an #ifdef to fix this. Fixes: 468a9428521e7d00 ("<linux/hash.h>: Add support for architecture-specific functions") Link: http://lkml.kernel.org/r/20160829214952.1334674-4-arnd@arndb.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01lib/test_hash.c: fix warning in two-dimensional array initGeert Uytterhoeven
lib/test_hash.c: In function 'test_hash_init': lib/test_hash.c:146:2: warning: missing braces around initializer [-Wmissing-braces] Fixes: 468a9428521e7d00 ("<linux/hash.h>: Add support for architecture-specific functions") Link: http://lkml.kernel.org/r/20160829214952.1334674-3-arnd@arndb.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01RAID/s390: provide raid6 recovery optimizationMartin Schwidefsky
The XC instruction can be used to improve the speed of the raid6 recovery. The loops now operate on blocks of 256 bytes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-30mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKSJosh Poimboeuf
There are three usercopy warnings which are currently being silenced for gcc 4.6 and newer: 1) "copy_from_user() buffer size is too small" compile warning/error This is a static warning which happens when object size and copy size are both const, and copy size > object size. I didn't see any false positives for this one. So the function warning attribute seems to be working fine here. Note this scenario is always a bug and so I think it should be changed to *always* be an error, regardless of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS. 2) "copy_from_user() buffer size is not provably correct" compile warning This is another static warning which happens when I enable __compiletime_object_size() for new compilers (and CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size is const, but copy size is *not*. In this case there's no way to compare the two at build time, so it gives the warning. (Note the warning is a byproduct of the fact that gcc has no way of knowing whether the overflow function will be called, so the call isn't dead code and the warning attribute is activated.) So this warning seems to only indicate "this is an unusual pattern, maybe you should check it out" rather than "this is a bug". I get 102(!) of these warnings with allyesconfig and the __compiletime_object_size() gcc check removed. I don't know if there are any real bugs hiding in there, but from looking at a small sample, I didn't see any. According to Kees, it does sometimes find real bugs. But the false positive rate seems high. 3) "Buffer overflow detected" runtime warning This is a runtime warning where object size is const, and copy size > object size. All three warnings (both static and runtime) were completely disabled for gcc 4.6 with the following commit: 2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+") That commit mistakenly assumed that the false positives were caused by a gcc bug in __compiletime_object_size(). But in fact, __compiletime_object_size() seems to be working fine. The false positives were instead triggered by #2 above. (Though I don't have an explanation for why the warnings supposedly only started showing up in gcc 4.6.) So remove warning #2 to get rid of all the false positives, and re-enable warnings #1 and #3 by reverting the above commit. Furthermore, since #1 is a real bug which is detected at compile time, upgrade it to always be an error. Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer needed. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Nilay Vaish <nilayvaish@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29RAID/s390: add SIMD implementation for raid6 gen/xorMartin Schwidefsky
Using vector registers is slightly faster: raid6: vx128x8 gen() 19705 MB/s raid6: vx128x8 xor() 11886 MB/s raid6: using algorithm vx128x8 gen() 19705 MB/s raid6: .... xor() 11886 MB/s, rmw enabled vs the software algorithms: raid6: int64x1 gen() 3018 MB/s raid6: int64x1 xor() 1429 MB/s raid6: int64x2 gen() 4661 MB/s raid6: int64x2 xor() 3143 MB/s raid6: int64x4 gen() 5392 MB/s raid6: int64x4 xor() 3509 MB/s raid6: int64x8 gen() 4441 MB/s raid6: int64x8 xor() 3207 MB/s raid6: using algorithm int64x4 gen() 5392 MB/s raid6: .... xor() 3509 MB/s, rmw enabled Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-08-26rhashtable: fix a memory leak in alloc_bucket_locks()Eric Dumazet
If vmalloc() was successful, do not attempt a kmalloc_array() Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations") Reported-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26rhashtable: add rhashtable_lookup_get_insert_key()Pablo Neira Ayuso
This patch modifies __rhashtable_insert_fast() so it returns the existing object that clashes with the one that you want to insert. In case the object is successfully inserted, NULL is returned. Otherwise, you get an error via ERR_PTR(). This patch adapts the existing callers of __rhashtable_insert_fast() so they handle this new logic, and it adds a new rhashtable_lookup_get_insert_key() interface to fetch this existing object. nf_tables needs this change to improve handling of EEXIST cases via honoring the NLM_F_EXCL flag and by checking if the data part of the mapping matches what we have. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-08-24dma-api: Teach the "DMA-from-stack" check about vmapped stacksAndy Lutomirski
If we're using CONFIG_VMAP_STACK=y and we manage to point an sg entry at the stack, then either the sg page will be in highmem or sg_virt() will return the direct-map alias. In neither case will the existing check_for_stack() implementation realize that it's a stack page. Fix it by explicitly checking for stack pages. This has no effect by itself. It's broken out for ease of review. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/448460622731312298bf19dcbacb1606e75de7a9.1470907718.git.luto@kernel.org [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-19rhashtable: Remove GFP flag from rhashtable_walk_initHerbert Xu
The commit 8f6fd83c6c5ec66a4a70c728535ddcdfef4f3697 ("rhashtable: accept GFP flags in rhashtable_walk_init") added a GFP flag argument to rhashtable_walk_init because some users wish to use the walker in an unsleepable context. In fact we don't need to allocate memory in rhashtable_walk_init at all. The walker is always paired with an iterator so we could just stash ourselves there. This patch does that by introducing a new enter function to replace the existing init function. This way we don't have to churn all the existing users again. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix Fietkau. 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme. 3) Fix some tg3 ethtool logic bugs, and one that would cause no interrupts to be generated when rx-coalescing is set to 0. From Satish Baddipadige and Siva Reddy Kallam. 4) QLCNIC mailbox corruption and napi budget handling fix from Manish Chopra. 5) Fix fib_trie logic when walking the trie during /proc/net/route output than can access a stale node pointer. From David Forster. 6) Several sctp_diag fixes from Phil Sutter. 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel. 8) Checksum fixup fixes in bpf from Daniel Borkmann. 9) Memork leaks in nfnetlink, from Liping Zhang. 10) Use after free in rxrpc, from David Howells. 11) Use after free in new skb_array code of macvtap driver, from Jason Wang. 12) Calipso resource leak, from Colin Ian King. 13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang. 14) Fix bpf non-linear packet write helpers, from Daniel Borkmann. 15) Fix lockdep splats in macsec, from Sabrina Dubroca. 16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF handling. 17) Various tc-action bug fixes, from CONG Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net_sched: allow flushing tc police actions net_sched: unify the init logic for act_police net_sched: convert tcf_exts from list to pointer array net_sched: move tc offload macros to pkt_cls.h net_sched: fix a typo in tc_for_each_action() net_sched: remove an unnecessary list_del() net_sched: remove the leftover cleanup_a() mlxsw: spectrum: Allow packets to be trapped from any PG mlxsw: spectrum: Unmap 802.1Q FID before destroying it mlxsw: spectrum: Add missing rollbacks in error path mlxsw: reg: Fix missing op field fill-up mlxsw: spectrum: Trap loop-backed packets mlxsw: spectrum: Add missing packet traps mlxsw: spectrum: Mark port as active before registering it mlxsw: spectrum: Create PVID vPort before registering netdevice mlxsw: spectrum: Remove redundant errors from the code mlxsw: spectrum: Don't return upon error in removal path i40e: check for and deal with non-contiguous TCs ixgbe: Re-enable ability to toggle VLAN filtering ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths ...
2016-08-15rhashtable: fix shift by 64 when shrinkingVegard Nossum
I got this: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Workqueue: events rht_deferred_worker 0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3 ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0 0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640 Call Trace: [<ffffffff82344f50>] dump_stack+0xac/0xfc [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4 [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180 [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0 [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160 [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380 [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790 [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790 [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370 [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970 [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970 [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970 [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0 [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340 [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340 [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40 [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970 [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970 [<ffffffff813642f7>] kthread+0x237/0x390 [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280 [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50 [<ffffffff845f95df>] ret_from_fork+0x1f/0x40 [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280 ================================================================================ roundup_pow_of_two() is undefined when called with an argument of 0, so let's avoid the call and just fall back to ht->p.min_size (which should never be smaller than HASH_MIN_SIZE). Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14rhashtable: avoid large lock-array allocationsFlorian Westphal
Sander reports following splat after netfilter nat bysrc table got converted to rhashtable: swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..] [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140 [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0 [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110 [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70 [<ffffffff811aec19>] __kmalloc+0x129/0x140 [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0 [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0 [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400 The failure happens when allocating the spinlock array. Even with GFP_KERNEL its unlikely for such a large allocation to succeed. Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition to adding NOWARN for atomic allocations this also makes the bucket-array sizing more conservative. In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"), Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'". IOW, consider size needed by a single spinlock when determining number of locks per cpu. So with 64 byte per cacheline and 4 byte per spinlock this gives 32 locks per cpu. Resulting size of the lock-array (sizeof(spinlock) == 4): cpus: 1 2 4 8 16 32 64 old: 1k 1k 4k 8k 16k 16k 16k new: 128 256 512 1k 2k 4k 8k 8k allocation should have decent chance of success even with GFP_ATOMIC, and should not fail with GFP_KERNEL. With 72-byte spinlock (LOCKDEP): cpus : 1 2 old: 9k 18k new: ~2k ~4k Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08unsafe_[get|put]_user: change interface to use a error target labelLinus Torvalds
When I initially added the unsafe_[get|put]_user() helpers in commit 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses"), I made the mistake of modeling the interface on our traditional __[get|put]_user() functions, which return zero on success, or -EFAULT on failure. That interface is fairly easy to use, but it's actually fairly nasty for good code generation, since it essentially forces the caller to check the error value for each access. In particular, since the error handling is already internally implemented with an exception handler, and we already use "asm goto" for various other things, we could fairly easily make the error cases just jump directly to an error label instead, and avoid the need for explicit checking after each operation. So switch the interface to pass in an error label, rather than checking the error value in the caller. Best do it now before we start growing more users (the signal handling code in particular would be a good place to use the new interface). So rather than if (unsafe_get_user(x, ptr)) ... handle error .. the interface is now unsafe_get_user(x, ptr, label); where an error during the user mode fetch will now just cause a jump to 'label' in the caller. Right now the actual _implementation_ of this all still ends up being a "if (err) goto label", and does not take advantage of any exception label tricks, but for "unsafe_put_user()" in particular it should be fairly straightforward to convert to using the exception table model. Note that "unsafe_get_user()" is much harder to convert to a clever exception table model, because current versions of gcc do not allow the use of "asm goto" (for the exception) with output values (for the actual value to be fetched). But that is hopefully not a limitation in the long term. [ Also note that it might be a good idea to switch unsafe_get_user() to actually _return_ the value it fetches from user space, but this commit only changes the error handling semantics ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08rhashtable-test: Fix max_size parameter descriptionPhil Sutter
Looks like a simple copy'n'paste error. Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-04dynamic_debug: add jump label supportJason Baron
Although dynamic debug is often only used for debug builds, sometimes its enabled for production builds as well. Minimize its impact by using jump labels. This reduces the text section by 7000+ bytes in the kernel image below. It does increase data, but this should only be referenced when changing the direction of the branches, and hence usually not in cache. text data bss dec hex filename 8194852 4879776 925696 14000324 d5a0c4 vmlinux.pre 8187337 4960224 925696 14073257 d6bda9 vmlinux.post Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - the rest of ocfs2 - various hotfixes, mainly MM - quite a bit of misc stuff - drivers, fork, exec, signals, etc. - printk updates - firmware - checkpatch - nilfs2 - more kexec stuff than usual - rapidio updates - w1 things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) ipc: delete "nr_ipc_ns" kcov: allow more fine-grained coverage instrumentation init/Kconfig: add clarification for out-of-tree modules config: add android config fragments init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig relay: add global mode support for buffer-only channels init: allow blacklisting of module_init functions w1:omap_hdq: fix regression w1: add helper macro module_w1_family w1: remove need for ida and use PLATFORM_DEVID_AUTO rapidio/switches: add driver for IDT gen3 switches powerpc/fsl_rio: apply changes for RIO spec rev 3 rapidio: modify for rev.3 specification changes rapidio: change inbound window size type to u64 rapidio/idt_gen2: fix locking warning rapidio: fix error handling in mbox request/release functions rapidio/tsi721_dma: advance queue processing from transfer submit call rapidio/tsi721: add messaging mbox selector parameter rapidio/tsi721: add PCIe MRRS override parameter rapidio/tsi721_dma: add channel mask and queue size parameters ...
2016-08-02kcov: allow more fine-grained coverage instrumentationVegard Nossum
For more targeted fuzzing, it's better to disable kernel-wide instrumentation and instead enable it on a per-subsystem basis. This follows the pattern of UBSAN and allows you to compile in the kcov driver without instrumenting the whole kernel. To instrument a part of the kernel, you can use either # for a single file in the current directory KCOV_INSTRUMENT_filename.o := y or # for all the files in the current directory (excluding subdirectories) KCOV_INSTRUMENT := y or # (same as above) ccflags-y += $(CFLAGS_KCOV) or # for all the files in the current directory (including subdirectories) subdir-ccflags-y += $(CFLAGS_KCOV) Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02crc32: use ktime_get_ns() for measurementArnd Bergmann
The crc32 test function measures the elapsed time in nanoseconds, but uses 'struct timespec' for that. We want to remove timespec from the kernel for y2038 compatibility, and ktime_get_ns() also helps make the code simpler here. It is also slightly better to use monontonic time, as we are only interested in the time difference. Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S . Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02lib/iommu-helper: skip to next segmentSebastian Ott
When a large enough area in the iommu bitmap is found but would span a boundary we continue the search starting from the next bit position. For large allocations this can lead to several useless invocations of bitmap_find_next_zero_area() and iommu_is_span_boundary(). Continue the search from the start of the next segment (which is the next bit position such that we'll not cross the same segment boundary again). Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02ratelimit: extend to print suppressed messages on releaseBorislav Petkov
Extend the ratelimiting facility to print the amount of suppressed lines when it is being released. This use case is aimed at short-termed, burst-like users for which we want to output the suppressed lines stats only once, after it has been disposed of. For an example, see /dev/kmsg usage in a follow-on patch. Also, change the printk() line we issue on release to not use "callbacks" as it is misleading: we're not suppressing callbacks but printk() calls. This has been separated from a previous patch by Linus. Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dave Young <dyoung@redhat.com> Cc: Franck Bui <fbui@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02UBSAN: fix typo in format stringNicolas Iooss
handle_object_size_mismatch() used %pk to format a kernel pointer with pr_err(). This seemed to be a misspelling for %pK, but using this to format a kernel pointer does not make much sence here. Therefore use %p instead, like in handle_missaligned_access(). Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02radix-tree: account nodes to memcg only if explicitly requestedVladimir Davydov
Radix trees may be used not only for storing page cache pages, so unconditionally accounting radix tree nodes to the current memory cgroup is bad: if a radix tree node is used for storing data shared among different cgroups we risk pinning dead memory cgroups forever. So let's only account radix tree nodes if it was explicitly requested by passing __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account page cache entries, so mark mapping->page_tree so. Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup") Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>