summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2016-12-12lib: radix-tree: add entry deletion support to __radix_tree_replace()Johannes Weiner
Page cache shadow entry handling will be a lot simpler when it can use a single generic replacement function for pages, shadow entries, and emptying slots. Make __radix_tree_replace() properly account insertions and deletions in node->count and garbage collect nodes as they become empty. Then re-implement radix_tree_delete() on top of it. Link: http://lkml.kernel.org/r/20161117193058.GC23430@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12lib: radix-tree: check accounting of existing slot replacement usersJohannes Weiner
The bug in khugepaged fixed earlier in this series shows that radix tree slot replacement is fragile; and it will become more so when not only NULL<->!NULL transitions need to be caught but transitions from and to exceptional entries as well. We need checks. Re-implement radix_tree_replace_slot() on top of the sanity-checked __radix_tree_replace(). This requires existing callers to also pass the radix tree root, but it'll warn us when somebody replaces slots with contents that need proper accounting (transitions between NULL entries, real entries, exceptional entries) and where a replacement through the slot pointer would corrupt the radix tree node counts. Link: http://lkml.kernel.org/r/20161117193021.GB23430@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12lib: radix-tree: native accounting of exceptional entriesJohannes Weiner
The way the page cache is sneaking shadow entries of evicted pages into the radix tree past the node entry accounting and tracking them manually in the upper bits of node->count is fraught with problems. These shadow entries are marked in the tree as exceptional entries, which are a native concept to the radix tree. Maintain an explicit counter of exceptional entries in the radix tree node. Subsequent patches will switch shadow entry tracking over to that counter. DAX and shmem are the other users of exceptional entries. Since slot replacements that change the entry type from regular to exceptional must now be accounted, introduce a __radix_tree_replace() function that does replacement and accounting, and switch DAX and shmem over. The increase in radix tree node size is temporary. A followup patch switches the shadow tracking to this new scheme and we'll no longer need the upper bits in node->count and shrink that back to one byte. Link: http://lkml.kernel.org/r/20161117192945.GA23430@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...
2016-12-12Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this development cycle were: - Miscellaneous fixes, including a change to call_rcu()'s rcu_head alignment check. - Security-motivated list consistency checks, which are disabled by default behind DEBUG_LIST. - Torture-test updates. - Documentation updates, yet again just simple changes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: torture: Prevent jitter from delaying build-only runs torture: Remove obsolete files from rcutorture .gitignore rcu: Don't kick unless grace period or request rcu: Make expedited grace periods recheck dyntick idle state torture: Trace long read-side delays rcu: RCU_TRACE enables event tracing as well as debugfs rcu: Remove obsolete comment from __call_rcu() rcu: Remove obsolete rcu_check_callbacks() header comment rcu: Tighten up __call_rcu() rcu_head alignment check Documentation/RCU: Fix minor typo documentation: Present updated RCU guarantee bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTION lib/Kconfig.debug: Fix typo in select statement lkdtm: Add tests for struct list corruption bug: Provide toggle for BUG on data corruption list: Split list_del() debug checking into separate function rculist: Consolidate DEBUG_LIST for list_add_rcu() list: Split list_add() debug checking into separate function
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-09Revert "radix tree test suite: fix compilation"Linus Torvalds
This reverts commit 53855d10f4567a0577360b6448d52a863929775b. It shouldn't have come in yet - it depends on the changes in linux-next that will come in during the next merge window. As Matthew Wilcox says, the test suite is broken with the current state without the revert. Requested-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07radix tree test suite: fix compilationMatthew Wilcox
Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test suite as it adds a call to cpuhp_setup_state_nocalls() which is not currently emulated in the test suite. Add it, and delete the emulation of the old CPU hotplug mechanism. Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-07Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two rtmutex race fixes (which miraculously never triggered, that we know of), plus two lockdep printk formatting regression fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix report formatting locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() locking/rtmutex: Prevent dequeue vs. unlock race locking/selftest: Fix output since KERN_CONT changes
2016-12-06parser: add u64 number parserJames Smart
Will be used by the nvme-fabrics FC transport in parsing options Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-12-05[iov_iter] new primitives - copy_from_iter_full() and friendsAl Viro
copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge branch 'locking/urgent' into locking/core, to pick up dependent fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-30kasan: support use-after-scope detectionDmitry Vyukov
Gcc revision 241896 implements use-after-scope detection. Will be available in gcc 7. Support it in KASAN. Gcc emits 2 new callbacks to poison/unpoison large stack objects when they go in/out of scope. Implement the callbacks and add a test. [dvyukov@google.com: v3] Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-30lib/debugobjects: export for use in modulesChris Wilson
Drivers, or other modules, that use a mixture of objects (especially objects embedded within other objects) would like to take advantage of the debugobjects facilities to help catch misuse. Currently, the debugobjects interface is only available to builtin drivers and requires a set of EXPORT_SYMBOL_GPL for use by modules. I am using the debugobjects in i915.ko to try and catch some invalid operations on embedded objects. The problem currently only presents itself across module unload so forcing i915 to be builtin is not an option. Link: http://lkml.kernel.org/r/20161122143039.6433-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Du, Changbin" <changbin.du@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25locking/selftest: Fix output since KERN_CONT changesMichael Ellerman
Since the KERN_CONT changes the locking-selftest output is messed up, eg: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Use pr_cont() to get it looking normal again: ---------------------------------------------------------------------------- | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------------------------------------------------- A-A deadlock: ok | ok | ok | ok | ok | ok | Reported-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-25mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]Andrey Ryabinin
This fixes CVE-2016-8650. If mpi_powm() is given a zero exponent, it wants to immediately return either 1 or 0, depending on the modulus. However, if the result was initalised with zero limb space, no limbs space is allocated and a NULL-pointer exception ensues. Fix this by allocating a minimal amount of limb space for the result when the 0-exponent case when the result is 1 and not touching the limb space when the result is 0. This affects the use of RSA keys and X.509 certificates that carry them. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 PGD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804011944c0 task.stack: ffff880401294000 RIP: 0010:[<ffffffff8138ce5d>] [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP: 0018:ffff880401297ad8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0 RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0 RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000 R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50 FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0 Stack: ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8 Call Trace: [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e [<ffffffff81001c2b>] do_syscall_64+0x80/0x191 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f RIP [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP <ffff880401297ad8> CR2: 0000000000000000 ---[ end trace d82015255d4a5d8d ]--- Basically, this is a backport of a libgcrypt patch: http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526 Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> cc: linux-ima-devel@lists.sourceforge.net cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-23Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates, yet again just simple changes. - Miscellaneous fixes, including a change to call_rcu()'s rcu_head alignment check. - Security-motivated list consistency checks, which are disabled by default behind DEBUG_LIST. - Torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All conflicts were simple overlapping changes except perhaps for the Thunder driver. That driver has a change_mtu method explicitly for sending a message to the hardware. If that fails it returns an error. Normally a driver doesn't need an ndo_change_mtu method becuase those are usually just range changes, which are now handled generically. But since this extra operation is needed in the Thunder driver, it has to stay. However, if the message send fails we have to restore the original MTU before the change because the entire call chain expects that if an error is thrown by ndo_change_mtu then the MTU did not change. Therefore code is added to nicvf_change_mtu to remember the original MTU, and to restore it upon nicvf_update_hw_max_frs() failue. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: 1) With modern networking cards we can run out of 32-bit DMA space, so support 64-bit DMA addressing when possible on sparc64. From Dave Tushar. 2) Some signal frame validation checks are inverted on sparc32, fix from Andreas Larsson. 3) Lockdep tables can get too large in some circumstances on sparc64, add a way to adjust the size a bit. From Babu Moger. 4) Fix NUMA node probing on some sun4v systems, from Thomas Tai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: drop duplicate header scatterlist.h lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc sunbmac: Fix compiler warning sunqe: Fix compiler warnings sparc64: Enable 64-bit DMA sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Initialize iommu_map_table and iommu_pool sparc64: Add ATU (new IOMMU) support sparc64: Add FORCE_MAX_ZONEORDER and default to 13 sparc64: fix compile warning section mismatch in find_node() sparc32: Fix inverted invalid_frame_pointer checks on sigreturns sparc64: Fix find_node warning if numa node cannot be found
2016-11-19netlink: smaller nla_attr_minlen tableAlexey Dobriyan
Length of a netlink attribute may be u16 but lengths of basic attributes are much smaller, so small we can save 16 bytes of .rodata and pocket change inside .text. 16-bit is worse on x86-64 than 8-bit because of operand size override prefix. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19) function old new delta validate_nla 418 417 -1 nla_policy_len 66 64 -2 nla_attr_minlen 32 16 -16 Total: Before=154865051, After=154865032, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18Merge tag 'v4.9-rc4' into soundJonathan Corbet
Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparcBabu Moger
This new config parameter limits the space used for "Lock debugging: prove locking correctness" by about 4MB. The current sparc systems have the limitation of 32MB size for kernel size including .text, .data and .bss sections. With PROVE_LOCKING feature, the kernel size could grow beyond this limit and causing system boot-up issues. With this option, kernel limits the size of the entries of lock_chains, stack_trace etc., so that kernel fits in required size limit. This is not visible to user and only used for sparc. Signed-off-by: Babu Moger <babu.moger@oracle.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17fix iov_iter_advance() for ITER_PIPEAbhi Das
iov_iter_advance() needs to decrement iter->count by the number of bytes we'd moved beyond. Normal flavours do that, but ITER_PIPE doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files ends up with a bogus fallback to page cache read, resulting in incorrect values for file offset and bytes read. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-16locking/core: Remove cpu_relax_lowlatency() usersChristian Borntraeger
With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them with cpu_relax(). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-14Merge branches 'doc.2016.11.14a', 'fixes.2016.11.14a', 'list.2016.10.31a' ↵Paul E. McKenney
and 'torture.2016.11.14a' into HEAD doc.2016.11.14a: Documentation changes fixes.2016.11.14aneous fixes list.2016.10.31a: List updates torture.2016.11.14a: Torture-test updates
2016-11-14rcu: RCU_TRACE enables event tracing as well as debugfsNikolay Borisov
The commit brings the RCU_TRACE Kconfig option's help text up to date by noting that it enables additional event tracing as well as debugfs. Signed-off-by: Nikolay Borisov <kernel@kyup.com> [ paulmck: Do some wordsmithing. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2016-11-11lib/stackdepot: export save/fetch stack for driversChris Wilson
Some drivers would like to record stacktraces in order to aide leak tracing. As stackdepot already provides a facility for only storing the unique traces, thereby reducing the memory required, export that functionality for use by drivers. The code was originally created for KASAN and moved under lib in commit cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") so that it could be shared with mm/. In turn, we want to share it now with drivers. Link: http://lkml.kernel.org/r/20161108133209.22704-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-10swiotlb: Minor fix-ups for DMA_ATTR_SKIP_CPU_SYNC supportAlexander Duyck
I am updating the paths so that instead of trying to pass "attr | DMA_ATTR_SKIP_CPU_SYNC" we instead just OR the value into attr and then pass it since attr will not be used after we make the unmap call. I realized there was one spot I had missed when I was applying the DMA attribute to the DMA mapping exception handling. This change corrects that. Finally it looks like there is a stray blank line at the end of the swiotlb_unmap_sg_attrs function that can be dropped. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-09lib/radix-tree: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161103145021.28528-6-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09lib/percpu_counter: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161103145021.28528-5-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07lib/raid6: Add AVX2 optimized xor_syndrome functionsGayatri Kammela
Implement the AVX2 optimization of RAID6 xor_syndrome functions which is simply based on sse2.c written by hpa. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNCAlexander Duyck
As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures beyond just ARM I need to make it so that the swiotlb will respect the flag. In order to do that I also need to update the swiotlb-xen since it heavily makes use of the functionality. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-07swiotlb: Drop unused functions swiotlb_map_sg and swiotlb_unmap_sgAlexander Duyck
There are no users for swiotlb_map_sg or swiotlb_unmap_sg so we might as well just drop them. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-07swiotlb: Rate-limit printing when running out of SW-IOMMU spaceGeert Uytterhoeven
If the system runs out of SW-IOMMU space, changes are high successive requests will fail, too, flooding the kernel log. This is true especially for streaming DMA, which is typically used repeatedly outside the driver's initialization routine. Add rate-limiting to fix this. While at it, get rid of the open-coded dev_name() handling by using the appropriate dev_err_*() variant. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2016-11-01block,fs: untangle fs.h and blk_types.hChristoph Hellwig
Nothing in fs.h should require blk_types.h to be included. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-31bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTIONArnd Bergmann
The CONFIG_DEBUG_LIST option is normally meant for kernel developers rather than production machines and is guarded by CONFIG_DEBUG_KERNEL. In contrast, the newly added CONFIG_BUG_ON_DATA_CORRUPTION is meant for security hardening and may be used on systems that intentionally do not enable CONFIG_DEBUG_KERNEL. In this configuration, we get a warning from Kconfig about the mismatched dependencies: warning: (BUG_ON_DATA_CORRUPTION) selects DEBUG_LIST which has unmet direct dependencies (DEBUG_KERNEL) This annotates the DEBUG_LIST option to be selectable by BUG_ON_DATA_CORRUPTION when DEBUG_KERNEL is disabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 40cd725cfc7f ("bug: Provide toggle for BUG on data corruption") Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-10-31lib/Kconfig.debug: Fix typo in select statementValentin Rothberg
Commit 484f29c7430b3 ("bug: Provide toggle for BUG on data corruption") added a Kconfig select statement on CONFIG_DEBUG_LIST, but the CONFIG_ prefix is only used in Make and C(PP) syntax. Remove the CONFIG_ prefix to correctly select the Kconfig option DEBUG_LIST. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Kees Cook <keescook@chromium.org>
2016-10-31bug: Provide toggle for BUG on data corruptionKees Cook
The kernel checks for cases of data structure corruption under some CONFIGs (e.g. CONFIG_DEBUG_LIST). When corruption is detected, some systems may want to BUG() immediately instead of letting the system run with known corruption. Usually these kinds of manipulation primitives can be used by security flaws to gain arbitrary memory write control. This provides a new config CONFIG_BUG_ON_DATA_CORRUPTION and a corresponding macro CHECK_DATA_CORRUPTION for handling these situations. Notably, even if not BUGing, the kernel should not continue processing the corrupted structure. This is inspired by similar hardening by Syed Rameez Mustafa in MSM kernels, and in PaX and Grsecurity, which is likely in response to earlier removal of the BUG calls in commit 924d9addb9b1 ("list debugging: use WARN() instead of BUG()"). Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31list: Split list_del() debug checking into separate functionKees Cook
Similar to the list_add() debug consolidation, this commit consolidates the debug checking performed during CONFIG_DEBUG_LIST into a new __list_del_entry_valid() function, and stops list updates when corruption is found. Refactored from same hardening in PaX and Grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31rculist: Consolidate DEBUG_LIST for list_add_rcu()Kees Cook
This commit consolidates the debug checking for list_add_rcu() into the new single __list_add_valid() debug function. Notably, this commit fixes the sanity check that was added in commit 17a801f4bfeb ("list_debug: WARN for adding something already in the list"), which wasn't checking RCU-protected lists. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com>
2016-10-31list: Split list_add() debug checking into separate functionKees Cook
Right now, __list_add() code is repeated either in list.h or in list_debug.c, but the only differences between the two versions are the debug checks. This commit therefore extracts these debug checks into a separate __list_add_valid() function and consolidates __list_add(). Additionally this new __list_add_valid() function will stop list manipulations if a corruption is detected, instead of allowing for further corruption that may lead to even worse conditions. This is slight refactoring of the same hardening done in PaX and Grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com>
2016-10-30Merge 4.9-rc3 into driver-core-nextGreg Kroah-Hartman
We want the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, mostly drivers as is usually the case. 1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey Khoroshilov. 2) Fix element timeouts in netfilter's nft_dynset, from Anders K. Pedersen. 3) Don't put aead_req crypto struct on the stack in mac80211, from Ard Biesheuvel. 4) Several uninitialized variable warning fixes from Arnd Bergmann. 5) Fix memory leak in cxgb4, from Colin Ian King. 6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann. 7) Several VRF semantic fixes from David Ahern. 8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper. 9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet. 10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev. 11) Fix stale link state during failover in NCSCI driver, from Gavin Shan. 12) Fix netdev lower adjacency list traversal, from Ido Schimmel. 13) Propvide proper handle when emitting notifications of filter deletes, from Jamal Hadi Salim. 14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen. 15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac. 16) Several routing offload fixes in mlxsw driver, from Jiri Pirko. 17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy. 18) Validate chunk len before using it in SCTP, from Marcelo Ricardo Leitner. 19) Revert a netns locking change that causes regressions, from Paul Moore. 20) Add recursion limit to GRO handling, from Sabrina Dubroca. 21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon. 22) Avoid accessing stale vxlan/geneve socket in data path, from Pravin Shelar" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits) geneve: avoid using stale geneve socket. vxlan: avoid using stale vxlan socket. qede: Fix out-of-bound fastpath memory access net: phy: dp83848: add dp83822 PHY support enic: fix rq disable tipc: fix broadcast link synchronization problem ibmvnic: Fix missing brackets in init_sub_crq_irqs ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context" arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold net/mlx4_en: Save slave ethtool stats command net/mlx4_en: Fix potential deadlock in port statistics flow net/mlx4: Fix firmware command timeout during interrupt test net/mlx4_core: Do not access comm channel if it has not yet been initialized net/mlx4_en: Fix panic during reboot net/mlx4_en: Process all completions in RX rings after port goes up net/mlx4_en: Resolve dividing by zero in 32-bit system net/mlx4_core: Change the default value of enable_qos net/mlx4_core: Avoid setting ports to auto when only one port type is supported net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec ...
2016-10-28kobject: improve function-level documentationJulia Lawall
In the first case, rename the second variable to correspond to the name found in the function parameter list. In the remaining cases, reorder the variables to correspond to their order in the parameter list. Issue detected using Coccinelle (http://coccinelle.lip6.fr/) Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-27lib/genalloc.c: start search from start of chunkDaniel Mentz
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MBDmitry Vyukov
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls. Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on second level). Size of each stack is (num_frames + 3) * sizeof(long). Which gives us ~84K stacks. This capacity was chosen empirically and it is enough to run kernel normally. However, when lots of configs are enabled and a fuzzer tries to maximize code coverage, it easily hits the limit within tens of minutes. I've tested for long a time with number of top level entries bumped 4x (4096). And I think I've seen overflow only once. But I don't have all configs enabled and code coverage has not reached maximum yet. So bump it 8x to 8192. Since we have two-level table, memory cost of this is very moderate -- currently the top-level table is 8KB, with this patch it is 64KB, which is negligible under KASAN. Here is some approx math. 128MB allows us to memorize ~670K stacks (assuming stack is ~200b). I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free| kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of alloc/free call sites are reachable with only one stack. But some utility functions can have large fanout. Assuming average fanout is 5x, total number of alloc/free stacks is ~300K. Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>