summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-23crypto: aesni - Merge avx precompute functionsDave Watson
The precompute functions differ only by the sub-macros they call, merge them to a single macro. Later diffs add more code to fill in the gcm_context_data structure, this allows changes in a single place. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Split AAD hash calculation to separate macroDave Watson
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Add GCM_COMPLETE macroDave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - support 256 byte keys in avx asmDave Watson
Add support for 192/256-bit keys using the avx gcm/aes routines. The sse routines were previously updated in e31ac32d3b (Add support for 192 & 256 bit keys to AESNI RFC4106). Instead of adding an additional loop in the hotpath as in e31ac32d3b, this diff instead generates separate versions of the code using macros, and the entry routines choose which version once. This results in a 5% performance improvement vs. adding a loop to the hot path. This is the same strategy chosen by the intel isa-l_crypto library. The key size checks are removed from the c code where appropriate. Note that this diff depends on using gcm_context_data - 256 bit keys require 16 HashKeys + 15 expanded keys, which is larger than struct crypto_aes_ctx, so they are stored in struct gcm_context_data. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Macro-ify func save/restoreDave Watson
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Introduce gcm_context_dataDave Watson
Add the gcm_context_data structure to the avx asm routines. This will be necessary to support both 256 bit keys and scatter/gather. The pre-computed HashKeys are now stored in the gcm_context_data struct, which is expanded to hold the greater number of hashkeys necessary for avx. Loads and stores to the new struct are always done unlaligned to avoid compiler issues, see e5b954e8 "Use unaligned loads from gcm_context_data" Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Merge GCM_ENC_DECDave Watson
The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they call separate sub-macros. Pass the macros as arguments, and merge them. This facilitates additional refactoring, by requiring changes in only one place. The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs, since it will be used by both AVX and AVX2. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-22can: af_can: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
protocol is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/can/af_can.c:115 can_get_proto() warn: potential spectre issue 'proto_tab' [w] Fix this by sanitizing protocol before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22packet: validate address length if non-zeroWillem de Bruijn
Validate packet socket address length if a length is given. Zero length is equivalent to not setting an address. Fixes: 99137b7888f4 ("packet: validate address length") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22nfc: af_nfc: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
proto is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/nfc/af_nfc.c:42 nfc_sock_create() warn: potential spectre issue 'proto_tab' [w] (local cap) Fix this by sanitizing proto before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22phonet: af_phonet: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
protocol is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/phonet/af_phonet.c:48 phonet_proto_get() warn: potential spectre issue 'proto_tab' [w] (local cap) Fix this by sanitizing protocol before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22net: core: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
flen is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/core/filter.c:1101 bpf_check_classic() warn: potential spectre issue 'filter' [w] Fix this by sanitizing flen before using it to index filter at line 1101: switch (filter[flen - 1].code) { and through pc at line 1040: const struct sock_filter *ftest = &filter[pc]; Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is two simple target fixes and one discard related I/O starvation problem in sd. The discard problem occurs because the discard page doesn't have a mempool backing so if the allocation fails due to memory pressure, we then lose the forward progress we require if the writeout is on the same device. The fix is to back it with a mempool" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: use mempool for discard special page scsi: target: iscsi: cxgbit: add missing spin_lock_init() scsi: target: iscsi: cxgbit: fix csk leak
2018-12-22Merge tag 'compiler-attributes-for-linus-v4.20' of ↵Linus Torvalds
https://github.com/ojeda/linux Pull compiler_types.h fix from Miguel Ojeda: "A cleanup for userspace in compiler_types.h: don't pollute userspace with macro definitions (Xiaozhou Liu) This is harmless for the kernel, but v4.19 was released with a few macros exposed to userspace as the patch explains; which this removes, so it *could* happen that we break something for someone (although leaving inline redefined is probably worse)" * tag 'compiler-attributes-for-linus-v4.20' of https://github.com/ojeda/linux: include/linux/compiler_types.h: don't pollute userspace with macro definitions
2018-12-22Merge tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linuxLinus Torvalds
Pull auxdisplay fix from Miguel Ojeda: "charlcd: fix x/y command parsing (Mans Rullgard)" * tag 'auxdisplay-for-linus-v4.20' of https://github.com/ojeda/linux: auxdisplay: charlcd: fix x/y command parsing
2018-12-22Revert "vfs: Allow userns root to call mknod on owned filesystems."Christian Brauner
This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd. commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.") enabled mknod() in user namespaces for userns root if CAP_MKNOD is available. However, these device nodes are useless since any filesystem mounted from a non-initial user namespace will set the SB_I_NODEV flag on the filesystem. Now, when a device node s created in a non-initial user namespace a call to open() on said device node will fail due to: bool may_open_dev(const struct path *path) { return !(path->mnt->mnt_flags & MNT_NODEV) && !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); } The problem with this is that as of the aforementioned commit mknod() creates partially functional device nodes in non-initial user namespaces. In particular, it has the consequence that as of the aforementioned commit open() will be more privileged with respect to device nodes than mknod(). Before it was the other way around. Specifically, if mknod() succeeded then it was transparent for any userspace application that a fatal error must have occured when open() failed. All of this breaks multiple userspace workloads and a widespread assumption about how to handle mknod(). Basically, all container runtimes and systemd live by the slogan "ask for forgiveness not permission" when running user namespace workloads. For mknod() the assumption is that if the syscall succeeds the device nodes are useable irrespective of whether it succeeds in a non-initial user namespace or not. This logic was chosen explicitly to allow for the glorious day when mknod() will actually be able to create fully functional device nodes in user namespaces. A specific problem people are already running into when running 4.18 rc kernels are failing systemd services. For any distro that is run in a container systemd services started with the PrivateDevices= property set will fail to start since the device nodes in question cannot be opened (cf. the arguments in [1]). Full disclosure, Seth made the very sound argument that it is already possible to end up with partially functional device nodes. Any filesystem mounted with MS_NODEV set will allow mknod() to succeed but will not allow open() to succeed. The difference to the case here is that the MS_NODEV case is transparent to userspace since it is an explicitly set mount option while the SB_I_NODEV case is an implicit property enforced by the kernel and hence opaque to userspace. [1]: https://github.com/systemd/systemd/pull/9483 Signed-off-by: Christian Brauner <christian@brauner.io> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP ↵Sai Praneeth Prakhya
and EFI_MIXED_MODE The following commit: d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd") forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and EFI_MIXED_MODE: - EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir using ioremap() and init_memory_mapping(). This feature can be enabled by passing "efi=old_map" as kernel command line argument. But, efi_unmap_pages() unmaps EFI boot services code/data regions *only* from efi_pgd and hence cannot be used for unmapping EFI boot services code/data regions from swapper_pg_dir. Introduce a temporary fix to not unmap EFI boot services code/data regions when EFI_OLD_MEMMAP is enabled while working on a real fix. - EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a 64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE, all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY, EFI_LOADER_<CODE/DATA>, EFI_BOOT_SERVICES_<CODE/DATA> and EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to facilitate EFI runtime calls access it's arguments in 1:1 mode. Hence, don't unmap EFI boot services code/data regions when booted in mixed mode. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prakhya@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-22dma-mapping: fix flags in dma_alloc_wcChristoph Hellwig
We really need the writecombine flag in dma_alloc_wc, fix a stupid oversight. Fixes: 7ed1d91a9e ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22powerpc/zImage: Also check for stdout-pathOliver O'Halloran
The /chosen/linux,stdout-path is "deprecated" in favour of /chosen/stdout-path so we should be checking for both. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=yBenjamin Herrenschmidt
HMIs will crash the kernel due to BRANCH_LINK_TO_FAR(hmi_exception_realmode) Calling into the OPD instead of the actual code. Fixes: 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Use DOTSYM() rather than #ifdef] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22macintosh: Use of_node_name_{eq, prefix} for node name comparisonsRob Herring
Convert string compares of DT node names to use of_node_name_{eq,prefix} helpers instead. This removes direct access to the node name pointer. This changes a single case insensitive node name comparison to case sensitive for "ata4". This is the only instance of a case insensitive comparison for all the open coded node name comparisons on powerpc. Searching the commit history, there doesn't appear to be any reason for it to be case insensitive. A couple of open coded iterating thru the child node names are converted to use for_each_child_of_node() instead. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22ide: Use of_node_name_eq for node name comparisonsRob Herring
Convert string compares of DT node names to use of_node_name_eq helper instead. This removes direct access to the node name pointer. Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-ide@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc: Use of_node_name_eq for node name comparisonsRob Herring
Convert string compares of DT node names to use of_node_name_eq helper instead. This removes direct access to the node name pointer. A couple of open coded iterating thru the child node names are converted to use for_each_child_of_node() instead. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc/pseries/pmem: Convert to %pOFn instead of device_node.nameRob Herring
In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. pmem.c was recently added and missed the initial conversion. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc/mm: Remove very old comment in hash-4k.hMichael Ellerman
This comment talks about PTEs being 64-bits and PMD/PGD being 32-bits, but that hasn't been true since 2005 when David Gibson implemented 4-level page tables in the commit titled "Four level pagetables for ppc64". Remove it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-22powerpc/pseries: Fix node leak in update_lmb_associativity_index()Michael Ellerman
In update_lmb_associativity_index() we lookup dr_node using of_find_node_by_path() which takes a reference for us. In the non-error case we forget to drop the reference. Note that find_aa_index() does modify properties of the node, but doesn't need an extra reference held once it's returned. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/configs/85xx: Enable CONFIG_DEBUG_KERNELScott Wood
This is required for CONFIG_DEBUG_INFO to work. Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21powerpc/dts/fsl: Fix dtc-flagged interrupt errorsScott Wood
mpc8641_hpcn was updated to 4-cell interrupt specifiers, but PCI interrupt-map was not updated. It was also missing #interrupt-cells on the outer PCI buses. p1020rdb-pc was updated to 4-cell interrupt specifiers, but the ethernet-phy nodes weren't updated. mpc832x_rdb had an invalid "interrupts = <0>" on the ethernet-phy nodes. Besides being the wrong number of cells, 0 is not a valid IPIC interrupt according to ipic.c. Presumably it was meant to indicate that these PHYs are not connected to an interrupt. Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21clk: qoriq: add more compatibles stringsYuantian Tang
Add more SoC compatible strings to support more chips. Signed-off-by: Yuantian Tang <andy.tang@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21powerpc/fsl: Use new clockgen bindingScott Wood
The driver retains compatibility with old device trees, but we don't want the old nodes lying around to be copied, or used as a reference (some of the mux options are incorrect), or even just being clutter. Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: Tang Yuantian <andy.tang@nxp.com> [scottwood: removed sysclk node added by Andy] Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21powerpc/83xx: handle machine check caused by watchdog timerChristophe Leroy
When the watchdog timer is set in interrupt mode, it causes a machine check when it times out. The purpose of this mode is to ease debugging, not to crash the kernel and reboot the machine. This patch implements a special handling for that, in order to not crash the kernel if the watchdog times out while in interrupt or within the idle task. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: added missing #include] Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved"Alexandre Belloni
Fix a spelling mistake in a register description. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21powerpc/fsl_pci: simplify fsl_pci_dma_set_maskChristoph Hellwig
swiotlb will only bounce buffer when the effective dma address for the device is smaller than the actual DMA range. Instead of flipping between the swiotlb and nommu ops for FSL SOCs that have the second outbound window just don't set the bus dma_mask in this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21arch/powerpc/fsl_rmu: Use dma_zalloc_coherentSabyasachi Gupta
Replaced dma_alloc_coherent + memset with dma_zalloc_coherent Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-12-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, page_alloc: fix has_unmovable_pages for HugePages fork,memcg: fix crash in free_thread_stack on memcg charge fail mm: thp: fix flags for pmd migration when split mm, memory_hotplug: initialize struct pages for the full memory section
2018-12-21mm, page_alloc: fix has_unmovable_pages for HugePagesOscar Salvador
While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:has_unmovable_pages+0x154/0x210 Call Trace: is_mem_section_removable+0x7d/0x100 removable_show+0x90/0xb0 dev_attr_show+0x1c/0x50 sysfs_kf_seq_show+0xca/0x1b0 seq_read+0x133/0x380 __vfs_read+0x26/0x180 vfs_read+0x89/0x140 ksys_read+0x42/0x90 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21fork,memcg: fix crash in free_thread_stack on memcg charge failRik van Riel
Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will result in fork failing if allocating a kernel stack for a task in dup_task_struct exceeds the kernel memory allowance for that cgroup. Unfortunately, it also results in a crash. This is due to the code jumping to free_stack and calling free_thread_stack when the memcg kernel stack charge fails, but without tsk->stack pointing at the freshly allocated stack. This in turn results in the vfree_atomic in free_thread_stack oopsing with a backtrace like this: #5 [ffffc900244efc88] die at ffffffff8101f0ab #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86 #7 [ffffc900244efce0] general_protection at ffffffff818ff082 [exception RIP: llist_add_batch+7] RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000 RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0 R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37 #10 [ffffc900244efe98] _do_fork at ffffffff810884e0 #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43 RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948 RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421 RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040 R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018 ORIG_RAX: 000000000000003a CS: 0033 SS: 002b The simplest fix is to assign tsk->stack right where it is allocated. Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21mm: thp: fix flags for pmd migration when splitPeter Xu
When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up. Since at it, drop the ifdef together as not needed. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21mm, memory_hotplug: initialize struct pages for the full memory sectionMikhail Zaslonko
If memory end is not aligned with the sparse memory section boundary, the mapping of such a section is only partly initialized. This may lead to VM_BUG_ON due to uninitialized struct page access from is_mem_section_removable() or test_pages_in_a_zone() function triggered by memory_hotplug sysfs handlers: Here are the the panic examples: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y kernel parameter mem=2050M -------------------------- page:000003d082008000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( test_pages_in_a_zone+0xde/0x160) show_valid_zones+0x5c/0x190 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: test_pages_in_a_zone+0xde/0x160 Kernel panic - not syncing: Fatal exception: panic_on_oops kernel parameter mem=3075M -------------------------- page:000003d08300c000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( is_mem_section_removable+0xb4/0x190) show_mem_removable+0x9a/0xd8 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: is_mem_section_removable+0xb4/0x190 Kernel panic - not syncing: Fatal exception: panic_on_oops Fix the problem by initializing the last memory section of each zone in memmap_init_zone() till the very end, even if it goes beyond the zone end. Michal said: : This has alwways been problem AFAIU. It just went unnoticed because we : have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop : zeroing memory during allocation in vmemmap") and so the above test : would simply skip these ranges as belonging to zone 0 or provided a : garbage. : : So I guess we do care for post f7f99100d8d9 kernels mostly and : therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during : allocation in vmemmap") Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: "Just some small fixes here and there, and a refcount leak in a serial driver, nothing serious" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: serial/sunsu: fix refcount leak sparc: Set "ARCH: sunxx" information on the same line sparc: vdso: Drop implicit common-page-size linker flag
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull more networking fixes from David Miller: "Some more bug fixes have trickled in, we have: 1) Local MAC entries properly in mscc driver, from Allan W. Nielsen. 2) Eric Dumazet found some more of the typical "pskb_may_pull() --> oops forgot to reload the header pointer" bugs in ipv6 tunnel handling. 3) Bad SKB socket pointer in ipv6 fragmentation handling, from Herbert Xu. 4) Overflow fix in sk_msg_clone(), from Vakul Garg. 5) Validate address lengths in AF_PACKET, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup qmi_wwan: Add support for Fibocom NL678 series tls: Do not call sk_memcopy_from_iter with zero length ipv6: tunnels: fix two use-after-free Prevent overflow of sk_msg in sk_msg_clone() packet: validate address length net: netxen: fix a missing check and an uninitialized use tcp: fix a race in inet_diag_dump_icsk() MAINTAINERS: update cxgb4 and cxgb3 maintainer ipv6: frags: Fix bogus skb->sk in reassembled packets mscc: Configured MAC entries should be locked.
2018-12-21auxdisplay: charlcd: fix x/y command parsingMans Rullgard
The x/y command parsing has been broken since commit 129957069e6a ("staging: panel: Fixed checkpatch warning about simple_strtoul()"). Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of x/y commands") fixed some problems by rewriting the parsing code, but also broke things further by removing the check for a complete command before attempting to parse it. As a result, parsing is terminated at the first x or y character. This reinstates the check for a final semicolon. Whereas the original code use strchr(), this is wasteful seeing as the semicolon is always at the end of the buffer. Thus check this character directly instead. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-12-21serial/sunsu: fix refcount leakYangtao Li
The function of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller. su_get_type() doesn't do that. The match node are used as an identifier to compare against the current node, so we can directly drop the refcount after getting the node from the path as it is not used as pointer. Fix this by use a single variable and drop the refcount right after of_find_node_by_path(). Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21sparc: Set "ARCH: sunxx" information on the same lineCorentin Labbe
While checking boot log from SPARC qemu, I saw that the "ARCH: sunxx" information was split on two different line. This patchs merge both line together. In the meantime, thoses information need to be printed via pr_info since printk print them by default via the warning loglevel. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21sparc: vdso: Drop implicit common-page-size linker flagndesaulniers@google.com
GNU linker's -z common-page-size's default value is based on the target architecture. arch/sparc/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it. Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fix from Paolo Bonzini: "A simple patch for a pretty bad bug: Unbreak AMD nested virtualization." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: fix switch to guest mmu
2018-12-21qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixupDaniele Palmas
This patch fixes qmap header retrieval when modem is configured for dl data aggregation. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a division by zero crash in the posix-timers code" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Fix division by zero bug
2018-12-21qmi_wwan: Add support for Fibocom NL678 seriesJörgen Storvist
Added support for Fibocom NL678 series cellular module QMI interface. Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x40 series chipsets. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>