summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-21get_maintainer: remove uses of P: for maintainer nameJoe Perches
Commit 1ca84ed6425f ("MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile") changed the use of the "P:" tag from "Person" to "Profile (ie: special subsystem coding styles and characteristics)" Change how get_maintainer.pl parses the "P:" tag to match. Link: http://lkml.kernel.org/r/ca53823fc5d25c0be32ad937d0207a0589c08643.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Dan Williams <dan.j.william@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21selftests/vm: add missed tests in run_vmtestsSeongJae Park
The commits introducing 'mlock-random-test'[1], 'map_fiex_noreplace'[2], and 'thuge-gen'[3] have not added those in the 'run_vmtests' script and thus the 'run_tests' command of kselftests doesn't run those. This commit adds those in the script. 'gup_benchmark' and 'transhuge-stress' are also not included in the 'run_vmtests', but this commit does not add those because those are for performance measurement rather than pass/fail tests. [1] commit 26b4224d9961 ("selftests: expanding more mlock selftest") [2] commit 91cbacc34512 ("tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE") [3] commit fcc1f2d5dd34 ("selftests: add a test program for variable huge page sizes in mmap/shmget") Link: http://lkml.kernel.org/r/20200206085144.29126-1-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swapChristian Borntraeger
QEMU has a funny new build error message when I use the upstream kernel headers: CC block/file-posix.o In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4, from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29, from /home/cborntra/REPOS/qemu/include/block/accounting.h:28, from /home/cborntra/REPOS/qemu/include/block/block_int.h:27, from /home/cborntra/REPOS/qemu/block/file-posix.c:30: /usr/include/linux/swab.h: In function `__swab': /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef] 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^~~~~~ /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "(" 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^ cc1: all warnings being treated as errors make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1 rm tests/qemu-iotests/socket_scm_helper.o This was triggered by commit d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h"). That patch is doing #include <asm/bitsperlong.h> but it uses BITS_PER_LONG. The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG. Let us use the __ variant in swap.h Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com Fixes: d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Allison Randal <allison@lohutok.net> Cc: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"Ioanna Alifieraki
This reverts commit a97955844807e327df11aa33869009d14d6b7de0. Commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") removes a lock that is needed. This leads to a process looping infinitely in exit_sem() and can also lead to a crash. There is a reproducer available in [1] and with the commit reverted the issue does not reproduce anymore. Using the reproducer found in [1] is fairly easy to reach a point where one of the child processes is looping infinitely in exit_sem between for(;;) and if (semid == -1) block, while it's trying to free its last sem_undo structure which has already been freed by freeary(). Each sem_undo struct is on two lists: one per semaphore set (list_id) and one per process (list_proc). The list_id list tracks undos by semaphore set, and the list_proc by process. Undo structures are removed either by freeary() or by exit_sem(). The freeary function is invoked when the user invokes a syscall to remove a semaphore set. During this operation freeary() traverses the list_id associated with the semaphore set and removes the undo structures from both the list_id and list_proc lists. For this case, exit_sem() is called at process exit. Each process contains a struct sem_undo_list (referred to as "ulp") which contains the head for the list_proc list. When the process exits, exit_sem() traverses this list to remove each sem_undo struct. As in freeary(), whenever a sem_undo struct is removed from list_proc, it is also removed from the list_id list. Removing elements from list_id is safe for both exit_sem() and freeary() due to sem_lock(). Removing elements from list_proc is not safe; freeary() locks &un->ulp->lock when it performs list_del_rcu(&un->list_proc) but exit_sem() does not (locking was removed by commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"). This can result in the following situation while executing the reproducer [1] : Consider a child process in exit_sem() and the parent in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)). - The list_proc for the child contains the last two undo structs A and B (the rest have been removed either by exit_sem() or freeary()). - The semid for A is 1 and semid for B is 2. - exit_sem() removes A and at the same time freeary() removes B. - Since A and B have different semid sem_lock() will acquire different locks for each process and both can proceed. The bug is that they remove A and B from the same list_proc at the same time because only freeary() acquires the ulp lock. When exit_sem() removes A it makes ulp->list_proc.next to point at B and at the same time freeary() removes B setting B->semid=-1. At the next iteration of for(;;) loop exit_sem() will try to remove B. The only way to break from for(;;) is for (&un->list_proc == &ulp->list_proc) to be true which is not. Then exit_sem() will check if B->semid=-1 which is and will continue looping in for(;;) until the memory for B is reallocated and the value at B->semid is changed. At that point, exit_sem() will crash attempting to unlink B from the lists (this can be easily triggered by running the reproducer [1] a second time). To prove this scenario instrumentation was added to keep information about each sem_undo (un) struct that is removed per process and per semaphore set (sma). CPU0 CPU1 [caller holds sem_lock(sma for A)] ... freeary() exit_sem() ... ... ... sem_lock(sma for B) spin_lock(A->ulp->lock) ... list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc) Undo structures A and B have different semid and sem_lock() operations proceed. However they belong to the same list_proc list and they are removed at the same time. This results into ulp->list_proc.next pointing to the address of B which is already removed. After reverting commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") the issue was no longer reproducible. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779 Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com Fixes: a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Herton R. Krzesinski <herton@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <malat@debian.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: hide timeval/timespec/itimerval/itimerspec typesArnd Bergmann
There are no in-kernel users remaining, but there may still be users that include linux/time.h instead of sys/time.h from user space, so leave the types available to user space while hiding them from kernel space. Only the __kernel_old_* versions of these types remain now. Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove unused time32 interfacesArnd Bergmann
No users remain, so kill these off before we grow new ones. Link: http://lkml.kernel.org/r/20200110154232.4104492-3-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove ktime to/from timespec/timeval conversionArnd Bergmann
A couple of helpers are now obsolete and can be removed, so drivers can no longer start using them and instead use y2038-safe interfaces. Link: http://lkml.kernel.org/r/20200110154232.4104492-2-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()Rafael J. Wysocki
Commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") overlooked the fact that fixed events can wake up the system too and broke RTC wakeup from suspend-to-idle as a result. Fix this issue by checking the fixed events in acpi_s2idle_wake() in addition to checking wakeup GPEs and break out of the suspend-to-idle loop if the status bits of any enabled fixed events are set then. Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21hwmon: (w83627ehf) Fix crash seen with W83627DHG-PGuenter Roeck
Loading the driver on a system with W83627DHG-P crashes as follows. w83627ehf: Found W83627DHG-P chip at 0x290 BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 604 Comm: sensors Not tainted 5.6.0-rc2-00055-gca7e1fd1026c #29 Hardware name: /D425KT, BIOS MWPNT10N.86A.0132.2013.0726.1534 07/26/2013 RIP: 0010:w83627ehf_read_string+0x27/0x70 [w83627ehf] Code: [... ] RSP: 0018:ffffb95980657df8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff96caaa7f5218 RCX: 0000000000000000 RDX: 0000000000000015 RSI: 0000000000000001 RDI: ffff96caa736ec08 RBP: 0000000000000000 R08: ffffb95980657e20 R09: 0000000000000001 R10: ffff96caaa635cc0 R11: 0000000000000000 R12: ffff96caa9f7cf00 R13: ffff96caa9ec3d00 R14: ffff96caa9ec3d28 R15: ffff96caa9ec3d40 FS: 00007fbc7c4e2740(0000) GS:ffff96caabc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000129d58000 CR4: 00000000000006f0 Call Trace: ? cp_new_stat+0x12d/0x160 hwmon_attr_show_string+0x37/0x70 [hwmon] dev_attr_show+0x14/0x50 sysfs_kf_seq_show+0xb5/0x1b0 seq_read+0xcf/0x460 vfs_read+0x9b/0x150 ksys_read+0x5f/0xe0 do_syscall_64+0x48/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... Temperature labels are not always present. Adjust sysfs attribute visibility accordingly. Reported-by: Meelis Roos <mroos@linux.ee> Suggested-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Meelis Roos <mroos@linux.ee> Cc: Dr. David Alan Gilbert <linux@treblig.org> Fixes: 266cd5835947 ("hwmon: (w83627ehf) convert to with_info interface") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-02-21Merge branch 'nvme-5.6-rc3' of git://git.infradead.org/nvme into block-5.6Jens Axboe
Pull NVMe fixes from Keith. * 'nvme-5.6-rc3' of git://git.infradead.org/nvme: nvme-multipath: Fix memory leak with ana_log_buf nvme: Fix uninitialized-variable warning nvme-pci: Use single IRQ vector for old Apple models nvme/pci: Add sleep quirk for Samsung and Toshiba drives
2020-02-21io_uring: prevent sq_thread from spinning when it should stopStefano Garzarella
This patch drops 'cur_mm' before calling cond_resched(), to prevent the sq_thread from spinning even when the user process is finished. Before this patch, if the user process ended without closing the io_uring fd, the sq_thread continues to spin until the 'sq_thread_idle' timeout ends. In the worst case where the 'sq_thread_idle' parameter is bigger than INT_MAX, the sq_thread will spin forever. Fixes: 6c271ce2f1d5 ("io_uring: add submission polling") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-21Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eofFilipe Manana
While logging the prealloc extents of an inode during a fast fsync we call btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while holding a read lock on a leaf of the inode's root (not the log root, the fs/subvol root), and then that function locks the file range in the inode's iotree. This can lead to a deadlock when: * the fsync is ranged * the file has prealloc extents beyond eof * writeback for a range different from the fsync range starts during the fsync * the size of the file is not sector size aligned Because when finishing an ordered extent we lock first a file range and then try to COW the fs/subvol tree to insert an extent item. The following diagram shows how the deadlock can happen. CPU 1 CPU 2 btrfs_sync_file() --> for range [0, 1MiB) --> inode has a size of 1MiB and has 1 prealloc extent beyond the i_size, starting at offset 4MiB flushes all delalloc for the range [0MiB, 1MiB) and waits for the respective ordered extents to complete --> before task at CPU 1 locks the inode, a write into file range [1MiB, 2MiB + 1KiB) is made --> i_size is updated to 2MiB + 1KiB --> writeback is started for that range, [1MiB, 2MiB + 4KiB) --> end offset rounded up to be sector size aligned btrfs_log_dentry_safe() btrfs_log_inode_parent() btrfs_log_inode() btrfs_log_changed_extents() btrfs_log_prealloc_extents() --> does a search on the inode's root --> holds a read lock on leaf X btrfs_finish_ordered_io() --> locks range [1MiB, 2MiB + 4KiB) --> end offset rounded up to be sector size aligned --> tries to cow leaf X, through insert_reserved_file_extent() --> already locked by the task at CPU 1 btrfs_truncate_inode_items() --> gets an i_size of 2MiB + 1KiB, which is not sector size aligned --> tries to lock file range [2MiB, (u64)-1) --> the start range is rounded down from 2MiB + 1K to 2MiB to be sector size aligned --> but the subrange [2MiB, 2MiB + 4KiB) is already locked by task at CPU 2 which is waiting to get a write lock on leaf X for which we are holding a read lock *** deadlock *** This results in a stack trace like the following, triggered by test case generic/561 from fstests: [ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds. [ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1 [ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000 [ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [ 2779.990466] Call Trace: [ 2779.990491] ? __schedule+0x384/0xa30 [ 2779.990521] schedule+0x33/0xe0 [ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs] [ 2779.990632] ? remove_wait_queue+0x60/0x60 [ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs] [ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs] [ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs] [ 2779.990987] ? mark_held_locks+0x6d/0xc0 [ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100 [ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs] [ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs] [ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs] [ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs] [ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs] [ 2779.991432] process_one_work+0x26d/0x6a0 [ 2779.991460] worker_thread+0x4f/0x3e0 [ 2779.991481] ? process_one_work+0x6a0/0x6a0 [ 2779.991489] kthread+0x103/0x140 [ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70 [ 2779.991515] ret_from_fork+0x3a/0x50 (...) [ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds. [ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1 [ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2780.030035] fsstress D 0 17375 17373 0x00004000 [ 2780.030038] Call Trace: [ 2780.030044] ? __schedule+0x384/0xa30 [ 2780.030052] schedule+0x33/0xe0 [ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs] [ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs] [ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0 [ 2780.030102] ? remove_wait_queue+0x60/0x60 [ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs] [ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs] [ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs] [ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs] [ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0 [ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs] [ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs] [ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0 [ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs] [ 2780.030282] ? dget_parent+0xa1/0x370 [ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs] [ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs] [ 2780.030339] do_fsync+0x38/0x60 [ 2780.030343] __x64_sys_fdatasync+0x13/0x20 [ 2780.030345] do_syscall_64+0x5c/0x280 [ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 2780.030356] RIP: 0033:0x7f2d80f6d5f0 [ 2780.030361] Code: Bad RIP value. [ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0 [ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003 [ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c [ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4 [ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90 So fix this by making btrfs_truncate_inode_items() not lock the range in the inode's iotree when the target root is a log root, since it's not needed to lock the range for log roots as the protection from the inode's lock and log_mutex are all that's needed. Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-21nvme-multipath: Fix memory leak with ana_log_bufLogan Gunthorpe
kmemleak reports a memory leak with the ana_log_buf allocated by nvme_mpath_init(): unreferenced object 0xffff888120e94000 (size 8208): comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000e2360188>] kmalloc_order+0x97/0xc0 [<0000000079b18dd4>] kmalloc_order_trace+0x24/0x100 [<00000000f50c0406>] __kmalloc+0x24c/0x2d0 [<00000000f31a10b9>] nvme_mpath_init+0x23c/0x2b0 [<000000005802589e>] nvme_init_identify+0x75f/0x1600 [<0000000058ef911b>] nvme_loop_configure_admin_queue+0x26d/0x280 [<00000000673774b9>] nvme_loop_create_ctrl+0x2a7/0x710 [<00000000f1c7a233>] nvmf_dev_write+0xc66/0x10b9 [<000000004199f8d0>] __vfs_write+0x50/0xa0 [<0000000065466fef>] vfs_write+0xf3/0x280 [<00000000b0db9a8b>] ksys_write+0xc6/0x160 [<0000000082156b91>] __x64_sys_write+0x43/0x50 [<00000000c34fbb6d>] do_syscall_64+0x77/0x2f0 [<00000000bbc574c9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe nvme_mpath_init() is called by nvme_init_identify() which is called in multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This means nvme_mpath_init() may be called multiple times before nvme_mpath_uninit() (which is only called on nvme_free_ctrl()). When nvme_mpath_init() is called multiple times, it overwrites the ana_log_buf pointer with a new allocation, thus leaking the previous allocation. To fix this, free ana_log_buf before allocating a new one. Fixes: 0d0b660f214dc490 ("nvme: add ANA support") Cc: <stable@vger.kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-02-21genirq/irqdomain: Make sure all irq domain flags are distinctZenghui Yu
This was noticed when printing debugfs for MSIs on my ARM64 server. The new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only be the x86 stuff for the time being... The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which is oddly defined as bit 6 for no good reason. Switch it to the non used bit 1. Fixes: 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity race") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200221020725.2038-1-yuzenghui@huawei.com
2020-02-21zonefs: fix documentation typos etc.Randy Dunlap
Fix typos, spellos, etc. in zonefs.txt. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Damien Le Moal <Damien.LeMoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-02-21csky: Implement copy_thread_tlsGuo Ren
This is required for clone3 which passes the TLS value through a struct rather than a register. Cc: Amanieu d'Antras <amanieu@gmail.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Add PCI supportMaJun
Add the pci related code for csky arch to support basic pci virtual function, such as qemu virt-pci-9pfs. Signed-off-by: MaJun <majun258@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Minimize defconfig to support buildroot config.fragmentMa Jun
Some bsp (eg: buildroot) has defconfig.fragment design to add more configs into the defconfig in linux source code tree. For example, we could put different cpu configs into different defconfig.fragments, but they all use the same defconfig in Linux. Signed-off-by: Ma Jun <majun258@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Add setup_initrd check codeGuo Ren
We should give some necessary check for initrd just like other architectures and it seems that setup_initrd() could be a common code for all architectures. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Cleanup old Kconfig optionsKrzysztof Kozlowski
CONFIG_CLKSRC_OF is gone since commit bb0eb050a577 ("clocksource/drivers: Rename CLKSRC_OF to TIMER_OF"). The platform already selects TIMER_OF. CONFIG_HAVE_DMA_API_DEBUG is gone since commit 6e88628d03dd ("dma-debug: remove CONFIG_HAVE_DMA_API_DEBUG"). CONFIG_DEFAULT_DEADLINE is gone since commit f382fb0bcef4 ("block: remove legacy IO schedulers"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21arch/csky: fix some Kconfig typosRandy Dunlap
Fix wording in help text for the CPU_HAS_LDSTEX symbol. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Fixup compile warning for three unimplemented syscallsGuo Ren
Implement fstat64, fstatat64, clone3 syscalls to fixup checksyscalls.sh compile warnings. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Remove unused cache implementationGuo Ren
Only for coding convention, these codes are unnecessary for abiv2. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Fixup ftrace modify panicGuo Ren
During ftrace init, linux will replace all function prologues (call_mcout) with nops, but it need flush_dcache and invalidate_icache to make it work. So flush_cache functions couldn't be nested called by ftrace framework. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Add flush_icache_mm to defer flush icache allGuo Ren
Some CPUs don't support icache.va instruction to maintain the whole smp cores' icache. Using icache.all + IPI casue a lot on performace and using defer mechanism could reduce the number of calling icache _flush_all functions. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Optimize abiv2 copy_to_user_page with VM_EXECGuo Ren
Only when vma is for VM_EXEC, we need sync dcache & icache. eg: - gdb ptrace modify user space instruction code area. Add VM_EXEC condition to reduce unnecessary cache flush. The abiv1 cpus' cache are all VIPT, so we still need to deal with dcache aliasing problem. But there is optimized way to use cache color, just like what's done in arch/csky/abiv1/inc/abi/page.h. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)Guo Ren
Instead of flushing cache per update_mmu_cache() called, we use flush_dcache_page to reduce the frequency of flashing the cache. As abiv2 cpus are all PIPT for icache & dcache, we needn't handle dcache aliasing problem. But their icache can't snoop dcache, so we still need sync_icache_dcache in update_mmu_cache(). Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Remove unnecessary flush_icache_* implementationGuo Ren
The abiv2 CPUs are all PIPT cache, so there is no need to implement flush_icache_page function. The function flush_icache_user_range hasn't been used, so just remove it. The function flush_cache_range is not necessary for PIPT cache when tlb mapping changed. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Support icache flush without specific instructionsGuo Ren
Some CPUs don't support icache specific instructions to flush icache lines in broadcast way. We use cpu control registers to flush local icache and use IPI to notify other cores. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky/Kconfig: Add Kconfig.platforms to support some driversGuo Ren
Such as snps,dw-apb-ictl Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky/smp: Fixup boot failed when CONFIG_SMPGuo Ren
If we use a non-ipi-support interrupt controller, it will cause panic here. We should let cpu up and work with CONFIG_SMP, when we use a non-ipi intc. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Set regs->usp to kernel sp, when the exception is from kernelGuo Ren
In the past, we didn't care about kernel sp when saving pt_reg. But in some cases, we still need pt_reg->usp to represent the kernel stack before enter exception. For cmpxhg in atomic.S, we need save and restore usp for above. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky/mm: Fixup export invalid_pte_table symbolGuo Ren
There is no present bit in csky pmd hardware, so we need to prepare invalid_pte_table for empty pmd entry and the functions (pmd_none & pmd_present) in pgtable.h need invalid_pte_talbe to get result. If a module use these functions, we need export the symbol for it. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: Mo Qihui <qihui.mo@verisilicon.com> Cc: Zhange Jian <zhang_jian5@dahuatech.com>
2020-02-21csky: Separate fixaddr_init from highmemGuo Ren
After fixaddr_init is separated from highmem, we could use tcm without highmem selected. (610 (abiv1) don't support highmem, but it could use tcm now.) Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Tightly-Coupled Memory or Sram supportGuo Ren
The implementation are not only used by TCM but also used by sram on SOC bus. It follow existed linux tcm software interface, so that old tcm application codes could be re-used directly. Software interface list in asm/tcm.h: - Variables/Const: __tcmdata, __tcmconst - Functions: __tcmfunc, __tcmlocalfunc - Malloc/Free: tcm_alloc, tcm_free In linux menuconfig: - Choose a TCM contain instrctions + data or separated in ITCM/DTCM. - Determine TCM_BASE (DTCM_BASE) in phyiscal address. - Determine size of TCM or ITCM(DTCM) in page counts. Here is hello tcm example from Documentation/arm/tcm.rst which could be directly used: /* Uninitialized data */ static u32 __tcmdata tcmvar; /* Initialized data */ static u32 __tcmdata tcmassigned = 0x2BADBABEU; /* Constant */ static const u32 __tcmconst tcmconst = 0xCAFEBABEU; static void __tcmlocalfunc tcm_to_tcm(void) { int i; for (i = 0; i < 100; i++) tcmvar ++; } static void __tcmfunc hello_tcm(void) { /* Some abstract code that runs in ITCM */ int i; for (i = 0; i < 100; i++) { tcmvar ++; } tcm_to_tcm(); } static void __init test_tcm(void) { u32 *tcmem; int i; hello_tcm(); printk("Hello TCM executed from ITCM RAM\n"); printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar); tcmvar = 0xDEADBEEFU; printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar); printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned); printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst); /* Allocate some TCM memory from the pool */ tcmem = tcm_alloc(20); if (tcmem) { printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem); tcmem[0] = 0xDEADBEEFU; tcmem[1] = 0x2BADBABEU; tcmem[2] = 0xCAFEBABEU; tcmem[3] = 0xDEADBEEFU; tcmem[4] = 0x2BADBABEU; for (i = 0; i < 5; i++) printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]); tcm_free(tcmem, 20); } } TODO: - Separate fixup mapping from highmem - Support abiv1 Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21csky: Initial stack protector supportMao Han
This is a basic -fstack-protector support without per-task canary switching. The protector will report something like when stack corruption is detected: It's tested with strcpy local array overflow in sys_kill and get: stack-protector: Kernel stack is corrupted in: sys_kill+0x23c/0x23c TODO: - Support task switch for different cannary Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21MAINTAINERS: csky: Add mailing list for cskyGuo Ren
Add mailing list and it's convenient for maintain C-SKY subsystem. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-21ext4: fix potential race between s_group_info online resizing and accessSuraj Jitindar Singh
During an online resize an array of pointers to s_group_info gets replaced so it can get enlarged. If there is a concurrent access to the array in ext4_get_group_info() and this memory has been reused then this can lead to an invalid memory access. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.edu Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Balbir Singh <sblbir@amazon.com> Cc: stable@kernel.org
2020-02-21ext4: fix potential race between online resizing and write operationsTheodore Ts'o
During an online resize an array of pointers to buffer heads gets replaced so it can get enlarged. If there is a racing block allocation or deallocation which uses the old array, and the old array has gotten reused this can lead to a GPF or some other random kernel memory getting modified. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.edu Reported-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-02-21Merge tag 'drm-intel-fixes-2020-02-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.6-rc3: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87y2sxtsrd.fsf@intel.com
2020-02-21Merge tag 'drm-misc-fixes-2020-02-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.6-rc3: - Fix dt binding for sunxi. - Allow only 1 rotation argument, and allow 0 rotation in video cmdline. - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f5a6370d-9898-6c72-43e4-5bb56a99b6f2@linux.intel.com
2020-02-20Merge branch 'bnxt_en-shutdown-and-kexec-kdump-related-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: shutdown and kexec/kdump related fixes. 2 small patches to fix kexec shutdown and kdump kernel driver init issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.Vasundhara Volam
If crashed kernel does not shutdown the NIC properly, PCIe FLR is required in the kdump kernel in order to initialize all the functions properly. Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20bnxt_en: Improve device shutdown method.Vasundhara Volam
Especially when bnxt_shutdown() is called during kexec, we need to disable MSIX and disable Bus Master to completely quiesce the device. Make these 2 calls unconditionally in the shutdown method. Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: netlink: cap max groups which will be considered in netlink_bind()Nikolay Aleksandrov
Since nl_groups is a u32 we can't bind more groups via ->bind (netlink_bind) call, but netlink has supported more groups via setsockopt() for a long time and thus nlk->ngroups could be over 32. Recently I added support for per-vlan notifications and increased the groups to 33 for NETLINK_ROUTE which exposed an old bug in the netlink_bind() code causing out-of-bounds access on archs where unsigned long is 32 bits via test_bit() on a local variable. Fix this by capping the maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively capping them at 32 which is the minimum of allocated groups and the maximum groups which can be bound via netlink_bind(). CC: Christophe Leroy <christophe.leroy@c-s.fr> CC: Richard Guy Briggs <rgb@redhat.com> Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: thunderx: workaround BGX TX Underflow issueTim Harvey
While it is not yet understood why a TX underflow can easily occur for SGMII interfaces resulting in a TX wedge. It has been found that disabling/re-enabling the LMAC resolves the issue. Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Robert Jones <rjones@gateworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20ionic: fix fw_status readShannon Nelson
The fw_status field is only 8 bits, so fix the read. Also, we only want to look at the one status bit, to allow for future use of the other bits, and watch for a bad PCI read. Fixes: 97ca486592c0 ("ionic: add heartbeat check") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA fixes from Mimi Zohar: "Two bug fixes and an associated change for each. The one that adds SM3 to the IMA list of supported hash algorithms is a simple change, but could be considered a new feature" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: add sm3 algorithm to hash algorithm configuration list crypto: rename sm3-256 to sm3 in hash_algo_name efi: Only print errors about failing to get certs if EFI vars are found x86/ima: use correct identifier for SetupMode variable
2020-02-20net: disable BRIDGE_NETFILTER by defaultRoman Kiryanov
The description says 'If unsure, say N.' but the module is built as M by default (once the dependencies are satisfied). When the module is selected (Y or M), it enables NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS which alter kernel internal structures. We (Android Studio Emulator) currently do not use this module and think this it is more consistent to have it disabled by default as opposite to disabling it explicitly to prevent enabling NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS. Signed-off-by: Roman Kiryanov <rkir@google.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: macb: Properly handle phylink on at91rm9200Alexandre Belloni
at91ether_init was handling the phy mode and speed but since the switch to phylink, the NCFGR register got overwritten by macb_mac_config(). The issue is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared but never restored as they conflict with the PAE, GBE and PCSSEL bits. Add new capability to differentiate between EMAC and the other versions of the IP and use it to set and avoid clearing the relevant bits. Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC doesn't use any rings/bufffers/queues. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>