summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-12-05x86,kvm: move qemu/guest FPU switching out to vcpu_runRik van Riel
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-01Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt: "This contains a handful of small cleanups that are a result of feedback that didn't make it into our original patch set, either because the feedback hadn't been given yet, I missed the original emails, or we weren't ready to submit the changes yet. I've been maintaining the various cleanup patch sets I have as their own branches, which I then merged together and signed. Each merge commit has a short summary of the changes, and each branch is based on your latest tag (4.15-rc1, in this case). If this isn't the right way to do this then feel free to suggest something else, but it seems sane to me. Here's a short summary of the changes, roughly in order of how interesting they are. - libgcc.h has been moved from include/lib, where it's the only member, to include/linux. This is meant to avoid tab completion conflicts. - VDSO entries for clock_get/gettimeofday/getcpu have been added. These are simple syscalls now, but we want to let glibc use them from the start so we can make them faster later. - A VDSO entry for instruction cache flushing has been added so userspace can flush the instruction cache. - The VDSO symbol versions for __vdso_cmpxchg{32,64} have been removed, as those VDSO entries don't actually exist. - __io_writes has been corrected to respect the given type. - A new READ_ONCE in arch_spin_is_locked(). - __test_and_op_bit_ord() is now actually ordered. - Various small fixes throughout the tree to enable allmodconfig to build cleanly. - Removal of some dead code in our atomic support headers. - Improvements to various comments in our atomic support headers" * tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits) RISC-V: __io_writes should respect the length argument move libgcc.h to include/linux RISC-V: Clean up an unused include RISC-V: Allow userspace to flush the instruction cache RISC-V: Flush I$ when making a dirty page executable RISC-V: Add missing include RISC-V: Use define for get_cycles like other architectures RISC-V: Provide stub of setup_profiling_timer() RISC-V: Export some expected symbols for modules RISC-V: move empty_zero_page definition to C and export it RISC-V: io.h: type fixes for warnings RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros RISC-V: use generic serial.h RISC-V: remove spin_unlock_wait() RISC-V: `sfence.vma` orderes the instruction cache RISC-V: Add READ_ONCE in arch_spin_is_locked() RISC-V: __test_and_op_bit_ord should be strongly ordered RISC-V: Remove smb_mb__{before,after}_spinlock() RISC-V: Remove __smp_bp__{before,after}_atomic RISC-V: Comment on why {,cmp}xchg is ordered how it is ...
2017-12-01move libgcc.h to include/linuxChristoph Hellwig
Introducing a new include/lib directory just for this file totally messes up tab completion for include/linux, which is highly annoying. Move it to include/linux where we have headers for all kinds of other lib/ code as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-11-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - x86 bugfixes: APIC, nested virtualization, IOAPIC - PPC bugfix: HPT guests on a POWER9 radix host * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM: VMX: Fix vmx->nested freeing when no SMI handler KVM: VMX: Fix rflags cache during vCPU reset KVM: X86: Fix softlockup when get the current kvmclock KVM: lapic: Fixup LDR on load in x2apic KVM: lapic: Split out x2apic ldr calculation KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP KVM: x86: Fix CPUID function for word 6 (80000001_ECX) KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 KVM: x86: ioapic: Preserve read-only values in the redirection table KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq KVM: x86: ioapic: Don't fire level irq when Remote IRR set KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: inject exceptions produced by x86_decode_insn KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs KVM: x86: fix em_fxstor() sleeping while in atomic KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry ...
2017-11-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Mergr misc fixes from Andrew Morton: "28 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits) fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate() mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" autofs: revert "autofs: take more care to not update last_used on path walk" fs/fat/inode.c: fix sb_rdonly() change mm, memcg: fix mem_cgroup_swapout() for THPs mm: migrate: fix an incorrect call of prep_transhuge_page() kmemleak: add scheduling point to kmemleak_scan() scripts/bloat-o-meter: don't fail with division by 0 fs/mbcache.c: make count_objects() more robust Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical" mm/madvise.c: fix madvise() infinite loop under special circumstances exec: avoid RLIMIT_STACK races with prlimit() IB/core: disable memory registration of filesystem-dax vmas v4l2: disable filesystem-dax mapping support mm: fail get_vaddr_frames() for filesystem-dax mappings mm: introduce get_user_pages_longterm device-dax: implement ->split() to catch invalid munmap attempts mm, hugetlbfs: introduce ->split() to vm_operations_struct scripts/faddr2line: extend usage on generic arch ...
2017-11-29autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"Ian Kent
Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but introduced a semantic change. In order to honor AT_NO_AUTOMOUNT a semantic change was made to the negative dentry case for stat family system calls in follow_automount(). This changed the unconditional triggering of an automount in this case to no longer be done and an error returned instead. This has caused more problems than I expected so reverting the change is needed. In a discussion with Neil Brown it was concluded that the automount(8) daemon can implement this change without kernel modifications. So that will be done instead and the autofs module documentation updated with a description of the problem and what needs to be done by module users for this specific case. Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") Signed-off-by: Ian Kent <raven@themaw.net> Cc: Neil Brown <neilb@suse.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Colin Walters <walters@redhat.com> Cc: Ondrej Holy <oholy@redhat.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: migrate: fix an incorrect call of prep_transhuge_page()Zi Yan
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. By inspecting the code, if called on a non-THP, prep_transhuge_page() will 1) change the value of the mapping of (page + 2), since it is used for THP deferred list; 2) change the lru value of (page + 1), since it is used for THP's dtor. Both can lead to data corruption of these two pages. Andrea said: "Pragmatically and from the point of view of the memory_hotplug subsys, the effect is a kernel crash when pages are being migrated during a memory hot remove offline and migration target pages are found in a bad state" This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu> Reported-by: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: <stable@vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: introduce get_user_pages_longtermDan Williams
Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm, hugetlbfs: introduce ->split() to vm_operations_structDan Williams
Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: fix device-dax pud write-faults triggered by get_user_pages()Dan Williams
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...
2017-11-29kallsyms: take advantage of the new '%px' formatLinus Torvalds
The conditional kallsym hex printing used a special fixed-width '%lx' output (KALLSYM_FMT) in preparation for the hashing of %p, but that series ended up adding a %px specifier to help with the conversions. Use it, and avoid the "print pointer as an unsigned long" code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27sunrpc: make the function arg as constBhumika Goyal
Make the struct cache_detail *tmpl argument of the function cache_create_net as const as it is only getting passed to kmemup having the argument as const void *. Add const to the prototype too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27KVM: Let KVM_SET_SIGNAL_MASK work as advertisedJan H. Schönherr
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-26Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Glexiner: - unbreak the irq trigger type check for legacy platforms - a handful fixes for ARM GIC v3/4 interrupt controllers - a few trivial fixes all over the place * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/matrix: Make - vs ?: Precedence explicit irqchip/imgpdc: Use resource_size function on resource object irqchip/qcom: Fix u32 comparison with value less than zero irqchip/exiu: Fix return value check in exiu_init() irqchip/gic-v3-its: Remove artificial dependency on PCI irqchip/gic-v4: Add forward definition of struct irq_domain_ops irqchip/gic-v3: pr_err() strings should end with newlines irqchip/s3c24xx: pr_err() strings should end with newlines irqchip/gic-v3: Fix ppi-partitions lookup irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails genirq: Track whether the trigger type has been set
2017-11-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - topology enumeration fixes - KASAN fix - two entry fixes (not yet the big series related to KASLR) - remove obsolete code - instruction decoder fix - better /dev/mem sanity checks, hopefully working better this time - pkeys fixes - two ACPI fixes - 5-level paging related fixes - UMIP fixes that should make application visible faults more debuggable - boot fix for weird virtualization environment * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/decoder: Add new TEST instruction pattern x86/PCI: Remove unused HyperTransport interrupt support x86/umip: Fix insn_get_code_seg_params()'s return value x86/boot/KASLR: Remove unused variable x86/entry/64: Add missing irqflags tracing to native_load_gs_index() x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing x86/pkeys/selftests: Fix protection keys write() warning x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey' x86/mpx/selftests: Fix up weird arrays x86/pkeys: Update documentation about availability x86/umip: Print a warning into the syslog if UMIP-protected instructions are used x86/smpboot: Fix __max_logical_packages estimate x86/topology: Avoid wasting 128k for package id array perf/x86/intel/uncore: Cache logical pkg id in uncore driver x86/acpi: Reduce code duplication in mp_override_legacy_irq() x86/acpi: Handle SCI interrupts above legacy space gracefully x86/boot: Fix boot failure when SMP MP-table is based at 0 x86/mm: Limit mmap() of /dev/mem to valid physical addresses x86/selftests: Add test for mapping placement for 5-level paging ...
2017-11-26Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes: a documentation fix, a Sparse warning fix and a debugging fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix task state recording/printout sched/deadline: Don't use dubious signed bitfields sched/deadline: Fix the description of runtime accounting in the documentation
2017-11-26Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "A handful of objtool fixes, most of them related to making the UAPI header-syncing warnings easier to read and easier to act upon" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Sync objtool UAPI header objtool: Fix cross-build objtool: Move kernel headers/code sync check to a script objtool: Move synced files to their original relative locations objtool: Make unreachable annotation inline asms explicitly volatile objtool: Add a comment for the unreachable annotation macros
2017-11-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - The final conversion of timer wheel timers to timer_setup(). A few manual conversions and a large coccinelle assisted sweep and the removal of the old initialization mechanisms and the related code. - Remove the now unused VSYSCALL update code - Fix permissions of /proc/timer_list. I still need to get rid of that file completely - Rename a misnomed clocksource function and remove a stale declaration * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) m68k/macboing: Fix missed timer callback assignment treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts timer: Remove redundant __setup_timer*() macros timer: Pass function down to initialization routines timer: Remove unused data arguments from macros timer: Switch callback prototype to take struct timer_list * argument timer: Pass timer_list pointer to callbacks unconditionally Coccinelle: Remove setup_timer.cocci timer: Remove setup_*timer() interface timer: Remove init_timer() interface treewide: setup_timer() -> timer_setup() (2 field) treewide: setup_timer() -> timer_setup() treewide: init_timer() -> setup_timer() treewide: Switch DEFINE_TIMER callbacks to struct timer_list * s390: cmm: Convert timers to use timer_setup() lightnvm: Convert timers to use timer_setup() drivers/net: cris: Convert timers to use timer_setup() drm/vc4: Convert timers to use timer_setup() block/laptop_mode: Convert timers to use timer_setup() net/atm/mpc: Avoid open-coded assignment of timer callback function ...
2017-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho. 2) bpf offload bug fixes from Jakub Kicinski. 3) Fix bpf verifier to NOP out code which is dead at run time because due to branch pruning the verifier will not explore such instructions. From Alexei Starovoitov. 4) Fix crash when deleting secondary chains in packet scheduler classifier. From Roman Kapl. 5) Fix buffer management bugs in smc, from Ursula Braun. 6) Fix regression in anycast route handling, from David Ahern. 7) Fix link settings regression in r8169, from Tobias Jakobi. 8) Add back enough UFO support so that live migration still works, from Willem de Bruijn. 9) Linearize enough packet data for the full extent to which the ipvlan code will inspect the packet headers, from Gao Feng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) ipvlan: Fix insufficient skb linear check for ipv6 icmp ipvlan: Fix insufficient skb linear check for arp geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6 net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY net: accept UFO datagrams from tuntap and packet net: realtek: r8169: implement set_link_ksettings() net: ipv6: Fixup device for anycast routes during copy net/smc: Fix preinitialization of buf_desc in __smc_buf_create() net/smc: use sk_rcvbuf as start for rmb creation ipv6: Do not consider linkdown nexthops during multipath net: sched: fix crash when deleting secondary chains net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE bpf: fix branch pruning logic bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: remove explicit handling of 0 for arg2 in bpf_probe_read bpf: introduce ARG_PTR_TO_MEM_OR_NULL i40evf: Use smp_rmb rather than read_barrier_depends fm10k: Use smp_rmb rather than read_barrier_depends igb: Use smp_rmb rather than read_barrier_depends ...
2017-11-24Merge tag 'keys-next-20171123' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys Merge keys subsystem changes from David Howells, for v4.15.
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-11-23 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Several BPF offloading fixes, from Jakub. Among others: - Limit offload to cls_bpf and XDP program types only. - Move device validation into the driver and don't make any assumptions about the device in the classifier due to shared blocks semantics. - Don't pass offloaded XDP program into the driver when it should be run in native XDP instead. Offloaded ones are not JITed for the host in such cases. - Don't destroy device offload state when moved to another namespace. - Revert dumping offload info into user space for now, since ifindex alone is not sufficient. This will be redone properly for bpf-next tree. 2) Fix test_verifier to avoid using bpf_probe_write_user() helper in test cases, since it's dumping a warning into kernel log which may confuse users when only running tests. Switch to use bpf_trace_printk() instead, from Yonghong. 3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics before it becomes uabi, from Gianluca. More specifically: - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only by bpf_csum_diff(), where the argument is either a valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO then enforces a valid pointer in case of non-0 size or a valid pointer or NULL in case of size 0. Given that, the semantics for ARG_PTR_TO_MEM in combination with ARG_CONST_SIZE_OR_ZERO are now such that in case of size 0, the pointer must always be valid and cannot be NULL. This fix in semantics allows for bpf_probe_read() to drop the recently added size == 0 check in the helper that would become part of uabi otherwise once released. At the same time we can then fix bpf_probe_read_str() and bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO instead of ARG_CONST_SIZE in order to fix recently reported issues by Arnaldo et al, where LLVM optimizes two boundary checks into a single one for unknown variables where the verifier looses track of the variable bounds and thus rejects valid programs otherwise. 4) A fix for the verifier for the case when it detects comparison of two constants where the branch is guaranteed to not be taken at runtime. Verifier will rightfully prune the exploration of such paths, but we still pass the program to JITs, where they would complain about using reserved fields, etc. Track such dead instructions and sanitize them with mov r0,r0. Rejection is not possible since LLVM may generate them for valid C code and doesn't do as much data flow analysis as verifier. For bpf-next we might implement removal of such dead code and adjust branches instead. Fix from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24net: accept UFO datagrams from tuntap and packetWillem de Bruijn
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-23Merge tag 'for-linus-timers-conversion-final-v4.15-rc1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into timers/urgent Pull the last batch of manual timer conversions from Kees Cook: - final batch of "non trivial" timer conversions (multi-tree dependencies, things Coccinelle couldn't handle, etc). - treewide conversions via Coccinelle, in 4 steps: - DEFINE_TIMER() functions converted to struct timer_list * argument - init_timer() -> setup_timer() - setup_timer() -> timer_setup() - setup_timer() -> timer_setup() (with a single embedded structure) - deprecated timer API removals (init_timer(), setup_*timer()) - finalization of new API (remove global casts)
2017-11-23bpf: fix branch pruning logicAlexei Starovoitov
when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-22Merge tag 'for-linus-20171120' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Richard Weinberger: "General changes: - Unconfuse get_unmapped_area and point/unpoint driver methods - New partition parser: sharpslpart - Kill GENERIC_IO - Various fixes NAND changes: - Add a flag to mark NANDs that require 3 address cycles to encode a page address - Set a default ECC/free layout when NAND_ECC_NONE is requested - Fix a bug in panic_nand_write() - Another batch of cleanups for the denali driver - Fix PM support in the atmel driver - Remove support for platform data in the omap driver - Fix subpage write in the omap driver - Fix irq handling in the mtk driver - Change link order of mtk_ecc and mtk_nand drivers to speed up boot time - Change log level of ECC error messages in the mxc driver - Patch the pxa3xx driver to support Armada 8k platforms - Add BAM DMA support to the qcom driver - Convert gpio-nand to the GPIO desc API - Fix ECC handling in the mt29f driver SPI-NOR changes: - Introduce system power management support - New mechanism to select the proper .quad_enable() hook by JEDEC ID, when needed, instead of only by manufacturer ID - Add support to new memory parts from Gigadevice, Winbond, Macronix and Everspin - Maintainance for Cadence, Intel, Mediatek and STM32 drivers" * tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd: (85 commits) mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid mtd: sharpslpart: Add sharpslpart partition parser mtd: Add sanity checks in mtd_write/read_oob() mtd: remove the get_unmapped_area method mtd: implement mtd_get_unmapped_area() using the point method mtd: chips/map_rom.c: implement point and unpoint methods mtd: chips/map_ram.c: implement point and unpoint methods mtd: mtdram: properly handle the phys argument in the point method mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD' mtd: slram: use memremap() instead of ioremap() kconfig: kill off GENERIC_IO option mtd: Fix C++ comment in include/linux/mtd/mtd.h mtd: constify mtd_partition mtd: plat-ram: Replace manual resource management by devm mtd: nand: Fix writing mtdoops to nand flash. mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID mtd: nand: mtk: fix infinite ECC decode IRQ issue mtd: spi-nor: Add support for mr25h128 mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o mtd: spi-nor: enable 4B opcodes for mx66l51235l ...
2017-11-22bpf: introduce ARG_PTR_TO_MEM_OR_NULLGianluca Borello
With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO and the verifier can prove the value of this next argument is 0. However, most helpers are just interested in handling <!NULL, 0>, so forcing them to deal with <NULL, 0> makes the implementation of those helpers more complicated for no apparent benefits, requiring them to explicitly handle those corner cases with checks that bpf programs could start relying upon, preventing the possibility of removing them later. Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case. Currently, the only helper that needs this is bpf_csum_diff_proto(), so change arg1 and arg3 to this new type as well. Also add a new battery of tests that explicitly test the !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the various <NULL, 0> variations are focused on bpf_csum_diff, so cover also other helpers. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook
With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove redundant __setup_timer*() macrosKees Cook
With __init_timer*() now matching __setup_timer*(), remove the redundant internal interface, clean up the resulting definitions and add more documentation. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Pass function down to initialization routinesKees Cook
In preparation for removing more macros, pass the function down to the initialization routines instead of doing it in macros. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove unused data arguments from macrosKees Cook
With the .data field removed, the ignored data arguments in timer macros can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Switch callback prototype to take struct timer_list * argumentKees Cook
Since all callbacks have been converted, we can switch the core prototype to "struct timer_list *" now too. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Pass timer_list pointer to callbacks unconditionallyKees Cook
Now that all timer callbacks are already taking their struct timer_list pointer as the callback argument, just do this unconditionally and remove the .data field. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove setup_*timer() interfaceKees Cook
With all callers converted to timer_setup(), the old setup_*timer() interface can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove init_timer() interfaceKees Cook
All users of init_timer() have been updated. Remove the ancient interface. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21block/laptop_mode: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: linux-block@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - print the warning about dropped messages on consoles on a separate line. It makes it more legible. - one typo fix and small code clean up. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: added new line symbol after warning about dropped messages printk: fix typo in printk_safe.c printk: simplify no_printk()
2017-11-21sched/deadline: Don't use dubious signed bitfieldsDan Carpenter
It doesn't cause a run-time bug, but these bitfields should be unsigned. When it's signed ->dl_throttled is set to either 0 or -1, instead of 0 and 1 as expected. The sched.h file is included into tons of places so Sparse generates a flood of warnings like this: ./include/linux/sched.h:477:54: error: dubious one-bit signed bitfield Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luca Abeni <luca.abeni@santannapisa.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Cc: luca abeni <luca.abeni@santannapisa.it> Link: http://lkml.kernel.org/r/20171013070121.dzcncojuj2f4utij@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-21bpf: make bpf_prog_offload_verifier_prep() static inlineJakub Kicinski
Header implementation of bpf_prog_offload_verifier_prep() which is used if CONFIG_NET=n should be a static inline. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: revert report offload info to user spaceJakub Kicinski
This reverts commit bd601b6ada11 ("bpf: report offload info to user space"). The ifindex by itself is not sufficient, we should provide information on which network namespace this ifindex belongs to. After considering some options we concluded that it's best to just remove this API for now, and rework it in -next. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: turn bpf_prog_get_type() into a wrapperJakub Kicinski
bpf_prog_get_type() is identical to bpf_prog_get_type_dev(), with false passed as attach_drv. Instead of keeping it as an exported symbol turn it into static inline wrapper. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-21bpf: offload: move offload device validation out to the driversJakub Kicinski
With TC shared block changes we can't depend on correct netdev pointer being available in cls_bpf. Move the device validation to the driver. Core will only make sure that offloaded programs are always attached in the driver (or in HW by the driver). We trust that drivers which implement offload callbacks will perform necessary checks. Moving the checks to the driver is generally a useful thing, in practice the check should be against a switchdev instance, not a netdev, given that most ASICs will probably allow using the same program on many ports. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-19Merge tag 'ntb-4.15' of git://github.com/jonmason/ntbLinus Torvalds
Pull ntb updates from Jon Mason: "Support for the switchtec ntb and related changes. Also, a couple of bug fixes" [ The timing isn't great. I had asked people to send me pull requests before my family vacation, and this code has not even been in linux-next as far as I can tell. But Logan Gunthorpe pleaded for its inclusion because the Switchtec driver has apparently been around for a while, just never in linux-next - Linus ] * tag 'ntb-4.15' of git://github.com/jonmason/ntb: ntb: intel: remove b2b memory window workaround for Skylake NTB NTB: make idt_89hpes_cfg const NTB: switchtec_ntb: Update switchtec documentation with notes for NTB NTB: switchtec_ntb: Add memory window support NTB: switchtec_ntb: Implement scratchpad registers NTB: switchtec_ntb: Implement doorbell registers NTB: switchtec_ntb: Add link management NTB: switchtec_ntb: Add skeleton NTB driver NTB: switchtec_ntb: Initialize hardware for doorbells and messages NTB: switchtec_ntb: Initialize hardware for memory windows NTB: switchtec_ntb: Introduce initial NTB driver NTB: Add check and comment for link up to mw_count() and mw_get_align() NTB: Ensure ntb_mw_get_align() is only called when the link is up NTB: switchtec: Add link event notifier callback NTB: switchtec: Add NTB hardware register definitions NTB: switchtec: Export class symbol for use in upper layer driver NTB: switchtec: Move structure definitions into a common header ntb: update maintainer list for Intel NTB driver
2017-11-18NTB: switchtec_ntb: Add skeleton NTB driverLogan Gunthorpe
Add a skeleton NTB driver which will be filled out in subsequent patches. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec_ntb: Introduce initial NTB driverLogan Gunthorpe
Seeing the Switchtec NTB hardware shares the same endpoint as the management endpoint we utilize the class_interface API to register an NTB driver for every Switchtec device in the system that has the NTB class code. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: Add check and comment for link up to mw_count() and mw_get_align()Logan Gunthorpe
Adds a comment and a check to ntb_mw_get_align() so that it always fails if the function is called before the link is up. Also adds a comment to ntb_mw_count() to note that it may return 0 if it is called before the link is up. This is to prevent accidental mis-use in clients that are testing on hardware that this doesn't matter for. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec: Add link event notifier callbackLogan Gunthorpe
In order for the Switchtec NTB code to handle link change events we create a notifier callback in the switchtec code which gets called whenever an appropriate event interrupt occurs. In order to preserve userspace's ability to follow these events, we compare the event count with a stored copy from last time we checked. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-18NTB: switchtec: Add NTB hardware register definitionsLogan Gunthorpe
There are two additional regions: ctrl and dbmsg. The first is for generic NTB control and memory windows. The second is for doorbells and message registers. This patch also adds a number of related constants for using these registers. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>