summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)Author
2020-01-17f2fs: support data compressionChao Yu
This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ | cluster 1 | cluster 2 | ......... | cluster N | +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 | +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ | data length | data chksum | reserved | compressed data | +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-07Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This is a relatively quiet cycle for nfsd, mainly various bugfixes. Possibly most interesting is Trond's fixes for some callback races that were due to my incomplete understanding of rpc client shutdown. Unfortunately at the last minute I've started noticing a new intermittent failure to send callbacks. As the logic seems basically correct, I'm leaving Trond's patches in for now, and hope to find a fix in the next week so I don't have to revert those patches" * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits) nfsd: depend on CRYPTO_MD5 for legacy client tracking NFSD fixing possible null pointer derefering in copy offload nfsd: check for EBUSY from vfs_rmdir/vfs_unink. nfsd: Ensure CLONE persists data and metadata changes to the target file SUNRPC: Fix backchannel latency metrics nfsd: restore NFSv3 ACL support nfsd: v4 support requires CRYPTO_SHA256 nfsd: Fix cld_net->cn_tfm initialization lockd: remove __KERNEL__ ifdefs sunrpc: remove __KERNEL__ ifdefs race in exportfs_decode_fh() nfsd: Drop LIST_HEAD where the variable it declares is never used. nfsd: document callback_wq serialization of callback code nfsd: mark cb path down on unknown errors nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback() nfsd: minor 4.1 callback cleanup SUNRPC: Fix svcauth_gss_proxy_init() SUNRPC: Trace gssproxy upcall results sunrpc: fix crash when cache_head become valid before update nfsd: remove private bin2hex implementation ...
2019-12-07Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded copy of a file from one source server to a different target server). - New RDMA tracepoints for debugging congestion control and Local Invalidate WRs. Bugfixes and cleanups - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for layoutreturn - Handle bad/dead sessions correctly in nfs41_sequence_process() - Various bugfixes to the delegation return operation. - Various bugfixes pertaining to delegations that have been revoked. - Cleanups to the NFS timespec code to avoid unnecessary conversions between timespec and timespec64. - Fix unstable RDMA connections after a reconnect - Close race between waking an RDMA sender and posting a receive - Wake pending RDMA tasks if connection fails - Fix MR list corruption, and clean up MR usage - Fix another RPCSEC_GSS issue with MIC buffer space" * tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) SUNRPC: Capture completion of all RPC tasks SUNRPC: Fix another issue with MIC buffer space NFS4: Trace lock reclaims NFS4: Trace state recovery operation NFSv4.2 fix memory leak in nfs42_ssc_open NFSv4.2 fix kfree in __nfs42_copy_file_range NFS: remove duplicated include from nfs4file.c NFSv4: Make _nfs42_proc_copy_notify() static NFS: Fallocate should use the nfs4_fattr_bitmap NFS: Return -ETXTBSY when attempting to write to a swapfile fs: nfs: sysfs: Remove NULL check before kfree NFS: remove unneeded semicolon NFSv4: add declaration of current_stateid NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process() nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list SUNRPC: Avoid RPC delays when exiting suspend NFS: Add a tracepoint in nfs_fh_to_dentry() NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done() NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn ...
2019-12-04Merge tag 'trace-v5.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: "Two fixes and one patch that was missed: Fixes: - Missing __print_hex_dump undef for processing new function in trace events - Stop WARN_ON messages when lockdown disables tracing on boot up Enhancement: - Debug option to inject trace events from userspace (for rasdaemon)" The enhancement has its own config option and is non invasive. It's been discussed for sever months and should have been added to my original push, but I never pulled it into my queue. * tag 'trace-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not create directories if lockdown is in affect tracing: Introduce trace event injection tracing: Fix __print_hex_dump scope
2019-12-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: "Incoming: - a small number of updates to scripts/, ocfs2 and fs/buffer.c - most of MM I still have quite a lot of material (mostly not MM) staged after linux-next due to -next dependencies. I'll send those across next week as the preprequisites get merged up" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits) mm/page_io.c: annotate refault stalls from swap_readpage mm/Kconfig: fix trivial help text punctuation mm/Kconfig: fix indentation mm/memory_hotplug.c: remove __online_page_set_limits() mm: fix typos in comments when calling __SetPageUptodate() mm: fix struct member name in function comments mm/shmem.c: cast the type of unmap_start to u64 mm: shmem: use proper gfp flags for shmem_writepage() mm/shmem.c: make array 'values' static const, makes object smaller userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register() userfaultfd: wrap the common dst_vma check into an inlined function userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb() userfaultfd: use vma_pagesize for all huge page size calculation mm/madvise.c: use PAGE_ALIGN[ED] for range checking mm/madvise.c: replace with page_size() in madvise_inject_error() mm/mmap.c: make vma_merge() comment more easy to understand mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops autonuma: reduce cache footprint when scanning page tables autonuma: fix watermark checking in migrate_balanced_pgdat() ...
2019-12-01Merge tag 'y2038-cleanups-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 cleanups from Arnd Bergmann: "y2038 syscall implementation cleanups This is a series of cleanups for the y2038 work, mostly intended for namespace cleaning: the kernel defines the traditional time_t, timeval and timespec types that often lead to y2038-unsafe code. Even though the unsafe usage is mostly gone from the kernel, having the types and associated functions around means that we can still grow new users, and that we may be missing conversions to safe types that actually matter. There are still a number of driver specific patches needed to get the last users of these types removed, those have been submitted to the respective maintainers" Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/ * tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits) y2038: alarm: fix half-second cut-off y2038: ipc: fix x32 ABI breakage y2038: fix typo in powerpc vdso "LOPART" y2038: allow disabling time32 system calls y2038: itimer: change implementation to timespec64 y2038: move itimer reset into itimer.c y2038: use compat_{get,set}_itimer on alpha y2038: itimer: compat handling to itimer.c y2038: time: avoid timespec usage in settimeofday() y2038: timerfd: Use timespec64 internally y2038: elfcore: Use __kernel_old_timeval for process times y2038: make ns_to_compat_timeval use __kernel_old_timeval y2038: socket: use __kernel_old_timespec instead of timespec y2038: socket: remove timespec reference in timestamping y2038: syscalls: change remaining timeval to __kernel_old_timeval y2038: rusage: use __kernel_old_timeval y2038: uapi: change __kernel_time_t to __kernel_old_time_t y2038: stat: avoid 'time_t' in 'struct stat' y2038: ipc: remove __kernel_time_t reference from headers y2038: vdso: powerpc: avoid timespec references ...
2019-12-01rss_stat: add support to detect RSS updates of external mmJoel Fernandes (Google)
When a process updates the RSS of a different process, the rss_stat tracepoint appears in the context of the process doing the update. This can confuse userspace that the RSS of process doing the update is updated, while in reality a different process's RSS was updated. This issue happens in reclaim paths such as with direct reclaim or background reclaim. This patch adds more information to the tracepoint about whether the mm being updated belongs to the current process's context (curr field). We also include a hash of the mm pointer so that the process who the mm belongs to can be uniquely identified (mm_id field). Also vsprintf.c is refactored a bit to allow reuse of hashing code. [akpm@linux-foundation.org: remove unused local `str'] [joelaf@google.com: inline call to ptr_to_hashval] Link: http://lore.kernel.org/r/20191113153816.14b95acd@gandalf.local.home Link: http://lkml.kernel.org/r/20191114164622.GC233237@google.com Link: http://lkml.kernel.org/r/20191106024452.81923-1-joel@joelfernandes.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Ioannis Ilkos <ilkos@google.com> Acked-by: Petr Mladek <pmladek@suse.com> [lib/vsprintf.c] Cc: Tim Murray <timmurray@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Carmen Jackson <carmenjackson@google.com> Cc: Mayank Gupta <mayankgupta@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Minchan Kim <minchan@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01mm: emit tracepoint when RSS changesJoel Fernandes (Google)
Useful to track how RSS is changing per TGID to detect spikes in RSS and memory hogs. Several Android teams have been using this patch in various kernel trees for half a year now. Many reported to me it is really useful so I'm posting it upstream. Initial patch developed by Tim Murray. Changes I made from original patch: o Prevent any additional space consumed by mm_struct. Regarding the fact that the RSS may change too often thus flooding the traces - note that, there is some "hysterisis" with this already. That is - We update the counter only if we receive 64 page faults due to SPLIT_RSS_ACCOUNTING. However, during zapping or copying of pte range, the RSS is updated immediately which can become noisy/flooding. In a previous discussion, we agreed that BPF or ftrace can be used to rate limit the signal if this becomes an issue. Also note that I added wrappers to trace_rss_stat to prevent compiler errors where linux/mm.h is included from tracing code, causing errors such as: CC kernel/trace/power-traces.o In file included from ./include/trace/define_trace.h:102, from ./include/trace/events/kmem.h:342, from ./include/linux/mm.h:31, from ./include/linux/ring_buffer.h:5, from ./include/linux/trace_events.h:6, from ./include/trace/events/power.h:12, from kernel/trace/power-traces.c:15: ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type struct trace_entry ent; \ Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.org Co-developed-by: Tim Murray <timmurray@google.com> Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Carmen Jackson <carmenjackson@google.com> Cc: Mayank Gupta <mayankgupta@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Minchan Kim <minchan@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window saw the the following new featuers added to ext4: - Direct I/O via iomap (required the iomap-for-next branch from Darrick as a prereq). - Support for using dioread-nolock where the block size < page size. - Support for encryption for file systems where the block size < page size. - Rework of journal credits handling so a revoke-heavy workload will not cause the journal to run out of space. - Replace bit-spinlocks with spinlocks in jbd2 Also included were some bug fixes and cleanups, mostly to clean up corner cases from fuzzed file systems and error path handling" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits) ext4: work around deleting a file with i_nlink == 0 safely ext4: add more paranoia checking in ext4_expand_extra_isize handling jbd2: make jbd2_handle_buffer_credits() handle reserved handles ext4: fix a bug in ext4_wait_for_tail_page_commit ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails ext4: code cleanup for get_next_id ext4: fix leak of quota reservations ext4: remove unused variable warning in parse_options() ext4: Enable encryption for subpage-sized blocks fs/buffer.c: support fscrypt in block_read_full_page() ext4: Add error handling for io_end_vec struct allocation jbd2: Fine tune estimate of necessary descriptor blocks jbd2: Provide trace event for handle restarts ext4: Reserve revoke credits for freed blocks jbd2: Make credit checking more strict jbd2: Rename h_buffer_credits to h_total_credits jbd2: Reserve space for revoke descriptor blocks jbd2: Drop jbd2_space_needed() jbd2: Account descriptor blocks into t_outstanding_credits jbd2: Factor out common parts of stopping and restarting a handle ...
2019-11-28Merge tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more io_uring updates from Jens Axboe: "As mentioned in the first pull request, there was a later batch as well. This contains fixes to the stuff that already went in, cleanups, and a few later additions. In particular, this contains: - Cleanups/fixes/unification of the submission and completion path (Pavel,me) - Linked timeouts improvements (Pavel,me) - Error path fixes (me) - Fix lookup window where cancellations wouldn't work (me) - Improve DRAIN support (Pavel) - Fix backlog flushing -EBUSY on submit (me) - Add support for connect(2) (me) - Fix for non-iter based fixed IO (Pavel) - creds inheritance for async workers (me) - Disable cmsg/ancillary data for sendmsg/recvmsg (me) - Shrink io_kiocb to 3 cachelines (me) - NUMA fix for io-wq (Jann)" * tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-block: (42 commits) io_uring: make poll->wait dynamically allocated io-wq: shrink io_wq_work a bit io-wq: fix handling of NUMA node IDs io_uring: use kzalloc instead of kcalloc for single-element allocations io_uring: cleanup io_import_fixed() io_uring: inline struct sqe_submit io_uring: store timeout's sqe->off in proper place net: disallow ancillary data for __sys_{send,recv}msg_file() net: separate out the msghdr copy from ___sys_{send,recv}msg() io_uring: remove superfluous check for sqe->off in io_accept() io_uring: async workers should inherit the user creds io-wq: have io_wq_create() take a 'data' argument io_uring: fix dead-hung for non-iter fixed rw io_uring: add support for IORING_OP_CONNECT net: add __sys_connect_file() helper io_uring: only return -EBUSY for submit on non-flushed backlog io_uring: only !null ptr to io_issue_sqe() io_uring: simplify io_req_link_next() io_uring: pass only !null to io_req_find_next() io_uring: remove io_free_req_find_next() ...
2019-11-27Merge tag 'trace-v5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New tracing features: - New PERMANENT flag to ftrace_ops when attaching a callback to a function. As /proc/sys/kernel/ftrace_enabled when set to zero will disable all attached callbacks in ftrace, this has a detrimental impact on live kernel tracing, as it disables all that it patched. If a ftrace_ops is registered to ftrace with the PERMANENT flag set, it will prevent ftrace_enabled from being disabled, and if ftrace_enabled is already disabled, it will prevent a ftrace_ops with PREMANENT flag set from being registered. - New register_ftrace_direct(). As eBPF would like to register its own trampolines to be called by the ftrace nop locations directly, without going through the ftrace trampoline, this function has been added. This allows for eBPF trampolines to live along side of ftrace, perf, kprobe and live patching. It also utilizes the ftrace enabled_functions file that keeps track of functions that have been modified in the kernel, to allow for security auditing. - Allow for kernel internal use of ftrace instances. Subsystems in the kernel can now create and destroy their own tracing instances which allows them to have their own tracing buffer, and be able to record events without worrying about other users from writing over their data. - New seq_buf_hex_dump() that lets users use the hex_dump() in their seq_buf usage. - Notifications now added to tracing_max_latency to allow user space to know when a new max latency is hit by one of the latency tracers. - Wider spread use of generic compare operations for use of bsearch and friends. - More synthetic event fields may be defined (32 up from 16) - Use of xarray for architectures with sparse system calls, for the system call trace events. This along with small clean ups and fixes" * tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits) tracing: Enable syscall optimization for MIPS tracing: Use xarray for syscall trace events tracing: Sample module to demonstrate kernel access to Ftrace instances. tracing: Adding new functions for kernel access to Ftrace instances tracing: Fix Kconfig indentation ring-buffer: Fix typos in function ring_buffer_producer ftrace: Use BIT() macro ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured ftrace: Rename ftrace_graph_stub to ftrace_stub_graph ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization ftrace: Add helper find_direct_entry() to consolidate code ftrace: Add another check for match in register_ftrace_direct() ftrace: Fix accounting bug with direct->count in register_ftrace_direct() ftrace/selftests: Fix spelling mistake "wakeing" -> "waking" tracing: Increase SYNTH_FIELDS_MAX for synthetic_events ftrace/samples: Add a sample module that implements modify_ftrace_direct() ftrace: Add modify_ftrace_direct() tracing: Add missing "inline" in stub function of latency_fsnotify() tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text tracing: Use seq_buf_hex_dump() to dump buffers ...
2019-11-27Merge tag 'char-misc-5.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of char/misc and other driver patches for 5.5-rc1 Loads of different things in here, this feels like the catch-all of driver subsystems these days. Full details are in the shortlog, but nothing major overall, just lots of driver updates and additions. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (198 commits) char: Fix Kconfig indentation, continued habanalabs: add more protection of device during reset habanalabs: flush EQ workers in hard reset habanalabs: make the reset code more consistent habanalabs: expose reset counters via existing INFO IOCTL habanalabs: make code more concise habanalabs: use defines for F/W files habanalabs: remove prints on successful device initialization habanalabs: remove unnecessary checks habanalabs: invalidate MMU cache only once habanalabs: skip VA block list update in reset flow habanalabs: optimize MMU unmap habanalabs: prevent read/write from/to the device during hard reset habanalabs: split MMU properties to PCI/DRAM habanalabs: re-factor MMU masks and documentation habanalabs: type specific MMU cache invalidation habanalabs: re-factor memory module code habanalabs: export uapi defines to user-space habanalabs: don't print error when queues are full habanalabs: increase max jobs number to 512 ...
2019-11-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...
2019-11-26tracing: Fix __print_hex_dump scopePiotr Maziarz
undef is needed for parsing __print_hex_dump in traceevent lib. Link: http://lkml.kernel.org/r/1574762791-14883-1-git-send-email-piotrx.maziarz@linux.intel.com Signed-off-by: Piotr Maziarz <piotrx.maziarz@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: "Another merge window, another pull full of stuff: 1) Support alternative names for network devices, from Jiri Pirko. 2) Introduce per-netns netdev notifiers, also from Jiri Pirko. 3) Support MSG_PEEK in vsock/virtio, from Matias Ezequiel Vara Larsen. 4) Allow compiling out the TLS TOE code, from Jakub Kicinski. 5) Add several new tracepoints to the kTLS code, also from Jakub. 6) Support set channels ethtool callback in ena driver, from Sameeh Jubran. 7) New SCTP events SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED, SCTP_ADDR_MADE_PRIM, and SCTP_SEND_FAILED_EVENT. From Xin Long. 8) Add XDP support to mvneta driver, from Lorenzo Bianconi. 9) Lots of netfilter hw offload fixes, cleanups and enhancements, from Pablo Neira Ayuso. 10) PTP support for aquantia chips, from Egor Pomozov. 11) Add UDP segmentation offload support to igb, ixgbe, and i40e. From Josh Hunt. 12) Add smart nagle to tipc, from Jon Maloy. 13) Support L2 field rewrite by TC offloads in bnxt_en, from Venkat Duvvuru. 14) Add a flow mask cache to OVS, from Tonghao Zhang. 15) Add XDP support to ice driver, from Maciej Fijalkowski. 16) Add AF_XDP support to ice driver, from Krzysztof Kazimierczak. 17) Support UDP GSO offload in atlantic driver, from Igor Russkikh. 18) Support it in stmmac driver too, from Jose Abreu. 19) Support TIPC encryption and auth, from Tuong Lien. 20) Introduce BPF trampolines, from Alexei Starovoitov. 21) Make page_pool API more numa friendly, from Saeed Mahameed. 22) Introduce route hints to ipv4 and ipv6, from Paolo Abeni. 23) Add UDP segmentation offload to cxgb4, Rahul Lakkireddy" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1857 commits) libbpf: Fix usage of u32 in userspace code mm: Implement no-MMU variant of vmalloc_user_node_flags slip: Fix use-after-free Read in slip_open net: dsa: sja1105: fix sja1105_parse_rgmii_delays() macvlan: schedule bc_work even if error enetc: add support Credit Based Shaper(CBS) for hardware offload net: phy: add helpers phy_(un)lock_mdio_bus mdio_bus: don't use managed reset-controller ax88179_178a: add ethtool_op_get_ts_info() mlxsw: spectrum_router: Fix use of uninitialized adjacency index mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels bpf: Simplify __bpf_arch_text_poke poke type handling bpf: Introduce BPF_TRACE_x helper for the tracing tests bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT bpf, testing: Add various tail call test cases bpf, x86: Emit patchable direct jump as tail call bpf: Constant map key tracking for prog array pokes bpf: Add poke dependency tracking for prog array maps bpf: Add initial poke descriptor table for jit images bpf: Move owner type, jited info into array auxiliary data ...
2019-11-25Merge branch 'for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "There are several notable changes here: - Single thread migrating itself has been optimized so that it doesn't need threadgroup rwsem anymore. - Freezer optimization to avoid unnecessary frozen state changes. - cgroup ID unification so that cgroup fs ino is the only unique ID used for the cgroup and can be used to directly look up live cgroups through filehandle interface on 64bit ino archs. On 32bit archs, cgroup fs ino is still the only ID in use but it is only unique when combined with gen. - selftest and other changes" * 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) writeback: fix -Wformat compilation warnings docs: cgroup: mm: Fix spelling of "list" cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root() cgroup: use cgrp->kn->id as the cgroup ID kernfs: use 64bit inos if ino_t is 64bit kernfs: implement custom exportfs ops and fid type kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id() kernfs: convert kernfs_node->id from union kernfs_node_id to u64 kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes kernfs: use dumber locking for kernfs_find_and_get_node_by_ino() netprio: use css ID instead of cgroup ID writeback: use ino_t for inodes in tracepoints kernfs: fix ino wrap-around detection kselftests: cgroup: Avoid the reuse of fd after it is deallocated cgroup: freezer: don't change task and cgroups status unnecessarily cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency cgroup: remove cgroup_enable_task_cg_lists() optimization cgroup: pids: use atomic64_t for pids->limit selftests: cgroup: Run test_core under interfering stress selftests: cgroup: Add task migration tests ...
2019-11-25io_uring: improve trace_io_uring_defer() trace pointJens Axboe
We don't have shadow requests anymore, so get rid of the shadow argument. Add the user_data argument, as that's often useful to easily match up requests, instead of having to look at request pointers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25Merge tag 'for-5.5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "User visible changes: - new block group profiles: RAID1 with 3- and 4- copies - RAID1 in btrfs has always 2 copies, now add support for 3 and 4 - this is an incompat feature (named RAID1C34) - recommended use of RAID1C3 is replacement of RAID6 profile on metadata, this brings a more reliable resiliency against 2 device loss/damage - support for new checksums - per-filesystem, set at mkfs time - fast hash (crc32c successor): xxhash, 64bit digest - strong hashes (both 256bit): sha256 (slower, FIPS), blake2b (faster) - the blake2b module goes via the crypto tree, btrfs.ko has a soft dependency - speed up lseek, don't take inode locks unnecessarily, this can speed up parallel SEEK_CUR/SEEK_SET/SEEK_END by 80% - send: - allow clone operations within the same file - limit maximum number of sent clone references to avoid slow backref walking - error message improvements: device scan prints process name and PID Core changes: - cleanups - remove unique workqueue helpers, used to provide a way to avoid deadlocks in the workqueue code, now done in a simpler way - remove lots of indirect function calls in compression code - extent IO tree code moved out of extent_io.c - cleanup backup superblock handling at mount time - transaction life cycle documentation and cleanups - locking code cleanups, annotations and documentation - add more cold, const, pure function attributes - removal of unused or redundant struct members or variables - new tree-checker sanity tests - try to detect missing INODE_ITEM, cross-reference checks of DIR_ITEM, DIR_INDEX, INODE_REF, and XATTR_* items - remove own bio scheduling code (used to avoid checksum submissions being stuck behind other IO), replaced by cgroup controller-based code to allow better control and avoid priority inversions in cases where the custom and cgroup scheduling disagreed Fixes: - avoid getting stuck during cyclic writebacks - fix trimming of ranges crossing block group boundaries - fix rename exchange on subvolumes, all involved subvolumes need to be recorded in the transaction" * tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits) btrfs: drop bdev argument from submit_extent_page btrfs: remove extent_map::bdev btrfs: drop bio_set_dev where not needed btrfs: get bdev directly from fs_devices in submit_extent_page btrfs: record all roots for rename exchange on a subvol Btrfs: fix block group remaining RO forever after error during device replace btrfs: scrub: Don't check free space before marking a block group RO btrfs: change btrfs_fs_devices::rotating to bool btrfs: change btrfs_fs_devices::seeding to bool btrfs: rename btrfs_block_group_cache btrfs: block-group: Reuse the item key from caller of read_one_block_group() btrfs: block-group: Refactor btrfs_read_block_groups() btrfs: document extent buffer locking btrfs: access eb::blocking_writers according to ACCESS_ONCE policies btrfs: set blocking_writers directly, no increment or decrement btrfs: merge blocking_writers branches in btrfs_tree_read_lock btrfs: drop incompat bit for raid1c34 after last block group is gone btrfs: add incompat for raid1 with 3, 4 copies btrfs: add support for 4-copy replication (raid1c4) btrfs: add support for 3-copy replication (raid1c3) ...
2019-11-25Merge tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Due to more granular branches, this one is small and will be followed with other core branches that add specific features. I meant to just have a core and drivers branch, but external dependencies we ended up adding a few more that are also core. The changes are: - Fixes and improvements for the zoned device support (Ajay, Damien) - sed-opal table writing and datastore UID (Revanth) - blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun) - Improvements to the block stats tracking (Pavel) - Fix for overruning sysfs buffer for large number of CPUs (Ming) - Optimization for small IO (Ming, Christoph) - Fix typo in RWH lifetime hint (Eugene) - Dead code removal and documentation (Bart) - Reduction in memory usage for queue and tag set (Bart) - Kerneldoc header documentation (André) - Device/partition revalidation fixes (Jan) - Stats tracking for flush requests (Konstantin) - Various other little fixes here and there (et al)" * tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits) Revert "block: split bio if the only bvec's length is > SZ_4K" block: add iostat counters for flush requests block,bfq: Skip tracing hooks if possible block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64' blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1 block: Don't disable interrupts in trigger_softirq() sbitmap: Delete sbitmap_any_bit_clear() blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue() block: split bio if the only bvec's length is > SZ_4K block: still try to split bio if the bvec crosses pages blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT blk-cgroup: reimplement basic IO stats using cgroup rstat blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive() blk-throtl: stop using blkg->stat_bytes and ->stat_ios bfq-iosched: stop using blkg->stat_bytes and ->stat_ios bfq-iosched: relocate bfqg_*rwstat*() helpers block: add zone open, close and finish ioctl support block: add zone open, close and finish operations block: Simplify REQ_OP_ZONE_RESET_ALL handling block: Remove REQ_OP_ZONE_RESET plugging ...
2019-11-25Merge tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "A lot of stuff has been going on this cycle, with improving the support for networked IO (and hence unbounded request completion times) being one of the major themes. There's been a set of fixes done this week, I'll send those out as well once we're certain we're fully happy with them. This contains: - Unification of the "normal" submit path and the SQPOLL path (Pavel) - Support for sparse (and bigger) file sets, and updating of those file sets without needing to unregister/register again. - Independently sized CQ ring, instead of just making it always 2x the SQ ring size. This makes it more flexible for networked applications. - Support for overflowed CQ ring, never dropping events but providing backpressure on submits. - Add support for absolute timeouts, not just relative ones. - Support for generic cancellations. This divorces io_uring from workqueues as well, which additionally gets us one step closer to generic async system call support. - With cancellations, we can support grabbing the process file table as well, just like we do mm context. This allows support for system calls that create file descriptors, like accept4() support that's built on top of that. - Support for io_uring tracing (Dmitrii) - Support for linked timeouts. These abort an operation if it isn't completed by the time noted in the linke timeout. - Speedup tracking of poll requests - Various cleanups making the coder easier to follow (Jackie, Pavel, Bob, YueHaibing, me) - Update MAINTAINERS with new io_uring list" * tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits) io_uring: make POLL_ADD/POLL_REMOVE scale better io-wq: remove now redundant struct io_wq_nulls_list io_uring: Fix getting file for non-fd opcodes io_uring: introduce req_need_defer() io_uring: clean up io_uring_cancel_files() io-wq: ensure free/busy list browsing see all items io-wq: ensure we have a stable view of ->cur_work for cancellations io_wq: add get/put_work handlers to io_wq_create() io_uring: check for validity of ->rings in teardown io_uring: fix potential deadlock in io_poll_wake() io_uring: use correct "is IO worker" helper io_uring: fix -ENOENT issue with linked timer with short timeout io_uring: don't do flush cancel under inflight_lock io_uring: flag SQPOLL busy condition to userspace io_uring: make ASYNC_CANCEL work with poll and timeout io_uring: provide fallback request for OOM situations io_uring: convert accept4() -ERESTARTSYS into -EINTR io_uring: fix error clear of ->file_table in io_sqe_files_register() io_uring: separate the io_free_req and io_free_req_find_next interface io_uring: keep io_put_req only responsible for release and put req ...
2019-11-25writeback: fix -Wformat compilation warningsQian Cai
The commit f05499a06fb4 ("writeback: use ino_t for inodes in tracepoints") introduced a lot of GCC compilation warnings on s390, In file included from ./include/trace/define_trace.h:102, from ./include/trace/events/writeback.h:904, from fs/fs-writeback.c:82: ./include/trace/events/writeback.h: In function 'trace_raw_output_writeback_page_template': ./include/trace/events/writeback.h:76:12: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'ino_t' {aka 'unsigned int'} [-Wformat=] TP_printk("bdi %s: ino=%lu index=%lu", ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/trace/trace_events.h:360:22: note: in definition of macro 'DECLARE_EVENT_CLASS' trace_seq_printf(s, print); \ ^~~~~ ./include/trace/events/writeback.h:76:2: note: in expansion of macro 'TP_printk' TP_printk("bdi %s: ino=%lu index=%lu", ^~~~~~~~~ Fix them by adding necessary casts where ino_t could be either "unsigned int" or "unsigned long". Fixes: f05499a06fb4 ("writeback: use ino_t for inodes in tracepoints") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-22SUNRPC: Capture completion of all RPC tasksChuck Lever
RPC tasks on the backchannel never invoke xprt_complete_rqst(), so there is no way to report their tk_status at completion. Also, any RPC task that exits via rpc_exit_task() before it is replied to will also disappear without a trace. Introduce a trace point that is symmetrical with rpc_task_begin that captures the termination status of each RPC task. Sample trace output for callback requests initiated on the server: kworker/u8:12-448 [003] 127.025240: rpc_task_end: task:50@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task kworker/u8:12-448 [002] 127.567310: rpc_task_end: task:51@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task kworker/u8:12-448 [001] 130.506817: rpc_task_end: task:52@3 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task Odd, though, that I never see trace_rpc_task_complete, either in the forward or backchannel. Should it be removed? Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-20page_pool: Add API to update numa nodeSaeed Mahameed
Add page_pool_update_nid() to be called by page pool consumers when they detect numa node changes. It will update the page pool nid value to start allocating from the new effective numa node. This is to mitigate page pool allocating pages from a wrong numa node, where the pool was originally allocated, and holding on to pages that belong to a different numa node, which causes performance degradation. For pages that are already being consumed and could be returned to the pool by the consumer, in next patch we will add a check per page to avoid recycling them back to the pool and return them to the page allocator. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18page_pool: extend tracepoint to also include the page PFNJesper Dangaard Brouer
The MM tracepoint for page free (called kmem:mm_page_free) doesn't provide the page pointer directly, instead it provides the PFN (Page Frame Number). This is annoying when writing a page_pool leak detector in BPF. This patch change page_pool tracepoints to also provide the PFN. The page pointer is still provided to allow other kinds of troubleshooting from BPF. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18page_pool: add destroy attempts counter and rename tracepointJesper Dangaard Brouer
When Jonathan change the page_pool to become responsible to its own shutdown via deferred work queue, then the disconnect_cnt counter was removed from xdp memory model tracepoint. This patch change the page_pool_inflight tracepoint name to page_pool_release, because it reflects the new responsability better. And it reintroduces a counter that reflect the number of times page_pool_release have been tried. The counter is also used by the code, to only empty the alloc cache once. With a stuck work queue running every second and counter being 64-bit, it will overrun in approx 584 billion years. For comparison, Earth lifetime expectancy is 7.5 billion years, before the Sun will engulf, and destroy, the Earth. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-18btrfs: rename btrfs_block_group_cacheDavid Sterba
The type name is misleading, a single entry is named 'cache' while this normally means a collection of objects. Rename that everywhere. Also the identifier was quite long, making function prototypes harder to format. Suggested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: add dedicated members for start and length of a block groupDavid Sterba
The on-disk format of block group item makes use of the key that stores the offset and length. This is further used in the code, although this makes thing harder to understand. The key is also packed so the offset/length is not properly aligned as u64. Add start (key.objectid) and length (key.offset) members to block group and remove the embedded key. When the item is searched or written, a local variable for key is used. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: move block_group_item::used to block groupDavid Sterba
For unknown reasons, the member 'used' in the block group struct is stored in the b-tree item and accessed everywhere using the special accessor helper. Let's unify it and make it a regular member and only update the item before writing it to the tree. The item is still being used for flags and chunk_objectid, there's some duplication until the item is removed in following patches. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tracepoints: constify all pointersDavid Sterba
We don't modify the data passed to tracepoints, some of the declarations are already const, add it to the rest. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tracepoints: drop typecasts from printkDavid Sterba
Remove typecasts from trace printk, adjust types and move typecast to the assignment if necessary. When assigning, the types are more obvious compared to matching the variables to the format strings. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: get rid of pointless wtag variable in async-thread.cOmar Sandoval
Commit ac0c7cf8be00 ("btrfs: fix crash when tracepoint arguments are freed by wq callbacks") added a void pointer, wtag, which is passed into trace_btrfs_all_work_done() instead of the freed work item. This is silly for a few reasons: 1. The freed work item still has the same address. 2. work is still in scope after it's freed, so assigning wtag doesn't stop anyone from using it. 3. The tracepoint has always taken a void * argument, so assigning wtag doesn't actually make things any more type-safe. (Note that the original bug in commit bc074524e123 ("btrfs: prefix fsid to all trace events") was that the void * was implicitly casted when it was passed to btrfs_work_owner() in the trace point itself). Instead, let's add some clearer warnings as comments. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18Merge tag 'nfs-rdma-for-5.5-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA Client Updates for Linux 5.5 New Features: - New tracepoints for congestion control and Local Invalidate WRs Bugfixes and Cleanups: - Eliminate log noise in call_reserveresult - Fix unstable connections after a reconnect - Clean up some code duplication - Close race between waking a sender and posting a receive - Fix MR list corruption, and clean up MR usage - Remove unused rpcrdma_sendctx fields - Try to avoid DMA mapping pages if it is too costly - Wake pending tasks if connection fails - Replace some dprintk()s with tracepoints
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16page_pool: do not release pool until inflight == 0.Jonathan Lemon
The page pool keeps track of the number of pages in flight, and it isn't safe to remove the pool until all pages are returned. Disallow removing the pool until all pages are back, so the pool is always available for page producers. Make the page pool responsible for its own delayed destruction instead of relying on XDP, so the page pool can be used without the xdp memory model. When all pages are returned, free the pool and notify xdp if the pool is registered with the xdp memory system. Have the callback perform a table walk since some drivers (cpsw) may share the pool among multiple xdp_rxq_info. Note that the increment of pages_state_release_cnt may result in inflight == 0, resulting in the pool being released. Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15y2038: itimer: change implementation to timespec64Arnd Bergmann
There is no 64-bit version of getitimer/setitimer since that is not actually needed. However, the implementation is built around the deprecated 'struct timeval' type. Change the code to use timespec64 internally to reduce the dependencies on timeval and associated helper functions. Minor adjustments in the code are needed to make the native and compat version work the same way, and to keep the range check working after the conversion. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-14tracing: Use seq_buf_hex_dump() to dump buffersPiotr Maziarz
Without this, buffers can be printed with __print_array macro that has no formatting options and can be hard to read. The other way is to mimic formatting capability with multiple calls of trace event with one call per row which gives performance impact and different timestamp in each row. Link: http://lkml.kernel.org/r/1573130738-29390-2-git-send-email-piotrx.maziarz@linux.intel.com Signed-off-by: Piotr Maziarz <piotrx.maziarz@linux.intel.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-12cgroup: use cgrp->kn->id as the cgroup IDTejun Heo
cgroup ID is currently allocated using a dedicated per-hierarchy idr and used internally and exposed through tracepoints and bpf. This is confusing because there are tracepoints and other interfaces which use the cgroupfs ino as IDs. The preceding changes made kn->id exposed as ino as 64bit ino on supported archs or ino+gen (low 32bits as ino, high gen). There's no reason for cgroup to use different IDs. The kernfs IDs are unique and userland can easily discover them and map them back to paths using standard file operations. This patch replaces cgroup IDs with kernfs IDs. * cgroup_id() is added and all cgroup ID users are converted to use it. * kernfs_node creation is moved to earlier during cgroup init so that cgroup_id() is available during init. * While at it, s/cgroup/cgrp/ in psi helpers for consistency. * Fallback ID value is changed to 1 to be consistent with root cgroup ID. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Namhyung Kim <namhyung@kernel.org>
2019-11-12kernfs: convert kernfs_node->id from union kernfs_node_id to u64Tejun Heo
kernfs_node->id is currently a union kernfs_node_id which represents either a 32bit (ino, gen) pair or u64 value. I can't see much value in the usage of the union - all that's needed is a 64bit ID which the current code is already limited to. Using a union makes the code unnecessarily complicated and prevents using 64bit ino without adding practical benefits. This patch drops union kernfs_node_id and makes kernfs_node->id a u64. ino is stored in the lower 32bits and gen upper. Accessors - kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the ino and gen. This simplifies ID handling less cumbersome and will allow using 64bit inos on supported archs. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexei Starovoitov <ast@kernel.org>
2019-11-12writeback: use ino_t for inodes in tracepointsTejun Heo
Writeback TPs currently use mix of 32 and 64bits for inos. This isn't currently broken because only cgroup inos are using 32bits and they're limited to 32bits. cgroup inos will make use of 64bits. Let's uniformly use ino_t. While at it, switch the default cgroup ino value used when cgroup is disabled to 1 instead of -1U as root cgroup always uses ino 1. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Namhyung Kim <namhyung@kernel.org>
2019-11-09tcp: remove redundant new line from tcp_event_sk_skbTony Lu
This removes '\n' from trace event class tcp_event_sk_skb to avoid redundant new blank line and make output compact. Fixes: af4325ecc24f ("tcp: expose sk_state in tcp_retransmit_skb tracepoint") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-08fsi: aspeed: Add trace pointsJoel Stanley
These trace points help with debugging the FSI master. They show the low level reads, writes and error states of the master. Signed-off-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20191108051945.7109-11-joel@jms.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-08trace: fsi: Print transfer size unsignedAndrew Jeffery
Due to other bugs I observed a spurious -1 transfer size. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20191108051945.7109-5-joel@jms.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-07Merge branch 'for-linus' into for-5.5/blockJens Axboe
Pull on for-linus to resolve what otherwise would have been a conflict with the cgroups rstat patchset from Tejun. * for-linus: (942 commits) blkcg: make blkcg_print_stat() print stats only for online blkgs nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload iocost: don't nest spin_lock_irq in ioc_weight_write() io_uring: ensure we clear io_kiocb->result before each issue um-ubd: Entrust re-queue to the upper layers nvme-multipath: remove unused groups_only mode in ana log nvme-multipath: fix possible io hang after ctrl reconnect io_uring: don't touch ctx in setup after ring fd install io_uring: Fix leaked shadow_req Linux 5.4-rc5 riscv: cleanup do_trap_break nbd: verify socket is supported during setup ata: libahci_platform: Fix regulator_get_optional() misuse nbd: handle racing with error'ed out commands nbd: protect cmd->status with cmd->lock io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD io_uring: used cached copies of sq->dropped and cq->overflow ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 ...
2019-11-05jbd2: Provide trace event for handle restartsJan Kara
Provide trace event for handle restarts to ease debugging. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Reserve revoke credits for freed blocksJan Kara
So far we have reserved only relatively high fixed amount of revoke credits for each transaction. We over-reserved by large amount for most cases but when freeing large directories or files with data journalling, the fixed amount is not enough. In fact the worst case estimate is inconveniently large (maximum extent size) for freeing of one extent. We fix this by doing proper estimate of the amount of blocks that need to be revoked when removing blocks from the inode due to truncate or hole punching and otherwise reserve just a small amount of revoke credits for each transaction to accommodate freeing of xattrs block or so. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-04io_uring: add completion trace eventJens Axboe
We currently don't have a completion event trace, add one of those. And to better be able to match up submissions and completions, add user_data to the submission trace as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01io_uring: remove io_uring_add_to_prev() trace eventJens Axboe
This internal logic was killed with the conversion to io-wq, so we no longer have a need for this particular trace. Kill it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01net: bridge: fdb: br_fdb_update can take flags directlyNikolay Aleksandrov
If we modify br_fdb_update() to take flags directly we can get rid of one test and one atomic bitop in the learning path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-31Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Linux-kernel memory consistency model updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>