summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)Author
2018-04-12Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails" * tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits) NFS: advance nfs_entry cookie only after decoding completes successfully NFSv3/acl: forget acl cache after setattr NFSv4.1: Fix exclusive create NFSv4: Declare the size up to date after it was set. nfs: Use ida_simple API NFSv4: Fix the nfs_inode_set_delegation() arguments NFSv4: Clean up CB_GETATTR encoding NFSv4: Don't ask for attributes when ACCESS is protected by a delegation NFSv4: Add a helper to encode/decode struct timespec NFSv4: Clean up encode_attrs NFSv4; Clean up XDR encoding of type bitmap4 NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group SUNRPC: Add a helper for encoding opaque data inline SUNRPC: Add helpers for decoding opaque and string types NFSv4: Ignore change attribute invalidations if we hold a delegation NFS: More fine grained attribute tracking NFS: Don't force unnecessary cache invalidation in nfs_update_inode() NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() NFS: Don't force a revalidation of all attributes if change is missing NFS: Convert NFS_INO_INVALID flags to unsigned long ...
2018-04-12Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull AFS updates from Al Viro: "The AFS series posted by dhowells depended upon lookup_one_len() rework; now that prereq is in the mainline, that series had been rebased on top of it and got some exposure and testing..." * 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Do better accretion of small writes on newly created content afs: Add stats for data transfer operations afs: Trace protocol errors afs: Locally edit directory data for mkdir/create/unlink/... afs: Adjust the directory XDR structures afs: Split the directory content defs into a header afs: Fix directory handling afs: Split the dynroot stuff out and give it its own ops tables afs: Keep track of invalid-before version for dentry coherency afs: Rearrange status mapping afs: Make it possible to get the data version in readpage afs: Init inode before accessing cache afs: Introduce a statistics proc file afs: Dump bad status record afs: Implement @cell substitution handling afs: Implement @sys substitution handling afs: Prospectively look up extra files when doing a single lookup afs: Don't over-increment the cell usage count when pinning it afs: Fix checker warnings vfs: Remove the const from dir_context::actor
2018-04-11mm, vmscan, tracing: use pointer to reclaim_stat struct in trace eventSteven Rostedt
The trace event trace_mm_vmscan_lru_shrink_inactive() currently has 12 parameters! Seven of them are from the reclaim_stat structure. This structure is currently local to mm/vmscan.c. By moving it to the global vmstat.h header, we can also reference it from the vmscan tracepoints. In moving it, it brings down the overhead of passing so many arguments to the trace event. In the future, we may limit the number of arguments that a trace event may pass (ideally just 6, but more realistically it may be 8). Before this patch, the code to call the trace event is this: 0f 83 aa fe ff ff jae ffffffff811e6261 <shrink_inactive_list+0x1e1> 48 8b 45 a0 mov -0x60(%rbp),%rax 45 8b 64 24 20 mov 0x20(%r12),%r12d 44 8b 6d d4 mov -0x2c(%rbp),%r13d 8b 4d d0 mov -0x30(%rbp),%ecx 44 8b 75 cc mov -0x34(%rbp),%r14d 44 8b 7d c8 mov -0x38(%rbp),%r15d 48 89 45 90 mov %rax,-0x70(%rbp) 8b 83 b8 fe ff ff mov -0x148(%rbx),%eax 8b 55 c0 mov -0x40(%rbp),%edx 8b 7d c4 mov -0x3c(%rbp),%edi 8b 75 b8 mov -0x48(%rbp),%esi 89 45 80 mov %eax,-0x80(%rbp) 65 ff 05 e4 f7 e2 7e incl %gs:0x7ee2f7e4(%rip) # 15bd0 <__preempt_count> 48 8b 05 75 5b 13 01 mov 0x1135b75(%rip),%rax # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28> 48 85 c0 test %rax,%rax 74 72 je ffffffff811e646a <shrink_inactive_list+0x3ea> 48 89 c3 mov %rax,%rbx 4c 8b 10 mov (%rax),%r10 89 f8 mov %edi,%eax 48 89 85 68 ff ff ff mov %rax,-0x98(%rbp) 89 f0 mov %esi,%eax 48 89 85 60 ff ff ff mov %rax,-0xa0(%rbp) 89 c8 mov %ecx,%eax 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp) 89 d0 mov %edx,%eax 48 89 85 70 ff ff ff mov %rax,-0x90(%rbp) 8b 45 8c mov -0x74(%rbp),%eax 48 8b 7b 08 mov 0x8(%rbx),%rdi 48 83 c3 18 add $0x18,%rbx 50 push %rax 41 54 push %r12 41 55 push %r13 ff b5 78 ff ff ff pushq -0x88(%rbp) 41 56 push %r14 41 57 push %r15 ff b5 70 ff ff ff pushq -0x90(%rbp) 4c 8b 8d 68 ff ff ff mov -0x98(%rbp),%r9 4c 8b 85 60 ff ff ff mov -0xa0(%rbp),%r8 48 8b 4d 98 mov -0x68(%rbp),%rcx 48 8b 55 90 mov -0x70(%rbp),%rdx 8b 75 80 mov -0x80(%rbp),%esi 41 ff d2 callq *%r10 After the patch: 0f 83 a8 fe ff ff jae ffffffff811e626d <shrink_inactive_list+0x1cd> 8b 9b b8 fe ff ff mov -0x148(%rbx),%ebx 45 8b 64 24 20 mov 0x20(%r12),%r12d 4c 8b 6d a0 mov -0x60(%rbp),%r13 65 ff 05 f5 f7 e2 7e incl %gs:0x7ee2f7f5(%rip) # 15bd0 <__preempt_count> 4c 8b 35 86 5b 13 01 mov 0x1135b86(%rip),%r14 # ffffffff8231bf68 <__tracepoint_mm_vmscan_lru_shrink_inactive+0x28> 4d 85 f6 test %r14,%r14 74 2a je ffffffff811e6411 <shrink_inactive_list+0x371> 49 8b 06 mov (%r14),%rax 8b 4d 8c mov -0x74(%rbp),%ecx 49 8b 7e 08 mov 0x8(%r14),%rdi 49 83 c6 18 add $0x18,%r14 4c 89 ea mov %r13,%rdx 45 89 e1 mov %r12d,%r9d 4c 8d 45 b8 lea -0x48(%rbp),%r8 89 de mov %ebx,%esi 51 push %rcx 48 8b 4d 98 mov -0x68(%rbp),%rcx ff d0 callq *%rax Link: http://lkml.kernel.org/r/2559d7cb-ec60-1200-2362-04fa34fd02bb@fb.com Link: http://lkml.kernel.org/r/20180322121003.4177af15@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reported-by: Alexei Starovoitov <ast@fb.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-10sunrpc: Add static trace point to report result of RPC pingChuck Lever
This information can help track down local misconfiguration issues as well as network partitions and unresponsive servers. There are several ways to send a ping, and with transport multi- plexing, the exact rpc_xprt that is used is sometimes not known by the upper layer. The rpc_xprt pointer passed to the trace point call also has to be RCU-safe. I found a spot inside the client FSM where an rpc_xprt pointer is always available and safe to use. Suggested-by: Bill Baker <Bill.Baker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10sunrpc: Add static trace point to report RPC latency statsChuck Lever
Introduce a low-overhead mechanism to report information about latencies of individual RPCs. The goal is to enable user space to filter the trace record for latency outliers, or build histograms, etc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10sunrpc: Simplify synopsis of some trace pointsChuck Lever
Clean up: struct rpc_task carries a pointer to a struct rpc_clnt, and in fact task->tk_client is always what is passed into trace points that are already passing @task. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10Merge tag 'trace-v4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features: - Tom Zanussi's extended histogram work. This adds the synthetic events to have histograms from multiple event data Adds triggers "onmatch" and "onmax" to call the synthetic events Several updates to the histogram code from this - Allow way to nest ring buffer calls in the same context - Allow absolute time stamps in ring buffer - Rewrite of filter code parsing based on Al Viro's suggestions - Setting of trace_clock to global if TSC is unstable (on boot) - Better OOM handling when allocating large ring buffers - Added initcall tracepoints (consolidated initcall_debug code with them) And other various fixes and clean ups" * tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) init: Have initcall_debug still work without CONFIG_TRACEPOINTS init, tracing: Have printk come through the trace events for initcall_debug init, tracing: instrument security and console initcall trace events init, tracing: Add initcall trace events tracing: Add rcu dereference annotation for test func that touches filter->prog tracing: Add rcu dereference annotation for filter->prog tracing: Fixup logic inversion on setting trace_global_clock defaults tracing: Hide global trace clock from lockdep ring-buffer: Add set/clear_current_oom_origin() during allocations ring-buffer: Check if memory is available before allocation lockdep: Add print_irqtrace_events() to __warn vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK) tracing: Uninitialized variable in create_tracing_map_fields() tracing: Make sure variable string fields are NULL-terminated tracing: Add action comparisons when testing matching hist triggers tracing: Don't add flag strings when displaying variable references tracing: Fix display of hist trigger expressions containing timestamps ftrace: Drop a VLA in module_exists() tracing: Mention trace_clock=global when warning about unstable clocks tracing: Default to using trace_global_clock if sched_clock is unstable ...
2018-04-10Merge tag 'rtc-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "This contains a few series that have been in preparation for a while and that will help systems with RTCs that will fail in 2038, 2069 or 2100. Subsystem: - Add tracepoints - Rework of the RTC/nvmem API to allow drivers to discard struct nvmem_config after registration - New range API, drivers can now expose the useful range of the RTC - New offset API the core is now able to add an offset to the RTC time, modifying the supported range. - Multiple rtc_time64_to_tm fixes - Handle time_t overflow on 32 bit platforms in the core instead of letting drivers do crazy things. - remove rtc_control API New driver: - Intersil ISL12026 Drivers: - Drivers exposing the RTC non volatile memory have been converted to use nvmem - Removed useless time and date validation - Removed an indirection pattern that was a cargo cult from ancient drivers - Removed VLA usage - Fixed a possible race condition in probe functions - AB8540 support is dropped from ab8500 - pcf85363 now has alarm support" * tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (128 commits) rtc: snvs: Fix usage of snvs_rtc_enable rtc: mt7622: fix module autoloading for OF platform drivers rtc: isl12022: use true and false for boolean values rtc: ab8500: Drop AB8540 support rtc: remove a warning during scripts/kernel-doc step rtc: 88pm860x: remove artificial limitation rtc: 88pm80x: remove artificial limitation rtc: st-lpc: remove artificial limitation rtc: mrst: remove artificial limitation rtc: mv: remove artificial limitation rtc: hctosys: Ensure system time doesn't overflow time_t parisc: time: stop validating rtc_time in .read_time rtc: pcf85063: fix clearing bits in pcf85063_start_clock rtc: at91sam9: Set name of regmap_config rtc: s5m: Remove VLA usage rtc: s5m: Move enum from rtc.h to rtc-s5m.c rtc: remove VLA usage rtc: Add useful timestamp definitions rtc: Add one offset seconds to expand RTC range rtc: Factor out the RTC range validation into rtc_valid_range() ...
2018-04-09afs: Trace protocol errorsDavid Howells
Trace protocol errors detected in afs. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09afs: Locally edit directory data for mkdir/create/unlink/...David Howells
Locally edit the contents of an AFS directory upon a successful inode operation that modifies that directory (such as mkdir, create and unlink) so that we can avoid the current practice of re-downloading the directory after each change. This is viable provided that the directory version number we get back from the modifying RPC op is exactly incremented by 1 from what we had previously. The data in the directory contents is in a defined format that we have to parse locally to perform lookups and readdir, so modifying isn't a problem. If the edit fails, we just clear the VALID flag on the directory and it will be reloaded next time it is needed. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09afs: Prospectively look up extra files when doing a single lookupDavid Howells
When afs_lookup() is called, prospectively look up the next 50 uncached fids also from that same directory and cache the results, rather than just looking up the one file requested. This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency by fetching up to 50 file statuses at a time. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-07Merge tag 'fscache-next-20180406' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: "Three patches that fix some of AFS's usage of fscache: (1) Need to invalidate the cache if a foreign data change is detected on the server. (2) Move the vnode ID uniquifier (equivalent to i_generation) from the auxiliary data to the index key to prevent a race between file delete and a subsequent file create seeing the same index key. (3) Need to retire cookies that correspond to files that we think got deleted on the server. Four patches to fix some things in fscache and cachefiles: (4) Fix a couple of checker warnings. (5) Correctly indicate to the end-of-operation callback whether an operation completed or was cancelled. (6) Add a check for multiple cookie relinquishment. (7) Fix a path through the asynchronous write that doesn't wake up a waiter for a page if the cache decides not to write that page, but discards it instead. A couple of patches to add tracepoints to fscache and cachefiles: (8) Add tracepoints for cookie operators, object state machine execution, cachefiles object management and cachefiles VFS operations. (9) Add tracepoints for fscache operation management and page wrangling. And then three development patches: (10) Attach the index key and auxiliary data to the cookie, pass this information through various fscache-netfs API functions and get rid of the callbacks to the netfs to get it. This means that the cache can get at this information, even if the netfs goes away. It also means that the cache can be lazy in updating the coherency data. (11) Pass the object data size through various fscache-netfs API rather than calling back to the netfs for it, and store the value in the object. This makes it easier to correctly resize the object, as the size is updated on writes to the cache, rather than calling back out to the netfs. (12) Maintain a catalogue of allocated cookies. This makes it possible to catch cookie collision up front rather than down in the bowels of the cache being run from a service thread from the object state machine. This will also make it possible in the future to reconnect to a cookie that's not gone dead yet because it's waiting for finalisation of the storage and also make it possible to bring cookies online if the cache is added after the cookie has been obtained" * tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: Maintain a catalogue of allocated cookies fscache: Pass object size in rather than calling back for it fscache: Attach the index key and aux data to the cookie fscache: Add more tracepoints fscache: Add tracepoints fscache: Fix hanging wait on page discarded by writeback fscache: Detect multiple relinquishment of a cookie fscache: Pass the correct cancelled indications to fscache_op_complete() fscache, cachefiles: Fix checker warnings afs: Be more aggressive in retiring cached vnodes afs: Use the vnode ID uniquifier in the cache key not the aux data afs: Invalidate cache on server data change
2018-04-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - the v9fs maintainers have been missing for a long time. I've taken over v9fs patch slinging. - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits) mm,oom_reaper: check for MMF_OOM_SKIP before complaining mm/ksm: fix interaction with THP mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t headers: untangle kmemleak.h from mm.h include/linux/mmdebug.h: make VM_WARN* non-rvals mm/page_isolation.c: make start_isolate_page_range() fail if already isolated mm: change return type to vm_fault_t mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory kernel/fork.c: detect early free of a live mm mm: make counting of list_lru_one::nr_items lockless mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static block_invalidatepage(): only release page if the full page was invalidated mm: kernel-doc: add missing parameter descriptions mm/swap.c: remove @cold parameter description for release_pages() mm/nommu: remove description of alloc_vm_area zram: drop max_zpage_size and use zs_huge_class_size() zsmalloc: introduce zs_huge_class_size() mm: fix races between swapoff and flush dcache fs/direct-io.c: minor cleanups in do_blockdev_direct_IO ...
2018-04-06fscache: Maintain a catalogue of allocated cookiesDavid Howells
Maintain a catalogue of allocated cookies so that cookie collisions can be handled properly. For the moment, this just involves printing a warning and returning a NULL cookie to the caller of fscache_acquire_cookie(), but in future it might make sense to wait for the old cookie to finish being cleaned up. This requires the cookie key to be stored attached to the cookie so that we still have the key available if the netfs relinquishes the cookie. This is done by an earlier patch. The catalogue also renders redundant fscache_netfs_list (used for checking for duplicates), so that can be removed. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>
2018-04-06init, tracing: Add initcall trace eventsSteven Rostedt (VMware)
Being able to trace the start and stop of initcalls is useful to see where the timings are an issue. There is already an "initcall_debug" parameter, but that can cause a large overhead itself, as the printing of the information may take longer than the initcall functions. Adding in a start and finish trace event around the initcall functions, as well as a trace event that records the level of the initcalls, one can get a much finer measurement of the times and interactions of the initcalls themselves, as trace events are much lighter than printk()s. Suggested-by: Abderrahmane Benbachir <abderrahmane.benbachir@polymtl.ca> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-05mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memoryDavid Rientjes
Kswapd will not wakeup if per-zone watermarks are not failing or if too many previous attempts at background reclaim have failed. This can be true if there is a lot of free memory available. For high- order allocations, kswapd is responsible for waking up kcompactd for background compaction. If the zone is not below its watermarks or reclaim has recently failed (lots of free memory, nothing left to reclaim), kcompactd does not get woken up. When __GFP_DIRECT_RECLAIM is not allowed, allow kcompactd to still be woken up even if kswapd will not reclaim. This allows high-order allocations, such as thp, to still trigger background compaction even when the zone has an abundance of free memory. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803111659420.209721@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGEAnshuman Khandual
alloc_contig_range() initiates compaction and eventual migration for the purpose of either CMA or HugeTLB allocations. At present, the reason code remains the same MR_CMA for either of these cases. Let's make it MR_CONTIG_RANGE which will appropriately reflect the reason code in both these cases. Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on server xdr decoding (with an eye towards eliminating a data copy in the RDMA case). I did some refactoring of the delegation code in preparation for eliminating some delegation self-conflicts and implementing write delegations" * tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits) nfsd: fix incorrect umasks sunrpc: remove incorrect HMAC request initialization NFSD: Clean up legacy NFS SYMLINK argument XDR decoders NFSD: Clean up legacy NFS WRITE argument XDR decoders nfsd: Trace NFSv4 COMPOUND execution nfsd: Add I/O trace points in the NFSv4 read proc nfsd: Add I/O trace points in the NFSv4 write path nfsd: Add "nfsd_" to trace point names nfsd: Record request byte count, not count of vectors nfsd: Fix NFSD trace points svc: Report xprt dequeue latency sunrpc: Report per-RPC execution stats sunrpc: Re-purpose trace_svc_process sunrpc: Save remote presentation address in svc_xprt for trace events sunrpc: Simplify trace_svc_recv sunrpc: Simplify do_enqueue tracing sunrpc: Move trace_svc_xprt_dequeue() sunrpc: Update show_svc_xprt_flags() to include recently added flags svc: Simplify ->xpo_secure_port sunrpc: Remove unneeded pointer dereference ...
2018-04-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Cleanups and bugfixes for ext4, including some fixes to make ext4 more robust against maliciously crafted file system images. (I still don't recommend that container folks hold any delusions that mounting arbitary images that can be crafted by malicious attackers should be considered sane thing to do, though!)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits) ext4: force revalidation of directory pointer after seekdir(2) ext4: add extra checks to ext4_xattr_block_get() ext4: add bounds checking to ext4_xattr_find_entry() ext4: move call to ext4_error() into ext4_xattr_check_block() ext4: don't show data=<mode> option if defaulted ext4: omit init_itable=n in procfs when disabled ext4: show more binary mount options in procfs ext4: simplify kobject usage ext4: remove unused parameters in sysfs code ext4: null out kobject* during sysfs cleanup ext4: don't allow r/w mounts if metadata blocks overlap the superblock ext4: always initialize the crc32c checksum driver ext4: fail ext4_iget for root directory if unallocated ext4: limit xattr size to INT_MAX ext4: add validity checks for bitmap block numbers ext4: fix comments in ext4_swap_extents() ext4: use generic_writepages instead of __writepage/write_cache_pages ext4: don't complain about incorrect features when probing ext4: remove EXT4_STATE_DIOREAD_LOCK flag ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin() ...
2018-04-04Merge tag 'for-4.17-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There are a several user visible changes, the rest is mostly invisible and continues to clean up the whole code base. User visible changes: - new mount option nossd_spread (pair for ssd_spread) - mount option subvolid will detect junk after the number and fail the mount - add message after cancelled device replace - direct module dependency on libcrc32, removed own crc wrappers - removed user space transaction ioctls - use lighter locking when reading /proc/self/mounts, RCU instead of mutex to avoid unnecessary contention Enhancements: - skip writeback of last page when truncating file to same size - send: do not issue unnecessary truncate operations - mount option token specifiers: use %u for unsigned values, more validation - selftests: more tree block validations qgroups: - preparatory work for splitting reservation types for data and metadata, this should allow for more accurate tracking and fix some issues with underflows or do further enhancements - split metadata reservations for started and joined transaction so they do not get mixed up and are accounted correctly at commit time - with the above, it's possible to revert patch that potentially deadlocks when trying to make more space by explicitly committing when the quota limit is hit - fix root item corruption when multiple same source snapshots are created with quota enabled RAID56: - make sure target is identical to source when raid56 rebuild fails after dev-replace - faster rebuild during scrub, batch by stripes and not block-by-block - make more use of cached data when rebuilding from a missing device Fixes: - null pointer deref when device replace target is missing - fix fsync after hole punching when using no-holes feature - fix lockdep splat when allocating percpu data with wrong GFP flags Cleanups, refactoring, core changes: - drop redunant parameters from various functions - kill and opencode trivial helpers - __cold/__exit function annotations - dead code removal - continued audit and documentation of memory barriers - error handling: handle removal from uuid tree - error handling: remove handling of impossible condtitons - more debugging or error messages - updated tracepoints - one VLA use removal (and one still left)" * tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits) btrfs: lift errors from add_extent_changeset to the callers Btrfs: print error messages when failing to read trees btrfs: user proper type for btrfs_mask_flags flags btrfs: split dev-replace locking helpers for read and write btrfs: remove stale comments about fs_mutex btrfs: use RCU in btrfs_show_devname for device list traversal btrfs: update barrier in should_cow_block btrfs: use lockdep_assert_held for mutexes btrfs: use lockdep_assert_held for spinlocks btrfs: Validate child tree block's level and first key btrfs: tests/qgroup: Fix wrong tree backref level Btrfs: fix copy_items() return value when logging an inode Btrfs: fix fsync after hole punching when using no-holes feature btrfs: use helper to set ulist aux from a qgroup Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" btrfs: qgroup: Update trace events for metadata reservation btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item btrfs: qgroup: Use separate meta reservation type for delalloc btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS ...
2018-04-04fscache: Add more tracepointsDavid Howells
Add more tracepoints to fscache, including: (*) fscache_page - Tracks netfs pages known to fscache. (*) fscache_check_page - Tracks the netfs querying whether a page is pending storage. (*) fscache_wake_cookie - Tracks cookies being woken up after a page completes/aborts storage in the cache. (*) fscache_op - Tracks operations being initialised. (*) fscache_wrote_page - Tracks return of the backend write_page op. (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write operation. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04fscache: Add tracepointsDavid Howells
Add some tracepoints to fscache: (*) fscache_cookie - Tracks a cookie's usage count. (*) fscache_netfs - Logs registration of a network filesystem, including the pointer to the cookie allocated. (*) fscache_acquire - Logs cookie acquisition. (*) fscache_relinquish - Logs cookie relinquishment. (*) fscache_enable - Logs enablement of a cookie. (*) fscache_disable - Logs disablement of a cookie. (*) fscache_osm - Tracks execution of states in the object state machine. and cachefiles: (*) cachefiles_ref - Tracks a cachefiles object's usage count. (*) cachefiles_lookup - Logs result of lookup_one_len(). (*) cachefiles_mkdir - Logs result of vfs_mkdir(). (*) cachefiles_create - Logs result of vfs_create(). (*) cachefiles_unlink - Logs calls to vfs_unlink(). (*) cachefiles_rename - Logs calls to vfs_rename(). (*) cachefiles_mark_active - Logs an object becoming active. (*) cachefiles_wait_active - Logs a wait for an old object to be destroyed. (*) cachefiles_mark_inactive - Logs an object becoming inactive. (*) cachefiles_mark_buried - Logs the burial of an object. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...
2018-04-03svc: Report xprt dequeue latencyChuck Lever
Record the time between when a rqstp is enqueued on a transport and when it is dequeued. This includes how long the rqstp waits on the queue and how long it takes the kernel scheduler to wake a nfsd thread to service it. The svc_xprt_dequeue trace point is altered to include the number of microseconds between xprt_enqueue and xprt_dequeue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Report per-RPC execution statsChuck Lever
Introduce a mechanism to report the server-side execution latency of each RPC. The goal is to enable user space to filter the trace record for latency outliers, build histograms, etc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Re-purpose trace_svc_processChuck Lever
Currently, trace_svc_process has two call sites: 1. Just after a call to svc_send. svc_send already invokes trace_svc_send with the same arguments just before returning 2. Just before a call to svc_drop. svc_drop already invokes trace_svc_drop with the same arguments just after it is called Therefore trace_svc_process does not provide any additional information not already provided by these other trace points. However, it would be useful to record the incoming RPC procedure. So reuse trace_svc_process for this purpose. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Save remote presentation address in svc_xprt for trace eventsChuck Lever
TP_printk defines a format string that is passed to user space for converting raw trace event records to something human-readable. My user space's printf (Oracle Linux 7), however, does not have a %pI format specifier. The result is that what is supposed to be an IP address in the output of "trace-cmd report" is just a string that says the field couldn't be displayed. To fix this, adopt the same approach as the client: maintain a pre- formated presentation address for occasions when %pI is not available. The location of the trace_svc_send trace point is adjusted so that rqst->rq_xprt is not NULL when the trace event is recorded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Simplify trace_svc_recvChuck Lever
There doesn't seem to be a lot of value in calling trace_svc_recv in the failing case. 1. There are two very common cases: one is the transport is not ready, and the other is shutdown. Neither is terribly interesting. 2. The trace record for the failing case contains nothing but the status code. Therefore the trace point call site in the error exit is removed. Since the trace point is now recording a length instead of a status, rename the status field and remove the case that records a zero XID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Move trace_svc_xprt_dequeue()Chuck Lever
Reduce the amount of noise generated by trace_svc_xprt_dequeue by moving it to the end of svc_get_next_xprt. This generates exactly one trace event when a ready xprt is found, rather than spurious events when there is no work to do. The empty events contain no information that can't be obtained simply by tracing function calls to svc_xprt_dequeue. A small additional benefit is simplification of the svc_xprt_event trace class, which no longer has to handle the case when the @xprt parameter is NULL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03sunrpc: Update show_svc_xprt_flags() to include recently added flagsChuck Lever
XPT_KILL_TEMP was added by commit 546125d16142 ("sunrpc: don't call sleeping functions from the notifier block callbacks"), and XPT_CONG_CTRL was added by commit 362142b25843 ("sunrpc: flag transports as having congestion control") . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-02Merge tag 'arch-removal' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pul removal of obsolete architecture ports from Arnd Bergmann: "This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. [ See the new nds32 port merged in the previous commit for the next generation of "one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem" - Linus ] The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar [ Palmer Dabbelt points out that RISC-V support is in mainline gcc since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]" This really says it all: 2498 files changed, 95 insertions(+), 467668 deletions(-) * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits) MAINTAINERS: UNICORE32: Change email account staging: iio: remove iio-trig-bfin-timer driver tty: hvc: remove tile driver tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers serial: remove tile uart driver serial: remove m32r_sio driver serial: remove blackfin drivers serial: remove cris/etrax uart drivers usb: Remove Blackfin references in USB support usb: isp1362: remove blackfin arch glue usb: musb: remove blackfin port usb: host: remove tilegx platform glue pwm: remove pwm-bfin driver i2c: remove bfin-twi driver spi: remove blackfin related host drivers watchdog: remove bfin_wdt driver can: remove bfin_can driver mmc: remove bfin_sdh driver input: misc: remove blackfin rotary driver input: keyboard: remove bf54x driver ...
2018-04-02Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU subsystem changes in this cycle were: - Miscellaneous fixes, perhaps most notably removing obsolete code whose only purpose in life was to gather information for the now-removed RCU debugfs facility. Other notable changes include removing NO_HZ_FULL_ALL in favor of the nohz_full kernel boot parameter, minor optimizations for expedited grace periods, some added tracing, creating an RCU-specific workqueue using Tejun's new WQ_MEM_RECLAIM flag, and several cleanups to code and comments. - SRCU cleanups and optimizations. - Torture-test updates, perhaps most notably the adding of ARMv8 support, but also including numerous cleanups and usability fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) rcu: Create RCU-specific workqueues with rescuers torture: Provide more sensible nreader/nwriter defaults for rcuperf torture: Grace periods do not piggyback off of themselves torture: Adjust rcuperf trace processing to allow for workqueues torture: Default jitter off when running rcuperf torture: Specify qemu memory size with --memory argument rcutorture: Add basic ARM64 support to run scripts rcutorture: Update kvm.sh header comment rcutorture: Record which grace-period primitives are tested rcutorture: Re-enable testing of dynamic expediting rcutorture: Avoid fake-writer use of undefined primitives rcutorture: Abstract function and module names rcutorture: Replace multi-instance kzalloc() with kcalloc() rcu: Remove SRCU throttling srcu: Remove dead code in srcu_gp_end() srcu: Reduce scans of srcu_data in counter wrap check srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap srcu: Abstract function name rcu: Make expedited RCU CPU selection avoid unnecessary stores rcu: Trace expedited GP delays due to transitioning CPUs ...
2018-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31btrfs: qgroup: Update trace events for metadata reservationQu Wenruo
Now trace_qgroup_meta_reserve() will have extra type parameter. And introduce two new trace events: 1) trace_qgroup_meta_free_all_pertrans() For btrfs_qgroup_free_meta_all_pertrans() 2) trace_qgroup_meta_convert() For btrfs_qgroup_convert_reserved_meta() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertransQu Wenruo
Btrfs uses 2 different methods to reseve metadata qgroup space. 1) Reserve at btrfs_start_transaction() time This is quite straightforward, caller will use the trans handler allocated to modify b-trees. In this case, reserved metadata should be kept until qgroup numbers are updated. 2) Reserve by using block_rsv first, and later btrfs_join_transaction() This is more complicated, caller will reserve space using block_rsv first, and then later call btrfs_join_transaction() to get a trans handle. In this case, before we modify trees, the reserved space can be modified on demand, and after btrfs_join_transaction(), such reserved space should also be kept until qgroup numbers are updated. Since these two types behave differently, split the original "META" reservation type into 2 sub-types: META_PERTRANS: For above case 1) META_PREALLOC: For reservations that happened before btrfs_join_transaction() of case 2) NOTE: This patch will only convert existing qgroup meta reservation callers according to its situation, not ensuring all callers are at correct timing. Such fix will be added in later patches. Signed-off-by: Qu Wenruo <wqu@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31btrfs: qgroup: Update trace events to use new separate rsv typesQu Wenruo
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-30rxrpc: Add a tracepoint to track rxrpc_peer refcountingDavid Howells
Add a tracepoint to track reference counting on the rxrpc_peer struct. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-30rxrpc: Add a tracepoint to track rxrpc_local refcountingDavid Howells
Add a tracepoint to track reference counting on the rxrpc_local struct. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-28bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28treewide: remove large struct-pass-by-value from tracepoint argumentsAlexei Starovoitov
- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value - convert 'type array[]' tracepoint arguments into 'type *array', since compiler will warn that sizeof('type array[]') == sizeof('type *array') and later should be used instead The CAST_TO_U64 macro in the later patch will enforce that tracepoint arguments can only be integers, pointers, or less than 8 byte structures. Larger structures should be passed by reference. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-27rxrpc: Trace call completionDavid Howells
Add a tracepoint to track rxrpc calls moving into the completed state and to log the completion type and the recorded error value and abort code. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27rxrpc, afs: Use debug_ids rather than pointers in tracesDavid Howells
In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27rxrpc: Trace resendDavid Howells
Add a tracepoint to trace packet resend events and to dump the Tx annotation buffer for added illumination. Signed-off-by: David Howells <dhowells@rdhat.com>
2018-03-26Btrfs: add tracepoint for em's EEXIST caseLiu Bo
This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the subtle bugs around merge_extent_mapping. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-15mmc: core: Fix tracepoint print of blk_addr and blkszAdrian Hunter
Swap the positions of blk_addr and blksz in the tracepoint print arguments so that they match the print format. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: d2f82254e4e8 ("mmc: core: Add members to mmc_request and mmc_data for CQE's") Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-03-11Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Miscellaneous fixes, perhaps most notably removing obsolete code whose only purpose in life was to gather information for the now-removed RCU debugfs facility. Other notable changes include removing NO_HZ_FULL_ALL in favor of the nohz_full kernel boot parameter, minor optimizations for expedited grace periods, some added tracing, creating an RCU-specific workqueue using Tejun's new WQ_MEM_RECLAIM flag, and several cleanups to code and comments. - SRCU cleanups and optimizations. - Torture-test updates, perhaps most notably the adding of ARMv8 support, but also including numerous cleanups and usability fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-23Drop a bunch of metag referencesJames Hogan
Now that arch/metag/ has been removed, drop a bunch of metag references in various codes across the whole tree: - VM_GROWSUP and __VM_ARCH_SPECIFIC_1. - MT_METAG_* ELF note types. - METAG Kconfig dependencies (FRAME_POINTER) and ranges (MAX_STACK_SIZE_MB). - metag cases in tools (checkstack.pl, recordmcount.c, perf). Signed-off-by: James Hogan <jhogan@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linux-mm@kvack.org Cc: linux-metag@vger.kernel.org
2018-02-20rcu: Trace expedited GP delays due to transitioning CPUsPaul E. McKenney
If a CPU is transitioning to or from offline state, an expedited grace period may undergo a timed wait. This timed wait can unduly delay grace periods, so this commit adds a trace statement to make it visible. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20rcu: Add more tracing of expedited grace periodsPaul E. McKenney
This commit adds more tracing of expedited grace periods to enable improved debugging of slowdowns. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-18ext4: add tracepoints for shutdown and file system errorsTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>