Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core
Pull KCSAN updates for v5.10 from Paul E. McKenney:
- Improve kernel messages.
- Be more permissive with bitops races under KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.
- Optimize debugfs stat counters.
- Introduce the instrument_*read_write() annotations, to provide a
finer description of certain ops - using KCSAN's compound instrumentation.
Use them for atomic RNW and bitops, where appropriate.
Doing this might find new races.
(Depends on the compiler having tsan-compound-read-before-write=1 support.)
- Support atomic built-ins, which will help certain architectures, such as s390.
- Misc enhancements and smaller fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Steve reported that lockdep_assert*irq*(), when nested inside lockdep
itself, will trigger a false-positive.
One example is the stack-trace code, as called from inside lockdep,
triggering tracing, which in turn calls RCU, which then uses
lockdep_assert_irqs_disabled().
Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Basically print_lock_class_header()'s for loop is out of sync with the
the size of of ->usage_traces[].
Also clean things up a bit while at it, to avoid such mishaps in the future.
Fixes: 23870f122768 ("locking/lockdep: Fix "USED" <- "IN-NMI" inversions")
Reported-by: Qian Cai <cai@redhat.com>
Debugged-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Qian Cai <cai@redhat.com>
Link: https://lkml.kernel.org/r/20200930094937.GE2651@hirez.programming.kicks-ass.net
|
|
The error handling introduced by commit:
2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.
Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
|
|
Pull networking fixes from Jakub Kicinski:
"One more set of fixes from the networking tree:
- add missing input validation in nl80211_del_key(), preventing
out-of-bounds access
- last minute fix / improvement of a MRP netlink (uAPI) interface
introduced in 5.9 (current) release
- fix "unresolved symbol" build error under CONFIG_NET w/o
CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock
BTF.
- fix 32 bit sub-register bounds tracking in the bpf verifier for OR
case
- tcp: fix receive window update in tcp_add_backlog()
- openvswitch: handle DNAT tuple collision in conntrack-related code
- r8169: wait for potential PHY reset to finish after applying a FW
file, avoiding unexpected PHY behaviour and failures later on
- mscc: fix tail dropping watermarks for Ocelot switches
- avoid use-after-free in macsec code after a call to the GRO layer
- avoid use-after-free in sctp error paths
- add a device id for Cellient MPL200 WWAN card
- rxrpc fixes:
- fix the xdr encoding of the contents read from an rxrpc key
- fix a BUG() for a unsupported encoding type.
- fix missing _bh lock annotations.
- fix acceptance handling for an incoming call where the incoming
call is encrypted.
- the server token keyring isn't network namespaced - it belongs
to the server, so there's no need. Namespacing it means that
request_key() fails to find it.
- fix a leak of the server keyring"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (21 commits)
net: usb: qmi_wwan: add Cellient MPL200 card
macsec: avoid use-after-free in macsec_handle_frame()
r8169: consider that PHY reset may still be in progress after applying firmware
openvswitch: handle DNAT tuple collision
sctp: fix sctp_auth_init_hmacs() error path
bridge: Netlink interface fix.
net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
bpf: Fix scalar32_min_max_or bounds tracking
tcp: fix receive window update in tcp_add_backlog()
net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
mptcp: more DATA FIN fixes
net: mscc: ocelot: warn when encoding an out-of-bounds watermark value
net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
rxrpc: Fix server keyring leak
rxrpc: The server keyring isn't network-namespaced
rxrpc: Fix accept on a connection that need securing
rxrpc: Fix some missing _bh annotations on locking conn->state_lock
rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
rxrpc: Fix rxkad token xdr encoding
...
|
|
Currently, init_listener() tries to prevent adding a filter with
SECCOMP_FILTER_FLAG_NEW_LISTENER if one of the existing filters already
has a listener. However, this check happens without holding any lock that
would prevent another thread from concurrently installing a new filter
(potentially with a listener) on top of the ones we already have.
Theoretically, this is also a data race: The plain load from
current->seccomp.filter can race with concurrent writes to the same
location.
Fix it by moving the check into the region that holds the siglock to guard
against concurrent TSYNC.
(The "Fixes" tag points to the commit that introduced the theoretical
data race; concurrent installation of another filter with TSYNC only
became possible later, in commit 51891498f2da ("seccomp: allow TSYNC and
USER_NOTIF together").)
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201005014401.490175-1-jannh@google.com
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-10-08
The main changes are:
1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due
to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song.
2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simon reported an issue with the current scalar32_min_max_or() implementation.
That is, compared to the other 32 bit subreg tracking functions, the code in
scalar32_min_max_or() stands out that it's using the 64 bit registers instead
of 32 bit ones. This leads to bounds tracking issues, for example:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
The bound tests on the map value force the upper unsigned bound to be 25769803777
in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
tracker. This is incorrect given the only thing we know is that the value must be
positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
we know that these are positive. Previously, in the else branch the 64 bit values
of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
be 1 as upper bound there. After the fix the subreg range is now correct:
[...]
8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
8: (79) r1 = *(u64 *)(r0 +0)
R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
9: (b7) r0 = 1
10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
10: (18) r2 = 0x600000002
12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
12: (ad) if r1 < r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
13: (95) exit
14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
14: (25) if r1 > 0x0 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
15: (95) exit
16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
16: (47) r1 |= 0
17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
[...]
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Simon Scannell <scannell.smn@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Drop a redundant local variable definition from sugov_fast_switch()
and rearrange the code in there to avoid the redundant logical
negation.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Use and entirely separate code path for the DMA_ATTR_NO_KERNEL_MAPPING
path. This avoids any confusion about the ret type, and avoids lots of
attr checks and helpers that can be significantly simplified now.
It also ensures that common handling is applied to architetures still
using the arch alloc/free hooks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This ensures dma_direct_alloc_pages will use the right gfp mask, as
well as keeping the code for that common between the two allocators.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Check for highmem pages from CMA, just like in the dma_direct_alloc path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Kernel threads intentionally do CLONE_FS in order to follow any changes
that 'init' does to set up the root directory (or cwd).
It is admittedly a bit odd, but it avoids the situation where 'init'
does some extensive setup to initialize the system environment, and then
we execute a usermode helper program, and it uses the original FS setup
from boot time that may be very limited and incomplete.
[ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
follow the root regardless, since it fixes up other users of root (see
chroot_fs_refs() for details), but overmounting root and doing a
chroot() would not. ]
However, Vegard Nossum noticed that the CLONE_FS not only means that we
follow the root and current working directories, it also means we share
umask with whatever init changed it to. That wasn't intentional.
Just reset umask to the original default (0022) before actually starting
the usermode helper program.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
An interrupt that is disabled/masked but set for wakeup may still need to
be able to wake up the system from sleep states like "suspend to RAM".
To that effect, introduce the IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag.
If the irqchip have this flag set, the irq PM code will enable/unmask
the irqs that are marked for wakeup, but that are in a disabled state.
On resume, such irqs will be restored back to their disabled state.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
[maz: commit message fix-up]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/1601267524-20199-4-git-send-email-mkshah@codeaurora.org
|
|
Move more nitty gritty DMA implementation details into the common
internal header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Most of the dma_direct symbols should only be used by direct.c and
mapping.c, so move them to kernel/dma. In fact more of dma-direct.h
should eventually move, but that will require more coordination with
other subsystems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Most of dma-debug.h is not required by anything outside of kernel/dma.
Move the four declarations needed by dma-mappin.h or dma-ops providers
into dma-mapping.h and dma-map-ops.h, and move the remainder of the
file to kernel/dma/debug.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Just provide a weak default definition of dma_contiguous_early_fixup and
let arm override it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Merge dma-contiguous.h into dma-map-ops.h, after removing the comment
describing the contiguous allocator into kernel/dma/contigous.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
dma_contiguous_set_default contains a trivial assignment, and has a
single caller that is compiled if CONFIG_CMA_DMA is enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
dev_set_cma_area contains a trivial assignment. It has just three
callers that all have a non-NULL device and depend on CONFIG_DMA_CMA,
so remove the wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Split out all the bits that are purely for dma_map_ops implementations
and related code into a new <linux/dma-map-ops.h> header so that they
don't get pulled into all the drivers. That also means the architecture
specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h>
any more, which leads to a missing includes that were pulled in by the
x86 or arm versions in a few not overly portable drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
On an embedded system with a tiny (1 MiB) CMA area for video memory, and
a simple enough video pipeline, we can decrease the CMA_ALIGNMENT by a
factor of 2 to avoid wasting memory, as all the allocations for video
buffers will be of the exact same size (dictated by the size of the
screen).
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pull networking fixes from David Miller:
1) Make sure SKB control block is in the proper state during IPSEC
ESP-in-TCP encapsulation. From Sabrina Dubroca.
2) Various kinds of attributes were not being cloned properly when we
build new xfrm_state objects from existing ones. Fix from Antony
Antony.
3) Make sure to keep BTF sections, from Tony Ambardar.
4) TX DMA channels need proper locking in lantiq driver, from Hauke
Mehrtens.
5) Honour route MTU during forwarding, always. From Maciej
Żenczykowski.
6) Fix races in kTLS which can result in crashes, from Rohit
Maheshwari.
7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
Jha.
8) Use correct address family in xfrm state lookups, from Herbert Xu.
9) A bridge FDB flush should not clear out user managed fdb entries
with the ext_learn flag set, from Nikolay Aleksandrov.
10) Fix nested locking of netdev address lists, from Taehee Yoo.
11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.
12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.
13) Don't free command entries in mlx5 while comp handler could still be
running, from Eran Ben Elisha.
14) Error flow of request_irq() in mlx5 is busted, due to an off by one
we try to free and IRQ never allocated. From Maor Gottlieb.
15) Fix leak when dumping netlink policies, from Johannes Berg.
16) Sendpage cannot be performed when a page is a slab page, or the page
count is < 1. Some subsystems such as nvme were doing so. Create a
"sendpage_ok()" helper and use it as needed, from Coly Li.
17) Don't leak request socket when using syncookes with mptcp, from
Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net/core: check length before updating Ethertype in skb_mpls_{push,pop}
net: mvneta: fix double free of txq->buf
net_sched: check error pointer in tcf_dump_walker()
net: team: fix memory leak in __team_options_register
net: typhoon: Fix a typo Typoon --> Typhoon
net: hinic: fix DEVLINK build errors
net: stmmac: Modify configuration method of EEE timers
tcp: fix syn cookied MPTCP request socket leak
libceph: use sendpage_ok() in ceph_tcp_sendpage()
scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
tcp: use sendpage_ok() to detect misused .sendpage
nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
net: introduce helper sendpage_ok() in include/linux/net.h
net: usb: pegasus: Proper error handing when setting pegasus' MAC address
net: core: document two new elements of struct net_device
netlink: fix policy dump leak
net/mlx5e: Fix race condition on nhe->n pointer in neigh update
net/mlx5e: Fix VLAN create flow
...
|
|
get_gendisk grabs a reference on the disk and file operation, so this
code will leak both of them while having absolutely no use for the
gendisk itself.
This effectively reverts commit 2df83fa4bce421f ("PM / Hibernate: Use
get_gendisk to verify partition if resume_file is integer format")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct. Add
a helper just for that and then mark bdget static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace /* FALL THRU */ comment with the new pseudo-keyword macro
fallthrough[1].
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201002224627.GA30475@embeddedor
|
|
The cpufreq core handles the updates to policy->cur and recording of
cpufreq trace events for all the governors except schedutil's fast
switch case.
Move that as well to cpufreq core for consistency and readability.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
To perform partial reads, callers of kernel_read_file*() must have a
non-NULL file_size argument and a preallocated buffer. The new "offset"
argument can then be used to seek to specific locations in the file to
fill the buffer to, at most, "buf_size" per call.
Where possible, the LSM hooks can report whether a full file has been
read or not so that the contents can be reasoned about.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that there is an API for checking loaded contents for modules
loaded without a file, call into the LSM hooks.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KP Singh <kpsingh@google.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Link: https://lore.kernel.org/r/20201002173828.2099543-11-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are a few places in the kernel where LSMs would like to have
visibility into the contents of a kernel buffer that has been loaded or
read. While security_kernel_post_read_file() (which includes the
buffer) exists as a pairing for security_kernel_read_file(), no such
hook exists to pair with security_kernel_load_data().
Earlier proposals for just using security_kernel_post_read_file() with a
NULL file argument were rejected (i.e. "file" should always be valid for
the security_..._file hooks, but it appears at least one case was
left in the kernel during earlier refactoring. (This will be fixed in
a subsequent patch.)
Since not all cases of security_kernel_load_data() can have a single
contiguous buffer made available to the LSM hook (e.g. kexec image
segments are separately loaded), there needs to be a way for the LSM to
reason about its expectations of the hook coverage. In order to handle
this, add a "contents" argument to the "kernel_load_data" hook that
indicates if the newly added "kernel_post_load_data" hook will be called
with the full contents once loaded. That way, LSMs requiring full contents
can choose to unilaterally reject "kernel_load_data" with contents=false
(which is effectively the existing hook coverage), but when contents=true
they can allow it and later evaluate the "kernel_post_load_data" hook
once the buffer is loaded.
With this change, LSMs can gain coverage over non-file-backed data loads
(e.g. init_module(2) and firmware userspace helper), which will happen
in subsequent patches.
Additionally prepare IMA to start processing these cases.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In preparation for adding partial read support, add an optional output
argument to kernel_read_file*() that reports the file size so callers
can reason more easily about their reading progress.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In preparation for refactoring kernel_read_file*(), remove the redundant
"size" argument which is not needed: it can be included in the return
code, with callers adjusted. (VFS reads already cannot be larger than
INT_MAX.)
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
include file. That header gets pulled in just about everywhere
and doesn't really need functions not related to the general fs interface.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
that are interested in filtering between types of things. The "how"
should be an internal detail made uninteresting to the LSMs.
Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-2-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add the support to route trace_marker buffer to other destination
via trace_export.
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20201005071319.78508-5-alexander.shishkin@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Only function traces can be exported to other destinations currently.
This patch exports event trace as well. Move trace export related
function to the beginning of file so other trace can call
trace_process_export() to export.
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20201005071319.78508-4-alexander.shishkin@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
More traces like event trace or trace marker will be supported.
Add flag for difference traces, so that they can be controlled
separately. Move current function trace to it's own flag
instead of global ftrace enable flag.
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20201005071319.78508-3-alexander.shishkin@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
stress-ng has a test (stress-ng --cyclic) that creates a set of threads
under SCHED_DEADLINE with the following parameters:
dl_runtime = 10000 (10 us)
dl_deadline = 100000 (100 us)
dl_period = 100000 (100 us)
These parameters are very aggressive. When using a system without HRTICK
set, these threads can easily execute longer than the dl_runtime because
the throttling happens with 1/HZ resolution.
During the main part of the test, the system works just fine because
the workload does not try to run over the 10 us. The problem happens at
the end of the test, on the exit() path. During exit(), the threads need
to do some cleanups that require real-time mutex locks, mainly those
related to memory management, resulting in this scenario:
Note: locks are rt_mutexes...
------------------------------------------------------------------------
TASK A: TASK B: TASK C:
activation
activation
activation
lock(a): OK! lock(b): OK!
<overrun runtime>
lock(a)
-> block (task A owns it)
-> self notice/set throttled
+--< -> arm replenished timer
| switch-out
| lock(b)
| -> <C prio > B prio>
| -> boost TASK B
| unlock(a) switch-out
| -> handle lock a to B
| -> wakeup(B)
| -> B is throttled:
| -> do not enqueue
| switch-out
|
|
+---------------------> replenishment timer
-> TASK B is boosted:
-> do not enqueue
------------------------------------------------------------------------
BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
crashes with hung task C.
This problem is avoided by removing the throttle state from the boosted
thread while boosting it (by TASK A in the example above), allowing it to
be queued and run boosted.
The next replenishment will take care of the runtime overrun, pushing
the deadline further away. See the "while (dl_se->runtime <= 0)" on
replenish_dl_entity() for more information.
Reported-by: Mark Simmons <msimmons@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Mark Simmons <msimmons@redhat.com>
Link: https://lkml.kernel.org/r/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bristot@redhat.com
|
|
rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1598605249-72651-1-git-send-email-vincent.donnefort@arm.com
|
|
Currently, pick_next_entity(...) has the following structure
(simplified):
[...]
if (last_buddy_ok())
result = last_buddy;
if (next_buddy_ok())
result = next_buddy;
[...]
The intended behavior is to prefer next buddy over last buddy;
the current code somewhat obfuscates this, and also wastes
cycles checking the last buddy when eventually the next buddy is
picked up.
So this patch refactors two 'ifs' above into
[...]
if (next_buddy_ok())
result = next_buddy;
else if (last_buddy_ok())
result = last_buddy;
[...]
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guitttot@linaro.org>
Link: https://lkml.kernel.org/r/20200930173532.1069092-1-posk@google.com
|
|
Functions that are passed to early_initcall should be of type
initcall_t, which expects a return type of int. This is not currently an
error but a patch in the Clang LTO series could change that in the
future.
Fixes: 9183c3f9ed71 ("static_call: Add inline static call infrastructure")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/lkml/20200903203053.3411268-17-samitolvanen@google.com/
|
|
Naresh reported a bug that appears to be a side effect of the static
calls. It happens when going from more than one tracepoint callback to
a single one, and removing the first callback on the list. The list of
tracepoint callbacks holds data and a function to call with the
parameters of that tracepoint and a handler to the associated data.
old_list:
0: func = foo; data = NULL;
1: func = bar; data = &bar_struct;
new_list:
0: func = bar; data = &bar_struct;
CPU 0 CPU 1
----- -----
tp_funcs = old_list;
tp_static_caller = tp_interator
__DO_TRACE()
data = tp_funcs[0].data = NULL;
tp_funcs = new_list;
tracepoint_update_call()
tp_static_caller = tp_funcs[0] = bar;
tp_static_caller(data)
bar(data)
x = data->item = NULL->item
BOOM!
To solve this, add a tracepoint_synchronize_unregister() between
changing tp_funcs and updating the static tracepoint, that does both a
synchronize_rcu() and synchronize_srcu(). This will ensure that when
the static call is updated to the single callback that it will be
receiving the data that it registered with.
Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/linux-next/CA+G9fYvPXVRO0NV7yL=FxCmFEMYkCwdz7R=9W+_votpT824YJA@mail.gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two tracing fixes:
- Fix temp buffer accounting that caused a WARNING for
ftrace_dump_on_opps()
- Move the recursion check in one of the function callback helpers to
the beginning of the function, as if the rcu_is_watching() gets
traced, it will cause a recursive loop that will crash the kernel"
* tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Move RCU is watching check after recursion check
tracing: Fix trace_find_next_entry() accounting of temp buffer size
|
|
Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.
With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.
Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Do not report failure on zero sized writes, and handle them as no-op.
There's issues for example in case of writev() when there's iovec
containing zero buffer as a first one. It's expected writev() on below
example to successfully perform the write to specified writable cgroup
file expecting integer value, and to return 2. For now it's returning
value -1, and skipping the write:
int writetest(int fd) {
const char *buf1 = "";
const char *buf2 = "1\n";
struct iovec iov[2] = {
{ .iov_base = (void*)buf1, .iov_len = 0 },
{ .iov_base = (void*)buf2, .iov_len = 2 }
};
return writev(fd, iov, 2);
}
This patch fixes the issue by checking if there's nothing to write,
and handling the write as no-op by just returning 0.
Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This step is already done in rebind_subsystems().
Not necessary to do it again.
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
@setup_text_buf only copies the original text messages (without any
prefix or extended text). It only needs to be LOG_LINE_MAX in size.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200930090134.8723-3-john.ogness@linutronix.de
|