Age | Commit message (Collapse) | Author |
|
Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Note that the get case is pretty weird in that it actually copies data
back to userspace from setsockopt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split ipv6_flowlabel_opt into a subfunction for each action and a small
wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot found that
touch /proc/testfile
causes NULL pointer dereference at tomoyo_get_local_path()
because inode of the dentry is NULL.
Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info
directly. Since proc_pid_ns() can only work with inode, using it in
the tomoyo_get_local_path() was wrong.
To avoid creating more functions for getting proc_ns, change the
argument type of the proc_pid_ns() function. Then, Tomoyo can use
the existing super_block to get pid_ns.
Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com
Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com
Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com
Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock")
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
|
fl_create() should call static_branch_deferred_inc() only in
case of success.
Also we should not call fl_free() in error path, as this could
cause a static key imbalance.
jump label: negative count!
WARNING: CPU: 0 PID: 15907 at kernel/jump_label.c:221 static_key_slow_try_dec kernel/jump_label.c:221 [inline]
WARNING: CPU: 0 PID: 15907 at kernel/jump_label.c:221 static_key_slow_try_dec+0x1ab/0x1d0 kernel/jump_label.c:206
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 15907 Comm: syz-executor.2 Not tainted 5.2.0-rc6+ #62
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x744 kernel/panic.c:219
__warn.cold+0x20/0x4d kernel/panic.c:576
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
RIP: 0010:static_key_slow_try_dec kernel/jump_label.c:221 [inline]
RIP: 0010:static_key_slow_try_dec+0x1ab/0x1d0 kernel/jump_label.c:206
Code: c0 e8 e9 3e e5 ff 83 fb 01 0f 85 32 ff ff ff e8 5b 3d e5 ff 45 31 ff eb a0 e8 51 3d e5 ff 48 c7 c7 40 99 92 87 e8 13 75 b7 ff <0f> 0b eb 8b 4c 89 e7 e8 a9 c0 1e 00 e9 de fe ff ff e8 bf 6d b7 ff
RSP: 0018:ffff88805f9c7450 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000
RDX: 000000000000e3e1 RSI: ffffffff815adb06 RDI: ffffed100bf38e7c
RBP: ffff88805f9c74e0 R08: ffff88806acf0700 R09: ffffed1015d060a9
R10: ffffed1015d060a8 R11: ffff8880ae830547 R12: ffffffff89832ce0
R13: ffff88805f9c74b8 R14: 1ffff1100bf38e8b R15: 00000000ffffff01
__static_key_slow_dec_deferred+0x65/0x110 kernel/jump_label.c:272
fl_free+0xa9/0xe0 net/ipv6/ip6_flowlabel.c:121
fl_create+0x6af/0x9f0 net/ipv6/ip6_flowlabel.c:457
ipv6_flowlabel_opt+0x80e/0x2730 net/ipv6/ip6_flowlabel.c:624
do_ipv6_setsockopt.isra.0+0x2119/0x4100 net/ipv6/ipv6_sockglue.c:825
ipv6_setsockopt+0xf6/0x170 net/ipv6/ipv6_sockglue.c:944
tcp_setsockopt net/ipv4/tcp.c:3131 [inline]
tcp_setsockopt+0x8f/0xe0 net/ipv4/tcp.c:3125
sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3130
__sys_setsockopt+0x253/0x4b0 net/socket.c:2080
__do_sys_setsockopt net/socket.c:2096 [inline]
__se_sys_setsockopt net/socket.c:2093 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2093
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4597c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2670556c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004597c9
RDX: 0000000000000020 RSI: 0000000000000029 RDI: 0000000000000003
RBP: 000000000075bfc8 R08: 000000000000fdf7 R09: 0000000000000000
R10: 0000000020000000 R11: 0000000000000246 R12: 00007f26705576d4
R13: 00000000004cec00 R14: 00000000004dd520 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..
Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
If not set, by default an autogenerated flowlabel is selected.
Explicit flowlabels require a control operation per label plus a
datapath check on every connection (every datagram if unconnected).
This is particularly expensive on unconnected sockets multiplexing
many flows, such as QUIC.
In the common case, where no lease is exclusive, the check can be
safely elided, as both lease request and check trivially succeed.
Indeed, autoflowlabel does the same even with exclusive leases.
Elide the check if no process has requested an exclusive lease.
fl6_sock_lookup previously returns either a reference to a lease or
NULL to denote failure. Modify to return a real error and update
all callers. On return NULL, they can use the label and will elide
the atomic_dec in fl6_sock_release.
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Changes RFC->v1:
- use static_key_false_deferred to rate limit jump label operations
- call static_key_deferred_flush to stop timers on exit
- move decrement out of RCU context
- defer optimization also if opt data is associated with a lease
- updated all fp6_sock_lookup callers, not just udp
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before taking a refcount, make sure the object is not already
scheduled for deletion.
Same fix is needed in ipv6_flowlabel_opt()
Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
but fl_free() releases fl->owner.pid before rcu grace period is started.
[1]
BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
pid_nr_ns+0x128/0x140 kernel/pid.c:407
ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
seq_read+0xad3/0x1130 fs/seq_file.c:268
proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
do_loop_readv_writev fs/read_write.c:701 [inline]
do_loop_readv_writev fs/read_write.c:688 [inline]
do_iter_read+0x4a9/0x660 fs/read_write.c:922
vfs_readv+0xf0/0x160 fs/read_write.c:984
kernel_readv fs/splice.c:358 [inline]
default_file_splice_read+0x475/0x890 fs/splice.c:413
do_splice_to+0x12a/0x190 fs/splice.c:876
splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
do_sendfile+0x597/0xd00 fs/read_write.c:1443
__do_sys_sendfile64 fs/read_write.c:1498 [inline]
__se_sys_sendfile64 fs/read_write.c:1490 [inline]
__x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
Allocated by task 17543:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_kmalloc mm/kasan/common.c:497 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3393 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
alloc_pid+0x55/0x8f0 kernel/pid.c:168
copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
copy_process kernel/fork.c:1709 [inline]
_do_fork+0x257/0xfd0 kernel/fork.c:2226
__do_sys_clone kernel/fork.c:2333 [inline]
__se_sys_clone kernel/fork.c:2327 [inline]
__x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 7789:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
__cache_free mm/slab.c:3499 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3765
put_pid.part.0+0x111/0x150 kernel/pid.c:111
put_pid+0x20/0x30 kernel/pid.c:105
fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
expire_timers kernel/time/timer.c:1362 [inline]
__run_timers kernel/time/timer.c:1681 [inline]
__run_timers kernel/time/timer.c:1649 [inline]
run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
__do_softirq+0x266/0x95a kernel/softirq.c:293
The buggy address belongs to the object at ffff888094012a00
which belongs to the cache pid_2 of size 88
The buggy address is located 4 bytes inside of
88-byte region [ffff888094012a00, ffff888094012a58)
The buggy address belongs to the page:
page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
>ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
^
ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
Fixes: 4f82f45730c6 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A request for a flowlabel fails in process or user exclusive mode must
fail if the caller pid or uid does not match. Invert the test.
Previously, the test was unsafe wrt PID recycling, but indeed tested
for inequality: fl1->owner != fl->owner
Fixes: 4f82f45730c68 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipcm_cookie includes sockcm_cookie. Do the same for ipcm6_cookie.
This reduces the number of arguments that need to be passed around,
applies ipcm6_init to all cookie fields at once and reduces code
differentiation between ipv4 and ipv6.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release. All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The code should be using the pid namespace from the procfs mount
instead of trying to look it up during open.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.
Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations create and destroy /proc entries.
ip6_fl_purge() makes almost the same actions as timer
ip6_fl_gc_timer does, and as it can be executed in parallel
with ip6_fl_purge(), two parallel ip6_fl_purge() may be
executed. So, we can mark it async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:
- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }
VFS stopped pinning module at this point.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This changes all DEFINE_TIMER() callbacks to use a struct timer_list
pointer instead of unsigned long. Since the data argument has already been
removed, none of these callbacks are using their argument currently, so
this renames the argument to "unused".
Done using the following semantic patch:
@match_define_timer@
declarer name DEFINE_TIMER;
identifier _timer, _callback;
@@
DEFINE_TIMER(_timer, _callback);
@change_callback depends on match_define_timer@
identifier match_define_timer._callback;
type _origtype;
identifier _origarg;
@@
void
-_callback(_origtype _origarg)
+_callback(struct timer_list *unused)
{ ... }
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Yet another big pile of changes:
- More year 2038 work from Arnd slowly reaching the point where we
need to think about the syscalls themself.
- A new timer function which allows to conditionally (re)arm a timer
only when it's either not running or the new expiry time is sooner
than the armed expiry time. This allows to use a single timer for
multiple timeout requirements w/o caring about the first expiry
time at the call site.
- A new NMI safe accessor to clock real time for the printk timestamp
work. Can be used by tracing, perf as well if required.
- A large number of timer setup conversions from Kees which got
collected here because either maintainers requested so or they
simply got ignored. As Kees pointed out already there are a few
trivial merge conflicts and some redundant commits which was
unavoidable due to the size of this conversion effort.
- Avoid a redundant iteration in the timer wheel softirq processing.
- Provide a mechanism to treat RTC implementations depending on their
hardware properties, i.e. don't inflict the write at the 0.5
seconds boundary which originates from the PC CMOS RTC to all RTCs.
No functional change as drivers need to be updated separately.
- The usual small updates to core code clocksource drivers. Nothing
really exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
timers: Add a function to start/reduce a timer
pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
timer: Prepare to change all DEFINE_TIMER() callbacks
netfilter: ipvs: Convert timers to use timer_setup()
scsi: qla2xxx: Convert timers to use timer_setup()
block/aoe: discover_timer: Convert timers to use timer_setup()
ide: Convert timers to use timer_setup()
drbd: Convert timers to use timer_setup()
mailbox: Convert timers to use timer_setup()
crypto: Convert timers to use timer_setup()
drivers/pcmcia: omap1: Fix error in automated timer conversion
ARM: footbridge: Fix typo in timer conversion
drivers/sgi-xp: Convert timers to use timer_setup()
drivers/pcmcia: Convert timers to use timer_setup()
drivers/memstick: Convert timers to use timer_setup()
drivers/macintosh: Convert timers to use timer_setup()
hwrng/xgene-rng: Convert timers to use timer_setup()
auxdisplay: Convert timers to use timer_setup()
sparc/led: Convert timers to use timer_setup()
mips: ip22/32: Convert timers to use timer_setup()
...
|
|
When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.
If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.
Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.
Fix is to properly set tot_len in fl6_merge_options()
At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().
[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x358/0x5a0 net/socket.c:1750
SyS_sendto+0x40/0x50 net/socket.c:1718
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Drop the arguments from the macro and adjust all callers with the
following script:
perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \
$(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h)
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # for m68k parts
Acked-by: Guenter Roeck <linux@roeck-us.net> # for watchdog parts
Acked-by: David S. Miller <davem@davemloft.net> # for networking parts
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org> # for wireless parts
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-mips@linux-mips.org
Cc: Petr Mladek <pmladek@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux1394-devel@lists.sourceforge.net
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-s390@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Michael Reed <mdr@sgi.com>
Cc: Manish Chopra <manish.chopra@cavium.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-pm@vger.kernel.org
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Gross <mark.gross@intel.com>
Cc: linux-watchdog@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: netdev@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lkml.kernel.org/r/1507159627-127660-11-git-send-email-keescook@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.
This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.
This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?
Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Silence lockdep false positive about rcu_dereference() being
used in the wrong context.
First one should use rcu_dereference_protected() as we own the spinlock.
Second one should be a normal assignation, as no barrier is needed.
Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.
Background:
IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.
Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.
This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.
Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.
No changes detected by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hold_net and release_net were an idea that turned out to be useless.
The code has been disabled since 2008. Kill the code it is long past due.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use spin_lock_bh in ip6_fl_purge() to prevent following potentially
deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer.
=================================
[ INFO: inconsistent lock state ]
3.19.0 #1 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(ip6_fl_lock){+.?...}, at: [<ffffffff8171155d>] ip6_fl_gc+0x2d/0x180
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810ee9a0>] __lock_acquire+0x4a0/0x10b0
[<ffffffff810efd54>] lock_acquire+0xc4/0x2b0
[<ffffffff81751d2d>] _raw_spin_lock+0x3d/0x80
[<ffffffff81711798>] ip6_flowlabel_net_exit+0x28/0x110
[<ffffffff815f9759>] ops_exit_list.isra.1+0x39/0x60
[<ffffffff815fa320>] cleanup_net+0x100/0x1e0
[<ffffffff810ad80a>] process_one_work+0x20a/0x830
[<ffffffff810adf4b>] worker_thread+0x11b/0x460
[<ffffffff810b42f4>] kthread+0x104/0x120
[<ffffffff81752bfc>] ret_from_fork+0x7c/0xb0
irq event stamp: 84640
hardirqs last enabled at (84640): [<ffffffff81752080>] _raw_spin_unlock_irq+0x30/0x50
hardirqs last disabled at (84639): [<ffffffff81751eff>] _raw_spin_lock_irq+0x1f/0x80
softirqs last enabled at (84628): [<ffffffff81091ad1>] _local_bh_enable+0x21/0x50
softirqs last disabled at (84629): [<ffffffff81093b7d>] irq_exit+0x12d/0x150
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(ip6_fl_lock);
<Interrupt>
lock(ip6_fl_lock);
*** DEADLOCK ***
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.
No changes to the resultant object files result as determined by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using a single fixed string is smaller code size than using
a format and many string arguments.
Reduces overall code size a little.
$ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
text data bss dec hex filename
34269 7012 14824 56105 db29 net/ipv4/igmp.o.new
34315 7012 14824 56151 db57 net/ipv4/igmp.o.old
30078 7869 13200 51147 c7cb net/ipv6/mcast.o.new
30105 7869 13200 51174 c7e6 net/ipv6/mcast.o.old
11434 3748 8580 23762 5cd2 net/ipv6/ip6_flowlabel.o.new
11491 3748 8580 23819 5d0b net/ipv6/ip6_flowlabel.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "else" block is on several lines and use bracket.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This replaces 6 identical code snippets with a call to a new
static inline function.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These include are here since kernel 2.2.7, but probably never used.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.
Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanups
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this option, the socket will reply with the flow label value read
on received packets.
The goal is to have a connection with the same flow label in both
direction of the communication.
Changelog of V4:
* Do not erase the flow label on the listening socket. Use pktopts to
store the received value
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes a suspicious rcu derference warning.
Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Take ip6_fl_lock before to read and update
a label.
v2: protect only the relevant code
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the last RFC 6437 does not give any constraints
for lifetime of flow labels, the previous RFC 3697
spoke of a minimum of 120 seconds between
reattribution of a flow label.
The maximum linger is currently set to 60 seconds
and does not allow this configuration without
CAP_NET_ADMIN right.
This patch increase the maximum linger to 150
seconds, allowing more flexibility to standard
users.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).
It helps application to take decision for a renew
or a release of the label.
v2:
* Add spin_lock to prevent race condition
* return -ENOENT if no result found
* check if flr_action is GET
v3:
* move the spin_lock to protect only the
relevant code
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code of flow label in Linux Kernel follows
the rules of RFC 1809 (an informational one) for
conditions on flow label sharing. There rules are
not in the last proposed standard for flow label
(RFC 6437), or in the previous one (RFC 3697).
Since this code does not follow any current or
old standard, we can remove it.
With this removal, the ipv6_opt_cmp function is
now a dead code and it can be removed too.
Changelog to v1:
* add justification for the change
* remove the condition on IPv6 options
[ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 18367681a10b (ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.)
omitted proper __rcu annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.
this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.
It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the following RCU warning:
[ 51.680236] ===============================
[ 51.681914] [ INFO: suspicious RCU usage. ]
[ 51.683610] 3.8.0-rc6-next-20130206-sasha-00028-g83214f7-dirty #276 Tainted: G W
[ 51.686703] -------------------------------
[ 51.688281] net/ipv6/ip6_flowlabel.c:671 suspicious rcu_dereference_check() usage!
we should use rcu_dereference_bh() when we hold rcu_read_lock_bh().
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
drivers/net/ethernet/intel/e1000e/ethtool.c
drivers/net/vmxnet3/vmxnet3_drv.c
drivers/net/wireless/iwlwifi/dvm/tx.c
net/ipv6/route.c
The ipv6 route.c conflict is simple, just ignore the 'net' side change
as we fixed the same problem in 'net-next' by eliminating cached
neighbours from ipv6 routes.
The e1000e conflict is an addition of a new statistic in the ethtool
code, trivial.
The vmxnet3 conflict is about one change in 'net' removing a guarding
conditional, whilst in 'net-next' we had a netdev_info() conversion.
The iwlwifi conflict is dealing with a WARN_ON() conversion in
'net-next' vs. a revert happening in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|