Age | Commit message (Collapse) | Author |
|
The parent image is read only up to the overlap point, the rest of
the buffer should be zeroed. This snuck in because as it turns out
the overlap test case has not been triggering this code path for
a while now.
Fixes: a9b67e69949d ("rbd: replace obj_req->tried_parent with obj_req->read_state")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull kernel thread signal handling fix from Eric Biederman:
"I overlooked the fact that kernel threads are created with all signals
set to SIG_IGN, and accidentally caused a regression in cifs and drbd
when replacing force_sig with send_sig.
This is my fix for that regression. I add a new function
allow_kernel_signal which allows kernel threads to receive signals
sent from the kernel, but continues to ignore all signals sent from
userspace. This ensures the user space interface for cifs and drbd
remain the same.
These kernel threads depend on blocking networking calls which block
until something is received or a signal is pending. Making receiving
of signals somewhat necessary for these kernel threads.
Perhaps someday we can cleanup those interfaces and remove
allow_kernel_signal. If not allow_kernel_signal is pretty trivial and
clearly documents what is going on so I don't think we will mind
carrying it"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Allow cifs and drbd to receive their terminating signals
|
|
My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd. I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN. So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.
Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig. As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL. Which after the first signal will allow userspace
to send signals to these kernel threads. At least for
wake_ack_receiver in drbd that does not appear actively wrong.
So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.
Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.
This change was partly inspired by similar cifs and drbd patches that
added allow_signal.
Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com>
Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Steve French <smfrench@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
In read_per_ring_refs(), after 'req' and related memory regions are
allocated, xen_blkif_map() is invoked to map the shared frame, irq, and
etc. However, if this mapping process fails, no cleanup is performed,
leading to memory leaks. To fix this issue, invoke the cleanup before
returning the error.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A deadlock with this stacktrace was observed.
The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
shrinker and the shrinker depends on I/O completion in the dm-bufio
subsystem.
In order to fix the deadlock (and other similar ones), we set the flag
PF_MEMALLOC_NOIO at loop thread entry.
PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0"
#0 [ffff8813dedfb938] __schedule at ffffffff8173f405
#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1"
#0 [ffff88272f5af228] __schedule at ffffffff8173f405
#1 [ffff88272f5af280] schedule at ffffffff8173fa27
#2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
#3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
#4 [ffff88272f5af330] mutex_lock at ffffffff81742133
#5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
#6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
#12 [ffff88272f5af760] new_slab at ffffffff811f4523
#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
#23 [ffff88272f5afec0] kthread at ffffffff810a8428
#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit 3582dd291788 ("aoe: convert aoeblk to blk-mq"), aoedev_downdev
has had the possibility of sleeping and causing the following crash.
BUG: scheduling while atomic: rmmod/2242/0x00000003
Modules linked in: aoe
Preemption disabled at:
[<ffffffffc01d95e5>] flush+0x95/0x4a0 [aoe]
CPU: 7 PID: 2242 Comm: rmmod Tainted: G I 5.2.3 #1
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009
Call Trace:
dump_stack+0x4f/0x6a
? flush+0x95/0x4a0 [aoe]
__schedule_bug.cold+0x44/0x54
__schedule+0x44f/0x680
schedule+0x44/0xd0
blk_mq_freeze_queue_wait+0x46/0xb0
? wait_woken+0x80/0x80
blk_mq_freeze_queue+0x1b/0x20
aoedev_downdev+0x111/0x160 [aoe]
flush+0xff/0x4a0 [aoe]
aoedev_exit+0x23/0x30 [aoe]
aoe_exit+0x35/0x948 [aoe]
__se_sys_delete_module+0x183/0x210
__x64_sys_delete_module+0x16/0x20
do_syscall_64+0x4d/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f24e0043b07
Code: 73 01 c3 48 8b 0d 89 73 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f
1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d 59 73 0b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe18f7f1e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f24e0043b07
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555c3ecf87c8
RBP: 00007ffe18f7f1f0 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f24e00b4ac0 R11: 0000000000000206 R12: 00007ffe18f7f238
R13: 00007ffe18f7f410 R14: 00007ffe18f80e73 R15: 0000555c3ecf8760
This patch, handling in the same way of pass two, unlocks the locks and
restart pass one after aoedev_downdev is done.
Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq")
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block@vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
Cc: nbd@other.debian.org
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:
for i in {a..z}{a..z}; do
dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
mkfs.ext2 $i.image
mkdir mnt$i
done
echo "Run"
for i in {a..z}{a..z}; do
mount -o loop -t ext2 $i.image mnt$i &
done
Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.
Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener")
Cc: stable@vger.kernel.org
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Mark switch cases where we are expecting to fall through.
This patch fixes the following warning (Building: m68k):
drivers/block/ataflop.c: In function ‘fd_locked_ioctl’:
drivers/block/ataflop.c:1728:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
set_capacity(floppy->disk, MAX_DISK_SIZE * 2);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/block/ataflop.c:1729:2: note: here
case FDFMTEND:
^~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
- Several io_uring fixes/improvements:
- Blocking fix for O_DIRECT (me)
- Latter page slowness for registered buffers (me)
- Fix poll hang under certain conditions (me)
- Defer sequence check fix for wrapped rings (Zhengyuan)
- Mismatch in async inc/dec accounting (Zhengyuan)
- Memory ordering issue that could cause stall (Zhengyuan)
- Track sequential defer in bytes, not pages (Zhengyuan)
- NVMe pull request from Christoph
- Set of hang fixes for wbt (Josef)
- Redundant error message kill for libahci (Ding)
- Remove unused blk_mq_sched_started_request() and related ops (Marcos)
- drbd dynamic alloc shash descriptor to reduce stack use (Arnd)
- blkcg ->pd_stat() non-debug print (Tejun)
- bcache memory leak fix (Wei)
- Comment fix (Akinobu)
- BFQ perf regression fix (Paolo)
* tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits)
io_uring: ensure ->list is initialized for poll commands
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
drbd: dynamically allocate shash descriptor
block: blk-mq: Remove blk_mq_sched_started_request and started_request
bcache: fix possible memory leak in bch_cached_dev_run()
io_uring: track io length in async_list based on bytes
io_uring: don't use iov_iter_advance() for fixed buffers
block: properly handle IOCB_NOWAIT for async O_DIRECT IO
blk-mq: allow REQ_NOWAIT to return an error inline
io_uring: add a memory barrier before atomic_read
rq-qos: use a mb for got_token
rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
rq-qos: don't reset has_sleepers on spurious wakeups
rq-qos: fix missed wake-ups in rq_qos_throttle
wait: add wq_has_single_sleeper helper
block, bfq: check also in-flight I/O in dispatch plugging
block: fix sysfs module parameters directory path in comment
...
|
|
Building with clang and KASAN, we get a warning about an overly large
stack frame on 32-bit architectures:
drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
[-Werror,-Wframe-larger-than=]
We already allocate other data dynamically in this function, so
just do the same for the shash descriptor, which makes up most of
this memory.
Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull ceph updates from Ilya Dryomov:
"Lots of exciting things this time!
- support for rbd object-map and fast-diff features (myself). This
will speed up reads, discards and things like snap diffs on sparse
images.
- ceph.snap.btime vxattr to expose snapshot creation time (David
Disseldorp). This will be used to integrate with "Restore Previous
Versions" feature added in Windows 7 for folks who reexport ceph
through SMB.
- security xattrs for ceph (Zheng Yan). Only selinux is supported for
now due to the limitations of ->dentry_init_security().
- support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
Layton). This is actually a single feature bit which was missing
because of the filesystem pieces. With this in, the kernel client
will finally be reported as "luminous" by "ceph features" -- it is
still being reported as "jewel" even though all required Luminous
features were implemented in 4.13.
- stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
with xattrs is to not terminate and this was causing
inconsistencies with ceph-fuse.
- change filesystem time granularity from 1 us to 1 ns, again fixing
an inconsistency with ceph-fuse (Luis Henriques).
On top of this there are some additional dentry name handling and cap
flushing fixes from Zheng. Finally, Jeff is formally taking over for
Zheng as the filesystem maintainer"
* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
ceph: fix end offset in truncate_inode_pages_range call
ceph: use generic_delete_inode() for ->drop_inode
ceph: use ceph_evict_inode to cleanup inode's resource
ceph: initialize superblock s_time_gran to 1
MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
rbd: setallochint only if object doesn't exist
rbd: support for object-map and fast-diff
rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
libceph: export osd_req_op_data() macro
libceph: change ceph_osdc_call() to take page vector for response
libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
rbd: new exclusive lock wait/wake code
rbd: quiescing lock should wait for image requests
rbd: lock should be quiesced on reacquire
rbd: introduce copyup state machine
rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
rbd: move OSD request allocation into object request state machines
rbd: factor out __rbd_osd_setup_discard_ops()
rbd: factor out rbd_osd_setup_copyup()
rbd: introduce obj_req->osd_reqs list
...
|
|
Merge floppy ioctl verification fixes from Denis Efremov.
This also marks the floppy driver as orphaned - it turns out that Jiri
no longer has working hardware.
Actual working physical floppy hardware is getting hard to find, and
while Willy was able to test this, I think the driver can be considered
pretty much dead from an actual hardware standpoint. The hardware that
is still sold seems to be mainly USB-based, which doesn't use this
legacy driver at all.
The old floppy disk controller is still emulated in various VM
environments, so the driver isn't going away, but let's see if anybody
is interested to step up to maintain it.
The lack of hardware also likely means that the ioctl range verification
fixes are probably mostly relevant to anybody using floppies in a
virtual environment. Which is probably also going away in favor of USB
storage emulation, but who knows.
Will Decon reviewed the patches but I'm not rebasing them just for that,
so I'll add a
Reviewed-by: Will Deacon <will@kernel.org>
here instead.
* floppy:
MAINTAINERS: mark floppy.c orphaned
floppy: fix out-of-bounds read in copy_buffer
floppy: fix invalid pointer dereference in drive_name
floppy: fix out-of-bounds read in next_valid_format
floppy: fix div-by-zero in setup_format_params
|
|
This fixes a global out-of-bounds read access in the copy_buffer
function of the floppy driver.
The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect
and head fields (unsigned int) of the floppy_drive structure are used to
compute the max_sector (int) in the make_raw_rw_request function. It is
possible to overflow the max_sector. Next, max_sector is passed to the
copy_buffer function and used in one of the memcpy calls.
An unprivileged user could trigger the bug if the device is accessible,
but requires a floppy disk to be inserted.
The patch adds the check for the .sect * .head multiplication for not
overflowing in the set_geometry function.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This fixes the invalid pointer dereference in the drive_name function of
the floppy driver.
The native_format field of the struct floppy_drive_params is used as
floppy_type array index in the drive_name function. Thus, the field
should be checked the same way as the autodetect field.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should
be used to call the drive_name. A floppy disk is not required to be
inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for a value of the native_format field to be in
the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array
indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This fixes a global out-of-bounds read access in the next_valid_format
function of the floppy driver.
The values from autodetect field of the struct floppy_drive_params are
used as indices for the floppy_type array in the next_valid_format
function 'floppy_type[DP->autodetect[probed_format]].sect'.
To trigger the bug, one could use a value out of range and set the drive
parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to
be inserted.
CAP_SYS_ADMIN is required to call FDSETDRVPRM.
The patch adds the check for values of the autodetect field to be in the
'0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This fixes a divide by zero error in the setup_format_params function of
the floppy driver.
Two consecutive ioctls can trigger the bug: The first one should set the
drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK
to become zero. Next, the floppy format operation should be called.
A floppy disk is not required to be inserted. An unprivileged user
could trigger the bug if the device is accessible.
The patch checks F_SECT_PER_TRACK for a non-zero value in the
set_geometry function. The proper check should involve a reasonable
upper limit for the .sect and .rate fields, but it could change the
UAPI.
The patch also checks F_SECT_PER_TRACK in the setup_format_params, and
cancels the formatting operation in case of zero.
The bug was found by syzkaller.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab:
"As agreed with Jon, I'm sending this big series directly to you, c/c
him, as this series required a special care, in order to avoid
conflicts with other trees"
* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
docs: kbuild: fix build with pdf and fix some minor issues
docs: block: fix pdf output
docs: arm: fix a breakage with pdf output
docs: don't use nested tables
docs: gpio: add sysfs interface to the admin-guide
docs: locking: add it to the main index
docs: add some directories to the main documentation index
docs: add SPDX tags to new index files
docs: add a memory-devices subdir to driver-api
docs: phy: place documentation under driver-api
docs: serial: move it to the driver-api
docs: driver-api: add remaining converted dirs to it
docs: driver-api: add xilinx driver API documentation
docs: driver-api: add a series of orphaned documents
docs: admin-guide: add a series of orphaned documents
docs: cgroup-v1: add it to the admin-guide book
docs: aoe: add it to the driver-api book
docs: add some documentation dirs to the driver-api book
docs: driver-model: move it to the driver-api book
docs: lp855x-driver.rst: add it to the driver-api book
...
|
|
Pull more block updates from Jens Axboe:
"A later pull request with some followup items. I had some vacation
coming up to the merge window, so certain things items were delayed a
bit. This pull request also contains fixes that came in within the
last few days of the merge window, which I didn't want to push right
before sending you a pull request.
This contains:
- NVMe pull request, mostly fixes, but also a few minor items on the
feature side that were timing constrained (Christoph et al)
- Report zones fixes (Damien)
- Removal of dead code (Damien)
- Turn on cgroup psi memstall (Josef)
- block cgroup MAINTAINERS entry (Konstantin)
- Flush init fix (Josef)
- blk-throttle low iops timing fix (Konstantin)
- nbd resize fixes (Mike)
- nbd 0 blocksize crash fix (Xiubo)
- block integrity error leak fix (Wenwen)
- blk-cgroup writeback and priority inheritance fixes (Tejun)"
* tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits)
MAINTAINERS: add entry for block io cgroup
null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
block: Limit zone array allocation size
sd_zbc: Fix report zones buffer allocation
block: Kill gfp_t argument of blkdev_report_zones()
block: Allow mapping of vmalloc-ed buffers
block/bio-integrity: fix a memory leak bug
nvme: fix NULL deref for fabrics options
nbd: add netlink reconfigure resize support
nbd: fix crash when the blksize is zero
block: Disable write plugging for zoned block devices
block: Fix elevator name declaration
block: Remove unused definitions
nvme: fix regression upon hot device removal and insertion
blk-throttle: fix zero wait time for iops throttled group
block: Fix potential overflow in blk_report_zones()
blkcg: implement REQ_CGROUP_PUNT
blkcg, writeback: Implement wbc_blkcg_css()
blkcg, writeback: Add wbc->no_cgroup_owner
blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
...
|
|
The blockdev book basically contains user-faced documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
A previous commit changed the prototype, but didn't adjust the function
for when zoned device support is disabled. Fix it up.
Fixes: bd976e527259 ("block: Kill gfp_t argument of blkdev_report_zones()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If the device is setup with ioctl we can resize the device after the
initial setup, but if the device is setup with netlink we cannot use the
resize related ioctls and there is no netlink reconfigure size ATTR
handling code.
This patch adds netlink reconfigure resize support to match the ioctl
interface.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This will allow the blksize to be set zero and then use 1024 as
default.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
[fix to use goto out instead of return in genl_connect]
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
|
|
Pull block updates from Jens Axboe:
"This is the main block updates for 5.3. Nothing earth shattering or
major in here, just fixes, additions, and improvements all over the
map. This contains:
- Series of documentation fixes (Bart)
- Optimization of the blk-mq ctx get/put (Bart)
- null_blk removal race condition fix (Bob)
- req/bio_op() cleanups (Chaitanya)
- Series cleaning up the segment accounting, and request/bio mapping
(Christoph)
- Series cleaning up the page getting/putting for bios (Christoph)
- block cgroup cleanups and moving it to where it is used (Christoph)
- block cgroup fixes (Tejun)
- Series of fixes and improvements to bcache, most notably a write
deadlock fix (Coly)
- blk-iolatency STS_AGAIN and accounting fixes (Dennis)
- Series of improvements and fixes to BFQ (Douglas, Paolo)
- debugfs_create() return value check removal for drbd (Greg)
- Use struct_size(), where appropriate (Gustavo)
- Two lighnvm fixes (Heiner, Geert)
- MD fixes, including a read balance and corruption fix (Guoqing,
Marcos, Xiao, Yufen)
- block opal shadow mbr additions (Jonas, Revanth)
- sbitmap compare-and-exhange improvemnts (Pavel)
- Fix for potential bio->bi_size overflow (Ming)
- NVMe pull requests:
- improved PCIe suspent support (Keith Busch)
- error injection support for the admin queue (Akinobu Mita)
- Fibre Channel discovery improvements (James Smart)
- tracing improvements including nvmetc tracing support (Minwoo Im)
- misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
Kulkarni)"
- Various little fixes and improvements to drivers and core"
* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
blk-iolatency: fix STS_AGAIN handling
block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
blk-mq: simplify blk_mq_make_request()
blk-mq: remove blk_mq_put_ctx()
sbitmap: Replace cmpxchg with xchg
block: fix .bi_size overflow
block: sed-opal: check size of shadow mbr
block: sed-opal: ioctl for writing to shadow mbr
block: sed-opal: add ioctl for done-mark of shadow mbr
block: never take page references for ITER_BVEC
direct-io: use bio_release_pages in dio_bio_complete
block_dev: use bio_release_pages in bio_unmap_user
block_dev: use bio_release_pages in blkdev_bio_end_io
iomap: use bio_release_pages in iomap_dio_bio_end_io
block: use bio_release_pages in bio_map_user_iov
block: use bio_release_pages in bio_unmap_user
block: optionally mark pages dirty in bio_release_pages
block: move the BIO_NO_PAGE_REF check into bio_release_pages
block: skd_main.c: Remove call to memset after dma_alloc_coherent
block: mtip32xx: Remove call to memset after dma_alloc_coherent
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
|
|
setallochint is really only useful on object creation. Continue
hinting unconditionally if object map cannot be used.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.
Invalid object maps are not trusted, but still updated. Note that we
never iterate, resize or invalidate object maps. If object-map feature
is enabled but object map fails to load, we just fail the requester
(either "rbd map" or I/O, by way of post-acquire action).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Snapshot object map will be loaded in rbd_dev_image_probe(), so we need
to know snapshot's size (as opposed to HEAD's size) sooner.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
This will be used for loading object map. rbd_obj_read_sync() isn't
suitable because object map must be accessed through class methods.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
rbd_wait_state_locked() is built around rbd_dev->lock_waitq and blocks
rbd worker threads while waiting for the lock, potentially impacting
other rbd devices. There is no good way to pass an error code into
image request state machines when acquisition fails, hence the use of
RBD_DEV_FLAG_BLACKLISTED for everything and various other issues.
Introduce rbd_dev->acquiring_list and move acquisition into image
request state machine. Use rbd_img_schedule() for kicking and passing
error codes. No blocking occurs while waiting for the lock, but
rbd_dev->lock_rwsem is still held across lock, unlock and set_cookie
calls.
Always acquire the lock on "rbd map" to avoid associating the latency
of acquiring the lock with the first I/O request.
A slight regression is that lock_timeout is now respected only if lock
acquisition is triggered by "rbd map" and not by I/O. This is somewhat
compensated by the fact that we no longer block if the peer refuses to
release lock -- I/O is failed with EROFS right away.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Syncing OSD requests doesn't really work. A single image request may
be comprised of multiple object requests, each of which can go through
a series of OSD requests (original, copyups, etc). On top of that, the
OSD cliest may be shared with other rbd devices.
What we want is to ensure that all in-flight image requests complete.
Introduce rbd_dev->running_list and block in RBD_LOCK_STATE_RELEASING
until that happens. New OSD requests may be started during this time.
Note that __rbd_img_handle_request() acquires rbd_dev->lock_rwsem only
if need_exclusive_lock() returns true. This avoids a deadlock similar
to the one outlined in the previous commit between unlock and I/O that
doesn't require lock, such as a read with object-map feature disabled.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Quiesce exclusive lock at the top of rbd_reacquire_lock() instead
of only when ceph_cls_set_cookie() fails. This avoids a deadlock on
rbd_dev->lock_rwsem.
If rbd_dev->lock_rwsem is needed for I/O completion, set_cookie can
hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O
reply, because, like lock and unlock requests, set_cookie is sent and
waited upon with rbd_dev->lock_rwsem held for write.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Both write and copyup paths will get more complex with object map.
Factor copyup code out into a separate state machine.
While at it, take advantage of obj_req->osd_reqs list and issue empty
and current snapc OSD requests together, one after another.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
These functions don't allocate and set up OSD requests anymore.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Following submission, move initial OSD request allocation into object
request state machines. Everything that has to do with OSD requests is
now handled inside the state machine, all __rbd_img_fill_request() has
left is initialization.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
With obj_req->xferred removed, obj_req->ex.oe_off and obj_req->ex.oe_len
can be updated if required for alignment. Previously the new offset and
length weren't stored anywhere beyond rbd_obj_setup_discard().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Since the dawn of time it had been assumed that a single object request
spawns a single OSD request. This is already impacting copyup: instead
of sending empty and current snapc copyups together, we wait for empty
snapc OSD request to complete in order to reassign obj_req->osd_req
with current snapc OSD request. Looking further, updating potentially
hundreds of snapshot object maps serially is a non-starter.
Replace obj_req->osd_req pointer with obj_req->osd_reqs list. Use
osd_req->r_private_item for linkage.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Make it possible to schedule image requests on a workqueue. This fixes
parent chain recursion added in the previous commit and lays the ground
for exclusive lock wait/wake improvements.
The "wait for pending subrequests and report first nonzero result" code
is generalized to be used by object request state machine.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Start eliminating asymmetry where the initial OSD request is allocated
and submitted from outside the state machine, making error handling and
restarts harder than they could be. This commit deals with submission,
a commit that deals with allocation will follow.
Note that this commit adds parent chain recursion on the submission
side:
rbd_img_request_submit
rbd_obj_handle_request
__rbd_obj_handle_request
rbd_obj_handle_read
rbd_obj_handle_write_guard
rbd_obj_read_from_parent
rbd_img_request_submit
This will be fixed in the next commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
In preparation for moving OSD request allocation and submission into
object request state machines, get rid of RBD_OBJ_WRITE_{FLAT,GUARD}.
We would need to start in a new state, whether the request is guarded
or not. Unify them into RBD_OBJ_WRITE_OBJECT and pass guard info
through obj_req->flags.
While at it, make our ENOENT handling a little more precise: only hide
ENOENT when it is actually expected, that is on delete.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Make rbd_obj_handle_read() look like a state machine and get rid of
the necessity to patch result in rbd_obj_handle_request(), completing
the removal of obj_req->xferred and img_req->xferred.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
obj_req->xferred and img_req->xferred don't bring any value. The
former is used for short reads and has to be set to obj_req->ex.oe_len
after that and elsewhere. The latter is just an aggregate.
Use result for short reads (>=0 - number of bytes read, <0 - error) and
pass it around explicitly. No need to store it in obj_req.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
fix. Otherwise we end up having conflicts with future patches between
for-5.3/block and master that touch this area. In particular, it makes
the bio_full() fix hard to backport to stable.
* tag 'v5.2-rc6': (482 commits)
Linux 5.2-rc6
Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
Bluetooth: Fix regression with minimum encryption key size alignment
tcp: refine memory limit test in tcp_fragment()
x86/vdso: Prevent segfaults due to hoisted vclock reads
SUNRPC: Fix a credential refcount leak
Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
NFS4: Only set creation opendata if O_CREAT
ARM: 8867/1: vdso: pass --be8 to linker if necessary
KVM: nVMX: reorganize initial steps of vmx_set_nested_state
KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
habanalabs: use u64_to_user_ptr() for reading user pointers
nfsd: replace Jeff by Chuck as nfsd co-maintainer
inet: clear num_timeout reqsk_alloc()
PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
net: mvpp2: debugfs: Add pmap to fs dump
ipv6: Default fib6_type to RTN_UNICAST when not set
net: hns3: Fix inconsistent indenting
net/af_iucv: always register net_device notifier
...
|
|
If we pass pages through an iov_iter we always already have a reference
in the caller. Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In commit af7ddd8a627c
("Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In commit af7ddd8a627c
("Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|