summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2019-11-14block: remove (__)blkdev_reread_part as an exported APIChristoph Hellwig
In general drivers should never mess with partition tables directly. Unfortunately s390 and loop do for somewhat historic reasons, but they can use bdev_disk_changed directly instead when we export it as they satisfy the sanity checks we have in __blkdev_reread_part. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd] Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12block: rework zone reportingChristoph Hellwig
Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12null_blk: Add zone_nr_conv to featuresDamien Le Moal
For a null_blk device with zoned mode enabled, the number of conventional zones can be configured through configfs with the zone_nr_conv parameter. Add this missing parameter in the features string. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12null_blk: clean up report zonesChristoph Hellwig
Make the instance name match the method name and define the name to NULL instead of providing an inline stub, which is rather pointless for a method call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12null_blk: clean up the block device operationsChristoph Hellwig
Remove the pointless stub open and release methods, give the operations vector a slightly less confusing name, and use normal alignment for the assignment operators. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12Merge branch 'for-5.5/drivers' into for-5.5/zonedJens Axboe
* for-5.5/drivers: (38 commits) null_blk: add zone open, close, and finish support dm: add zone open, close and finish support nvme: Fix parsing of ANA log page nvmet: stop using bio_set_op_attrs nvmet: add plugging for read/write when ns is bdev nvmet: clean up command parsing a bit nvme-pci: Spelling s/resdicovered/rediscovered/ nvmet: fill discovery controller sn, fr and mn correctly nvmet: Open code nvmet_req_execute() nvmet: Remove the data_len field from the nvmet_req struct nvmet: Introduce nvmet_dsm_len() helper nvmet: Cleanup discovery execute handlers nvmet: Introduce common execute function for get_log_page and identify nvmet-tcp: Don't set the request's data_len nvmet-tcp: Don't check data_len in nvmet_tcp_map_data() nvme: Introduce nvme_lba_to_sect() nvme: Cleanup and rename nvme_block_nr() nvme: resync include/linux/nvme.h with nvmecli nvme: move common call to nvme_cleanup_cmd to core layer nvme: introduce "Command Aborted By host" status code ...
2019-11-08block: drbd: remove a stray unlock in __drbd_send_protocol()Dan Carpenter
There are two callers of this function and they both unlock the mutex so this ends up being a double unlock. Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07null_blk: add zone open, close, and finish supportAjay Joshi
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH support to allow explicit control of zone states. Contains contributions from Matias Bjorling, Hans Holmberg, Keith Busch and Damien Le Moal. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-01loop: fix no-unmap write-zeroes request behaviorDarrick J. Wong
Currently, if the loop device receives a WRITE_ZEROES request, it asks the underlying filesystem to punch out the range. This behavior is correct if unmapping is allowed. However, a NOUNMAP request means that the caller doesn't want us to free the storage backing the range, so punching out the range is incorrect behavior. To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to the fallocate documentation) required to ensure that the entire range is backed by real storage, which suffices for our purposes. Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25nbd: verify socket is supported during setupMike Christie
nbd requires socket families to support the shutdown method so the nbd recv workqueue can be woken up from its sock_recvmsg call. If the socket does not support the callout we will leave recv works running or get hangs later when the device or module is removed. This adds a check during socket connection/reconnection to make sure the socket being passed in supports the needed callout. Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25block: mtip32xx: Spelling s/configration/configuration/Geert Uytterhoeven
Fix misspelling of "configuration". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25nbd: handle racing with error'ed out commandsJosef Bacik
We hit the following warning in production print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700 ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60 Workqueue: knbd-recv recv_work [nbd] RIP: 0010:refcount_sub_and_test_checked+0x53/0x60 Call Trace: blk_mq_free_request+0xb7/0xf0 blk_mq_complete_request+0x62/0xf0 recv_work+0x29/0xa1 [nbd] process_one_work+0x1f5/0x3f0 worker_thread+0x2d/0x3d0 ? rescuer_thread+0x340/0x340 kthread+0x111/0x130 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x1f/0x30 ---[ end trace b079c3c67f98bb7c ]--- This was preceded by us timing out everything and shutting down the sockets for the device. The problem is we had a request in the queue at the same time, so we completed the request twice. This can actually happen in a lot of cases, we fail to get a ref on our config, we only have one connection and just error out the command, etc. Fix this by checking cmd->status in nbd_read_stat. We only change this under the cmd->lock, so we are safe to check this here and see if we've already error'ed this command out, which would indicate that we've completed it as well. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25nbd: protect cmd->status with cmd->lockJosef Bacik
We already do this for the most part, except in timeout and clear_req. For the timeout case we take the lock after we grab a ref on the config, but that isn't really necessary because we're safe to touch the cmd at this point, so just move the order around. For the clear_req cause this is initiated by the user, so again is safe. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-23pktcdvd: add compat_ioctl handlerArnd Bergmann
pkt_ioctl() implements the generic SCSI_IOCTL_SEND_COMMAND and some cdrom ioctls by forwarding to the underlying block device. For compat_ioctl handling, this always takes a roundtrip through fs/compat_ioctl.c that we should try to avoid, at least for the compatible commands. CDROM_SEND_PACKET is an exception here, it requires special translation in compat_blkdev_driver_ioctl(). CDROM_LAST_WRITTEN has no compat handling at the moment. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-19zram: fix race between backing_dev_show and backing_dev_storeChenwandun
CPU0: CPU1: backing_dev_show backing_dev_store ...... ...... file = zram->backing_dev; down_read(&zram->init_lock); down_read(&zram->init_init_lock) file_path(file, ...); zram->backing_dev = backing_dev; up_read(&zram->init_lock); up_read(&zram->init_lock); gets the value of zram->backing_dev too early in backing_dev_show, which resultin the value being NULL at the beginning, and not NULL later. backtrace: d_path+0xcc/0x174 file_path+0x10/0x18 backing_dev_show+0x40/0xb4 dev_attr_show+0x20/0x54 sysfs_kf_seq_show+0x9c/0x10c kernfs_seq_show+0x28/0x30 seq_read+0x184/0x488 kernfs_fop_read+0x5c/0x1a4 __vfs_read+0x44/0x128 vfs_read+0xa0/0x138 SyS_read+0x54/0xb4 Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com Signed-off-by: Chenwandun <chenwandun@huawei.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-18drbd: Use pr_warn instead of pr_warningKefeng Wang
As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/20191018031850.48498-9-wangkefeng.wang@huawei.com To: linux-kernel@vger.kernel.org Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-17null_blk: return fixed zoned reads > write pointerAjay Joshi
A zoned block device maintains a write pointer within a zone, and reads beyond the write pointer are undefined. Fill data buffer returned above the write pointer with 0xFF. Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-15rbd: cancel lock_dwork if the wait is interruptedDongsheng Yang
There is a warning message in my test with below steps: # rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test & # sleep 5 # pkill -9 rbd # rbd map test & # sleep 5 # pkill rbd The reason is that the rbd_add_acquire_lock() is interruptable, that means, when we kill the waiting on ->acquire_wait, the lock_dwork could be still running. 1. do_rbd_add() 2. lock_dwork rbd_add_acquire_lock() - queue_delayed_work() lock_dwork queued - wait_for_completion_killable_timeout() <-- kill happen rbd_dev_image_unlock() <-- UNLOCKED now, nothing to do. rbd_dev_device_release() rbd_dev_image_release() - ... lock successed here - cancel_delayed_work_sync(&rbd_dev->lock_dwork) Then when we reach the rbd_dev_free(), WARN_ON is triggered because lock_state is not RBD_LOCK_STATE_UNLOCKED. To fix it, this commit make sure the lock_dwork was finished before calling rbd_dev_image_unlock(). On the other hand, this would not happend in do_rbd_remove(), because after rbd mapped, lock_dwork will only be queued for IO request, and request will continue unless lock_dwork finished. when we call rbd_dev_image_unlock() in do_rbd_remove(), all requests are done. That means, lock_state should not be locked again after rbd_dev_image_unlock(). [ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is interrupted. ] Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-10-11Merge tag 'for-linus-20191010' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Fix wbt performance regression introduced with the blk-rq-qos refactoring (Harshad) - Fix io_uring fileset removal inadvertently killing the workqueue (me) - Fix io_uring typo in linked command nonblock submission (Pavel) - Remove spurious io_uring wakeups on request free (Pavel) - Fix null_blk zoned command error return (Keith) - Don't use freezable workqueues for backing_dev, also means we can revert a previous libata hack (Mika) - Fix nbd sysfs mutex dropped too soon at removal time (Xiubo) * tag 'for-linus-20191010' of git://git.kernel.dk/linux-block: nbd: fix possible sysfs duplicate warning null_blk: Fix zoned command return code io_uring: only flush workqueues on fileset removal io_uring: remove wait loop spurious wakeups blk-wbt: fix performance regression in wbt scale_up/scale_down Revert "libata, freezer: avoid block device removal while system is frozen" bdi: Do not use freezable workqueue io_uring: fix reversed nonblock flag for link submission
2019-10-10nbd: fix possible sysfs duplicate warningXiubo Li
1. nbd_put takes the mutex and drops nbd->ref to 0. It then does idr_remove and drops the mutex. 2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails to find an existing device, so it does nbd_dev_add. 3. just before the nbd_put could call nbd_dev_remove or not finished totally, but if nbd_dev_add try to add_disk, we can hit: debugfs: Directory 'nbd1' with parent 'block' already present! This patch will make sure all the disk add/remove stuff are done by holding the nbd_index_mutex lock. Reported-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-09null_blk: Fix zoned command return codeKeith Busch
The return code from null_handle_zoned() sets the cmd->error value. Returning OK status when an error occured overwrites the intended cmd->error. Return the appropriate error code instead of setting the error in the cmd. Fixes: fceb5d1b19cbe626 ("null_blk: create a helper for zoned devices") Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-07null_blk: Enable modifying 'submit_queues' after an instance has been configuredBart Van Assche
This patch makes it possible to test blk_mq_update_nr_hw_queues() from inside a VM. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-07null_blk: Improve nullb_device_##NAME##_store() readabilityBart Van Assche
Introduce a local variable to make the code easier to read. This patch does not change any functionality but makes the next patch in this series easier to read. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-04Merge tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Mandate timespec64 for the io_uring timeout ABI (Arnd) - Set of NVMe changes via Sagi: - controller removal race fix from Balbir - quirk additions from Gabriel and Jian-Hong - nvme-pci power state save fix from Mario - Add 64bit user commands (for 64bit registers) from Marta - nvme-rdma/nvme-tcp fixes from Max, Mark and Me - Minor cleanups and nits from James, Dan and John - Two s390 dasd fixes (Jan, Stefan) - Have loop change block size in DIO mode (Martijn) - paride pg header ifdef guard (Masahiro) - Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned devices and suboptimal performance on others (Ming) * tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits) block: sed-opal: fix sparse warning: convert __be64 data block: sed-opal: fix sparse warning: obsolete array init. block: pg: add header include guard Revert "s390/dasd: Add discard support for ESE volumes" s390/dasd: Fix error handling during online processing io_uring: use __kernel_timespec in timeout ABI loop: change queue block size to match when using DIO blk-mq: apply normal plugging for HDD blk-mq: honor IO scheduler for multiqueue devices nvme-rdma: fix possible use-after-free in connect timeout nvme: Move ctrl sqsize to generic space nvme: Add ctrl attributes for queue_count and sqsize nvme: allow 64-bit results in passthru commands nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T nvmet-tcp: remove superflous check on request sgl Added QUIRKs for ADATA XPG SX8200 Pro 512GB nvme-rdma: Fix max_hw_sectors calculation nvme: fix an error code in nvme_init_subsystem() nvme-pci: Save PCI state before putting drive into deepest state nvme-tcp: fix wrong stop condition in io_work ...
2019-10-01loop: change queue block size to match when using DIOMartijn Coenen
The loop driver assumes that if the passed in fd is opened with O_DIRECT, the caller wants to use direct I/O on the loop device. However, if the underlying block device has a different block size than the loop block queue, direct I/O can't be enabled. Instead of requiring userspace to manually change the blocksize and re-enable direct I/O, just change the queue block sizes to match, as well as the io_min size. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ...
2019-09-25augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macroMichel Lespinasse
Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks for the case where the augmented value is a scalar whose definition follows a max(f(node)) pattern. This actually covers all present uses of RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the various RBCOMPUTE function definitions. [walken@google.com: fix mm/vmalloc.c] Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com [walken@google.com: re-add check to check_augmented()] Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights are: - automatic recovery of a blacklisted filesystem session (Zheng Yan). This is disabled by default and can be enabled by mounting with the new "recover_session=clean" option. - serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is taken to avoid serializing O_DIRECT reads and writes with each other, this is based on the exclusion scheme from NFS. - handle large osdmaps better in the face of fragmented memory (myself) - don't limit what security.* xattrs can be get or set (Jeff Layton). We were overly restrictive here, unnecessarily preventing things like file capability sets stored in security.capability from working. - allow copy_file_range() within the same inode and across different filesystems within the same cluster (Luis Henriques)" * tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits) ceph: call ceph_mdsc_destroy from destroy_fs_client libceph: use ceph_kvmalloc() for osdmap arrays libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc() ceph: allow object copies across different filesystems in the same cluster ceph: include ceph_debug.h in cache.c ceph: move static keyword to the front of declarations rbd: pull rbd_img_request_create() dout out into the callers ceph: reconnect connection if session hang in opening state libceph: drop unused con parameter of calc_target() ceph: use release_pages() directly rbd: fix response length parameter for encoded strings ceph: allow arbitrary security.* xattrs ceph: only set CEPH_I_SEC_INITED if we got a MAC label ceph: turn ceph_security_invalidate_secctx into static inline ceph: add buffered/direct exclusionary locking for reads and writes libceph: handle OSD op ceph_pagelist_append() errors ceph: don't return a value from void function ceph: don't freeze during write page faults ceph: update the mtime when truncating up ceph: fix indentation in __get_snap_name() ...
2019-09-22pktcdvd: remove warning on attempting to register non-passthrough devJens Axboe
Anatoly reports that he gets the below warning when booting -git on a sparc64 box on debian unstable: ... [ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES implementation [ 13.428002] ------------[ cut here ]------------ [ 13.428081] WARNING: CPU: 21 PID: 586 at drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428147] Attempt to register a non-SCSI queue [ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64 n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear md_mod crc32c_sparc64 [ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted 5.3.0-10169-g574cc4539762 #1234 [ 13.428507] Call Trace: [ 13.428542] [00000000004635c0] __warn+0xc0/0x100 [ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60 [ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd] [ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0 [ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80 [ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40 [ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44 [ 13.428890] irq event stamp: 4181 [ 13.428924] hardirqs last enabled at (4189): [<00000000004e0a74>] console_unlock+0x634/0x6c0 [ 13.428984] hardirqs last disabled at (4196): [<00000000004e0540>] console_unlock+0x100/0x6c0 [ 13.429048] softirqs last enabled at (3978): [<0000000000b2e2d8>] __do_softirq+0x498/0x520 [ 13.429110] softirqs last disabled at (3967): [<000000000042cfb4>] do_softirq_own_stack+0x34/0x60 [ 13.429172] ---[ end trace 2220ca468f32967d ]--- [ 13.430018] pktcdvd: setup of pktcdvd device failed [ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES implementation [ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes optimized CAMELLIA implementation [ 13.522856] pktcdvd: setup of pktcdvd device failed [ 13.529327] pktcdvd: setup of pktcdvd device failed [ 13.532932] pktcdvd: setup of pktcdvd device failed [ 13.536165] pktcdvd: setup of pktcdvd device failed [ 13.539372] pktcdvd: setup of pktcdvd device failed [ 13.542834] pktcdvd: setup of pktcdvd device failed [ 13.546536] pktcdvd: setup of pktcdvd device failed [ 15.431071] XFS (dm-0): Mounting V5 Filesystem ... Apparently debian auto-attaches any cdrom like device to pktcdvd, which can lead to the above warning. There's really no reason to warn for this situation, kill it. Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17nbd: fix possible page fault for nbd diskXiubo Li
When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same time when the socket is closed due to the server daemon is restarted, just before the last DISCONNET is totally done if we start a new connection by using the old nbd_index, there will be crashing randomly, like: <3>[ 110.151949] block nbd1: Receive control failed (result -32) <1>[ 110.152024] BUG: unable to handle page fault for address: 0000058000000840 <1>[ 110.152063] #PF: supervisor read access in kernel mode <1>[ 110.152083] #PF: error_code(0x0000) - not-present page <6>[ 110.152094] PGD 0 P4D 0 <4>[ 110.152106] Oops: 0000 [#1] SMP PTI <4>[ 110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2 <4>[ 110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 <4>[ 110.152166] Workqueue: knbd-recv recv_work [nbd] <4>[ 110.152187] RIP: 0010:__dev_printk+0xd/0x67 <4>[ 110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...] <4>[ 110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206 <4>[ 110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000 <4>[ 110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f <4>[ 110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56 <4>[ 110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900 <4>[ 110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8 <4>[ 110.152329] FS: 0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000 <4>[ 110.152362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0 <4>[ 110.152401] Call Trace: <4>[ 110.152422] _dev_err+0x6c/0x83 <4>[ 110.152435] nbd_read_stat.cold+0xda/0x578 [nbd] <4>[ 110.152448] ? __switch_to_asm+0x34/0x70 <4>[ 110.152468] ? __switch_to_asm+0x40/0x70 <4>[ 110.152478] ? __switch_to_asm+0x34/0x70 <4>[ 110.152491] ? __switch_to_asm+0x40/0x70 <4>[ 110.152501] ? __switch_to_asm+0x34/0x70 <4>[ 110.152511] ? __switch_to_asm+0x40/0x70 <4>[ 110.152522] ? __switch_to_asm+0x34/0x70 <4>[ 110.152533] recv_work+0x35/0x9e [nbd] <4>[ 110.152547] process_one_work+0x19d/0x340 <4>[ 110.152558] worker_thread+0x50/0x3b0 <4>[ 110.152568] kthread+0xfb/0x130 <4>[ 110.152577] ? process_one_work+0x340/0x340 <4>[ 110.152609] ? kthread_park+0x80/0x80 <4>[ 110.152637] ret_from_fork+0x35/0x40 This is very easy to reproduce by running the nbd-runner. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17nbd: rename the runtime flags as NBD_RT_ prefixedXiubo Li
Preparing for the destory when disconnecting crash fixing. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-17Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - Two NVMe pull requests: - ana log parse fix from Anton - nvme quirks support for Apple devices from Ben - fix missing bio completion tracing for multipath stack devices from Hannes and Mikhail - IP TOS settings for nvme rdma and tcp transports from Israel - rq_dma_dir cleanups from Israel - tracing for Get LBA Status command from Minwoo - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself - Some consolidation between the fabrics transports for handling the CAP register - reset race with ns scanning fix for fabrics (move fabrics commands to a dedicated request queue with a different lifetime from the admin request queue)." - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people - Series fixing (and re-fixing) null_blk debug printing and nr_devices checks (André) - A few pull requests from Song, with fixes from Andy, Guoqing, Guilherme, Neil, Nigel, and Yufen. - REQ_OP_ZONE_RESET_ALL support (Chaitanya) - Bio merge handling unification (Christoph) - Pick default elevator correctly for devices with special needs (Damien) - Block stats fixes (Hou) - Timeout and support devices nbd fixes (Mike) - Series fixing races around elevator switching and device add/remove (Ming) - sed-opal cleanups (Revanth) - Per device weight support for BFQ (Fam) - Support for blk-iocost, a new model that can properly account cost of IO workloads. (Tejun) - blk-cgroup writeback fixes (Tejun) - paride queue init fixes (zhengbin) - blk_set_runtime_active() cleanup (Stanley) - Block segment mapping optimizations (Bart) - lightnvm fixes (Hans/Minwoo/YueHaibing) - Various little fixes and cleanups * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits) null_blk: format pr_* logs with pr_fmt null_blk: match the type of parameter nr_devices null_blk: do not fail the module load with zero devices block: also check RQF_STATS in blk_mq_need_time_stamp() block: make rq sector size accessible for block stats bfq: Fix bfq linkage error raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl ...
2019-09-16null_blk: format pr_* logs with pr_fmtAndré Almeida
Instead of writing "null_blk: " at the beginning of each pr_err/info/warn log message, format messages using pr_fmt() macro. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16null_blk: match the type of parameter nr_devicesAndré Almeida
Since the variable nr_devices is an unsigned int, the module_param() should also use this type. Change the type so they can match. Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16null_blk: do not fail the module load with zero devicesAndré Almeida
The module load should fail only if there is something wrong with the configuration or if an error prevents it to work properly. The module should be able to be loaded with (nr_device == 0), since it will not trigger errors or be in malfunction state. Preventing loading with zero devices also breaks applications that configures this module using configfs API. Remove the nr_device check to fix this. Fixes: f7c4ce890dd2 ("null_blk: validate the number of devices") Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-16rbd: pull rbd_img_request_create() dout out into the callersIlya Dryomov
Make it more informative: log op_type, offset and length for block layer requests and initiating obj_req for child requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-16rbd: fix response length parameter for encoded stringsDongsheng Yang
rbd_dev_image_id() allocates space for length but passes a smaller value to rbd_obj_method_sync(). rbd_dev_v2_object_prefix() doesn't allocate space for length. Fix both to be consistent. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-11null_blk: validate the number of devicesAndré Almeida
A negative number of devices is nonsensical, so change the type to unsigned. If the number of devices is 0, it is impossible for userspace to interact with the module, so refuse loading the driver for that case. Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-11null_blk: fix module name at log messageAndré Almeida
The name of the module is "null_blk", not "null". Make `pr_info()` follow the pattern of `pr_err()` log messages. Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05block: Set ELEVATOR_F_ZBD_SEQ_WRITE for nullblk zoned disksDamien Le Moal
Using the helper blk_queue_required_elevator_features(), set the elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request queue of null_blk devices created with zoned mode enabled. This feature requirement can always be satisfied as the mq-deadline elevator is always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pcd: need to check if cd->disk is null in pcd_detectzhengbin
If alloc_disk fails in pcd_init_units, cd->disk & pi are empty, we need to check if cd->disk is null in pcd_detect. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pcd: need to set queue to NULL before put_diskzhengbin
In pcd_init_units, if blk_mq_init_sq_queue fails, need to set queue to NULL before put_disk, otherwise null-ptr-deref Read will occur. put_disk kobject_put disk_release blk_put_queue(disk->queue) Fixes: f0d176255401 ("paride/pcd: Fix potential NULL pointer dereference and mem leak") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-04paride/pf: need to set queue to NULL before put_diskzhengbin
In pf_init_units, if blk_mq_init_sq_queue fails, need to set queue to NULL before put_disk, otherwise null-ptr-deref Read will occur. put_disk kobject_put disk_release blk_put_queue(disk->queue) Fixes: 77218ddf46d8 ("paride: convert pf to blk-mq") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28rbd: restore zeroing past the overlap when reading from parentIlya Dryomov
The parent image is read only up to the overlap point, the rest of the buffer should be zeroed. This snuck in because as it turns out the overlap test case has not been triggering this code path for a while now. Fixes: a9b67e69949d ("rbd: replace obj_req->tried_parent with obj_req->read_state") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-23null_blk: fix inline misuseJens Axboe
You can't magically mark a function inline and expect that to work. Fixes: fceb5d1b19cb ("null_blk: create a helper for zoned devices") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for req completionChaitanya Kulkarni
This patch creates a helper function for handling the request completion in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for zoned devicesChaitanya Kulkarni
This patch creates a helper function for handling zoned block device operations. This patch also restructured the code for null_blk_zoned.c and uses the pattern to return blk_status_t and catch the error in the function null_handle_cmd() into cmd->error variable instead of setting it up in the deeper layer just like the way it is done for flush, badblocks and memory backed case in the null_handle_cmd(). We also move null_handle_zoned() to the null_blk_zoned.c to keep the zoned code separate. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for mem-backed opsChaitanya Kulkarni
This patch creates a helper for handling requests when null_blk is memory backed in the null_handle_cmd(). Although the helper is very simple right now, it makes the code flow consistent with the rest of code in the null_handle_cmd() and provides a uniform code structure for future code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for badblocksChaitanya Kulkarni
This patch creates a helper for handling badblocks code in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-23null_blk: create a helper for throttlingChaitanya Kulkarni
This patch creates a helper for handling throttling code in the null_handle_cmd(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>