Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
|
|
There has one race case for ceph's rbd-nbd tool. When do mapping
it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually
the nbd device has already unmaped.
It dues to if just after the wake_up(), the recv_work() is scheduled
out and defers calling the nbd_config_put(), though the map process
has exited the "nbd->recv_task" is not cleared.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After send_msg_open is done, send_msg_close should be done
if any error occurs and it is necessary to recover
what has been done.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The max_hw_secotrs is only limited by the transport, not remote device,
block layer on server side will split to the device limit if it's too
big.
The max_segments, similar, and rtrs server will submit single buffer, so
no need to cap.
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The argument is not needed since all callers pass 1 for it.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.
A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.
This contains:
- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)
- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu
- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)
- null_blk open/active zone limit support (Niklas)
- Set of bcache updates (Coly, Dongsheng, Qinglang)"
* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...
|
|
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DRBD keeps a block device open just to get and set the capacity from
it. Switch to primarily using the disk capacity as intended by the
block layer, and sync it to the bdev using revalidate_disk_size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace /* Fall through. */ comment with the new pseudo-keyword macro
fallthrough[1].
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In _drbd_send_page() a page is checked by following code before sending
it by kernel_sendpage(),
(page_count(page) < 1) || PageSlab(page)
If the check is true, this page won't be send by kernel_sendpage() and
handled by sock_no_sendpage().
This kind of check is exactly what macro sendpage_ok() does, which is
introduced into include/linux/net.h to solve a similar send page issue
in nvme-tcp code.
This patch uses macro sendpage_ok() to replace the open coded checks to
page type and refcount in _drbd_send_page(), as a code cleanup.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for user space to set a max open zone and a max active zone
limit via configfs. By default, the default values are 0 == no limit.
Call the block layer API functions used for exposing the configured
limits to sysfs.
Add accounting in null_blk_zoned so that these new limits are respected.
Performing an operation that would exceed these limits results in a
standard I/O error.
A max open zone limit exists in the ZBC standard.
While null_blk_zoned is used to test the Zoned Block Device model in
Linux, when it comes to differences between ZBC and ZNS, null_blk_zoned
mostly follows ZBC.
Therefore, implement the manage open zone resources function from ZBC,
but additionally add support for max active zones.
This enables user space not only to test against a device with an open
zone limit, but also to test against a device with an active zone limit.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The ->bd_contains field is set by __blkdev_get and drivers have no
business manipulating it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bd_disk is set on all block devices, including those for partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Support a shared tag bitmap, whereby request tags are unique over all
submission queues, and not just per submission queue.
As such, per device total queue depth is normally hw_queue_depth *
submit_queues, but hw_queue_depth when set. And a similar story for when
shared_tags is set, where that is the queue depth over all null blk
devices.
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* for-5.10/block: (140 commits)
bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag
bdi: invert BDI_CAP_NO_ACCT_WB
bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
mm: use SWP_SYNCHRONOUS_IO more intelligently
bdi: remove BDI_CAP_SYNCHRONOUS_IO
bdi: remove BDI_CAP_CGROUP_WRITEBACK
block: lift setting the readahead size into the block layer
md: update the optimal I/O size on reshape
bdi: initialize ->ra_pages and ->io_pages in bdi_init
aoe: set an optimal I/O size
bcache: inherit the optimal I/O size
drbd: remove dead code in device_to_statistics
fs: remove the unused SB_I_MULTIROOT flag
block: mark blkdev_get static
PM: mm: cleanup swsusp_swap_check
mm: split swap_type_of
PM: rewrite is_hibernate_resume_dev to not require an inode
mm: cleanup claim_swapfile
ocfs2: cleanup o2hb_region_dev_store
dasd: cleanup dasd_scan_partitions
...
|
|
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it. This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.
One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore. It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
decided if ->rw_page can be used on a block device. Just check up for
the method instead. The only complication is that zram needs a second
set of block_device_operations as it can switch between modes that
actually support ->rw_page and those who don't.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk. Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors. To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
aoe forces a larger readahead size, but any reason to do larger I/O
is not limited to readahead. Also set the optimal I/O size, and
remove the local constants in favor of just using SZ_2G.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Ever since the switch to blk-mq, a lower device not used for VM
writeback will not be marked congested, so the check will never
trigger.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use blkdev_get_by_dev instead of bdgrab + blkdev_get.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace bdget + blkdev_get by blkdev_get_by_dev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove code which has been dead since the initial commit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The pcd driver does not have a ->revalidate_disk method, so it can just
use bdev_check_media_change without any additional changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass a struct ace_device to ace_revalidate_disk, move the media changed
check into the one caller that needs it, and give the routine a better
name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch to use bdev_check_media_change instead of check_disk_change and
call ace_revalidate_disk manually. Given that ace_revalidate_disk only
deals with media change events, the extra call into ->revalidate_disk
from bdev_disk_changed is not required either, so stop wiring up the
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch to use bdev_check_media_changed instead of check_disk_change and
call floppy_revalidate manually. Given that floppy_revalidate only
deals with media change events, the extra call into ->revalidate_disk
from bdev_disk_changed is not required either, so stop wiring up the
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
floppy_revalidate mostly duplicates work already done in floppy_open
despite only beeing called from floppy_open. Remove the function and
just clear the ->ejected flag directly under the right condition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch to use bdev_check_media_change instead of check_disk_change and
call floppy_revalidate manually. Given that floppy_revalidate only
deals with media change events, the extra call into ->revalidate_disk
from bdev_disk_changed is not required either, so stop wiring up the
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch to use bdev_check_media_change instead of check_disk_change and
call floppy_revalidate manually. Given that floppy_revalidate only
deals with media change events, the extra call into ->revalidate_disk
from bdev_disk_changed is not required either, so stop wiring up the
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Switch to use bdev_check_media_change instead of check_disk_change and
call floppy_revalidate manually. Given that floppy_revalidate only
deals with media change events, the extra call into ->revalidate_disk
from bdev_disk_changed is not required either, so stop wiring up the
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The Amiga floppy driver does not have a ->revalidate_disk method, so it
can just use bdev_check_media_change without any additional changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It turns out that currently we rely only on sysfs attribute
permissions:
$ ll /sys/bus/rbd/{add*,remove*}
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove
--w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_major
This means that images can be mapped and unmapped (i.e. block devices
can be created and deleted) by a UID 0 process even after it drops all
privileges or by any process with CAP_DAC_OVERRIDE in its user namespace
as long as UID 0 is mapped into that user namespace.
Be consistent with other virtual block devices (loop, nbd, dm, md, etc)
and require CAP_SYS_ADMIN in the initial user namespace for mapping and
unmapping, and also for dumping the configuration string and refreshing
the image header.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.
So rename it to make the point explicitly.
[jpg: rebase a few times, add rnbd-clt.c change]
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
revalidate_disk is a relative awkward helper for driver use, as it first
calls an optional driver method and then updates the block device size,
while most callers either don't need the method call at all, or want to
keep state between the caller and the called method.
Add a revalidate_disk_size helper that just performs the update of the
block device size from the gendisk one, and switch all drivers that do
not implement ->revalidate_disk to use the new helper instead of
revalidate_disk()
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use kobj_to_dev() instead of container_of()
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers. In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.
Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.
This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.
Reported-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace bd_set_size with a version that takes the number of sectors
instead, as that fits most of the current and future callers much better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
- nbd timeout fix (Hou)
- device size fix for loop LOOP_CONFIGURE (Martijn)
- MD pull from Song with raid5 stripe size fix (Yufen)
* tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
md/raid5: make sure stripe_size as power of two
loop: Set correct device size when using LOOP_CONFIGURE
nbd: restore default timeout when setting it to zero
|
|
The device size calculation was done before processing the loop
configuration, which meant that the we set the size on the underlying
block device incorrectly in case lo_offset/lo_sizelimit were set in the
configuration. Delay computing the size until we've setup the device
parameters correctly.
Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
Reported-by: Lennart Poettering <mzxreary@0pointer.de>
Tested-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we configured io timeout of nbd0 to 100s. Later after we
finished using it, we configured nbd0 again and set the io
timeout to 0. We expect it would timeout after 30 seconds
and keep retry. But in fact we could not change the timeout
when we set it to 0. the timeout is still the original 100s.
So change the timeout to default 30s when we set it to zero.
It also behaves same as commit 2da22da57348 ("nbd: fix zero
cmd timeout handling v2").
It becomes more important if we were reconfigure a nbd device
and the io timeout it set to zero. Because it could take 30s
to detect the new socket and thus io could be completed more
quickly compared to 100s.
Signed-off-by: Hou Pu <houpu@bytedance.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Sagi:
- nvme completion rework from Christoph and Chao that mostly came
from a bit of divergence of how we classify errors related to
pathing/retry etc.
- nvmet passthru fixes from Chaitanya
- minor nvmet fixes from Amit and I
- mpath round-robin path selection fix from Martin
- ignore noiob for zoned devices from Keith
- minor nvme-fc fix from Tianjia"
- BFQ cgroup leak fix (Dmitry)
- block layer MAINTAINERS addition (Geert)
- fix null_blk FUA checking (Hou)
- get_max_io_size() size fix (Keith)
- fix block page_is_mergeable() for compound pages (Matthew)
- discard granularity fixes (Ming)
- IO scheduler ordering fix (Ming)
- misc fixes
* tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits)
null_blk: fix passing of REQ_FUA flag in null_handle_rq
nvmet: Disable keep-alive timer when kato is cleared to 0h
nvme: redirect commands on dying queue
nvme: just check the status code type in nvme_is_path_error
nvme: refactor command completion
nvme: rename and document nvme_end_request
nvme: skip noiob for zoned devices
nvme-pci: fix PRP pool size
nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
nvme: Use spin_lock_irq() when taking the ctrl->lock
nvmet: call blk_mq_free_request() directly
nvmet: fix oops in pt cmd execution
nvmet: add ns tear down label for pt-cmd handling
nvme: multipath: round-robin: eliminate "fallback" variable
nvme: multipath: round-robin: fix single non-optimized path case
nvme-fc: Fix wrong return value in __nvme_fc_init_request()
nvmet-passthru: Reject commands with non-sgl flags set
nvmet: fix a memory leak
blkcg: fix memleak for iolatency
MAINTAINERS: Add missing header files to BLOCK LAYER section
...
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
REQ_FUA should be checked using rq->cmd_flags instead of req_op().
Fixes: deb78b419dfda ("nullb: emulate cache")
Signed-off-by: Hou Pu <houpu@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Clang warns:
drivers/block/rnbd/rnbd-srv.c:150:6: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (IS_ERR(bio)) {
^~~~~~~~~~~
drivers/block/rnbd/rnbd-srv.c:177:9: note: uninitialized use occurs here
return err;
^~~
drivers/block/rnbd/rnbd-srv.c:150:2: note: remove the 'if' if its
condition is always false
if (IS_ERR(bio)) {
^~~~~~~~~~~~~~~~~~
drivers/block/rnbd/rnbd-srv.c:126:9: note: initialize the variable 'err'
to silence this warning
int err;
^
= 0
1 warning generated.
err is indeed uninitialized when this statement is taken. Ensure that it
is assigned the error value of bio before jumping to the error handling
label.
Fixes: 735d77d4fd28 ("rnbd: remove rnbd_dev_submit_io")
Reported-by: Brooke Basile <brookebasile@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1134
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|