Age | Commit message (Collapse) | Author |
|
This patch adds helper functions which are used in the NVMeOF configfs
when the user is configuring the passthru subsystem. Here we ensure
that only one subsys is assigned to each nvme_ctrl by using an xarray
on the cntlid.
The subsystem's version number is overridden by the passed through
controller's version. However, if that version is less than 1.2.1,
then we bump the advertised version to that and print a warning
in dmesg.
Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add passthru command handling capability for the NVMeOF target and
export passthru APIs which are used to integrate passthru
code with nvmet-core.
The new file passthru.c handles passthru cmd parsing and execution.
In the passthru mode, we create a block layer request from the nvmet
request and map the data on to the block layer request.
Admin commands and features are on an allow list as there are a number
of each that don't make too much sense with passthrough. We use an
allow list such that new commands can be considered before being blindly
passed through. In both cases, vendor specific commands are always
allowed.
We also reject reservation IO commands as the underlying device cannot
differentiate between multiple hosts behind a fabric.
Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
nvme_find_get_ns() and nvme_put_ns() are required by the target passthru
code and are exported under the NVME_TARGET_PASSTHRU namespace.
Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it
gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0).
It makes use of filp_open() to open the file and uses the private
data to obtain a pointer to the struct nvme_ctrl. If the fops of the
file do not match, -EINVAL is returned.
The purpose of this function is to support NVMe-OF target passthru
and is exported under the NVME_TARGET_PASSTHRU namespace.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Introduce a new nvme_execute_passthru_rq() helper which calls
nvme_passthru_[start|end]() around blk_execute_rq(). This ensures
all passthru calls (including nvme_submit_io()) will be wrapped
appropriately.
nvme_execute_passthru_rq() will also be useful for the nvmet passthru
code and is exported in the NVME_TARGET_PASSTHRU namespace.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Separate the code to obtain command effects from the code
to start a passthru request and move the nvme_passthru_start() and
nvme_passthru_end() functions up above nvme_submit_user_cmd() in order
that they may be used in a new helper a subsequent patch.
The new helper function will be necessary for nvmet passthru
code to determine if we need to change out of interrupt context
to handle the effects. It is exported in the NVME_TARGET_PASSTHRU
namespace.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The host driver should decide whether to use SGLs or PRPs and they
currently assume the flags are cleared after the call to
nvme_setup_cmd(). However, passed-through commands may erroneously
set these bits; so clear them for all cases.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The transport has a del_work_active flag to avoid duplicate scheduling
of the del_work item. This is redundant with the checks that
schedule_work() makes.
Remove the del_work_active flag.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When searching for an association based on an association id, when there
is a match, the code takes a reference. However, it is not validating
that the reference taking was successful.
Check the status of the reference. If unsuccessful, the device is being
deleted and should be ignored.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently the FC transport is set max_hw_sectors based on the lldds
max sgl segment count. However, the block queue max segments is
set based on the controller's max_segments count, which the transport
does not set. As such, the lldd is receiving sgl lists that are
exceeding its max segment count.
Set the controller max segment count and derive max_hw_sectors from
the max segment count.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Stay consistent with the rest of the driver
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
A deadlock happens in the following scenario with multipath:
1) scan_work(nvme0) detects a new nsid while nvme0
is an optimized path to it, path nvme1 happens to be
inaccessible.
2) Before scan_work is complete nvme0 disconnect is initiated
nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
3) scan_work(1) attempts to submit IO,
but nvme_path_is_optimized() observes nvme0 is not LIVE.
Since nvme1 is a possible path IO is requeued and scan_work hangs.
--
Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
--
4) Delete also hangs in flush_work(ctrl->scan_work)
from nvme_remove_namespaces().
Similiarly a deadlock with ana_work may happen: if ana_work has started
and calls nvme_mpath_set_live and device_add_disk, it will
trigger I/O. When we trigger disconnect I/O will block because
our accessible (optimized) path is disconnecting, but the alternate
path is inaccessible, so I/O blocks. Then disconnect tries to flush
the ana_work and hangs.
[ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
[ 605.552087] Call Trace:
[ 605.552683] __schedule+0x2b9/0x6c0
[ 605.553507] schedule+0x42/0xb0
[ 605.554201] io_schedule+0x16/0x40
[ 605.555012] do_read_cache_page+0x438/0x830
[ 605.556925] read_cache_page+0x12/0x20
[ 605.557757] read_dev_sector+0x27/0xc0
[ 605.558587] amiga_partition+0x4d/0x4c5
[ 605.561278] check_partition+0x154/0x244
[ 605.562138] rescan_partitions+0xae/0x280
[ 605.563076] __blkdev_get+0x40f/0x560
[ 605.563830] blkdev_get+0x3d/0x140
[ 605.564500] __device_add_disk+0x388/0x480
[ 605.565316] device_add_disk+0x13/0x20
[ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core]
[ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
[ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core]
[ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core]
[ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core]
[ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core]
[ 605.573330] process_one_work+0x1db/0x380
[ 605.574144] worker_thread+0x4d/0x400
[ 605.574896] kthread+0x104/0x140
[ 605.577205] ret_from_fork+0x35/0x40
[ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
[ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830
[ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.582320] nvme D 0 14044 14043 0x00000000
[ 605.583424] Call Trace:
[ 605.583935] __schedule+0x2b9/0x6c0
[ 605.584625] schedule+0x42/0xb0
[ 605.585290] schedule_timeout+0x203/0x2f0
[ 605.588493] wait_for_completion+0xb1/0x120
[ 605.590066] __flush_work+0x123/0x1d0
[ 605.591758] __cancel_work_timer+0x10e/0x190
[ 605.593542] cancel_work_sync+0x10/0x20
[ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core]
[ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core]
[ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
[ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core]
[ 605.598320] dev_attr_store+0x17/0x30
Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
indicate the phase of controller deletion where I/O cannot be allowed
to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
be issued to the bottom device, and only after we flush the ana_work
and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
from re-firing by aborting early if we are not LIVE, so we should be safe
here.
In addition, change the transport drivers to follow the updated state
machine.
Fixes: 0d0b660f214d ("nvme: add ANA support")
Reported-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We are starting to see some non-trivial states
so lets start documenting them.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch replaces the ctrl->namespaces tracking from linked list to
xarray and improves the performance when accessing one namespce :-
XArray vs Default:-
IOPS and BW (more the better) increase BW (~1.8%):-
---------------------------------------------------
XArray :-
read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec)
read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec)
read: IOPS=162k, BW=631MiB/s (662MB/s)(18.5GiB/30001msec)
Default:-
read: IOPS=156k, BW=609MiB/s (639MB/s)(17.8GiB/30001msec)
read: IOPS=157k, BW=613MiB/s (643MB/s)(17.0GiB/30001msec)
read: IOPS=160k, BW=626MiB/s (656MB/s)(18.3GiB/30001msec)
Submission latency (less the better) decrease (~8.3%):-
-------------------------------------------------------
XArray:-
slat (usec): min=7, max=8386, avg=11.19, stdev=5.96
slat (usec): min=7, max=441, avg=11.09, stdev=4.48
slat (usec): min=7, max=1088, avg=11.21, stdev=4.54
Default :-
slat (usec): min=8, max=2826.5k, avg=23.96, stdev=3911.50
slat (usec): min=8, max=503, avg=12.52, stdev=5.07
slat (usec): min=8, max=2384, avg=12.50, stdev=5.28
CPU Usage (less the better) decrease (~5.2%):-
----------------------------------------------
XArray:-
cpu : usr=1.84%, sys=18.61%, ctx=949471, majf=0, minf=250
cpu : usr=1.83%, sys=18.41%, ctx=950262, majf=0, minf=237
cpu : usr=1.82%, sys=18.82%, ctx=957224, majf=0, minf=234
Default:-
cpu : usr=1.70%, sys=19.21%, ctx=858196, majf=0, minf=251
cpu : usr=1.82%, sys=19.98%, ctx=929720, majf=0, minf=227
cpu : usr=1.83%, sys=20.33%, ctx=947208, majf=0, minf=235.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Has the driver use shared CQs providing ~10%-20% improvement when
multiple disks are used. Instead of opening a CQ for each QP per
controller, a CQ for each core will be provided by the RDMA core driver
that will be shared between the QPs on that core reducing interrupt
overhead.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Has the driver use shared CQs providing ~10%-20% improvement as seen in
the patch introducing shared CQs. Instead of opening a CQ for each QP
per controller connected, a CQ for each QP will be provided by the RDMA
core driver that will be shared between the QPs on that core reducing
interrupt overhead.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch implements a solution for a BIOS hack used on some currently
shipping Intel systems to change driver power management policy for PCIe
NVMe drives. Some newer Intel platforms, like some Comet Lake systems,
require that PCIe devices use D3 when doing suspend-to-idle in order to
allow the platform to realize maximum power savings. This is particularly
needed to support ATX power supply shutdown on desktop systems. In order to
ensure this happens for root ports with storage devices, Microsoft
apparently created this ACPI _DSD property as a way to influence their
driver policy. To my knowledge this property has not been discussed with
the NVME specification body.
Though the solution is not ideal, it addresses a problem that also affects
Linux since the NVMe driver's default policy of using NVMe APST during
suspend-to-idle prevents the PCI root port from going to D3 and leads to
higher power consumption for these platforms. The power consumption
difference may be negligible on laptop systems, but many watts on desktop
systems when the ATX power supply is blocked from powering down.
The patch creates a new nvme_acpi_storage_d3 function to check for the
StorageD3Enable property during probe and enables D3 as a quirk if set. It
also provides a 'noacpi' module parameter to allow skipping the quirk if
needed.
Tested with:
- PM961 NVMe SED Samsung 512GB
- INTEL SSDPEKKF512G8
Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
>From the initial implementation of NVMe SGL kernel support
commit a7a7cbe353a5 ("nvme-pci: add SGL support") with addition of the
commit 943e942e6266 ("nvme-pci: limit max IO size and segments to avoid
high order allocations") now there is only caller left for
nvme_pci_iod_alloc_size() which statically passes true for last
parameter that calculates allocation size based on SGL since we need
size of biggest command supported for mempool allocation.
This patch modifies the helper functions nvme_pci_iod_alloc_size() such
that it is now uses maximum of PRP and SGL size for iod allocation size
calculation.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Saving the nvme controller's page size was from a time when the driver
tried to use different sized pages, but this value is always set to
a constant, and has been this way for some time. Remove the 'page_size'
field and replace its usage with the constant value.
This also lets the compiler make some micro-optimizations in the io
path, and that's always a good thing.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We've already validated the 'kato' in nvme_start_keep_alive(), thus no
need to validate it again in nvme_start_ctrl(). Remove it.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
"v" is an unsigned int so it can't be more than UINT_MAX. Removing this
check makes it easier to preserve the error code as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add a new max_active zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.
Export max_active_zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.
Add the new max_active_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.
For SCSI devices, even though max active zones is not part of the ZBC/ZAC
spec, export max_active_zones as 0, signifying "no limit".
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new max_open_zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.
Export max open zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.
Add the new max_open_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
By the time a namespace is added to ctrl->namespaces list and thus
can be looked up ns->disk has been assigned, and it it never cleared.
Remove all the checks for ns->disk being NULL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
|
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme_pci_iod_alloc_size() should return 'size_t' type to be
consistent with the sizeof return value.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add a blank line after declarations to make code more readable.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Fix comment typos and remove whitespaces before tabs to cleanup
checkpatch errors.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We've validated the segment counts before calling nvme_map_data(),
so there is no need to validate again in nvme_pci_use_sgls(, which is
only called from nvme_map_data().
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Documented model names of Intel SSDs requiring quirks.
Signed-off-by: David Fugate <david.fugate@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is useful information, and moreover its it's useful to
be able to alter these parameters per controller after it
has been established.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
in NVM Express TP4053. Zoned namespaces are discovered based on their
Command Set Identifier reported in the namespaces Namespace
Identification Descriptor list. A successfully discovered Zoned
Namespace will be registered with the block layer as a host managed
zoned block device with Zone Append command support. A namespace that
does not support append is not supported by the driver.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The Commands Supported and Effects log page was extended with a CSI
field that enables the host to query the log page for each command set
supported. Retrieve this log page for each command set that an attached
namespace supports, and save a pointer to that log in the namespace head.
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Implements support for multiple I/O Command Sets. NVMe TP 4056
introduces a method to enumerate multiple command sets per namespace. If
the command set is exposed, this method for enumeration will be used
instead of the traditional method that uses the CC.CSS register command
set register for command set identification.
For namespaces where the Command Set Identifier is not supported or
recognized, the specific namespace will not be created.
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Use USEC_PER_SEC instead of magic numbers to make code more readable.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We can make it shorter and simpler without some redundant checks.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
If we have a long list of request to send, signal the network stack
that more is coming (MSG_MORE). If we have nothing else, signal MSG_EOR.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
blk-mq request plugging can improve the execution of our pipeline.
When we queue a request we actually trigger our I/O worker thread
yielding a context switch by definition. However if we know that
there are more requests in the pipe that are coming, we are better
off not trigger our I/O worker and only do that for the last request
in the batch (bd->last). By having it, we improve efficiency by
amortizing context switches over a batch of requests.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The queue processing will splice to a queue local list, this should
alleviate some contention on the send_list lock, but also prepares
us to the next patch where we look on these lists for network stack
flag optimization.
Remove queue lock as its not used anymore.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
[hch: simplified a loop]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme host and target verify the wwnn and wwpn format via
nvme_fc_parse_traddr(). For instance, it is required that the length of
wwnn to be either 21 ("nn-0x") or 19 (nn-).
Add this verification to nvme-fcloop so that the input should always be in
hex and the length of input should always be 18.
Otherwise, the user may use e.g. 0x2 to create fcloop local port, while
0x0000000000000002 is required for nvme host and target. This makes the
requirement of format confusing.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In function nvmet_subsys_atte_version_show() which uses the NVME_XXX()
macros related to version (of type u64) get rid of the int type cast
when printing subsys version and use appropriate format specifier for
u64.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Replace has_keyed_sgls and metadata_support booleans with a flags member
that will be used for adding more features in the future.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Since the nvmet_tcp_ops is static, there is no need to initialize values
to zero.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This field is never used since the introduction of nvme loopback by
commit 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver").
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Just cleanup by removing the empty line.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Although use of for loop is preferred it is not a common practice to
have 80 char long for loop initialization and comparison section.
Use temp variables for calculating values and replace them in the
for loop with size of all variables to set to u64 since preferred
variable is declared as u64.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The NVMe PCIe declares module parameter io_queue_depth as int. Change
this to u16 as queue depth can never be negative. Now to reflect this
update module parameter getter function from param_get_int() ->
param_get_uint() and respective setter function with type of n changed
from int to u16 with param_set_int() to param_set_ushort(). Finally
update struct nvme_dev q_depth member to u16 and use u16 in min_t()
when calculating dev->q_depth in the nvme_pci_enable() (since q_depth is
now u16) and use unsigned int instead of int when calculating
dev->tagset.queue_depth as target variable tagset->queue_depth is of type
unsigned int in nvme_dev_add().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In nvme_init_identify() when calculating submission queue size use u16
instead of int type in the min_t() since target variable ctrl->sqsize is
of type u16.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In nvme_configure_directives() when calculating number of streams use
u16 instead of unsigned type in the min_t() since target variable
ctrl->nr_streams is of type u16.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Merge in 5.8-rc4 for-5.9/block to setup for-5.9/drivers, to provide
a clean base and making the life for the NVMe changes easier.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* tag 'v5.8-rc4': (732 commits)
Linux 5.8-rc4
x86/ldt: use "pr_info_once()" instead of open-coding it badly
MIPS: Do not use smp_processor_id() in preemptible code
MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
.gitignore: Do not track `defconfig` from `make savedefconfig`
io_uring: fix regression with always ignoring signals in io_cqring_wait()
x86/ldt: Disable 16-bit segments on Xen PV
x86/entry/32: Fix #MC and #DB wiring on x86_32
x86/entry/xen: Route #DB correctly on Xen PV
x86/entry, selftests: Further improve user entry sanity checks
x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
i2c: mlxcpld: check correct size of maximum RECV_LEN packet
i2c: add Kconfig help text for slave mode
i2c: slave-eeprom: update documentation
i2c: eg20t: Load module automatically if ID matches
i2c: designware: platdrv: Set class based on DMI
i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
mm/page_alloc: fix documentation error
vmalloc: fix the owner argument for the new __vmalloc_node_range callers
mm/cma.c: use exact_nid true to fix possible per-numa cma leak
...
|