summaryrefslogtreecommitdiff
path: root/drivers/scsi/cxlflash
AgeCommit message (Collapse)Author
2017-05-08scsi: cxlflash: Select IRQ_POLLGuenter Roeck
The driver now uses IRQ_POLL and needs to select it to avoid the following build error. ERROR: ".irq_poll_complete" [drivers/scsi/cxlflash/cxlflash.ko] undefined! ERROR: ".irq_poll_sched" [drivers/scsi/cxlflash/cxlflash.ko] undefined! ERROR: ".irq_poll_disable" [drivers/scsi/cxlflash/cxlflash.ko] undefined! ERROR: ".irq_poll_init" [drivers/scsi/cxlflash/cxlflash.ko] undefined! Fixes: cba06e6de403 ("scsi: cxlflash: Implement IRQ polling for RRQ processing") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Introduce hardware queue steeringMatthew R. Ochs
As an enhancement to distribute requests to multiple hardware queues, add the infrastructure to hash a SCSI command into a particular hardware queue. Support the following scenarios when deriving which queue to use: single queue, tagging when SCSI-MQ enabled, and simple hash via CPU ID when SCSI-MQ is disabled. Rather than altering the existing send API, the derived hardware queue is stored in the AFU command where it can be used for sending a command to the chosen hardware queue. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Add hardware queues attributeMatthew R. Ochs
As staging for supporting multiple hardware queues, add an attribute to show and set the current number of hardware queues for the host. Support specifying a hard limit or a CPU affinitized value. This will allow the number of hardware queues to be tuned by a system administrator. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Support multiple hardware queuesUma Krishnan
Introduce multiple hardware queues to improve legacy I/O path performance. Each hardware queue is comprised of a master context and associated I/O resources. The hardware queues are initially implemented as a static array embedded in the AFU. This will be transitioned to a dynamic allocation in a later series to improve the memory footprint of the driver. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Improve asynchronous interrupt processingMatthew R. Ochs
The method used to decode asynchronous interrupts involves unnecessary loops to match up bits that are set with corresponding entries in the asynchronous interrupt information table. This algorithm is wasteful and does not scale well as new status bits are supported. As an improvement, use the for_each_set_bit() service to iterate over the asynchronous status bits and refactor the information table such that it can be indexed by bit position. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Fix warnings/errorsMatthew R. Ochs
As a general cleanup, address all reasonable checkpatch warnings and errors. These include enforcement of comment styles and including named identifiers in function prototypes. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Fix power-of-two validationsMatthew R. Ochs
Validation statements to enforce assumptions about specific defines are not being evaluated by the compiler due to the fact that they reside in a routine that is not used. To activate them, call the routine as part of module initialization. As an additional, related cleanup, remove the now-defunct CXLFLASH_NUM_CMDS. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Remove unnecessary DMA mappingMatthew R. Ochs
Devices supported by the cxlflash driver are fully coherent and do not require a bus address mapping. Avoid unnecessary path length by using the virtual address and length already present in the scatter-gather entry. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Fence EEH during probeMatthew R. Ochs
An EEH during probe can lead to a crash as the recovery thread races with the probe thread. To avoid this issue, introduce new states to fence out EEH recovery until probe has completed. Also ensure the reset wait queue is flushed during device removal to avoid orphaned threads. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Support up to 4 portsMatthew R. Ochs
Update the driver to allow for future cards with 4 ports. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: SISlite updates to support 4 portsMatthew R. Ochs
Update the SISlite header to support 4 ports as outlined in the SISlite specification. Address fallout from structure renames and refreshed organization throughout the driver. Determine the number of ports supported by a card from the global port selection mask register reset value. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Hide FC internals behind common access routineMatthew R. Ochs
As staging to support FC-related updates to the SISlite specification, introduce helper routines to obtain references to FC resources that exist within the global map. This will allow changes to the underlying global map structure without impacting existing code paths. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Remove port configuration assumptionsMatthew R. Ochs
At present, the cxlflash driver only supports hardware with two FC ports. The code was initially designed with this assumption and is dependent on having two FC ports - adding more ports will break logic within the driver. To mitigate this issue, remove the existing port assumptions and transition the code to support more than two ports. As a side effect, clarify the interpretation of the DK_CXLFLASH_ALL_PORTS_ACTIVE flag. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Support dynamic number of FC portsMatthew R. Ochs
Transition from a static number of FC ports to a value that is derived during probe. For now, a static value is used but this will later be based on the type of card being configured. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Update sysfs helper routines to pass config structureMatthew R. Ochs
As staging for future function, pass the config pointer instead of the AFU pointer for port-related sysfs helper routines. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Implement IRQ polling for RRQ processingMatthew R. Ochs
Currently, RRQ processing takes place on hardware interrupt context. This can be a heavy burden in some environments due to the overhead encountered while completing RRQ entries. In an effort to improve system performance, use the IRQ polling API to schedule this processing on softirq context. This function will be disabled by default until starting values can be established for the hardware supported by this driver. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Serialize RRQ access and support offlevel processingMatthew R. Ochs
As further staging to support processing the HRRQ by other means, access to the HRRQ needs to be serialized by a disabled lock. This will allow safe access in other non-hardware interrupt contexts. In an effort to minimize the period where interrupts are disabled, support is added to queue up commands harvested from the RRQ such that they can be processed with hardware interrupts enabled. While this doesn't offer any improvement with processing on a hardware interrupt it will help when IRQ polling is supported and the command completions can execute on softirq context. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13scsi: cxlflash: Separate RRQ processing from the RRQ interrupt handlerMatthew R. Ochs
In order to support processing the HRRQ by other means (e.g. polling), the processing portion of the current RRQ interrupt handler needs to be broken out into a separate routine. This will allow RRQ processing from places other than the RRQ hardware interrupt handler. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-03Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "This is the set of stuff that didn't quite make the initial pull and a set of fixes for stuff which did. The new stuff is basically lpfc (nvme), qedi and aacraid. The fixes cover a lot of previously submitted stuff, the most important of which probably covers some of the failing irq vectors allocation and other fallout from having the SCSI command allocated as part of the block allocation functions" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (59 commits) scsi: qedi: Fix memory leak in tmf response processing. scsi: aacraid: remove redundant zero check on ret scsi: lpfc: use proper format string for dma_addr_t scsi: lpfc: use div_u64 for 64-bit division scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m scsi: cciss: correct check map error. scsi: qla2xxx: fix spelling mistake: "seperator" -> "separator" scsi: aacraid: Fixed expander hotplug for SMART family scsi: mpt3sas: switch to pci_alloc_irq_vectors scsi: qedf: fixup compilation warning about atomic_t usage scsi: remove scsi_execute_req_flags scsi: merge __scsi_execute into scsi_execute scsi: simplify scsi_execute_req_flags scsi: make the sense header argument to scsi_test_unit_ready mandatory scsi: sd: improve TUR handling in sd_check_events scsi: always zero sshdr in scsi_normalize_sense scsi: scsi_dh_emc: return success in clariion_std_inquiry() scsi: fix memory leak of sdpk on when gd fails to allocate scsi: sd: make sd_devt_release() static scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework. ...
2017-02-24mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-23scsi: merge __scsi_execute into scsi_executeChristoph Hellwig
All but one caller want the decoded sense header, so offer the existing __scsi_execute helper as the public scsi_execute API to simply the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22scsi: cxlflash: Enable PCI device ID for future IBM CXL Flash AFUMatthew R. Ochs
Add support for a future IBM Coherent Accelerator (CXL) flash AFU with an ID of 0x0624. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-11scsi: cxlflash: Cancel scheduled workers before stopping AFUUma Krishnan
When processing an AFU asynchronous interrupt, if the action results in an operation that requires off level processing (a link reset for example), the worker thread is scheduled. In the meantime a reset event (i.e.: EEH) could unmap the AFU to recover. This results in an Oops when the worker thread tries to access the AFU mapping. [c000000f17e03b90] d000000007cd5978 cxlflash_worker_thread+0x268/0x550 [c000000f17e03c40] c00000000011883c process_one_work+0x1dc/0x680 [c000000f17e03ce0] c000000000118e80 worker_thread+0x1a0/0x520 [c000000f17e03d80] c000000000126174 kthread+0xf4/0x100 [c000000f17e03e30] c00000000000a47c ret_from_kernel_thread+0x5c/0xe0 In an effort to avoid this, a mapcount was introduced in commit b45cdbaf9f7f ("cxlflash: Resolve oops in wait_port_offline") but due to the race condition described above, this solution is incomplete. In order to fully resolve this problem and to simplify things, this commit removes the mapcount solution. Instead, the scheduled worker thread is cancelled after interrupts have been disabled and prior to the mapping being freed. Fixes: b45cdbaf9f7f ("cxlflash: Resolve oops in wait_port_offline") Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-11scsi: cxlflash: Cleanup printsMatthew R. Ochs
The usage of prints within the cxlflash driver is inconsistent. This hinders debug and makes the driver source and log output appear sloppy. The following cleanups help unify the prints within cxlflash: - move all prints to dev-* where possible - transition all hex prints to lowercase - standardize variable prints in debug output - derive pointers in a consistent manner - change int to bool where appropriate - remove superfluous data from prints and print statements that do not make sense Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-11scsi: cxlflash: Support SQ Command ModeMatthew R. Ochs
The SISLite specification outlines a new queuing model to improve over the MMIO-based IOARRIN model that exists today. This new model uses a submission queue that exists in host memory and is shared with the device. Each entry in the queue is an IOARCB that describes a transfer request. When requests are submitted, IOARCBs ('current' position tracked in host software) are populated and the submission queue tail pointer is then updated via MMIO to make the device aware of the requests. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-11scsi: cxlflash: Refactor context reset to share reset logicMatthew R. Ochs
As staging for supporting hardware with different context reset registers but a similar reset procedure, refactor the existing context reset routine to move the reset logic to a common routine. This will allow hardware with a different reset register to leverage existing code. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Migrate scsi command pointer to AFU commandMatthew R. Ochs
Currently, when sending a SCSI command, the pointer is stored in a reserved field of the AFU command descriptor for retrieval once the SCSI command has completed. In order to support new descriptor formats that make use of the reserved field, the pointer is migrated to outside the descriptor where it can still be found during completion processing. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Migrate IOARRIN specific routines to function pointersMatthew R. Ochs
As staging for supporting hardware with a different queuing mechanism, move the send_cmd() and context_reset() routines to function pointers that are configured when the AFU is initialized. In addition, rename the existing routines to better reflect the queue model they support. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Cleanup queuecommand()Matthew R. Ochs
The queuecommand routine is disorganized where it populates the private command and also contains some logic/statements that are not needed given that cxlflash devices do not (and likely never will) support scatter-gather. Restructure the code to remove the unnecessary logic and create an organized flow: handle state -> DMA map -> populate command -> send command Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Cleanup send_tmf()Matthew R. Ochs
The send_tmf() routine includes some copy/paste cruft that can be removed as well as the setting of an AFU command-specific while holding the tmf_slock. While not a bug, it is out of place and should be shifted down alongside the other command initialization statements for clarity. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Remove AFU command lockMatthew R. Ochs
The original design of the cxlflash driver required AFU commands to convey state information across multiple threads. The IOASA "host use" byte was used to track if a command was done, errored, or timed out. A per-command spin lock was used to serialize access to this byte. As this is no longer required with the introduction of completions and various refactoring over time, the spin lock, state tracking, and associated code can be removed. To support the simplification, the wait_resp() routine is refactored to return a success or failure. Additionally, as the simplification to the AFU internal command routine, explicit assignments of AFU command fields to zero are removed as the memory is zeroed upon allocation. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Wait for active AFU commands to timeout upon tear downMatthew R. Ochs
With the removal of the static private command pool, the ability to 'complete' outstanding commands was lost. While not an issue for the commands originating outside the driver, internal AFU commands are synchronous and therefore have a timeout associated with them. To avoid a stale memory access, the tear down sequence needs to ensure that there are not any active commands before proceeding. As these internal AFU commands are rare events, the simplest way to accomplish this is detecting the activity and waiting for it to timeout. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Remove private command poolMatthew R. Ochs
Clean up and remove the remaining private command pool infrastructure that is no longer required. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Use cmd_size for private commandsMatthew R. Ochs
Instead of using a private pool of AFU commands, use cmd_size to prime the private pool of SCSI commands such that they are allocated with a size large enough to contain an aligned AFU command. Use scsi_cmd_priv() to derive the aligned/zeroed private command on queuecommand and TMF paths. Remove cmd_checkout() as it is no longer required. The remaining AFU private command infrastructure will be removed in a cleanup commit. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Allocate memory instead of using command pool for AFU syncMatthew R. Ochs
As staging for the removal of the AFU command pool, remove the reliance upon the pool for the internal AFU sync command. Instead of obtaining an AFU command from the pool, dynamically allocate memory with the appropriate alignment requirements. Since the AFU sync service is only executed from the process environment, blocking is acceptable. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Remove unused buffer from AFU commandMatthew R. Ochs
The cxlflash driver originally required a per-command 4K buffer that hosted data passed to the AFU. When the routines that initiate AFU and internal SCSI commands were refactored to use scsi_execute(), the need for this buffer became obsolete. As it is no longer necessary, the buffer is removed. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Avoid command room violationUma Krishnan
During test, a command room violation interrupt is occasionally seen for the master context when the CXL flash devices are stressed. After studying the code, there could be gaps in the way command room value is being cached in cxlflash. When the cached command room is zero the thread attempting to send becomes burdened with updating the cached value with the actual value from the AFU. Today, this is handled with an atomic set operation of the raw value read. Following the atomic update, the thread proceeds to send. This behavior is incorrect on two counts: - The update fails to take into account the current thread and its consumption of one of the hardware commands. - The update does not take into account other threads also atomically updating. Per design, a worker thread updates the cached value when a send thread times out. By not protecting the update with a lock, the cached value can be incorrectly clobbered. To correct these issues, the update of the cached command room has been simplified and also protected using a spin lock which is held until the MMIO is complete. This ensures the command room is properly consumed by the same thread. Update of cached value also takes into account the current thread consuming a hardware command. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Improve context_reset() logicUma Krishnan
Currently, the context reset routine waits for command room to be available before sending the reset request. Per review of the SISLite specification and clarifications from the CXL Flash AFU designers, this wait is unnecessary. The reset request can be sent anytime regardless of command room, so long as only a single reset request is active at any one point in time. This commit simplifies the reset routine by removing the wait for command room. Additionally it adds a debug trace to help pinpoint hardware errors when a context reset does not complete. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Fix crash in cxlflash_restore_luntable()Uma Krishnan
During test, the following crash was observed: [34538.981505] Faulting instruction address: 0xd000000007c9c870 cpu 0x9: Vector: 300 (Data Access) at [c0000007f1e8f590] pc: d000000007c9c870: cxlflash_restore_luntable+0x70/0x1d0 [cxlflash] lr: d000000007c9c84c: cxlflash_restore_luntable+0x4c/0x1d0 [cxlflash] sp: c0000007f1e8f810 msr: 9000000100009033 dar: c00000171d637438 dsisr: 40000000 current = 0xc0000007f1e43f90 paca = 0xc000000007b25100 softe: 0 irq_happened: 0x01 pid = 493, comm = eehd enter ? for help [c0000007f1e8f8a0] d000000007c940b0 init_afu+0xd60/0x1200 [cxlflash] [c0000007f1e8f9a0] d000000007c945a8 cxlflash_pci_slot_reset+0x58/0xe0 [cxlflash] [c0000007f1e8fa20] d00000000715f790 cxl_pci_slot_reset+0x230/0x340 [cxl] [c0000007f1e8fae0] c000000000040dd4 eeh_report_reset+0x144/0x180 [c0000007f1e8fb20] c00000000003f708 eeh_pe_dev_traverse+0x98/0x170 [c0000007f1e8fbb0] c000000000041618 eeh_handle_normal_event+0x328/0x410 [c0000007f1e8fc30] c000000000041db8 eeh_handle_event+0x178/0x330 [c0000007f1e8fce0] c000000000042118 eeh_event_handler+0x1a8/0x1b0 [c0000007f1e8fd80] c00000000011420c kthread+0xec/0x100 [c0000007f1e8fe30] c00000000000a47c ret_from_kernel_thread+0x5c/0xe0 When superpipe mode is disabled for a LUN, the references for the local lun are deleted but the LUN is still identified as being present in the LUN table. This mismatched state can result in the above crash when the LUN table is restored during an error recovery operation. To fix this issue, the local LUN information structure is updated to reflect the LUN is no longer in the LUN table once all references to the LUN are gone. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-30scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONEUma Krishnan
The following Oops is encountered when blk_mq is enabled with the cxlflash driver: [ 2960.817172] Oops: Kernel access of bad area, sig: 11 [#5] [ 2960.817309] NIP __blk_mq_run_hw_queue+0x278/0x4c0 [ 2960.817313] LR __blk_mq_run_hw_queue+0x2bc/0x4c0 [ 2960.817314] Call Trace: [ 2960.817320] __blk_mq_run_hw_queue+0x2bc/0x4c0 (unreliable) [ 2960.817324] blk_mq_run_hw_queue+0xd8/0x100 [ 2960.817329] blk_mq_insert_requests+0x14c/0x1f0 [ 2960.817333] blk_mq_flush_plug_list+0x150/0x190 [ 2960.817338] blk_flush_plug_list+0x11c/0x2b0 [ 2960.817344] blk_finish_plug+0x58/0x80 [ 2960.817348] __do_page_cache_readahead+0x1c0/0x2e0 [ 2960.817352] force_page_cache_readahead+0x68/0xd0 [ 2960.817356] generic_file_read_iter+0x43c/0x6a0 [ 2960.817359] blkdev_read_iter+0x68/0xa0 [ 2960.817361] __vfs_read+0x11c/0x180 [ 2960.817364] vfs_read+0xa4/0x1c0 [ 2960.817366] SyS_read+0x6c/0x110 [ 2960.817369] system_call+0x38/0xb4 The SCSI blk_mq stack assumes that sg_tablesize is always a non-zero value with scsi_mq_setup_tags() allocating tags using sg_tablesize. The cxlflash driver currently uses SG_NONE (0) for the sg_tablesize as the devices it supports are not capable of scatter gather. This mismatch of values results in the Oops above. To resolve this issue, sg_tablesize for cxlflash can simply be set to 1, a value which satisfies the constraints in cxlflash and the lack of support of SG_NONE in SCSI blk_mq. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-14scsi: cxlflash: Fix context reference tracking on detachMatthew R. Ochs
Commit 888baf069f49 ("scsi: cxlflash: Add kref to context") introduced a kref to the context. In particular, the detach routine was updated to use the kref services for managing the removal and destruction of a context. As part of this change, the tracking mechanism internal to the detach handler was refactored. This introduced a bug that can cause the tracking state to be lost. This can lead to a situation where exclusive access to a context is prematurely [and unknowingly] relinquished for the executing thread. To remedy, only update the tracking state when the kref operation indicates the context was removed. Fixes: 888baf069f49 ("scsi: cxlflash: Add kref to context") Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-14scsi: cxlflash: Refactor WWPN setupMatthew R. Ochs
Commit 964497b3bf3f ("cxlflash: Remove dual port online dependency") logically removed the ability for the WWPN setup routine afu_set_wwpn() to return a non-success value. This routine can safely be made a void to simplify the code as there is no longer a need to report a failure. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-14scsi: cxlflash: Improve EEH recovery timeMatthew R. Ochs
When an EEH occurs during device initialization, the port timeout logic can cause excessive delays as MMIO reads will fail. Depending on where they are experienced, these delays can lead to a prolonged reset, causing an unnecessary triggering of other timeout logic in the SCSI stack or user applications. To expedite recovery, the port timeout logic is updated to decay the timeout at a much faster rate when in the presence of a likely EEH frozen event. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-14scsi: cxlflash: Fix to avoid EEH and host reset collisionsMatthew R. Ochs
The EEH reset handler is ignorant to the current state of the driver when processing a frozen event and initiating a device reset. This can be an issue if an EEH event occurs while a user or stack initiated reset is executing. More specifically, if an EEH occurs while the SCSI host reset handler is active, the reset initiated by the EEH thread will likely collide with the host reset thread. This can leave the device in an inconsistent state, or worse, cause a system crash. As a remedy, the EEH handler is updated to evaluate the device state and take appropriate action (proceed, wait, or disconnect host). The host reset handler is also updated to handle situations where an EEH occurred during a host reset. In such situations, the host reset handler will delay reporting back a success to give the EEH reset an opportunity to complete. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-09scsi: cxlflash: Remove the device cleanly in the system shutdown pathUma Krishnan
Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash cards") was recently introduced to notify the AFU when a system is going down. Due to the position of the cxlflash driver in the device stack, cxlflash devices are _always_ removed during a reboot/shutdown. This can lead to a crash if the cxlflash shutdown hook is invoked _after_ the shutdown hook for the owning virtual PHB. Furthermore, the current implementation of shutdown/remove hooks for cxlflash are not tolerant to being invoked when the device is not enabled. This can also lead to a crash in situations where the remove hook is invoked after the device has been removed via the vPHBs shutdown hook. An example of this scenario would be an EEH reset failure while a reboot/shutdown is in progress. To solve both problems, the shutdown hook for cxlflash is updated to simply remove the device. This path already includes the AFU notification and thus this solution will continue to perform the original intent. At the same time, the remove hook is updated to protect against being called when the device is not enabled. Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash cards") Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-09scsi: cxlflash: Scan host only after the port is ready for I/OUma Krishnan
When a port link is established, the AFU sends a 'link up' interrupt. After the link is up, corresponding initialization steps are performed on the card. Following that, when the card is ready for I/O, the AFU sends 'login succeeded' interrupt. Today, cxlflash invokes scsi_scan_host() upon receipt of both interrupts. SCSI commands sent to the port prior to the 'login succeeded' interrupt will fail with 'port not available' error. This is not desirable. Moreover, when async_scan is active for the host, subsequent scan calls are terminated with error. Due to this, the scsi_scan_host() call performed after 'login succeeded' interrupt could portentially return error and the devices may not be scanned properly. To avoid this problem, scsi_scan_host() should be called only after the 'login succeeded' interrupt. Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-23scsi: cxlflash: Remove adapter file descriptor cacheMatthew R. Ochs
The adapter file descriptor was previously cached within the kernel for a given context in order to support performing a close on behalf of an application. This is no longer needed as applications are now required to perform a close on the adapter file descriptor. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-23scsi: cxlflash: Transition to application close modelMatthew R. Ochs
Caching the adapter file descriptor and performing a close on behalf of an application is a poor design. This is due to the fact that once a file descriptor in installed, it is free to be altered without the knowledge of the cxlflash driver. This can lead to inconsistencies between the application and kernel. Furthermore, the nature of the former design is more exploitable and thus should be abandoned. To support applications performing a close on the adapter file that is associated with a context, a new flag is introduced to the user API to indicate to applications that they are responsible for the close following the cleanup (detach) of a context. The documentation is also updated to reflect this change in behavior. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-18scsi: cxlflash: Add kref to contextMatthew R. Ochs
Currently, context user references are tracked via the list of LUNs that have attached to the context. While convenient, this is not intuitive without a deep study of the code and is inconsistent with the existing reference tracking patterns within the kernel. This design choice can lead to future bug injection. To improve code comprehension and better protect against future bugs, add explicit reference counting to contexts and migrate the context removal code to the kref release handler. Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-18scsi: cxlflash: Cache owning adapter within contextMatthew R. Ochs
The context removal routine requires access to the owning adapter structure to reset the context within the AFU as part of the tear down sequence. In order to support kref adoption, the owning adapter must be accessible from the release handler. As the kref framework only provides the kref reference as the sole parameter, another means is needed to derive the owning adapter. As a remedy, the owning adapter reference is saved off within the context during initialization. Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>