summaryrefslogtreecommitdiff
path: root/drivers/dax/bus.c
AgeCommit message (Collapse)Author
2020-04-02dax: Move mandatory ->zero_page_range() check in alloc_dax()Vivek Goyal
zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-12-01Merge tag 'libnvdimm-for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The highlight this cycle is continuing integration fixes for PowerPC and some resulting optimizations. Summary: - Updates to better support vmalloc space restrictions on PowerPC platforms. - Cleanups to move common sysfs attributes to core 'struct device_type' objects. - Export the 'target_node' attribute (the effective numa node if pmem is marked online) for regions and namespaces. - Miscellaneous fixups and optimizations" * tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) MAINTAINERS: Remove Keith from NVDIMM maintainers libnvdimm: Export the target_node attribute for regions and namespaces dax: Add numa_node to the default device-dax attributes libnvdimm: Simplify root read-only definition for the 'resource' attribute dax: Simplify root read-only definition for the 'resource' attribute dax: Create a dax device_type libnvdimm: Move nvdimm_bus_attribute_group to device_type libnvdimm: Move nvdimm_attribute_group to device_type libnvdimm: Move nd_mapping_attribute_group to device_type libnvdimm: Move nd_region_attribute_group to device_type libnvdimm: Move nd_numa_attribute_group to device_type libnvdimm: Move nd_device_attribute_group to device_type libnvdimm: Move region attribute group definition libnvdimm: Move attribute groups to device type libnvdimm: Remove prototypes for nonexistent functions libnvdimm/btt: fix variable 'rc' set but not used libnvdimm/pmem: Delete include of nd-core.h libnvdimm/namespace: Differentiate between probe mapping and runtime mapping libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe libnvdimm: Trivial comment fix ...
2019-11-19dax: Add numa_node to the default device-dax attributesDan Williams
It is confusing that device-dax instances publish a 'target_node' attribute, but not a 'numa_node'. The 'numa_node' information is available elsewhere in the sysfs device hierarchy, but it is not obvious and not reliable from one device-dax instance-type (e.g. child devices of nvdimm namespaces) to the next (e.g. 'hmem' devices defined by EFI Specific Purpose Memory and the ACPI HMAT). Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309906102.1582359.4262088001244476001.stgit@dwillia2-desk3.amr.corp.intel.com
2019-11-19dax: Simplify root read-only definition for the 'resource' attributeDan Williams
Rather than update the permission in ->is_visible() set the permission directly at declaration time. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309904959.1582359.7281180042781955506.stgit@dwillia2-desk3.amr.corp.intel.com
2019-11-19dax: Create a dax device_typeDan Williams
Move the open coded release method and attribute groups to a 'struct device_type' instance. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309904365.1582359.5451327195246651379.stgit@dwillia2-desk3.amr.corp.intel.com
2019-11-07dax: Fix alloc_dax_region() compile warningDan Williams
PFN flags are (unsigned long long), fix the alloc_dax_region() calling convention to fix warnings of the form: >> include/linux/pfn_t.h:18:17: warning: large integer implicitly truncated to unsigned type [-Woverflow] #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3)) Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-18Merge tag 'dax-for-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax updates from Dan Williams: "The fruits of a bug hunt in the fsdax implementation with Willy and a small feature update for device-dax: - Fix a hang condition that started triggering after the Xarray conversion of fsdax in the v4.20 kernel. - Add a 'resource' (root-only physical base address) sysfs attribute to device-dax instances to correlate memory-blocks onlined via the kmem driver with a given device instance" * tag 'dax-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix missed wakeup with PMD faults device-dax: Add a 'resource' attribute
2019-07-05libnvdimm: add dax_dev sync flagPankaj Gupta
This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-06-20device-dax: Add a 'resource' attributeVishal Verma
device-dax based devices were missing a 'resource' attribute to indicate the physical address range contributed by the device in question. This information is desirable to userspace tooling that may want to use the dax device as system-ram, and wants to selectively hotplug and online the memory blocks associated with a given device. Without this, the tooling would have to parse /proc/iomem for the memory ranges contributed by dax devices, which can be a workaround, but it is far easier to provide this information in the sysfs hierarchy. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-27device-dax: Add a 'modalias' attribute to DAX 'bus' devicesVishal Verma
Add a 'modalias' attribute to devices under the DAX bus so that userspace is able to dynamically load modules as needed. Normally, udev can get the modalias from 'uevent', and that is correctly set up by the DAX bus. However other tooling such as 'libndctl' for interacting with drivers/nvdimm/, and 'libdaxctl' for drivers/dax/ can also use the modalias to dynamically load modules via libkmod lookups. The 'nd' bus set up by the libnvdimm subsystem exports a modalias attribute. Imitate this to export the same for the 'dax' bus. Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-20device-dax: Add a 'target_node' attributeDan Williams
The target-node attribute is the Linux numa-node that a device-dax instance may create when it is online. Prior to being online the device's 'numa_node' property reflects the closest online cpu node which is the typical expectation of a device 'numa_node'. Once it is online it becomes its own distinct numa node, i.e. 'target_node'. Export the 'target_node' property to give userspace tooling the ability to predict the effective numa-node from a device-dax instance configured to provide 'System RAM' capacity. Cc: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-24device-dax: Auto-bind device after successful new_idDan Williams
The typical 'new_id' attribute behavior is to immediately attach a device to its driver after a new device-id is added. Implement this behavior for the dax bus. Reported-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06acpi/nfit, device-dax: Identify differentiated memory with a unique numa-nodeDan Williams
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06device-dax: Add /sys/class/dax backwards compatibilityDan Williams
On the expectation that some environments may not upgrade libdaxctl (userspace component that depends on the /sys/class/dax hierarchy), provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat driver implements the original /sys/class/dax sysfs layout rather than /sys/bus/dax. When userspace is upgraded it can blacklist this module and switch to the dax_pmem driver going forward. CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according to the dax_pmem entry in Documentation/ABI/obsolete/. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06device-dax: Add support for a dax override driverDan Williams
Introduce the 'new_id' concept for enabling a custom device-driver attach policy for dax-bus drivers. The intended use is to have a mechanism for hot-plugging device-dax ranges into the page allocator on-demand. With this in place the default policy of using device-dax for performance differentiated memory can be overridden by user-space policy that can arrange for the memory range to be managed as 'System RAM' with user-defined NUMA and other performance attributes. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06device-dax: Move resource pinning+mapping into the common driverDan Williams
Move the responsibility of calling devm_request_resource() and devm_memremap_pages() into the common device-dax driver. This is another preparatory step to allowing an alternate personality driver for a device-dax range. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06device-dax: Introduce bus + driver modelDan Williams
In support of multiple device-dax instances per device-dax-region and allowing the 'kmem' driver to attach to dax-instances instead of the current device-node access, convert the dax sub-system from a class to a bus. Recall that the kmem driver takes reserved / special purpose memories and assigns them to be managed by the core-mm. Aside from the fact the device-dax instances are registered and probed on a bus, two other lifetime-management changes are made: 1/ Delay attaching a cdev until driver probe time 2/ A new run_dax() helper is introduced to allow restoring dax-operation after a kill_dax() event. So, at driver ->probe() time we run_dax() and at ->remove() time we kill_dax() and invalidate all mappings. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06device-dax: Start defining a dax bus modelDan Williams
Towards eliminating the dax_class, move the dax-device-attribute enabling to a new bus.c file in the core. The amount of code thrash of sub-sequent patches is reduced as no logic changes are made, just pure code movement. A temporary export of unregister_dex_dax() and dax_attribute_groups is needed to preserve compilation, but those symbols become static again in a follow-on patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>