Age | Commit message (Collapse) | Author |
|
update files to latest version from F/W team.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As part of the securing GAUDI, the F/W will configure the PCI iATU
regions. If the driver identifies a secured PCI ID, it will know to
skip iATU configuration in a very early stage.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As F/ security indication must be available before driver approaches
PCI bus, F/W security should be derived from PCI id rather than be
fetched during boot handshake with F/W.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
DRAM scrubbing can take time hence it adds to latency during allocation.
To minimize latency during initialization, scrubbing is moved to release
call.
In case scrubbing fails it means the device is in a bad state,
hence HARD reset is initiated.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Clearing QM errors by the driver will prevent these H/W blocks from
stopping in case they are configured to stop on errors, so perform this
clearing only if this mode is not in use.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit.
On boot-up, the driver will therefore fail inserting the device.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Prefer the use of strscpy when copying the ASIC name into a char array,
to prevent accidentally exceeding the array's length.
In addition, strlcpy is frowned upon so replace it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When trying to debug program, the user often needs to
dump large parts of the device's DRAM, which can reach to tens of GBs.
Because reading from the device's internal memory through the PCI BAR
is extremely slow, the debug can take hours.
Instead, we can provide the user to copy data through one of the DMA
engines. This will make the operation much faster.
Currently, only GAUDI is supported.
In GAUDI, we need to find a PCI DMA engine that is IDLE and set the
DMA as secured to be able to bypass our MMU as we currently don't
map the temporary buffer to the MMU.
Example bash one-line to dump entire HBM to file (~2 minutes):
for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \
printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \
echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \
sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Since we moved the SOB reset flow to workqueue and
not part of the fence release flow, we might reach a
scenario where new context is created while we in the middle
of resetting the SOB.
in such cases the reset may fail due to idle check.
This will mess up the streams sync since the SOB value is invalid.
so we protect this area with a mutex, to delay context creation.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is a need to allow to user to send command submissions with
custom timeout as some CS take longer than the max timeout that is
used by default.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The new approach is based on the notion that the relative
current power consumption is in relation of proportionality
to device's true utilization.
Utilization info ranges between [0,100]%
Currently, dc_power values are hard-coded.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to shorten the time cs lock is being held, we move any
possible work outside of the cs lock.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add a little sleep between page unmappings in case mapping of
large number of host pages failed, in order to
avoid soft lockup bug during the rollback.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update with latest version from the Firmware team.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Unsecure relevant registers as TPC engine need access to
TPC status.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The device can get into deadlock in case it use indirect mode for MSI
interrupts (multi-msi) and have hard-reset during interrupt storm.
To prevent that, always use direct mode which means single-msi mode.
The F/W will prevent the host from writing to the indirect MSI
registers to prevent any malicious user from causing this scenario.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In case the BMC of the devices' box wants to initiate a reset of
a specific device, it must go through driver.
Once driver will receive the request it will initiate a hard reset
flow.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to have a better debuggability we allow debugfs access
to user mmu mapped host memory. Non-user host memory access will be
rejected.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
fixed the following coccicheck:
./drivers/misc/habanalabs/common/sysfs.c:347:60-61: WARNING opportunity
for kobj_to_dev()
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update to the latest version of the file as supplied by the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
if reset is due to heartbeat, device CPU is no responsive in which
case no point sending PCI disable message to it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As there are incorrect assumptions in which some of the
initialization and data path flows cannot sleep, most allocations
are being done using GFP_ATOMIC.
We modify the code to use GFP_ATOMIC only when realy needed, as
sleepable flow should use GFP_KERNEL.
In addition add a fallback to allocate memory using GFP_KERNEL,
once ATOMIC allocation fails.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Update to the latest definition of the firmware
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add driver implementation for reading the current power from the device
CPU F/W.
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Improve "vm" debugfs node to print also the virtual addresses which are
currently mapped to HW blocks in the device.
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
For simplicity, use a single bringup flag indicating which FW
binaries should loaded to device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Timeout in wait for interrupt is in 32-bit variable so we need to use
the correct maximum value to compare.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to support command submissions from user space, the driver
need to add support for user interrupt completions. The driver will
allow multiple user threads to wait for an interrupt and perform
a comparison with a given user address once interrupt expires.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to support user interrupts, driver must enable all MSI-X
interrupts for any case user will trigger them. We differentiate
between a valid user interrupt and a non valid one.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
As the F/wW is the first to detect out of sync event, a new event is
added to notify the driver on such event. In which case the driver
performs hard reset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Because our graph contains network operations, we need to account
for delay in the network.
5 seconds timeout per CS is not enough to account for that.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Notify to the user that although he closed the FD, the device is
still in use because there are live CS and/or memory mappings (mmaps).
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Move the field to correct location in structure and remove comment.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
After any reset (soft or hard) the device (the engines/QMANs) should
be idle. If they are not idle, fail the reset. If it is soft-reset,
the driver will try to do hard-reset automatically. If it is hard-reset,
the driver will make the device non-operational.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The device is actually released only after the refcnt of the hpriv
structure is 0, which means all its contexts were closed.
If we reset the device while a context is still open, there are
possibilities for unexpected behavior and crashes. For example, if the
process has a mapping of a register block that is now currently being
reset, and the process writes/reads to that block during the reset,
the device can get stuck.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In order to support command submissions that are done directly from
user space, the driver must perform soft reset once user closes its FD.
In case the soft reset fails or device is not idle, a hard reset should
be performed.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
currently we support only 2 asids in all asics.
asid 0 for driver, and asic 1 for user.
no need to setup 1024 asids configurations at init phase.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add Synopsys DesignWare xData IP driver. This driver enables/disables
the PCI traffic generator module pertain to the Synopsys DesignWare
prototype.
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/daa1efe23850e77d6807dc3f371728fc0b7548b8.1617016509.git.gustavo.pimentel@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
KMSAN complains that vmci_check_host_caps() left the payload part of
check_msg uninitialized.
=====================================================
BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G B 5.11.0-rc7+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
Call Trace:
dump_stack+0x21c/0x280
kmsan_report+0xfb/0x1e0
kmsan_internal_check_memory+0x202/0x520
kmsan_check_memory+0xd/0x10
iowrite8_rep+0x86/0x380
vmci_guest_probe_device+0xf0b/0x1e70
pci_device_probe+0xab3/0xe70
really_probe+0xd16/0x24d0
driver_probe_device+0x29d/0x3a0
device_driver_attach+0x25a/0x490
__driver_attach+0x78c/0x840
bus_for_each_dev+0x210/0x340
driver_attach+0x89/0xb0
bus_add_driver+0x677/0xc40
driver_register+0x485/0x8e0
__pci_register_driver+0x1ff/0x350
vmci_guest_init+0x3e/0x41
vmci_drv_init+0x1d6/0x43f
do_one_initcall+0x39c/0x9a0
do_initcall_level+0x1d7/0x259
do_initcalls+0x127/0x1cb
do_basic_setup+0x33/0x36
kernel_init_freeable+0x29a/0x3ed
kernel_init+0x1f/0x840
ret_from_fork+0x1f/0x30
Uninit was created at:
kmsan_internal_poison_shadow+0x5c/0xf0
kmsan_slab_alloc+0x8d/0xe0
kmem_cache_alloc+0x84f/0xe30
vmci_guest_probe_device+0xd11/0x1e70
pci_device_probe+0xab3/0xe70
really_probe+0xd16/0x24d0
driver_probe_device+0x29d/0x3a0
device_driver_attach+0x25a/0x490
__driver_attach+0x78c/0x840
bus_for_each_dev+0x210/0x340
driver_attach+0x89/0xb0
bus_add_driver+0x677/0xc40
driver_register+0x485/0x8e0
__pci_register_driver+0x1ff/0x350
vmci_guest_init+0x3e/0x41
vmci_drv_init+0x1d6/0x43f
do_one_initcall+0x39c/0x9a0
do_initcall_level+0x1d7/0x259
do_initcalls+0x127/0x1cb
do_basic_setup+0x33/0x36
kernel_init_freeable+0x29a/0x3ed
kernel_init+0x1f/0x840
ret_from_fork+0x1f/0x30
Bytes 28-31 of 36 are uninitialized
Memory access of size 36 starts at ffff8881675e5f00
=====================================================
Fixes: 1f166439917b69d3 ("VMCI: guest side driver implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20210402121742.3917-2-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
KMSAN complains that the vmci_use_ppn64() == false path in
vmci_dbell_register_notification_bitmap() left upper 32bits of
bitmap_set_msg.bitmap_ppn64 member uninitialized.
=====================================================
BUG: KMSAN: uninit-value in kmsan_check_memory+0xd/0x10
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc7+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
Call Trace:
dump_stack+0x21c/0x280
kmsan_report+0xfb/0x1e0
kmsan_internal_check_memory+0x484/0x520
kmsan_check_memory+0xd/0x10
iowrite8_rep+0x86/0x380
vmci_send_datagram+0x150/0x280
vmci_dbell_register_notification_bitmap+0x133/0x1e0
vmci_guest_probe_device+0xcab/0x1e70
pci_device_probe+0xab3/0xe70
really_probe+0xd16/0x24d0
driver_probe_device+0x29d/0x3a0
device_driver_attach+0x25a/0x490
__driver_attach+0x78c/0x840
bus_for_each_dev+0x210/0x340
driver_attach+0x89/0xb0
bus_add_driver+0x677/0xc40
driver_register+0x485/0x8e0
__pci_register_driver+0x1ff/0x350
vmci_guest_init+0x3e/0x41
vmci_drv_init+0x1d6/0x43f
do_one_initcall+0x39c/0x9a0
do_initcall_level+0x1d7/0x259
do_initcalls+0x127/0x1cb
do_basic_setup+0x33/0x36
kernel_init_freeable+0x29a/0x3ed
kernel_init+0x1f/0x840
ret_from_fork+0x1f/0x30
Local variable ----bitmap_set_msg@vmci_dbell_register_notification_bitmap created at:
vmci_dbell_register_notification_bitmap+0x50/0x1e0
vmci_dbell_register_notification_bitmap+0x50/0x1e0
Bytes 28-31 of 32 are uninitialized
Memory access of size 32 starts at ffff88810098f570
=====================================================
Fixes: 83e2ec765be03e8a ("VMCI: doorbell implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20210402121742.3917-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the char/misc fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix sparse warnings:
drivers/misc/pvpanic/pvpanic.c:28:18: warning:
symbol 'pvpanic_list' was not declared. Should it be static?
drivers/misc/pvpanic/pvpanic.c:29:12: warning:
symbol 'pvpanic_lock' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20210331121706.15268-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In case of error, the function pci_iomap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Link: https://lore.kernel.org/r/20210330013659.916-1-linqiheng@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for pvpanic PCI device added in qemu [1]. At probe time, obtain the
address where to read/write pvpanic events and pass it to the generic handling
code. Will follow the same logic as pvpanic MMIO device driver. At remove time,
unmap base address and disable PCI device.
[1] https://github.com/qemu/qemu/commit/9df52f58e76e904fb141b10318362d718f470db2
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-4-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Create the mecahism that allows multiple pvpanic instances to call
pvpanic_probe and receive panic events. A global list will retain all the
mapped addresses where to write panic events.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-3-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Split-up generic and platform dependent code in order to be able to re-use
generic event handling code in pvpanic PCI device driver in the next patches.
The code from pvpanic.c was split in two new files:
- pvpanic.c: generic code that handles pvpanic events
- pvpanic-mmio.c: platform/bus dependent code
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-2-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
delete unneeded variable initialization.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Link: https://lore.kernel.org/r/1616749747-3882-1-git-send-email-yekai13@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently kgdbts can get stuck waiting for do_sys_open() to be called
in some of the current tests. This is because C compilers often
automatically inline this function, which is a very thin wrapper around
do_sys_openat2(), into some of its callers. gcc-10 does this on (at least)
both x86 and arm64.
We can fix the test suite by placing the breakpoints on do_sys_openat2()
instead since that isn't (currently) inlined. However do_sys_openat2() is
a static function so we cannot simply use an addressof. Since we are
testing debug machinery it is acceptable to use kallsyms to lookup a
suitable address because this is more or less what kdb does in the same
circumstances. Re-implement lookup_addr() to be based on kallsyms rather
than function pointers.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210325094807.3546702-1-daniel.thompson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|