Age | Commit message (Collapse) | Author |
|
When moving from manual to automatic power management mode in GOYA, the
driver didn't correctly place the device in LOW power mode. As a result, if
an application was run immediately after the move, it would have run with
low frequencies.
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
If a GAUDI device is present in the system, display an error message that
it is not supported by the current kernel.
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add print upon clock slow down due to power consumption or overheating.
In addition, add print when back to optimal clock.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Use specific values in enum of register map to be able to deprecate old
values.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Sparse reports a warning at goya_hw_queues_unlock()
warning: context imbalance in goya_hw_queues_unlock() - unexpected unlock
The root cause is a missing annotation at goya_hw_queues_unlock()
Add the missing __releases(&goya->hw_queues_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Sparse reports a warning at goya_hw_queues_lock()
warning: context imbalance in goya_hw_queues_lock() - wrong count at exit
The root cause is a missing annotation at goya_hw_queues_lock()
Add the missing __acquires(&goya->hw_queues_lock) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The "parse_cnt" variable is incremented while validating the CS chunks,
but it is actually not being used.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add support for hwmon_in_highest, hwmon_temp_highest and hwmon_curr_highest
attributes. These attributes retrieve the historical maximum voltage,
temperature and current that were sampled, respectively.
Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The hl read and write routines implement the hwmon_ops read and write
interface routines respectively.
These routines are expected to return a completion status when called,
which was not the case until this commit.
This commit modifies these routines to return 0 upon success and a
negative error value upon failure.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
This commit adds support for offsetting the temperatures reading
by a specified value as defined in
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
using the standard sysfs defined for hwmon.
This is required by system administrators to inject errors to test
their monitoring applications in data centers.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The compute engines can perform millions of transactions per second. If
there is a bug in the S/W stack, we could get a lot of interrupts and spam
the kernel log. Therefore, ratelimit these prints
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Allow debug user to write/read 64-bit data through debugfs.
This will expedite the dump process of the (large) internal
memories of the device done during debug.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
DRAM_PHYS_BASE is already taken into account in MMU_PAGE_TABLES_ADDR.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
There is an extra ; after the end of a function, which needs to be removed
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
CS with no chunks for execute phase is invalid, so its
context_switch/restore phase should not be run.
Hence, move the check of the execute chunks number to the beginning of
hl_cs_ioctl().
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
As HL_MAX_JOBS_PER_CS is 512, it is possible that more than 255 CS jobs
will be submitted for a certain queue. Hence, modify the
"jobs_in_queue_cnt" parameter of the "hl_cs" structure to be u16 instead
of u8.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Host memory may be allocated with huge pages.
A different virtual range may be used for mapping in this case.
Add Huge PCI MMU (HPMMU) properties to support it.
This patch is a prerequisite for future ASICs support and has no effect on
Goya ASIC as currently a single virtual host range is used for all page
sizes.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
When no patched command buffer (CB) is created, use the user CB size as
the job size.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx)
within per-page loop.
Signed-off-by: Pawel Piskorski <ppiskorski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
During device memory memset, the driver allocates and use a CB (command
buffer). To reuse existing code, it keeps a pointer to the CB in two
variables, user_cb and patched_cb. Therefore, there is no need to "put"
both the user_cb and patched_cb, as it will cause an underflow of the
refcnt of the CB.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
During hard reset we must not write to the device.
Hence avoid halting CoreSight during user context close if it is done
during hard reset.
In addition, we must not re-enable clock gating afterwards as it was
deliberately disabled in the beginning of the hard reset flow.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The driver must halt the engines before doing hard-reset, otherwise the
device can go into undefined state. There is a place where the driver
didn't do that and this patch fixes it.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/misc/habanalabs/goya/goya.c: In function goya_pldm_init_cpu:
drivers/misc/habanalabs/goya/goya.c:2195:6: warning: variable val set but not used [-Wunused-but-set-variable]
drivers/misc/habanalabs/goya/goya.c: In function goya_hw_init:
drivers/misc/habanalabs/goya/goya.c:2505:6: warning: variable val set but not used [-Wunused-but-set-variable]
Fixes: 9494a8dd8d22 ("habanalabs: add h/w queues module")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
In case a user submits a CS, and the submission fails, and the user doesn't
check the return value and instead use the error return value as a valid
sequence number of a CS and ask to wait on it, the driver will print an
error and return an error code for that wait.
The real problem happens if now the user ignores the error of the wait, and
try to wait again and again. This can lead to a flood of error messages
from the driver and even soft lockup event.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
Prevent accesses to the device (register read/write) from debugfs entries
during reset as that can cause the device to get stuck.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
During hard-reset, there can be multiple events received from the H/W. For
each event, the driver opens a worker thread to handle it. For some of the
events, the driver will read/write registers in the code that handles the
event.
In case of hard-reset, we must prevent reads/writes to the registers during
the reset operation because the device might get stuck if that happens.
Therefore, flush the EQ workers before resetting the device (in hard-reset
only). Additional events won't arrive as we synced and disabled the
interrupts.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
In the hl_device_reset we ask about the hard_reset argument when we want to
differentiate between soft and hard reset, except for three places where
we use "from_hard_reset_thread". Replace one of those locations with the
hard_reset argument as it is guaranteed that if we reached to that
line in the code during hard_reset, it is from a kernel thread.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
Expose both soft and hard reset counts via INFO IOCTL.
This will allow system management applications to easily check
if the device has undergone reset.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Instead of doing if inside if, just write them with && operator.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
Make the code more concise and maintainable by using defines for the F/W
files.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
Successful device initialization is mentioned in kernel log with the
message "Successfully added device to habanalabs driver". There is no point
of spamming the log with additional messages about successful queue
testing, which are implied by the above mentioned message.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
Now that the VA block free list is not updated on context close in order
to optimize this flow, no need in the sanity checks of the list contents
as these will fail for sure.
In addition, remove the "context closing with VA in use" print during hard
reset as this situation is a side effect of the failure that caused the
hard reset.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Reduce context close time by performing MMU cache invalidation once at the
end of the unmap loop rather in each iteration, in order to avoid hard
reset with open contexts.
Reset with open contexts can potentially lead to a kernel crash as the
generic pool of the MMU hops is destroyed while it is not empty because
some unmap operations are not done.
The commit affect mainly when running on simulator.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Reduce context close time by skipping the VA block free list update in
order to avoid hard reset with open contexts.
Reset with open contexts can potentially lead to a kernel crash as the
generic pool of the MMU hops is destroyed while it is not empty because
some unmap operations are not done.
The commit affect mainly when running on simulator.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Reduce context close time by skipping hash table lookup if possible in
order to avoid hard reset with open contexts.
Reset with open contexts can potentially lead to a kernel crash as the
generic pool of the MMU hops is destroyed while it is not empty because
some unmap operations are not done.
This commit affect mainly when running on simulator.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
During hard reset we should not access the device except of necessary
reset operations because the device might be stuck or unresponsive.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Split the properties used for MMU mappings to DRAM and PCI (host) types.
This is a prerequisite for future ASICs support.
Note that in Goya ASIC, the PMMU and DMMU are the same (except of page
sizes) as only one MMU mechanism is used for both of the mapping types.
Hence this patch should not have any effect on current behavior.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Some cosmetics around the MMU code to make it more self-explanatory.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add the ability to invalidate the necessary MMU cache only.
This ability is a prerequisite for future ASICs support.
Note that in Goya ASIC, a single cache is used for both host/DRAM
mappings and hence this patch should not have any effect on current
behavior.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Some of the functions in the memory module code were too long and/or
contained multiple operations that are not always done together. Re-factor
the code by dividing those functions to smaller functions which are more
readable and maintainable.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The two defines that control the maximum size of a command buffer and the
maximum number of JOBS per CS need to be exported to the user as they are
part of the API towards user-space.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
If the queues are full and we return -EAGAIN to the user, there is no need
to print an error, as that case isn't an error and the user is expected to
re-submit the work.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
In training, there is a need for a large amount of patching to the recipe.
This results in many command buffers contains a lot of DMA packets. The
number of command buffers per CS is larger than the current maximum of 64,
which is an arbitrary number that is enough for inference, but it has no
real affect on the code and/or resources of the host machine.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
ETR should always be non-secured as it is used by the users to record
profiling/trace data.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
We have a single ETR block in the SOC, so use explicit register
name defines for initializing this block. This makes it more readable and
maintainable.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
Move the read of the F/W boot versions before exiting on possible failures
of the F/W boot. This will help debug boot failures as we will be able to
know the F/W boot version.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
To enable userspace processes, e.g. management utilities, to display the
card name to the user, add the card name property to the HW_IP
structure that is copied to the user in the INFO IOCTL.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/misc/habanalabs/goya/goya.c: In function 'goya_init_mme_cmdq':
drivers/misc/habanalabs/goya/goya.c:1536:6: warning:
variable 'qman_base_addr' set but not used [-Wunused-but-set-variable]
It is never used, so can be removed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Add a new opcode to the INFO IOCTL to allow the user application to
retrieve the ASIC's current and maximum clock rate. The rate is
returned in MHz.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
Reduce latency to memory during TPC kernel execution.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|