Age | Commit message (Collapse) | Author |
|
Add the virtual function PCI device id.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Frank.Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix sparse warning:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:1846:6:
warning: symbol 'deallocate_hiq_sdma_mqd' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c: In function restore_process_worker:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:949:29: warning:
variable pdd set but not used [-Wunused-but-set-variable]
It is not used since
commit 5b87245faf57 ("drm/amdkfd: Simplify kfd2kgd interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The name field in node topology has not been used. We re-purpose it to
hold the asic name, which can be queried by user space applications
through sysfs.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The amdgpu_task_info will be used when printing VM page fault for KFD
processes.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanatha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Linux 5.3-rc3
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Following bitmap layout logic introduced by:
"drm/amdgpu: support get_cu_info for Arcturus".
v2: squash in fixup for gfx_v9_0.c (Alex)
v3: squash in debug print output fix
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VCC moved out of user SGPR allocation in gfx10. It's now stored
in SGPRs 106-107.
Also fixes incorrect SGPR read offsets.
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
These moved from SGPRs in gfx9 to HWREG in gfx10.
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Copy/paste error, first 4 VGPRs are separated by 64 dwords (256 bytes).
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Previously submitted code was taken from an incorrect branch and
was non-functional.
Cc: Oak Zeng <oak.zeng@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-By: Oak Zeng <oak.zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If the trap is entered due to MODE.DEBUG_EN=1 and SAVECTX is raised
concurrently the handler cannot identify the source of the exception.
This causes the debugger to lose single step exception notification
when a context save request arrives at the same time.
When MODE.DEBUG_EN=1 and STATUS.HALT=0 (exception not already handled)
jump to the second-level trap handler upon entering the trap. The
second-level trap will set STATUS.HALT=1 and return to the shader.
If SAVECTX was raised then control flow will return to the trap, which
will then handle the context save request.
Cc: Tony Tye <tony.tye@amd.com>
Cc: Laurent Morichetti <laurent.morichetti@amd.com>
Cc: Qingchuan Shi <qingchuan.shi@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When a wavefront raises TRAPSTS.XNACK_ERROR with STATUS.ALLOW_REPLAY=0
subsequent memory instructions have undefined behavior. In practice
SQC stores continue to work but TCP stores do not.
Context save is permitted to fail after XNACK error because the
wavefront will be halted and subsequently terminated. However the
debugger has an interest in retrieving the wavefront VGPR/LDS state.
Detect the out-of-spec case and use SQC stores during context save
in place of TCP stores.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, this patch silences
the following warning:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c: In function ‘mqd_manager_init_v10’:
./include/linux/dynamic_debug.h:122:52: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __dynamic_func_call(id, fmt, func, ...) do { \
^
./include/linux/dynamic_debug.h:143:2: note: in expansion of macro ‘__dynamic_func_call’
__dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~
./include/linux/dynamic_debug.h:153:2: note: in expansion of macro ‘_dynamic_func_call’
_dynamic_func_call(fmt, __dynamic_pr_debug, \
^~~~~~~~~~~~~~~~~~
./include/linux/printk.h:336:2: note: in expansion of macro ‘dynamic_pr_debug’
dynamic_pr_debug(fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:432:3: note: in expansion of macro ‘pr_debug’
pr_debug("%s@%i\n", __func__, __LINE__);
^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:433:2: note: here
case KFD_MQD_TYPE_COMPUTE:
^~~~
by removing the call to pr_debug() in KFD_MQD_TYPE_CP:
"The mqd init for CP and COMPUTE will have the same routine." [1]
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
[1] https://lore.kernel.org/lkml/c735a1cc-a545-50fb-44e7-c0ad93ee8ee7@amd.com/
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
Add missing break statement in order to prevent the code from falling
through to case CHIP_NAVI10.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: 14328aa58ce5 ("drm/amdkfd: Add navi10 support to amdkfd. (v3)")
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
Pull drm fixes from Daniel Vetter:
"Dave is back in shape, but now family got it so I'm doing the pull.
Two things worthy of note:
- nouveau feature pull was way too late, Dave&me decided to not take
that, so Ben spun up a pull with just the fixes.
- after some chatting with the arm display maintainers we decided to
change a bit how that's maintained, for more oversight/review and
cross vendor collab.
More details below:
nouveau:
- bugfixes
- TU116 enabling (minor iteration) :w
amdgpu:
- large pile of fixes for new hw support this release (navi, vega20)
- audio hotplug fix
- bunch of corner cases and small fixes all over for amdgpu/kfd
komeda:
- back out some new properties (from this merge window) that needs
more pondering.
bochs:
- fb pitch setup
core:
- a new panel quirk
- misc fixes"
* tag 'drm-next-2019-07-19' of git://anongit.freedesktop.org/drm/drm: (73 commits)
drm/nouveau/secboot/gp102-: remove WAR for SEC2 RTOS start bug
drm/nouveau/flcn/gp102-: improve implementation of bind_context() on SEC2/GSP
drm/nouveau: fix memory leak in nouveau_conn_reset()
drm/nouveau/dmem: missing mutex_lock in error path
drm/nouveau/hwmon: return EINVAL if the GPU is powered down for sensors reads
drm/nouveau: fix bogus GPL-2 license header
drm/nouveau: fix bogus GPL-2 license header
drm/nouveau/i2c: Enable i2c pads & busses during preinit
drm/nouveau/disp/tu102-: wire up scdc parameter setter
drm/nouveau/core: recognise TU116 chipset
drm/nouveau/kms: disallow dual-link harder if hdmi connection detected
drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling
drm/nouveau/disp/nv50-: force scaler for any non-default LVDS/eDP modes
drm/nouveau/mcp89/mmu: Use mcp77_mmu_new instead of g84_mmu_new on MCP89.
drm/amd/display: init res_pool dccg_ref, dchub_ref with xtalin_freq
drm/amdgpu/pm: remove check for pp funcs in freq sysfs handlers
drm/amd/display: Force uclk to max for every state
drm/amdkfd: Remove GWS from process during uninit
drm/amd/amdgpu: Fix offset for vmid selection in debugfs interface
drm/amd/powerplay: update vega20 driver if to fit latest SMU firmware
...
|
|
GPU cache info (part of virtual CRAT) size depends on CU number.
For arcturus, CU number has been increased. So the required memory
for vcrat also increases.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
ACC VGPRs are a secondary VGPR set of same size as the primary VGPRs.
Save them as a block immediately following VGPRs.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add pci device ids.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
CWSR (compute wave save/restore) is used for
preempting compute queues.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
some sdma engines are optimized for xgmi on arcturus.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Handle interrupts for second mmhub.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In the original formula, when sdma queue number is 64,
the left shift overflows. Use an equivalence that won't
overflow.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Arcturus has 8 sdma engines
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add initial support for ARCTURUS to kfd.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extend map_process and set_resources pm4 packet to support
bigger gds size for arcturus.
v2: Only make the change for v9
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extend map_queue and unmap_queue PM4 packets to support 8
SDMA engines. The new format is backward compatible.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If we shut down a process without having destroyed its GWS-using
queues, it is possible that GWS BO will still be in the process
BO list during the gpuvm destruction. This list should be empty
at that time, so we should remove the GWS allocation at the
process uninit point if it is still around.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Apply the same setting to SH_MEM_CONFIG and VM_CONTEXT1_CNTL. This
makes the noretry param no longer KFD-specific. On GFX10 I'm not
changing SH_MEM_CONFIG in this commit because GFX10 has different
retry behaviour in the SQ and I don't have a way to test it at the
moment.
Suggested-by: Christian König <Christian.Koenig@amd.com>
CC: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by : Shaoyun.liu < Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull drm updates from Dave Airlie:
"The biggest thing in this is the AMD Navi GPU support, this again
contains a bunch of header files that are large. These are the new AMD
RX5700 GPUs that just recently became available.
New drivers:
- ST-Ericsson MCDE driver
- Ingenic JZ47xx SoC
UAPI change:
- HDR source metadata property
Core:
- HDR inforframes and EDID parsing
- drm hdmi infoframe unpacking
- remove prime sg_table caching into dma-buf
- New gem vram helpers to reduce driver code
- Lots of drmP.h removal
- reservation fencing fix
- documentation updates
- drm_fb_helper_connector removed
- mode name command handler rewrite
fbcon:
- Remove the fbcon notifiers
ttm:
- forward progress fixes
dma-buf:
- make mmap call optional
- debugfs refcount fixes
- dma-fence free with pending signals fix
- each dma-buf gets an inode
Panels:
- Lots of additional panel bindings
amdgpu:
- initial navi10 support
- avoid hw reset
- HDR metadata support
- new thermal sensors for vega asics
- RAS fixes
- use HMM rather than MMU notifier
- xgmi topology via kfd
- SR-IOV fixes
- driver reload fixes
- DC use a core bpc attribute
- Aux fixes for DC
- Bandwidth calc updates for DC
- Clock handling refactor
- kfd VEGAM support
vmwgfx:
- Coherent memory support changes
i915:
- HDR Support
- HDMI i2c link
- Icelake multi-segmented gamma support
- GuC firmware update
- Mule Creek Canyon PCH support for EHL
- EHL platform updtes
- move i915.alpha_support to i915.force_probe
- runtime PM refactoring
- VBT parsing refactoring
- DSI fixes
- struct mutex dependency reduction
- GEM code reorg
mali-dp:
- Komeda driver features
msm:
- dsi vs EPROBE_DEFER fixes
- msm8998 snapdragon 835 support
- a540 gpu support
- mdp5 and dpu interconnect support
exynos:
- drmP.h removal
tegra:
- misc fixes
tda998x:
- audio support improvements
- pixel repeated mode support
- quantisation range handling corrections
- HDMI vendor info fix
armada:
- interlace support fix
- overlay/video plane register handling refactor
- add gamma support
rockchip:
- RX3328 support
panfrost:
- expose perf counters via hidden ioctls
vkms:
- enumerate CRC sources list
ast:
- rework BO handling
mgag200:
- rework BO handling
dw-hdmi:
- suspend/resume support
rcar-du:
- R8A774A1 Soc Support
- LVDS dual-link mode support
- Additional formats
- Misc fixes
omapdrm:
- DSI command mode display support
stm
- fb modifier support
- runtime PM support
sun4i:
- use vmap ops
vc4:
- binner bo binding rework
v3d:
- compute shader support
- resync/sync fixes
- job management refactoring
lima:
- NULL pointer in irq handler fix
- scheduler default timeout
virtio:
- fence seqno support
- trace events
bochs:
- misc fixes
tc458767:
- IRQ/HDP handling
sii902x:
- HDMI audio support
atmel-hlcdc:
- misc fixes
meson:
- zpos support"
* tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm: (1815 commits)
Revert "Merge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux into drm-next"
Revert "mm: adjust apply_to_pfn_range interface for dropped token."
mm: adjust apply_to_pfn_range interface for dropped token.
drm/amdgpu/navi10: add uclk activity sensor
drm/amdgpu: properly guard the generic discovery code
drm/amdgpu: add missing documentation on new module parameters
drm/amdgpu: don't invalidate caches in RELEASE_MEM, only do the writeback
drm/amd/display: avoid 64-bit division
drm/amdgpu/psp11: simplify the ucode register logic
drm/amdgpu: properly guard DC support in navi code
drm/amd/powerplay: vega20: fix uninitialized variable use
drm/amd/display: dcn20: include linux/delay.h
amdgpu: make pmu support optional
drm/amd/powerplay: Zero initialize current_rpm in vega20_get_fan_speed_percent
drm/amd/powerplay: Zero initialize freq in smu_v11_0_get_current_clk_freq
drm/amd/powerplay: Use memset to initialize metrics structs
drm/amdgpu/mes10.1: Fix header guard
drm/amd/powerplay: add temperature sensor support for navi10
drm/amdgpu: fix scheduler timeout calc
drm/amdgpu: Prepare for hmm_range_register API change (v2)
...
|
|
The cp hang occurs in OCL conformance test only on supermicro
platform which has 40 cores and the test generates 40 threads.
The root cause is race condition in non-protected flags.
The fix is to add flags of is_evicted and is_active(init_mqd())
into protected area.
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This works around difficult-to-reproduce soft hangs on oversubscribed
runlists.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
New flag to disable an idle runlist optimization that is causing soft
hangs with some diffult-to-reproduce customer workloads. This will
serve as a workaround until the problem can be reproduced and the
root-cause determined.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Oversubscription of queues or processes results in poor performance
mostly because HWS blinbly schedules busy and idle queues, resulting
in poor occupancy if many queues are idle.
Let users know with a warning message when transitioning from a
non-oversubscribed to an oversubscribed runlist.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Just for cleanup.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Since amdgpu has always requested PCIE atomics, kfd don't
need duplicated PCIE atomics enablement. Referring to amdgpu
request result is enough.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In XGMI configuration, more than one asic can be reset at same time,
kfd is able to handle this and no need to trigger the warning
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Previous kfd doesn't use gws so this mask was set to 0.
Set it to 64 bit 1s because now kfd can use all 64 gws
resources.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This makes boot uniformly boottime and tai uniformly clocktai, to
address the remaining oversights.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
|
|
KFD (kernel fusion driver) is the kernel driver
for the compute backend for usermode compute
stack.
v2: squash in updates (Alex)
v3: squash in rebase fixes (Alex)
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a folder structure to /sys/class/kfd/kfd/ called proc which contains
subfolders, each representing an active KFD process' PID, containing 1
file: pasid.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA queue allocation requires the dqm lock as it modify
the global dqm members. Enclose it in the dqm_lock.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The idea to break the circular lock dependency is to temporarily drop
dqm lock before calling allocate_mqd. See callstack #1 below.
[ 59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
[ 513.604034] ======================================================
[ 513.604205] WARNING: possible circular locking dependency detected
[ 513.604375] 4.18.0-kfd-root #2 Tainted: G W
[ 513.604530] ------------------------------------------------------
[ 513.604699] kswapd0/611 is trying to acquire lock:
[ 513.604840] 00000000d254022e (&dqm->lock_hidden){+.+.}, at: evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.605150]
but task is already holding lock:
[ 513.605307] 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
[ 513.605540]
which lock already depends on the new lock.
[ 513.605747]
the existing dependency chain (in reverse order) is:
[ 513.605944]
-> #4 (&anon_vma->rwsem){++++}:
[ 513.606106] __vma_adjust+0x147/0x7f0
[ 513.606231] __split_vma+0x179/0x190
[ 513.606353] mprotect_fixup+0x217/0x260
[ 513.606553] do_mprotect_pkey+0x211/0x380
[ 513.606752] __x64_sys_mprotect+0x1b/0x20
[ 513.606954] do_syscall_64+0x50/0x1a0
[ 513.607149] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 513.607380]
-> #3 (&mapping->i_mmap_rwsem){++++}:
[ 513.607678] rmap_walk_file+0x1f0/0x280
[ 513.607887] page_referenced+0xdd/0x180
[ 513.608081] shrink_page_list+0x853/0xcb0
[ 513.608279] shrink_inactive_list+0x33b/0x700
[ 513.608483] shrink_node_memcg+0x37a/0x7f0
[ 513.608682] shrink_node+0xd8/0x490
[ 513.608869] balance_pgdat+0x18b/0x3b0
[ 513.609062] kswapd+0x203/0x5c0
[ 513.609241] kthread+0x100/0x140
[ 513.609420] ret_from_fork+0x24/0x30
[ 513.609607]
-> #2 (fs_reclaim){+.+.}:
[ 513.609883] kmem_cache_alloc_trace+0x34/0x2e0
[ 513.610093] reservation_object_reserve_shared+0x139/0x300
[ 513.610326] ttm_bo_init_reserved+0x291/0x480 [ttm]
[ 513.610567] amdgpu_bo_do_create+0x1d2/0x650 [amdgpu]
[ 513.610811] amdgpu_bo_create+0x40/0x1f0 [amdgpu]
[ 513.611041] amdgpu_bo_create_reserved+0x249/0x2d0 [amdgpu]
[ 513.611290] amdgpu_bo_create_kernel+0x12/0x70 [amdgpu]
[ 513.611584] amdgpu_ttm_init+0x2cb/0x560 [amdgpu]
[ 513.611823] gmc_v9_0_sw_init+0x400/0x750 [amdgpu]
[ 513.612491] amdgpu_device_init+0x14eb/0x1990 [amdgpu]
[ 513.612730] amdgpu_driver_load_kms+0x78/0x290 [amdgpu]
[ 513.612958] drm_dev_register+0x111/0x1a0
[ 513.613171] amdgpu_pci_probe+0x11c/0x1e0 [amdgpu]
[ 513.613389] local_pci_probe+0x3f/0x90
[ 513.613581] pci_device_probe+0x102/0x1c0
[ 513.613779] driver_probe_device+0x2a7/0x480
[ 513.613984] __driver_attach+0x10a/0x110
[ 513.614179] bus_for_each_dev+0x67/0xc0
[ 513.614372] bus_add_driver+0x1eb/0x260
[ 513.614565] driver_register+0x5b/0xe0
[ 513.614756] do_one_initcall+0xac/0x357
[ 513.614952] do_init_module+0x5b/0x213
[ 513.615145] load_module+0x2542/0x2d30
[ 513.615337] __do_sys_finit_module+0xd2/0x100
[ 513.615541] do_syscall_64+0x50/0x1a0
[ 513.615731] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 513.615963]
-> #1 (reservation_ww_class_mutex){+.+.}:
[ 513.616293] amdgpu_amdkfd_alloc_gtt_mem+0xcf/0x2c0 [amdgpu]
[ 513.616554] init_mqd+0x223/0x260 [amdgpu]
[ 513.616779] create_queue_nocpsch+0x4d9/0x600 [amdgpu]
[ 513.617031] pqm_create_queue+0x37c/0x520 [amdgpu]
[ 513.617270] kfd_ioctl_create_queue+0x2f9/0x650 [amdgpu]
[ 513.617522] kfd_ioctl+0x202/0x350 [amdgpu]
[ 513.617724] do_vfs_ioctl+0x9f/0x6c0
[ 513.617914] ksys_ioctl+0x66/0x70
[ 513.618095] __x64_sys_ioctl+0x16/0x20
[ 513.618286] do_syscall_64+0x50/0x1a0
[ 513.618476] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 513.618695]
-> #0 (&dqm->lock_hidden){+.+.}:
[ 513.618984] __mutex_lock+0x98/0x970
[ 513.619197] evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.619459] kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
[ 513.619710] kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
[ 513.620103] amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
[ 513.620363] amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
[ 513.620614] __mmu_notifier_invalidate_range_start+0x70/0xb0
[ 513.620851] try_to_unmap_one+0x7fc/0x8f0
[ 513.621049] rmap_walk_anon+0x121/0x290
[ 513.621242] try_to_unmap+0x93/0xf0
[ 513.621428] shrink_page_list+0x606/0xcb0
[ 513.621625] shrink_inactive_list+0x33b/0x700
[ 513.621835] shrink_node_memcg+0x37a/0x7f0
[ 513.622034] shrink_node+0xd8/0x490
[ 513.622219] balance_pgdat+0x18b/0x3b0
[ 513.622410] kswapd+0x203/0x5c0
[ 513.622589] kthread+0x100/0x140
[ 513.622769] ret_from_fork+0x24/0x30
[ 513.622957]
other info that might help us debug this:
[ 513.623354] Chain exists of:
&dqm->lock_hidden --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem
[ 513.623900] Possible unsafe locking scenario:
[ 513.624189] CPU0 CPU1
[ 513.624397] ---- ----
[ 513.624594] lock(&anon_vma->rwsem);
[ 513.624771] lock(&mapping->i_mmap_rwsem);
[ 513.625020] lock(&anon_vma->rwsem);
[ 513.625253] lock(&dqm->lock_hidden);
[ 513.625433]
*** DEADLOCK ***
[ 513.625783] 3 locks held by kswapd0/611:
[ 513.625967] #0: 00000000f14edf84 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
[ 513.626309] #1: 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
[ 513.626671] #2: 0000000067b5cd12 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x5/0xb0
[ 513.627037]
stack backtrace:
[ 513.627292] CPU: 0 PID: 611 Comm: kswapd0 Tainted: G W 4.18.0-kfd-root #2
[ 513.627632] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 513.627990] Call Trace:
[ 513.628143] dump_stack+0x7c/0xbb
[ 513.628315] print_circular_bug.isra.37+0x21b/0x228
[ 513.628581] __lock_acquire+0xf7d/0x1470
[ 513.628782] ? unwind_next_frame+0x6c/0x4f0
[ 513.628974] ? lock_acquire+0xec/0x1e0
[ 513.629154] lock_acquire+0xec/0x1e0
[ 513.629357] ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.629587] __mutex_lock+0x98/0x970
[ 513.629790] ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.630047] ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.630309] ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.630562] evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
[ 513.630816] kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
[ 513.631057] kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
[ 513.631288] amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
[ 513.631536] amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
[ 513.632076] __mmu_notifier_invalidate_range_start+0x70/0xb0
[ 513.632299] try_to_unmap_one+0x7fc/0x8f0
[ 513.632487] ? page_lock_anon_vma_read+0x68/0x250
[ 513.632690] rmap_walk_anon+0x121/0x290
[ 513.632875] try_to_unmap+0x93/0xf0
[ 513.633050] ? page_remove_rmap+0x330/0x330
[ 513.633239] ? rcu_read_unlock+0x60/0x60
[ 513.633422] ? page_get_anon_vma+0x160/0x160
[ 513.633613] shrink_page_list+0x606/0xcb0
[ 513.633800] shrink_inactive_list+0x33b/0x700
[ 513.633997] shrink_node_memcg+0x37a/0x7f0
[ 513.634186] ? shrink_node+0xd8/0x490
[ 513.634363] shrink_node+0xd8/0x490
[ 513.634537] balance_pgdat+0x18b/0x3b0
[ 513.634718] kswapd+0x203/0x5c0
[ 513.634887] ? wait_woken+0xb0/0xb0
[ 513.635062] kthread+0x100/0x140
[ 513.635231] ? balance_pgdat+0x3b0/0x3b0
[ 513.635414] ? kthread_delayed_work_timer_fn+0x80/0x80
[ 513.635626] ret_from_fork+0x24/0x30
[ 513.636042] Evicting PASID 32768 queues
[ 513.936236] Restoring PASID 32768 queues
[ 524.708912] Evicting PASID 32768 queues
[ 524.999875] Restoring PASID 32768 queues
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 06b89b38f3cc518a761164f9f958a9607bbb3587.
This fix is not proper. allocate_mqd can't be moved before
allocate_sdma_queue as it depends on q->properties->sdma_id
set in later.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit f77dac6cd62e5d4bcadd740620af1218bfb54cc6.
This fix is not proper. allocate_mqd can't be moved before
allocate_sdma_queue as it depends on q->properties->sdma_id
set in later.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We can't have devices that are not completely initialized in kfd topology.
Otherwise it is a race condition when user access not completely
initialized device. This also addresses a kfd_topology_add_device accessing
NULL dqm pointer issue.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move HSA_CAP_ATS_PRESENT initialization logic from kfd iommu codes to
kfd topology codes. This removes kfd_iommu_device_init's dependency
on kfd_topology_add_device. Also remove duplicate code setting the
same.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA queue allocation requires the dqm lock at it modify
the global dqm members. Move up the dqm_lock so sdma
queue allocation is enclosed in the critical section. Move
mqd allocation out of critical section to avoid circular
lock dependency.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|