drm/i915: clarify concurrent hang detect/gpu reset consistency

Damien Lespiau wondered how race the gpu reset/hang detection code is against concurrent gpu resets/hang detections or combinations thereof. Luckily the single work item is guranteed to never run concurrently, so reset handling is already single-threaded. Hence we only have to worry about concurrent hang detections, or a hang detection firing off while we're still processing an older gpu reset request. Due to the new mechanism of setting the reset in progress flag and the ordering guaranteed by the schedule_work function there's nothing to do but add a comment explaining why we're safe. The only thing I've noticed is that we still try to reset the gpu now, even when it is declared terminally wedged. Add a check for that to avoid continous warnings about failed resets, in case the hangcheck timer ever gets stuck. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
author: Daniel Vetter <daniel.vetter@ffwll.ch> 2012-12-06 16:23:37 +0100
committer: Daniel Vetter <daniel.vetter@ffwll.ch> 2013-01-21 20:14:59 +0100
commit: 7db0ba242b3a7ddcfc1450bc50eb3102c68f0244 (patch)
tree: ba81f75c7f1906dc9e3bae130a33cc4fb007a2cb /drivers
parent: f69061bedd6ea63f271fe97914364def2f33fc6b (diff)
1 files changed, 13 insertions, 2 deletions
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f833f2c155f8..943db1001cfd 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -875,9 +875,20 @@ static void i915_error_work_func(struct work_struct *work)
 
 	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
 
-	if (i915_reset_in_progress(error)) {
+	/*
+	 * Note that there's only one work item which does gpu resets, so we
+	 * need not worry about concurrent gpu resets potentially incrementing
+	 * error->reset_counter twice. We only need to take care of another
+	 * racing irq/hangcheck declaring the gpu dead for a second time. A
+	 * quick check for that is good enough: schedule_work ensures the
+	 * correct ordering between hang detection and this work item, and since
+	 * the reset in-progress bit is only ever set by code outside of this
+	 * work we don't need to worry about any other races.
+	 */
+	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
-		kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, reset_event);
+		kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE,
+				   reset_event);
 
 		ret = i915_reset(dev);
author	Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-06 16:23:37 +0100
committer	Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-21 20:14:59 +0100
commit	7db0ba242b3a7ddcfc1450bc50eb3102c68f0244 (patch)
tree	ba81f75c7f1906dc9e3bae130a33cc4fb007a2cb /drivers
parent	f69061bedd6ea63f271fe97914364def2f33fc6b (diff)