summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_vma.h
AgeCommit message (Collapse)Author
2017-08-24drm/i915: Ignore duplicate VMA stored within the per-object handle LUTChris Wilson
By using drm_gem_flink/drm_gem_open on an object using the same fd, it is possible for a client to create multiple handles pointing to the same object (tied to the same contexts and VMA), as exemplified by igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible since forever, we cannot assume that the handle:(fpriv, object) is unique and so must handle the multiple users of a single VMA. v2: Added commentary noise. Testcase: igt/gem_close Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355 Fixes: d1b48c1e7184 ("drm/i915: Replace execbuf vma ht with an idr") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk Tested-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: MichaƂ Winiarski <michal.winiarski@intel.com>
2017-08-18drm/i915: Replace execbuf vma ht with an idrChris Wilson
This was the competing idea long ago, but it was only with the rewrite of the idr as an radixtree and using the radixtree directly ourselves, along with the realisation that we can store the vma directly in the radixtree and only need a list for the reverse mapping, that made the patch performant enough to displace using a hashtable. Though the vma ht is fast and doesn't require any extra allocation (as we can embed the node inside the vma), it does require a thread for resizing and serialization and will have the occasional slow lookup. That is hairy enough to investigate alternatives and favour them if equivalent in peak performance. One advantage of allocating an indirection entry is that we can support a single shared bo between many clients, something that was done on a first-come first-serve basis for shared GGTT vma previously. To offset the extra allocations, we create yet another kmem_cache for them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
2017-08-18drm/i915: Convert execbuf to use struct-of-array packing for critical fieldsChris Wilson
When userspace is doing most of the work, avoiding relocs (using NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we still spend a lot of time processing the arrays in execbuf, even though we now should have nothing to do most of the time. One issue that becomes readily apparent in profiling anv is that iterating over the large execobj[] is unfriendly to the loop prefetchers of the CPU and it much prefers iterating over a pair of arrays rather than one big array. v2: Clear vma[] on construction to handle errors during vma lookup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-3-chris@chris-wilson.co.uk
2017-07-27drm/i915: Remove assertion from raw __i915_vma_unpin()Chris Wilson
After we detect a i915_vma pin overflow, we call __i915_vma_unpin to cleanup. However, on an overflow the pin_count bitfield will be zero, triggering an assertion, even though we the intention is to merely warn and report the error back to the user (as historically the culprit has be a leak in the display code). Fixes: 20dfbde463c8 ("drm/i915: Wrap vma->pin_count accessors with small inline helpers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-06-16drm/i915: Stash a pointer to the obj's resv in the vmaChris Wilson
During execbuf, a mandatory step is that we add this request (this fence) to each object's reservation_object. Inside execbuf, we track the vma, and to add the fence to the reservation_object then means having to first chase the obj, incurring another cache miss. We can reduce the number of cache misses by stashing a pointer to the reservation_object in the vma itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
2017-06-16drm/i915: Eliminate lots of iterations over the execobjects arrayChris Wilson
The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16drm/i915: Store a direct lookup from object handle to vmaChris Wilson
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16drm/i915: Make i915_vma_destroy() staticChris Wilson
i915_vma_destroy() is now not used outside of i915_vma.c so we can remove the export and make the function static. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616123508.12673-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2017-06-15drm/i915: Split vma exec_link/evict_linkChris Wilson
Currently the vma has one link member that is used for both holding its place in the execbuf reservation list, and in any eviction list. This dual property is quite tricky and error prone. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk
2017-02-13drm/i915: Exercise i915_vma_pin/i915_vma_insertChris Wilson
High-level testing of the struct drm_mm by verifying our handling of weird requests to i915_vma_pin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-35-chris@chris-wilson.co.uk
2017-01-19drm/i915: Remove i915_gem_object_to_ggtt()Chris Wilson
With the last user of this convenience wrapper gone, we can kill the wrapper and in the process make the lookup function static. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-5-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-19drm/i915: Remove i915_vma_create from VMA APIChris Wilson
With the introduce of i915_vma_instance() for obtaining the VMA singleton for a (obj, vm, view) tuple, we can remove the i915_vma_create() in favour of a single entry point. We do incur a lookup onto an empty tree, but the i915_vma_create() were being called infrequently and during initialisation, so the small overhead is negligible. v2: Drop the i915_ prefix from the now static vma_create() function Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk
2017-01-19drm/i915: Rename some warts in the VMA APIChris Wilson
Whilst writing testcases to exercise the VMA API, some oddities came to light, such as i915_gem_obj_lookup_or_create(). Joonas suggested i915_vma_instance() as a neat replacement, so rename them, move them to i915_vma.c and add some kerneldoc as a sugary bonus. s/i915_gem_obj_to_vma/i915_vma_lookup/ s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/ Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
2017-01-14drm/i915: Convert i915_ggtt_view to use an anonymous unionChris Wilson
Reading the ggtt_views is much more pleasant without the extra characters from specifying the union (i.e. ggtt_view.partial rather than ggtt_view.params.partial). To make this work inside i915_vma_compare() with only a single memcmp requires us to ensure that there are no uninitialised bytes within each branch of the union (we make sure the structs are packed) and we need to store the size of each branch. v4: Rewrite changelog and add comments explaining the assert. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-01-14drm/i915: Compact memcmp in i915_vma_compare()Chris Wilson
In preparation for the next patch to convert to using an anonymous union and leaving the excess bytes in the union uninitialised, we first need to make sure we do not compare using those uninitialised bytes. We also want to preserve the compactness of the code, avoiding a second call to memcmp or introducing a switch, so we take advantage of using the type as an encoded size (as well as a unique identifier for each type of view). v2: Add the rationale for why we encode size into ggtt_view.type as a comment before the memcmp() v3: Use a switch to also assert that no two i915_ggtt_view_type have the same value. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-3-chris@chris-wilson.co.uk
2017-01-13drm/i915: Assert that we have allocated the drm_mm_node upon pinningChris Wilson
We currently check after the slow path that the vma is bound correctly, but we don't currently check after the fast path. This is important in case we accidentally take the fast path and leave the vma misplaced. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170111210937.29252-27-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10drm/i915: Store required fence size/alignment for GGTT vmaChris Wilson
The fence size/alignment is a combination of the vma size plus object tiling parameters. Those parameters are rarely changed, making the fence size/alignemnt roughly constant for the lifetime of the VMA. We can simplify subsequent calculations by precalculating the size/alignment required for GGTT vma taking fencing into account (with an update if we do change the tiling or stride). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
2016-12-15drm/i915: Optimise VMA lookup slightlyTvrtko Ursulin
Cast VM pointers before substraction to save the compiler doing a smart one which includes multiplication. v2: Only keep the first optimisation and prettify it. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1481639847-9214-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-11-29drm/i915: Convert vm->dev backpointer to vm->i915Chris Wilson
99% of the time we access i915_address_space->dev we want the i915 device and not the drm device, so let's store the drm_i915_private backpointer instead. The only real complication here are the inlines in i915_vma.h where drm_i915_private is not yet defined and so we have to choose an alternate path for our asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-18drm/i915: Move frontbuffer CS write tracking from ggtt vma to objectChris Wilson
I tried to avoid having to track the write for every VMA by only tracking writes to the ggtt. However, for the purposes of frontbuffer tracking this is insufficient as we need to invalidate around writes not just to the the ggtt but all aliased ppgtt views of the framebuffer. By moving the critical section to the object and only doing so for framebuffer writes we can reduce the tracking even further by only watching framebuffers and not vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-14drm/i915: Fix test on inputs for vma_compare()Chris Wilson
When supplying a view to vma_compare() it is required that the supplied i915_address_space is the global GTT. I tested the VMA instead (which is the current position in the rbtree and maybe from any address space). (This reapplies commit a44342acde30 ("drm/i915: Fix test on inputs for vma_compare()") as it was lost in the vma split) Reported-by: Matthew Auld <matthew.auld@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98579 Fixes: db6c2b4151f2 ("drm/i915: Store the vma in an rbtree...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161103200852.23431-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Fixes: b42fe9ca0a1e ("drm/i915: Split out i915_vma.c") Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-11drm/i915: Split out i915_vma.cJoonas Lahtinen
As a side product, had to split two other files; - i915_gem_fence_reg.h - i915_gem_object.h (only parts that needed immediate untanglement) I tried to move code in as big chunks as possible, to make review easier. i915_vma_compare was moved to a header temporarily. v2: - Use i915_gem_fence_reg.{c,h} v3: - Rebased v4: - Fix building when DEBUG_GEM is enabled by reordering a bit. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com