Introduce chunked MegaBuffer allocation

After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage. This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2025-07-17 08:46:39 +00:00 · 2022-08-07 02:59:33 +05:30
parent 99b5fc35c6
commit 5b7572a8b3
12 changed files with 218 additions and 183 deletions
--- a/app/src/main/cpp/skyline/gpu/buffer.h
+++ b/app/src/main/cpp/skyline/gpu/buffer.h
@ -8,6 +8,7 @@
 #include <common/lockable_shared_ptr.h>
 #include <nce.h>
 #include <gpu/tag_allocator.h>
+#include "megabuffer.h"
 #include "memory_manager.h"

 namespace skyline::gpu {
@ -60,7 +61,7 @@ namespace skyline::gpu {

            // These are not accounted for in hash nor operator== since they are not an inherent property of the view, but they are required nonetheless for megabuffering on a per-view basis
            mutable u64 lastAcquiredSequence{}; //!< The last sequence number for the attached buffer that the megabuffer copy of this view was acquired from, if this is equal to the current sequence of the attached buffer then the copy at `megabufferOffset` is still valid
-            mutable vk::DeviceSize megabufferOffset{}; //!< Offset of the current copy of the view in the megabuffer (if any), 0 if no copy exists and this is only valid if `lastAcquiredSequence` is equal to the current sequence of the attached buffer
+            mutable MegaBufferAllocator::Allocation megaBufferAllocation; //!< Allocation for the current copy of the view in the megabuffer (if any), 0 if no copy exists and this is only valid if `lastAcquiredSequence` is equal to the current sequence of the attached buffer

            BufferViewStorage(vk::DeviceSize offset, vk::DeviceSize size, vk::Format format);

@ -421,10 +422,10 @@ namespace skyline::gpu {

        /**
         * @brief If megabuffering is beneficial for the current buffer, pushes its contents into the megabuffer and returns the offset of the pushed data
-         * @return The offset of the pushed buffer contents in the megabuffer, or 0 if megabuffering is not to be used
+         * @return The megabuffer allocation for the buffer, may be invalid if megabuffering is not beneficial
         * @note The view **must** be locked prior to calling this
         */
-        vk::DeviceSize AcquireMegaBuffer(MegaBuffer &megaBuffer) const;
+        MegaBufferAllocator::Allocation AcquireMegaBuffer(const std::shared_ptr<FenceCycle> &pCycle, MegaBufferAllocator &allocator) const;

        /**
         * @return A span of the backing buffer contents