Introduce chunked MegaBuffer allocation

After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage. This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2025-07-17 08:46:39 +00:00 · 2022-08-07 02:59:33 +05:30
parent 99b5fc35c6
commit 5b7572a8b3
12 changed files with 218 additions and 183 deletions
--- a/app/src/main/cpp/skyline/gpu/buffer_manager.h
+++ b/app/src/main/cpp/skyline/gpu/buffer_manager.h
@ -7,8 +7,6 @@
 #include "buffer.h"

 namespace skyline::gpu {
-    class MegaBuffer;
-
    /**
     * @brief The Buffer Manager is responsible for maintaining a global view of buffers being mapped from the guest to the host, any lookups and creation of host buffer from equivalent guest buffer alongside reconciliation of any overlaps with existing textures
     */
@ -23,22 +21,6 @@ namespace skyline::gpu {
        static constexpr size_t L2EntryGranularity{19}; //!< The amount of AS (in bytes) a single L2 PTE covers (512 KiB == 1 << 19)
        SegmentTable<Buffer*, AddressSpaceSize, PageSizeBits, L2EntryGranularity> bufferTable; //!< A page table of all buffer mappings for O(1) lookups on full matches

-        std::mutex megaBufferMutex; //!< Synchronizes access to the allocated megabuffers
-
-        friend class MegaBuffer;
-
-        /**
-         * @brief A wrapper around a buffer which can be utilized as backing storage for a megabuffer and can track its state to avoid concurrent usage
-         */
-        struct MegaBufferSlot {
-            std::atomic_flag active{true}; //!< If the megabuffer is currently being utilized, we want to construct a buffer as active
-            std::shared_ptr<FenceCycle> cycle; //!< The latest cycle on the fence, all waits must be performed through this
-
-            memory::Buffer backing; //!< The GPU buffer as the backing storage for the megabuffer
-
-            MegaBufferSlot(GPU &gpu);
-        };
-
        /**
         * @brief A wrapper around a Buffer which locks it with the specified ContextTag
         */
@ -86,8 +68,6 @@ namespace skyline::gpu {
        static bool BufferLessThan(const std::shared_ptr<Buffer> &it, u8 *pointer);

      public:
-        std::list<MegaBufferSlot> megaBuffers; //!< A pool of all allocated megabuffers, these are dynamically utilized
-
        BufferManager(GPU &gpu);

        /**
@ -114,56 +94,5 @@ namespace skyline::gpu {
         * @note The buffer manager **must** be locked prior to calling this
         */
        BufferView FindOrCreate(GuestBuffer guestMapping, ContextTag tag = {}, const std::function<void(std::shared_ptr<Buffer>, ContextLock<Buffer> &&)> &attachBuffer = {});
-
-        /**
-         * @return A dynamically allocated megabuffer which can be used to store buffer modifications allowing them to be replayed in-sequence on the GPU
-         * @note This object **must** be destroyed to be reclaimed by the manager and prevent a memory leak
-         * @note The buffer manager **doesn't** need to be locked prior to calling this
-         */
-        MegaBuffer AcquireMegaBuffer(const std::shared_ptr<FenceCycle> &cycle);
-    };
-
-    /**
-     * @brief A simple linearly allocated GPU-side buffer used to temporarily store buffer modifications allowing them to be replayed in-sequence on the GPU
-     * @note This class is **not** thread-safe and any calls must be externally synchronized
-     */
-    class MegaBuffer {
-      private:
-        BufferManager::MegaBufferSlot *slot;
-        span<u8> freeRegion; //!< The unallocated space in the megabuffer
-
-      public:
-        MegaBuffer(BufferManager::MegaBufferSlot &slot);
-
-        ~MegaBuffer();
-
-        MegaBuffer &operator=(MegaBuffer &&other);
-
-        /**
-         * @return If any allocations into the megabuffer were done at the time of the call
-         */
-        bool WasUsed();
-
-        /**
-         * @brief Replaces the cycle associated with the underlying megabuffer with the supplied cycle
-         * @note The megabuffer must **NOT** have any dependencies that aren't conveyed by the supplied cycle
-         */
-        void ReplaceCycle(const std::shared_ptr<FenceCycle> &cycle);
-
-        /**
-         * @brief Resets the free region of the megabuffer to its initial state, data is left intact but may be overwritten
-         */
-        void Reset();
-
-        /**
-         * @brief Returns the underlying Vulkan buffer for the megabuffer
-         */
-        vk::Buffer GetBacking() const;
-
-        /**
-         * @brief Pushes data to the megabuffer and returns the offset at which it was written
-         * @param pageAlign Whether the pushed data should be page aligned in the megabuffer
-         */
-        vk::DeviceSize Push(span<u8> data, bool pageAlign = false);
    };
 }