Implement overhead-free sequenced buffer updates with megabuffers

Previously constant buffer updates would be handled on the CPU and only the end result would be synced to the GPU before execute. This caused issues as if the constant buffer contents was changed between each draw in a renderpass (e.g. text rendering) the draws themselves would only see the final resulting constant buffer. We had earlier tried to fix this by using vkCmdUpdateBuffer however this caused significant performance loss due to an oversight in Adreno drivers. We could have worked around this simply by using vkCmdCopy buffer however there would still be a performance loss due to renderpasses being split up with copies inbetween. To avoid this we introduce 'megabuffers', a brand new technique not done before in any other switch emulators. Rather than replaying the copies in sequence on the GPU, we take advantage of the fact that buffers are generally small in order to replay buffers on the GPU instead. Each write and subsequent usage of a buffer will cause a copy of the buffer with that write, and all prior applied to be pushed into the megabuffer, this way at the start of execute the megabuffer will hold all used states of the buffer simultaneously. Draws then reference these individual states in sequence to allow everything to work without any copies. In order to support this buffers have been moved to an immediate sync model, with synchronisation being done at usage-time rather than execute (in order to keep contents properly sequenced) and GPU-side writes now need to be explictly marked (since they prevent megabuffering). It should also be noted that a fallback path using cmdCopyBuffer exists for the cases where buffers are too large or GPU dirty.
2025-07-17 08:46:39 +00:00 · 2022-04-23 18:10:39 +01:00
parent 0d9992cb8e
commit de796cd2cd
7 changed files with 363 additions and 59 deletions
--- a/app/src/main/cpp/skyline/gpu/buffer_manager.cpp
+++ b/app/src/main/cpp/skyline/gpu/buffer_manager.cpp
@ -6,7 +6,39 @@
 #include "buffer_manager.h"

 namespace skyline::gpu {
-    BufferManager::BufferManager(GPU &gpu) : gpu(gpu) {}
+    MegaBuffer::MegaBuffer(GPU &gpu) : backing(gpu.memory.AllocateBuffer(Size)), freeRegion(backing.subspan(PAGE_SIZE)) {}
+
+    void MegaBuffer::Reset() {
+        std::scoped_lock lock{mutex};
+        freeRegion = backing.subspan(PAGE_SIZE);
+    }
+
+    vk::Buffer MegaBuffer::GetBacking() const {
+        return backing.vkBuffer;
+    }
+
+    vk::DeviceSize MegaBuffer::Push(span<u8> data, bool pageAlign) {
+        std::scoped_lock lock{mutex};
+
+        if (data.size() > freeRegion.size())
+            throw exception("Ran out of megabuffer space! Alloc size: 0x{:X}", data.size());
+
+        if (pageAlign) {
+            // If page aligned data was requested then align the free
+            auto alignedFreeBase{util::AlignUp(static_cast<size_t>(freeRegion.data() - backing.data()), PAGE_SIZE)};
+            freeRegion = backing.subspan(alignedFreeBase);
+        }
+
+        // Allocate space for data from the free region
+        auto resultSpan{freeRegion.subspan(0, data.size())};
+        resultSpan.copy_from(data);
+
+        // Move the free region along
+        freeRegion = freeRegion.subspan(data.size());
+        return static_cast<vk::DeviceSize>(resultSpan.data() - backing.data());
+    }
+
+    BufferManager::BufferManager(GPU &gpu) : gpu(gpu), megaBuffer(gpu) {}

    bool BufferManager::BufferLessThan(const std::shared_ptr<Buffer> &it, u8 *pointer) {
        return it->guest->begin().base() < pointer;
@ -49,14 +81,10 @@ namespace skyline::gpu {
                highestAddress = mapping.end().base();
        }

-        auto newBuffer{std::make_shared<Buffer>(gpu, span<u8>(lowestAddress, highestAddress))};
+        auto newBuffer{std::make_shared<Buffer>(gpu, cycle, span<u8>(lowestAddress, highestAddress), overlaps)};
        for (auto &overlap : overlaps) {
            std::scoped_lock overlapLock{*overlap};

-            if (!overlap->cycle.owner_before(cycle))
-                overlap->WaitOnFence(); // We want to only wait on the fence cycle if it's not the current fence cycle
-            overlap->SynchronizeGuest(true, true); // Sync back the buffer before we destroy it
-
            buffers.erase(std::find(buffers.begin(), buffers.end(), overlap));

            // Transfer all views from the overlapping buffer to the new buffer with the new buffer and updated offset