Commit Graph

720 Commits

Author SHA1 Message Date
372ab8befa Allow external synchronization for buffers
In certain situations such as constant buffer updates, we desire to use the guest buffer as a shadow buffer forwarding all writes directly to it while we update the host using inline buffer updates so they happen in-sequence. This requires special behavior as we cannot let any synchronization operations take place as they would break the shadow buffer, as a result, an external synchronization flag has been added to prevent this from happening. 

It should be noted that this flag is not respected for buffer recreation which will lead to UB, this can and will break updates in certain cases and this change isn't complete without buffer manager support.
2022-04-16 18:44:53 +05:30
c0c4db68a8 Fix BufferView offset not being added in vkCmdUpdateBuffer
The offset of the view wasn't added to the `vkCmdUpdateBuffer`, this would cause the offset to be incorrect given the buffer was a view of a larger buffer that wasn't the start of it. This commit fixes that by adding the offset of the view to the buffer update.
2022-04-14 18:06:15 +05:30
a1c06e0401 Mark GPU resources as dirty before GPU usage
We didn't call `MarkGpuDirty` on textures/buffers prior to GPU usage, this would cause them to not be R/W protected when they should be and provide outdated copies if there were any read accesses from the CPU (which are not possible at the moment since we assume all accesses are writes at the moment). This has now been fixed by calling it after synchronizing the resource.
2022-04-14 17:20:05 +05:30
41a6afed01 Fix GraphicsContext code formatting for auto formatter 2022-04-14 15:27:22 +05:30
624df92616 Change AddNonGraphicsPass to AddOutsideRpCommand
The terminology "Non-Graphics pass" was deemed to be fairly inaccurate since it simply covered all Vulkan commands (not "passes") outside the render-pass scope, these may be graphical operations such as blits and therefore it is more accurate to use the new terminology of "Outside-RenderPass command" due to the lack of such an implication while being consistent with the Vulkan specification.
2022-04-14 15:20:22 +05:30
d79635772f Implement support for GPU-side constant buffer updating
Previously constant buffer updates would be handled on the CPU and only the end result would be synced to the GPU before execute. This caused issues as if the constant buffer contents was changed between each draw in a renderpass (e.g. text rendering) the draws themselves would only see the final resulting constant buffer. Fix this by updating cbufs on the GPU/CPU seperately, only ever syncing them back at the start or after a guest side CPU write, at the moment only a single word is updated at a time however this can be optimised in the future to batch all consecutive updates into one large one.
2022-04-14 14:14:52 +05:30
036faedabd Implement a way to run non-graphics passes with command executor
These commands will end the current renderpass and run on their own, this is useful for compute, blits etc.
2022-04-14 14:14:52 +05:30
feb179fcff Implement primitive restart support
Maxwell3D also supports using an arbitrary restart index value however no games are known to use this so leave it for now.
2022-04-14 14:14:52 +05:30
3f3acc31d8 Rework swizzle infrastructure to support arbritary format swizzles
This is required to support R4G4B4A4 which has no directly corresponding Vulkan format.

Co-authored-by: Lunar-Pixel <lunarn452@gmail.com>
2022-04-14 14:14:52 +05:30
6f85a66151 Implement host-only Buffers
We require certain buffers to only be on the host while being accessible through the same abstractions as a guest buffer as they must be interchangeable in usage.
2022-04-14 14:14:52 +05:30
2c697ec36a Determine depth/stencil texture aspect based off of image swizzle
Required since we can't have a non-rt image with both a depth/stencil aspect at the same time according to vk spec.
2022-04-14 14:14:52 +05:30
8eaca87de8 Use an empty host texture in place of invalid TIC entries on guest
Some games may pass empty TICs as inputs to shaders while not actually using them within the shader. Create an empty texture and pass this in instead when we hit this case, the nullDescriptor feature could be used but it's not supported by all devices so we chose to do it this way instead.
2022-04-14 14:14:52 +05:30
815f1f4067 Add support for sRGB TIC textures
Without this sRGB textures would be interpreted as RGB leading to colours being slighly off. The sRGB flag isn't stored as part of format word so we reuse the _pad_ field of it to store the flag for the switch case.
2022-04-14 14:14:52 +05:30
1ba4abf950 Add Astc{6x6,8x8} and R4G4B4A4 image formats 2022-04-14 14:14:52 +05:30
62ba180550 Use R5G6B5 as Vulkan swapchain format rather than B5G6R5
B5G6R5 isn't generally supported by the swapchain and the format is used for R5G6B5 with swapped R/B channels to avoid aliasing so we reverse that by using R5G6B5 as the underlying Vulkan format for the swapchain which should be automatically handled by the driver for any copies from B5G6R5 textures and the data representation should be the same as B5G6R5 with swapped R/B channels so not reporting the correct texture::Format should be fine.
2022-04-14 14:14:52 +05:30
3e4e8de1d2 Implement primitive Linear->Block Linear DMA engine copies
Slightly inaccurate and misses some features but good enough for most games, should be revisted later.
2022-04-14 14:14:52 +05:30
486a835d0a Use guest texture view type to determine the underlying image type
If we have a Nx1x1 image then determining the type from dimensions will result in a 1D image being created thus preventing us from creating a 2D view. By using the image view type we can avoid this for textures from TICs since we know in advance how they will be used
2022-04-14 14:14:52 +05:30
b1f10865a0 Attach depth RT to command executor before draws
This enforces that the depth RT outlives the draw, without this the depth RT could be freed while in active use by command executor leading to UAFs and crashes.
2022-04-14 14:14:52 +05:30
2e197cead5 Support D32S8_Float_Uint_Unorm_Unorm depth/stencil format 2022-04-14 14:14:52 +05:30
e5e20f39c9 Implement a simple constant buffer cache
In some games such as SMO thousands of constant buffers are bound per frame which was causing an unreasonable number of lookups in both vmm and the buffer manager. Work around this by introducing a simple hashmap based cache, eviction is currently unsupported but not really necessary yet due to the small size of the buffers in the cache.
2022-04-14 14:14:52 +05:30
01c027b9f6 Fix GetBlockLinearLayerSize to avoid incorrectly calculating a zero size 2022-04-14 14:14:52 +05:30
d033ff2478 Fix draws when no colour RTs and only depth is bound 2022-04-14 14:14:52 +05:30
d137051833 Add basic support for 3d/cubemap textures
These are mostly used in 3D games like SMO, support is still quite basic and synchronising block linear 3D texture will crash in most cases due to them being unimplemented.
2022-04-14 14:14:52 +05:30
bcc00216b7 Fix incorrect Bc2/3 block sizes 2022-04-14 14:14:52 +05:30
e294fa8c91 Add subpass limit quirk to fix Adreno driver bug
Older Adreno proprietary drivers (5xx and below) will segfault while destroying the renderpass and associated objects if more than 64 subpasses are within a renderpass due to internal driver implementation details. This commit introduces checks to automatically break up a renderpass when that limit is hit.
2022-04-14 14:14:52 +05:30
65d5a3bce5 Align all Buffers to page boundary
We have support for overlapping buffers which allows us to merge a lot of smaller buffers located on a single page into a single larger buffer which allows for better performance. It additionally ensures that all host buffers match the alignment guarantees of the guest and adequately fulfill host alignment requirements.
2022-04-14 14:14:52 +05:30
cb1ec9a7f4 Rework BufferManager, Buffer and BufferView
This commit encapsulates a complex sequence of cascading changes in the process of supporting overlaps for buffers:
* We determined that it is impossible to resolve overlaps with multiple intervals per buffer within the constraints of each overlap being a contiguous view, support for multiple intervals was therefore dropped. The older buffer manager code was entirely reworked to be simpler due to only handling one interval per buffer with code now being based off `IntervalMap` but tailored specifically for buffers.
* During overlap resolution, the problem of how existing views into the buffer being recreated would be updated, it had to be replaced with a larger buffer that could contain all overlaps and all existing views would need to be repointed to it. This was addressed by a buffer owning all views to itself, we could automatically recalculate the offset of all views and update the buffers with it.
* We still needed to update usage of existing views which was done by handling all access (such as inside a recorded draw) to buffer view properties via `BufferView::RegisterUsage` which dispatches a callback with the view and the corresponding backing buffer. This callback can be stored and called during overlap resolution with the new buffer.
* We had issues with lifetime of the buffer with the handle-like semantics of `BufferView` introduced in the last buffer-related commit, if we updated the view to be owned by a new buffer we'd need to extend the lifetime of the new buffer not the older one and the only way to do this was a proxy owner object `BufferDelegate` which holds a shared pointer to the real `Buffer` which in-turn holds a pointer to all `BufferDelegate` objects to update on repointing. A `BufferView` is effectively just a wrapper around `std::shared_ptr<BufferDelegate>` with more favorable semantics but generally just forwarding calls.
It should be additionally noted that to support usage of `RegisterUsage` the code around buffers in `GraphicsContext` was refactored to defer truly binding till the recording phase.
2022-04-14 14:14:52 +05:30
a6781b38f4 Clear syncBuffers after CommandExecutor execution
Due to an oversight, we weren't clearing the list of buffers that needed to be synced after every execution which led to them building up. Due to the relatively cheap synchronization of buffers and only doing so on faults this wasn't caught until now, it does depress the framerate significantly over time due to the size of the list growing to be in the range of 100k buffer views depending on the title.
2022-04-14 14:14:52 +05:30
594f061b21 Implement SSBOs
Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-04-14 14:14:52 +05:30
5c387f5c5a Fixup depth mode init value to allow ignoring redundant calls 2022-04-14 14:14:52 +05:30
7a5c771f44 Rework GPU BufferView to have handle-like semantics
We wanted views to extend the lifetime of the underlying buffers and at the same time preserve all views until the destruction of the buffer to prevent recreation which might be costly in the future when we need `VkBufferView`s of the buffer but also require a centralized list of all views for recreation of the buffer. It also removes the inconsistency between `BufferView*` being returned in `GetXView` in `GraphicsContext`.
2022-04-14 14:14:52 +05:30
fae5332f20 Disable descriptor aliasing on Adreno to workaround shader compiler bug
Alised descriptor sets are incorrectly interpreted by the shader compiler causing it to bugger up LLVM function argument types and crash

Co-authored-by: PixelyIon <pixelyion@protonmail.com>
2022-04-14 14:14:52 +05:30
fc2c123ae2 Implement GPU depthMode register
This controls the depth range used by the shader, hades already has support for the necessary patching so we only need to pass the current mode over to it and it'll do the necessary work.
2022-04-14 14:14:52 +05:30
7e088ca465 Fix constbuf updates to actually increment the write offset
Uses the register directly now as when we modify it we want the changes to be visible from macros too.
2022-04-14 14:14:52 +05:30
d2f3479610 Use eB5G6R5UnormPack16 VkFormat for B5G6R5Unorm and R5G6B5Unorm
Using `eB5G6R5UnormPack16` (with a swizzle for `R5G6B5Unorm`) removes the need for `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` when those formats are aliased which happens in Sonic Mania among other titles.
2022-04-14 14:14:52 +05:30
24d7066d8b Add quirk to avoid VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT on Adreno GPUs
Adreno GPUs have significant performance penalties from usage of `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` which require disabling UBWC and on Turnip, forces linear tiling. As a result, it's been made an optional quirk which doesn't supply the flag in `VkImageCreateInfo` and logs a warning if a view with a different Vulkan format from the original image is created.
2022-04-14 14:14:52 +05:30
731d06010d Set eMutableFormat in Texture Image Creation
We often need to alias the underlying data as multiple Vulkan formats which requires the `eMutableFormat` bit to be set in `VkImageCreateInfo`, without doing this there'll be validation layer errors and potentially GPU bugs.
2022-04-14 14:14:52 +05:30
dafcfa68ca Transition texture layout to eGeneral after creation
As we no longer set the layout to general inside the Texture constructor, yet, we need it to be set prior to the image being used as an attachment. We need to transition the layout to `eGeneral` after creation of the texture object.
2022-04-14 14:14:52 +05:30
730bf504f8 Correct Adreno texture binding quirk
We incorrectly determined an Adreno driver bug to require padding between binding slots but the real issue was not supporting consecutive binding writes for `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` and was fixed by the padding slot unintentionally requiring individual writes. The quirk has now been corrected to explicitly specify this as the bug and the solution is more apt.
2022-04-14 14:14:52 +05:30
77e2797219 Delete expired weak_ptrs for Texture/Buffer views
A large amount of Texture/Buffer views would expire before reuse could occur in `Texture::GetView`/`Buffer::GetView`. These can lead to a substantial memory allocation given enough time and they are now deleted during the lookup while iterating on all entries.

It should be noted that there are a lot of duplicate views that don't live long enough to be reused and the ultimate solution here is to make those views live long enough to be reused.
2022-04-14 14:14:52 +05:30
881bb969c4 Implement access-driven Buffer synchronization
Similar to constant redundant synchronization for textures, there is a lot of redundant synchronization of buffers. Albeit, buffer synchronization is far cheaper than texture synchronization it still has associated costs which have now been reduced by only synchronizing on access.
2022-04-14 14:14:52 +05:30
7532eaf050 Attach Texture to Cycle in Texture::TransitionLayout
Not doing so could result in the texture being destroyed before the completion of a transition and lead to undefined behavior.
2022-04-14 14:14:52 +05:30
3268b3779a Implement access-driven Texture synchronization
There was a lot of redundant synchronization of textures to and from host constantly as we were not aware of guest memory access, this has now been averted by tracking any memory accesses to the texture memory using the NCE Memory Trapping API and synchronizing only when required.
2022-04-14 14:14:52 +05:30
5c9e42e384 Use mirror mappings for Textures and Buffers
This is a prerequisite to memory trapping as we need to write to the mirror to avoid a race condition with external threads writing to a texture/buffer while we do so ourselves for the sync on a read/write, it also avoids an additional `mprotect` to `-WX`/`RWX` on a read access.

An additional advantage for textures especially is that we now support split-mapping textures due to laying them out in a contiguous mirror and they will not require costly algorithmic changes. Buffers should also benefit from not needing to iterate over every region when they are split into multiple mappings.
2022-04-14 14:14:52 +05:30
43879e2476 Round up when calculating size of compressed texture in bytes 2022-04-14 14:14:52 +05:30
d889550e84 Don't set COLOR_ATTACHMENT_BIT for compressed formats.
The better solution would be to only set this for formats that support
it on original HW but this will get rid of the validation errors for now.
2022-04-14 14:14:52 +05:30
82296ac5b8 Use buffer size instead of allocation size for Buffer constructor
Fixes a validation error.
2022-04-14 14:14:52 +05:30
752245c3c8 Enable provoking vertex feature 2022-04-14 14:14:52 +05:30
dd45d054e7 Enable shaderDrawParameters 2022-04-14 14:14:52 +05:30
011de98940 Rework formats to support passing through guest swizzle values
Almost every Maxwell format now directly corresponds to a Vulkan format. This allows formats to be passed through and the swizzle used directly from guest (with some extra swizzle handling for edge cases) thus saving the need to explicitly support each swizzle combination which is adds a lot of code bloat. The format header is additionally reordered with line breaks to separate formats by their bits-per-block.
2022-04-14 14:14:52 +05:30