strato

scl/strato

mirror of https://github.com/Takiiiiiiii/strato.git synced 2025-07-17 08:46:39 +00:00

Author	SHA1	Message	Date
PixelyIon	cb7c3602e7	Attach `TextureView` to `FenceCycle` The lifetime of `TextureView` objects wasn't correctly managed as they weren't being attached the the `FenceCycle` in `AttachTexture`, this led to them getting deleted and causing all sorts of UB.	2022-08-06 22:20:54 +05:30
PixelyIon	3ca56ef578	Fix NCE Trapping API Deadlock A deadlock was caused by holding `trapMutex` while waiting on the lock of a resource inside a callback while another thread holding the resource's mutex waits on `trapMutex`. This has been fixed by no longer allowing blocking locks inside the callbacks and introducing a separate callback for locking the resource which is done after unlocking the `trapMutex` which can then be locked by any contending threads.	2022-08-06 22:18:42 +05:30
PixelyIon	b0910e7b1a	Avoid locking `Texture`/`Buffer` in trap handler We generally don't need to lock the `Texture`/`Buffer` in the trap handler, this is particularly problematic now as we hold the lock for the duration of a submission of any workloads. This leads to a large amount of contention for the lock and stalling in the signal handler when the resource may be `Clean` and can simply be switched over to `CpuDirty` without locking and utilizing atomics which is what this commit addresses.	2022-08-06 22:18:42 +05:30
PixelyIon	217d484cba	Abstract `TextureView`/`BufferDelegate` locking into `LockableSharedPtr` An atomic transactional loop was performed on the backing `std::shared_ptr` inside `BufferView`/`TextureView`'s `lock`/`LockWithTag`/`try_lock` functions, these locks utilized `std::atomic_load` for atomically loading the value from the `shared_ptr` recursively till it was the same value pre/post-locking. This commit abstracts the locking functionality of `TextureView`/`BufferDelegate` into `LockableSharedPtr` to avoid code duplication and removes the usage of `std::atomic_load` in either case as it is not necessary due to the implicit memory barrier provided by locking a mutex.	2022-08-06 22:18:42 +05:30
PixelyIon	1239907ce8	Rework `Texture` & `Buffer` for `Context` and `FenceCycle` Chaining GPU resources have been designed with locking by fences in mind, fences were treated as implicit locks on a GPU, design paradigms such as `GraphicsContext` simply unlocking the texture mutex after attaching it which would set the fence cycle were considered fine prior but are unoptimal as it enforces that a `FenceCycle` effectively ensures exclusivity. This conflates the function of a mutex which is mutual exclusion and that of the fence which is to track GPU-side completion and led to tying if it was acceptable to use a GPU resource to GPU completion rather than simply if it was not currently being used by the CPU which is the function of the mutex. This rework fixes this with the groundwork that has been laid with previous commits, as `Context` semantics are utilized to move back to using mutexes for locking of resources and tracking the usage on the GPU in a cleaner way rather than arbitrary fence comparisons. This also leads to cleaning up a lot of methods that involved usage of fences that no longer require it and therefore can be entirely removed, further cleaning up the codebase. It also opens the door for future improvements such as the removal of `hostImmutableCycle` and replacing them with better solutions, the implementation of which is broken at the moment regardless. While moving to `Context`-based locking the question of multiple GPU workloads being in-flight while using overlapping resources came up which brought a fundamental limitation of `FenceCycle` to light which was that only one resource could be concurrently attached to a cycle and it could not adequately represent multi-cycle dependencies. `FenceCycle` chaining was designed to fix this inadequacy and allows for several different GPU workloads to be in-flight concurrently while utilizing the same resources as long as they can ensure GPU-GPU synchronization.	2022-08-06 22:18:42 +05:30
PixelyIon	6b9269b88e	Introduce `Context` semantics to GPU resource locking Resources on the GPU can be fairly convoluted and involve overlaps which can lead to the same GPU resources being utilized with different views, we previously utilized fences to lock resources to prevent concurrent access but this was overly harsh as it would block usage of resources till GPU completion of the commands associated with a resource. Fences have now been replaced with locks but locks run into the issue of being per-view and therefore to add a common object for tracking usage the concept of "tags" was introduced to track a single context so locks can be skipped if they're from the same context. This is important to prevent a deadlock when locking a resource which has been already locked from the current context with a different view.	2022-08-06 22:18:42 +05:30
MCredstoner2004	2e356b8f0b	Use spans instead of ptr and size in kernel memory	2022-06-11 17:05:39 +05:30
Billy Laws	581a016991	Add GuestTexture::GetSize helper function This code was getting duplicated a bit so commonise into a helper function.	2022-06-03 19:30:54 +01:00
PixelyIon	2712b3276b	Fix incorrect `VkBufferImageCopy` offset calculations The `VkBufferImageCopy` offset calculations were wrong inside `CopyIntoStagingBuffer` as it multiplied the mip level's linear size by `levelCount` rather than `layerCount`. This led to substantial UB in games which called this function as it led to an overflow and resulted in writing to other areas of the buffer which caused major issues such as vertex/index buffer corruption and corresponding graphical glitches alongside likely being the cause of some crashes.	2022-06-02 22:14:22 +05:30
Billy Laws	c745e0e02b	Move image type logic to GuestTexture, allowing 2D array views for 3D RTs We can't render to a 3D texture through a 3D view, we instead have to create a 2D array view into it and render to that. The texture manager previously didn't support having a different view type/layer count between a guest texture view and the underlying storage texture that is required to support this so that was also implemented by reading the view layer count from the dimensions depth instead if the underlying texture is 3D (and the view type is 2D array). Additionally move away from our own view type enum to Vulkan, inline with other guest texture member types.	2022-05-31 22:09:53 +01:00
PixelyIon	80c8fb8791	Implement CPU BCn Texture Decoding Certain GPU vendors such as ARM's Mali do not have support for BCn textures whatsoever while other vendors such as AMD only have partial support (BC1-BC3). Most titles on the guest utilize BC textures and to address this on host GPUs without support for BCn, we need to decompress the texture on the CPU. This commit implements a CPU BCn texture decoder based off Swiftshader's BC decoder, it also adds the necessary infrastructure to have different formats for the `GuestTexture` and `Texture` objects.	2022-05-28 21:22:24 +05:30
PixelyIon	7d4e0a7844	Implement Mipmapped Texture Support Support for mipmapped textures was not implemented which is fairly crucial to proper rendering of games as the only level that would load is the first level (highest resolution), that might result in a lot more memory bandwidth being utilized. Mipmapping also has associated benefits regarding aliasing as it has a minor anti-aliasing effect on distant textures. This commit entirely implements mipmapping support but it does not extend to full support for views into specific mipmap levels due to the texture manager implemention being incomplete.	2022-05-28 21:22:24 +05:30
PixelyIon	de300bfdbe	Refactor Texture Swizzling The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases. The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.	2022-05-19 17:13:55 +05:30
PixelyIon	168e51e7ad	Always use `GetLayerStride` for layer stride in Texture The `GuestTexture::GetLayerStride` function was not always being utilized to retrieve the layer stride inside `Texture`, it would instead directly access the `guestTexture::layerStride` member. This is problematic as it may not be initialized and return `0` which would lead to a broken image copy.	2022-05-13 14:21:37 +05:30
PixelyIon	f2cc25ee9f	Implement Array Texture Swizzling Textures can have more than one layer which we currently don't handle, all layers past the initial one will be filled with random data or 0s, leading to incorrect rendering. This has now been implemented now which fixes any titles which utilize array textures, such as "Super Mario Odyssey" or "Hatsune Miku: Project DIVA MegaMix".	2022-05-12 18:23:45 +05:30
Billy Laws	1609fd2a32	Account for layerCount in SynchronizeGuestWithBuffer staging buffer size	2022-05-10 18:33:31 +01:00
Billy Laws	e1c13bbc08	Update hades	2022-05-08 19:37:10 +01:00
PixelyIon	62ea2a6da5	Avoid format aliasing warnings on Adreno Implements an algorithm to determine formats that can be aliased as views without needing `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT`, this avoids spamming warning logs on view creation when the aliased formats will function in practice.	2022-05-05 19:15:37 +05:30
PixelyIon	42573170c6	Implement Framebuffer Cache Implements a cache for storing `VkFramebuffer` objects with a special path on devices with `VK_KHR_imageless_framebuffer` to allow for more cache hits due to an abstract image rather than a specific one. Caching framebuffers is a fairly crucial optimization due to the cost of creating framebuffers on TBDRs since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding framebuffer memory.	2022-05-01 18:27:27 +05:30
PixelyIon	af7f0c301e	Avoid redundant `VkImageView` recreation There are a lot of cases of `VkImageView` being recreated arbitrarily due to it being tied to the ephemeral object `TextureView` rather than `Texture`, this commit flips that by storing all `VkImageView`s inside `Texture` with `TextureView` simply holding a copy of the handle to them. Additionally, this change results in stable `VkImageView` handles and helps in paving the path for framebuffer caching when `VK_KHR_imageless_framebuffer` is unavailable.	2022-05-01 18:27:27 +05:30
Billy Laws	1dd230afde	Refactor all std::lock_guard usages to std::scoped_lock	2022-04-25 15:00:30 +01:00
PixelyIon	8ccef733ff	Fix UB with guest-less Texture/Buffers in `MarkGpuDirty` As there was no check for the lack of a `GuestTexture`/`GuestBuffer`, it would lead to UB when a texture/buffer that had no guest such as the `zeroTexture` from `GraphicsContext` would be marked as dirty they would cause a call to `NCE::RetrapRegions` with a `nullptr` handle that would be dereferenced and cause a segmentation fault.	2022-04-16 18:45:56 +05:30
Billy Laws	8eaca87de8	Use an empty host texture in place of invalid TIC entries on guest Some games may pass empty TICs as inputs to shaders while not actually using them within the shader. Create an empty texture and pass this in instead when we hit this case, the nullDescriptor feature could be used but it's not supported by all devices so we chose to do it this way instead.	2022-04-14 14:14:52 +05:30
Billy Laws	486a835d0a	Use guest texture view type to determine the underlying image type If we have a Nx1x1 image then determining the type from dimensions will result in a 1D image being created thus preventing us from creating a 2D view. By using the image view type we can avoid this for textures from TICs since we know in advance how they will be used	2022-04-14 14:14:52 +05:30
Billy Laws	d137051833	Add basic support for 3d/cubemap textures These are mostly used in 3D games like SMO, support is still quite basic and synchronising block linear 3D texture will crash in most cases due to them being unimplemented.	2022-04-14 14:14:52 +05:30
PixelyIon	e294fa8c91	Add subpass limit quirk to fix Adreno driver bug Older Adreno proprietary drivers (5xx and below) will segfault while destroying the renderpass and associated objects if more than 64 subpasses are within a renderpass due to internal driver implementation details. This commit introduces checks to automatically break up a renderpass when that limit is hit.	2022-04-14 14:14:52 +05:30
PixelyIon	24d7066d8b	Add quirk to avoid `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` on Adreno GPUs Adreno GPUs have significant performance penalties from usage of `VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT` which require disabling UBWC and on Turnip, forces linear tiling. As a result, it's been made an optional quirk which doesn't supply the flag in `VkImageCreateInfo` and logs a warning if a view with a different Vulkan format from the original image is created.	2022-04-14 14:14:52 +05:30
PixelyIon	731d06010d	Set `eMutableFormat` in Texture Image Creation We often need to alias the underlying data as multiple Vulkan formats which requires the `eMutableFormat` bit to be set in `VkImageCreateInfo`, without doing this there'll be validation layer errors and potentially GPU bugs.	2022-04-14 14:14:52 +05:30
PixelyIon	dafcfa68ca	Transition texture layout to `eGeneral` after creation As we no longer set the layout to general inside the Texture constructor, yet, we need it to be set prior to the image being used as an attachment. We need to transition the layout to `eGeneral` after creation of the texture object.	2022-04-14 14:14:52 +05:30
PixelyIon	77e2797219	Delete expired `weak_ptr`s for Texture/Buffer views A large amount of Texture/Buffer views would expire before reuse could occur in `Texture::GetView`/`Buffer::GetView`. These can lead to a substantial memory allocation given enough time and they are now deleted during the lookup while iterating on all entries. It should be noted that there are a lot of duplicate views that don't live long enough to be reused and the ultimate solution here is to make those views live long enough to be reused.	2022-04-14 14:14:52 +05:30
PixelyIon	7532eaf050	Attach Texture to Cycle in `Texture::TransitionLayout` Not doing so could result in the texture being destroyed before the completion of a transition and lead to undefined behavior.	2022-04-14 14:14:52 +05:30
PixelyIon	3268b3779a	Implement access-driven Texture synchronization There was a lot of redundant synchronization of textures to and from host constantly as we were not aware of guest memory access, this has now been averted by tracking any memory accesses to the texture memory using the NCE Memory Trapping API and synchronizing only when required.	2022-04-14 14:14:52 +05:30
PixelyIon	5c9e42e384	Use mirror mappings for Textures and Buffers This is a prerequisite to memory trapping as we need to write to the mirror to avoid a race condition with external threads writing to a texture/buffer while we do so ourselves for the sync on a read/write, it also avoids an additional `mprotect` to `-WX`/`RWX` on a read access. An additional advantage for textures especially is that we now support split-mapping textures due to laying them out in a contiguous mirror and they will not require costly algorithmic changes. Buffers should also benefit from not needing to iterate over every region when they are split into multiple mappings.	2022-04-14 14:14:52 +05:30
Robin Kertels	d889550e84	Don't set COLOR_ATTACHMENT_BIT for compressed formats. The better solution would be to only set this for formats that support it on original HW but this will get rid of the validation errors for now.	2022-04-14 14:14:52 +05:30
PixelyIon	723189a948	Calculate Blocklinear Texture Aligned Size Correctly The size of blocklinear textures did not consider alignment to Block/ROB boundaries before, it is aligned to them now. Incorrect sizes led to textures not being aliased correctly due to different size calculations for GraphicBufferProducer surfaces and Maxwell3D color RTs.	2022-04-14 14:14:52 +05:30
PixelyIon	9f5c3d8ecd	Force Textures to be Optimal on Host GPU We don't respect the host subresource layout in synchronizing linear textures from the guest to the host when mapped to memory directly, this leads to texture corruption and while the real fix would involve respecting the host subresource layout, this has been deferred for later as real world performance advantages/disadvantages associated with this change can be observed more carefully to determine if it's worth it.	2022-04-14 14:14:52 +05:30
PixelyIon	98b366c1f5	Fix Texture Synchronization Bug Fixes texture corruption due to incorrect synchronization, the barrier would not enforce waiting till the texture was entirely rendered causing an incomplete texture to be downloaded which lead to rendering bugs for certain GPUs including ARM's Mali GPUs.	2022-04-14 14:14:52 +05:30
PixelyIon	bb14af4f7a	Implement Maxwell3D Sampled Textures The descriptor sets should now contain a combined image and sampler handle for any sampled textures in the guest shader from the supplied offset into the texture constant buffer. Note: Games tend to rely on inline constant buffer updates for writing the texture constant buffer and due to it not being implemented, the value will be read as 0 which is incorrect.	2022-04-14 14:14:52 +05:30
PixelyIon	bd6cd0056c	Support Multi-Aspect Copy in `Texture::CopyIntoStagingBuffer` Only copying a single aspect was supported by `CopyIntoStagingBuffer` earlier due to not supplying a `VkBufferImageCopy` for each aspect separately, this has now been done with Color/Depth/Stencil aspects having their own `VkBufferImageCopy` for the `VkCmdCopyImageToBuffer` command.	2022-04-14 14:14:52 +05:30
PixelyIon	daff17c776	Order `TextureView` Definition Correctly The definition of the `TextureView` class was spread across `texture.cpp` and has now been moved to the top of the file above the other half of the definition.	2022-04-14 14:14:52 +05:30
PixelyIon	45c7a89fc3	Cleanup `BufferView`/`TextureView` Locking Code Renames the variable to be neater and less confusing alongside adding comments for `try_lock()` to make the goal of the function more apparent.	2022-04-14 14:14:52 +05:30
PixelyIon	aa32f6b017	Add Depth/Stencil Format Support to `Texture` Sets `VkImageUsageFlags` correctly rather than hardcoding it for color attachments and adds multiple `VkBufferImageCopy` to `VkCmdCopyBufferToImage` for Color/Depth/Stencil aspects of an image.	2022-04-14 14:14:52 +05:30
PixelyIon	6eda1777c5	Rework `TextureView` to be disconnected from `Texture` We want `TextureView`(s) to be disconnected from the backing on the host and instead represent a specific texture on the guest with a backing that can change depending on mapping of new textures which'd invalidate the backing but should now be automatically repointed to an appropriate new backing. This approach also requires locking of the backing to function as it is mutable till it has been locked or the backing has an attached `FenceCycle` that hasn't been signaled which will be added for `CommandExecutor` in a subsequent commit.	2022-04-14 14:14:52 +05:30
PixelyIon	a55aca76c6	Rename `TextureView::backing` to `TextureView::texture` It was determined that `backing` wasn't a very descriptive name and that it conflicted with the texture's own backing, the name was changed to `texture` to make it more apparent that it was specifically the `Texture` object backing the view.	2022-04-14 14:14:52 +05:30
PixelyIon	79ceb2cf23	Improve Vulkan `Texture` Synchronization The Vulkan Pipeline Barriers were unoptimal and incorrect to some degree prior as we purely synchronized images and not staging buffers. This has now been fixed and improved in general with more relevant synchronization.	2021-11-09 21:08:03 +05:30
PixelyIon	ea2626bcc6	Address CR Comments	2021-10-26 10:46:36 +05:30
PixelyIon	9b9bf8d300	Introduce `ThreadLocal` Class + Fix Several GPU Bugs * Fix `AddClearColorSubpass` bug where it would not generate a `VkCmdNextSubpass` when an attachment clear was utilized * Fix `AddSubpass` bug where the Depth Stencil texture would not be synced * Respect `VkCommandPool` external synchronization requirements by making it thread-local with a custom RAII wrapper * Fix linear RT width calculation as it's provided in terms of bytes rather than format units * Fix `AllocateStagingBuffer` bug where it would not supply `eTransferDst` as a usage flag * Fix `AllocateMappedImage` where `VkMemoryPropertyFlags` were not respected resulting in non-`eHostVisible` memory being utilized * Change feature requirement in `AndroidManifest.xml` to Vulkan 1.1 from OGL 3.1 as this was incorrect	2021-10-16 12:13:30 +01:00
PixelyIon	b762d1df23	Introduce Texture Always Sync + Wait on GPU Execution + More RT Formats Infrastructure for always syncing textures has been introduced now, they will be synced prior to and after every execution. This does considerably reduce the performance alongside waiting on GPU execution to finish but it will be partially recouped once conditional syncing is performed.	2021-10-16 12:13:30 +01:00
PixelyIon	95a08627e5	Subpass Support + More RT Formats + Fix `FenceCycle` Cyclic Dependencies Support for subpasses was added by reworking attachment reuse code to account for preserved attachments and subpass dependencies. A lot of RT formats were also added to allow SMO to boot up entirely, it should be noted that it doesn't render anything. `FenceCycle` had a cyclic dependency which broke clean exit, we now utilize `std::weak_ptr<FenceCycle>` inside the `Texture` object. A minor fix for broken stack traces was also made caused by supplying a `nullptr` C-string to libfmt when a symbol was unresolved which caused an `abort` due to invocation of `strlen` with it.	2021-10-16 12:13:30 +01:00
PixelyIon	bee28aaf0d	Validation Layer Filter + Fix `Texture`, GPU & `PresentationEngine` bugs This commit implements a filter by type for any validation layer output, this allows filtering out any logs which may be unnecessary and additionally triggering a breakpoint as required. An issue concerning the `NDEBUG` flag never being set was fixed, it's now supplied as a release compiler flag. The issue can manifest itself by always relying on a validation layer even though it shouldn't on release, this is why the validation layer was mistakenly disabled entirely previously by using `#ifndef` rather than `#ifdef`. An issue with the initial layout of a texture being supplied as neither `VK_IMAGE_LAYOUT_UNDEFINED` or `VK_IMAGE_LAYOUT_PREINITIALIZED` was fixed, these cases are now handled by transitioning to those layouts after creating the image rather than supplying it within `initialLayout`. Another issue was fixed regarding not maintaining a transformation after a surface has been destroyed and recreated existed and manifested itself when the user would go out of the app and come back in, they would see the surface having an identity transformation rather than the desired one.	2021-10-05 01:13:22 +05:30

1 2

56 Commits