strato

scl/strato

mirror of https://github.com/Takiiiiiiii/strato.git synced 2025-07-17 08:46:39 +00:00

Author	SHA1	Message	Date
PixelyIon	a60d6ec58f	Replace host immutability `FenceCycle` with GPU usage tracking We utilized a `FenceCycle` to keep track of if the buffer was mutable or not and introduced another cycle to track GPU-side requirements only on fulfillment of which could the buffer be utilized on the host but due to the recent change in the behavior this system ended up being unoptimal. This commit replaces the cycle with a boolean tracking if there are any usages of the resource on the GPU within the current context that may prevent it from being mutated on the CPU. The fence of the context is simply attached to the buffer based off this which was allowed as the new behavior of buffer fences matches all the requirements for this.	2022-08-06 22:18:42 +05:30
PixelyIon	217d484cba	Abstract `TextureView`/`BufferDelegate` locking into `LockableSharedPtr` An atomic transactional loop was performed on the backing `std::shared_ptr` inside `BufferView`/`TextureView`'s `lock`/`LockWithTag`/`try_lock` functions, these locks utilized `std::atomic_load` for atomically loading the value from the `shared_ptr` recursively till it was the same value pre/post-locking. This commit abstracts the locking functionality of `TextureView`/`BufferDelegate` into `LockableSharedPtr` to avoid code duplication and removes the usage of `std::atomic_load` in either case as it is not necessary due to the implicit memory barrier provided by locking a mutex.	2022-08-06 22:18:42 +05:30
PixelyIon	2d08886e4e	Utilize `TextureView` rather than `Texture` for presentation `PresentationEngine` and `GraphicBufferProducer` methods that utilized textures for the surface utilized the `Texture` type rather than the `TextureView` type, this was never correct but at the time of authoring this code `TextureView` was not finalized and in a major flux which is why it was not utilized and `Texture` was utilized instead. Now that is is far more stable, it has been replaced with `TextureView`.	2022-08-06 22:18:42 +05:30
PixelyIon	d7399e33c1	Avoid waiting on mutex in `PresentationEngine::Present` We want to block on the host thread during presentation while the host surface isn't present to implicitly pause the game, this can end up being fairly costly as it involves locking the `PresentationEngine` mutex which can lead to a lot of contention with the presentation thread. This fixes the issue by polling if there is a surface and only if there isn't then doing the wait as it isn't mandatory to wait always, we'll eventually run into the guest thread stalling.	2022-08-06 22:18:42 +05:30
PixelyIon	0ac5f4ce27	Lock `TextureManager`/`BufferManager` during submission Multiple threads concurrently accessing the `TextureManager`/`BufferManager` (Referred to as "resource managers") has a potential deadlock with a resource being locked while acquiring the resource manager lock while the thread owning it tries to acquire a lock on the resource resulting in a deadlock. This has been fixed with locking of resource manager now being externally handled which ensures it can be locked prior to locking any resources, `CommandExecutor` provides accessors for retrieving the resource manager which automatically handles locking aside doing so on attachment of resources.	2022-08-06 22:18:42 +05:30
PixelyIon	1239907ce8	Rework `Texture` & `Buffer` for `Context` and `FenceCycle` Chaining GPU resources have been designed with locking by fences in mind, fences were treated as implicit locks on a GPU, design paradigms such as `GraphicsContext` simply unlocking the texture mutex after attaching it which would set the fence cycle were considered fine prior but are unoptimal as it enforces that a `FenceCycle` effectively ensures exclusivity. This conflates the function of a mutex which is mutual exclusion and that of the fence which is to track GPU-side completion and led to tying if it was acceptable to use a GPU resource to GPU completion rather than simply if it was not currently being used by the CPU which is the function of the mutex. This rework fixes this with the groundwork that has been laid with previous commits, as `Context` semantics are utilized to move back to using mutexes for locking of resources and tracking the usage on the GPU in a cleaner way rather than arbitrary fence comparisons. This also leads to cleaning up a lot of methods that involved usage of fences that no longer require it and therefore can be entirely removed, further cleaning up the codebase. It also opens the door for future improvements such as the removal of `hostImmutableCycle` and replacing them with better solutions, the implementation of which is broken at the moment regardless. While moving to `Context`-based locking the question of multiple GPU workloads being in-flight while using overlapping resources came up which brought a fundamental limitation of `FenceCycle` to light which was that only one resource could be concurrently attached to a cycle and it could not adequately represent multi-cycle dependencies. `FenceCycle` chaining was designed to fix this inadequacy and allows for several different GPU workloads to be in-flight concurrently while utilizing the same resources as long as they can ensure GPU-GPU synchronization.	2022-08-06 22:18:42 +05:30
PixelyIon	07d45ee504	Introduce `FenceCycle` Chaining If we want to allow submitting multiple pieces of work to the GPU at once while still requiring CPU synchronization, we'll need to track all past fence cycles associated with a resource alongside the current one. To solve this the concept of chaining fences has been introduced, fences from past usages can be chained to the latest fence which'll then recursively forward operations to chained fences. This change also ends up mandating a move away from `FenceCycleDependency` as it would prevent fences from concurrently locking the same resources which is required for chaining to work as two fences being chained fundamentally means they're locking the same resources. The `AtomicForwardList` is therefore used as the new container.	2022-08-06 22:18:42 +05:30
PixelyIon	6b9269b88e	Introduce `Context` semantics to GPU resource locking Resources on the GPU can be fairly convoluted and involve overlaps which can lead to the same GPU resources being utilized with different views, we previously utilized fences to lock resources to prevent concurrent access but this was overly harsh as it would block usage of resources till GPU completion of the commands associated with a resource. Fences have now been replaced with locks but locks run into the issue of being per-view and therefore to add a common object for tracking usage the concept of "tags" was introduced to track a single context so locks can be skipped if they're from the same context. This is important to prevent a deadlock when locking a resource which has been already locked from the current context with a different view.	2022-08-06 22:18:42 +05:30
PixelyIon	3139889a09	Implement Asynchronous Presentation We currently present all frames synchronously on the thread that calls into SurfaceFlinger functions, this is unoptimal as it doesn't match guest behavior which can lead to delaying the guest from working on the next frame. This commit queuing up frames to non-blocking and handles all waiting then presenting the frame on a dedicated thread.	2022-08-06 22:18:42 +05:30
PixelyIon	6e09dc5204	Fix thread name setting We utilize `pthread_setname_np` to set the thread names but didn't check for any errors which resulted in the `Skyline-Choreographer` and `ChannelCmdFifo` not having proper names as they exceeded the 16 character limit on thread names for the pthread function. This has now been fixed by changing the names and introducing error checking to invocations of this function.	2022-08-06 22:18:42 +05:30
PixelyIon	662ea532d8	Skip waiting on host GPU after command buffer submission We waited on the host GPU after `Execute` but this isn't optimal as it causes a major stall on the CPU which can lead to several adverse effects such as downclocking by the governor and losing the opportunity to work in parallel with the GPU. This has now been fixed by splitting `Execute`'s functionality into two functions: `Submit` and `SubmitWithFlush` which both execute all nodes and submit the resulting command buffer to the GPU but flushing will wait on the GPU to complete while the non-flush variant will not wait and work ahead of the GPU.	2022-08-06 22:18:42 +05:30
PixelyIon	5129d2ae78	Add move-assignment semantics to `ActiveCommandBuffer`/`MegaBuffer` We need move-assignment semantics to viably utilize these objects as class members, they cannot be replaced without move-assign (or copy-assign but that is undesirable here). This commit fixes that by introducing a move assignment operator to them while making the `slot` a pointer which has the necessary nullability semantics.	2022-08-06 22:18:42 +05:30
Billy Laws	7fd9d347e3	Use per-RT blend enable registers even when independent blend is disabled The common blend enable register seems to be used for something else. This is required for blending to work correctly in OpenGL games	2022-07-29 20:07:14 +01:00
Billy Laws	048c2fdd29	Fix Vulkan framebuffer dimensions calculations The framebuffer needs to be large enough to contain both the render area extent and offset	2022-07-29 20:07:14 +01:00
lynxnb	3905728447	Make every setting observable individually A `Setting` delegate class has been introduced, holding the raw value of the setting and adding support for registering callbacks to that setting. Callbacks will then be called when the value of that setting changes. As a result of this, raw setting values have been made accessible through pointer dereference semantics.	2022-07-26 20:16:24 +05:30
lynxnb	cbc896c8f8	Fix `waitForFences` crash on Mali drivers Mali GPU drivers utilize the `ppoll()` syscall inside `waitForFences` which isn't correctly restarted after a signal, which we can receive at any time on a guest thread. This commit fixes that by recursively calling the function on failure till it succeeds or returns an unexpected error. Co-authored-by: PixelyIon <pixelyion@protonmail.com> Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-07-14 20:34:16 +02:00
Billy Laws	e816256220	Add blend, scissor, viewport and vertex state to shader hash These caused a ton of additional comparisons in Zelda Link's Awakening as many shaders would have the same hash.	2022-06-28 21:32:59 +01:00
lynxnb	e6cfdeb06a	Fix non-indexed quad draws Certain non-indexed quad draws would mistakenly take the indexed quad path because of the assumption that they would not have a bound index buffer. This resulted in a crash for most games using quads due to a faulty exception `Indexed quad conversion is not supported`, when in fact they were not using indexed quads. Co-authored-by: PixelyIon <pixelyion@protonmail.com> Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-06-23 10:57:11 +02:00
lynxnb	8fc3bc75f4	Allow providing an index type to calculate quad conversion buffer size	2022-06-23 00:15:44 +02:00
Billy Laws	7709dc8cf6	Rewrite buffer megabuffering to be per view and more efficient This commit implements several key optimisations in megabuffering that are all inherently interlinked. - Megabuffering is moved from per-buffer to per-view copies, this makes megabuffering possible for small views into larger underlying buffers which is often the case with even the simplest of games, - Megabuffering is no longer the default option, it is only enabled for buffer views that have had inline GPU writes applied to them in the past as that is the only case where they are beneficial. In any other case the cost of copying, even with a 128KiB limit can be significant. - With both of these changes, there is now possibility for overlapping views where one uses megabuffering and one does not. In order to allow GPU inline writes to work consistently in such cases a system of 'host immutability' has been implemented, when a buffer is marked as host immutable for a given cycle, all writes to the buffer from that point to the point the cycle is signalled will be performed on the GPU, ensuring that the backing contents are correctly sequenced	2022-06-11 17:05:39 +05:30
MCredstoner2004	2e356b8f0b	Use spans instead of ptr and size in kernel memory	2022-06-11 17:05:39 +05:30
Billy Laws	22039df301	Transition to std::unordered_set for buffer view tracking Has the same guarantees of pointer stabilty while also being significantly faster in cases where a buffer has thousands of views. This is the case in RE4 and this change leads to an almost 1000% performance improvement in that game.	2022-06-09 23:52:13 +01:00
PixelyIon	a5ca370c36	Implement thread-safe MegaBuffer pool We currently have a global `MegaBuffer` instance that is shared across all channels, this is very problematic as `MegaBuffer` fundamentally works like a state machine with allocations (especially resetting/freeing) and is thread-specific. Therefore, we now have a pool of several `MegaBuffer`s which is allocated from by the `CommandExecutor` and kept channel specific as a result which also limits its usage to a single thread, this allows for individually resetting or freeing any allocations.	2022-06-05 13:04:40 +05:30
PixelyIon	3e08494146	Minor `CommandScheduler` refactor There was a lot of redundant code in the `CommandScheduler` when the same functionality could be achieved with much shorter and cleaner code which this commit fixes. This includes no changes to the user-facing API and does not require any changes on the user side as a result.	2022-06-05 13:04:40 +05:30
Billy Laws	54999957a2	Remove RGB565 format workaround Will soon be redundant with new texture manager and is quite hacky so drop it.	2022-06-04 17:49:13 +01:00
Billy Laws	deb7a0e22a	Implement 5x5 and 10x10 ASTC texture formats	2022-06-04 17:42:37 +01:00
Billy Laws	cc5a3f99c1	Reformat format description file	2022-06-04 17:42:13 +01:00
Billy Laws	a476bbaf4d	Add 11_11_10 vertex buffer format	2022-06-04 17:41:10 +01:00
Billy Laws	71c37dd6c4	Add D24X8Unorm depth RT format support	2022-06-04 17:40:49 +01:00
Billy Laws	d3af629b83	Support R32G32B32A32 int RT formats	2022-06-04 17:38:57 +01:00
Billy Laws	106ad597db	Support BGRA8888 surfaceflinger format A swizzle is applied to R8G8B8A8 to transform it to BGRA since BGRA isn't a commonly supported swapchain format on Android.	2022-06-04 16:49:26 +01:00
Billy Laws	84dec7561c	Dont cache rendertarget mappings Some games remap rendertargets or map them late which would lead to weird graphical bugs or crashes. Drop the caching since VMM lookup is fairly cheap anyway.	2022-06-03 19:31:52 +01:00
Billy Laws	581a016991	Add GuestTexture::GetSize helper function This code was getting duplicated a bit so commonise into a helper function.	2022-06-03 19:30:54 +01:00
PixelyIon	2712b3276b	Fix incorrect `VkBufferImageCopy` offset calculations The `VkBufferImageCopy` offset calculations were wrong inside `CopyIntoStagingBuffer` as it multiplied the mip level's linear size by `levelCount` rather than `layerCount`. This led to substantial UB in games which called this function as it led to an overflow and resulted in writing to other areas of the buffer which caused major issues such as vertex/index buffer corruption and corresponding graphical glitches alongside likely being the cause of some crashes.	2022-06-02 22:14:22 +05:30
PixelyIon	06901ef22a	Fix BC7 output swizzling from BGRA to RGBA BC7 CPU decoding had the red and blue channels swapped around as it outputted a BGRA image after decoding while we expected an RGBA image to be produced. This should fix the colors of certain textures in titles such as Cuphead or Sonic Forces.	2022-06-02 19:48:55 +05:30
Billy Laws	c745e0e02b	Move image type logic to GuestTexture, allowing 2D array views for 3D RTs We can't render to a 3D texture through a 3D view, we instead have to create a 2D array view into it and render to that. The texture manager previously didn't support having a different view type/layer count between a guest texture view and the underlying storage texture that is required to support this so that was also implemented by reading the view layer count from the dimensions depth instead if the underlying texture is 3D (and the view type is 2D array). Additionally move away from our own view type enum to Vulkan, inline with other guest texture member types.	2022-05-31 22:09:53 +01:00
Billy Laws	6cc925c2d3	Reset RT mappings on dimension and format changes	2022-05-31 17:49:16 +01:00
Billy Laws	8180bf852e	Lock textures before attaching in BlitContext	2022-05-31 16:54:13 +01:00
PixelyIon	e592b11039	Drop `samplerAnisotropy` as a required GPU feature Sampler anisotropy was made a required feature in an earlier commit due to its widespread availability but this was determined to be incorrect as certain Mali GPUs that can otherwise run 2D games in Skyline do not have this feature, while they are still not officially supported as this was the only roadblock to support them, it has now been made an optional feature.	2022-05-31 01:37:40 +05:30
PixelyIon	80c8fb8791	Implement CPU BCn Texture Decoding Certain GPU vendors such as ARM's Mali do not have support for BCn textures whatsoever while other vendors such as AMD only have partial support (BC1-BC3). Most titles on the guest utilize BC textures and to address this on host GPUs without support for BCn, we need to decompress the texture on the CPU. This commit implements a CPU BCn texture decoder based off Swiftshader's BC decoder, it also adds the necessary infrastructure to have different formats for the `GuestTexture` and `Texture` objects.	2022-05-28 21:22:24 +05:30
PixelyIon	fe615b1e03	Clarify texture swizzling inner-loop iteration count The iterations of the inner loop for sector deswizzling was miscalculated as `SectorWidth * SectorHeight` while the result was correct at `32`, it should be determined by the amount of sector lines within a GOB i.e.: `(GobWidth / SectorWidth) * GobHeight`.	2022-05-28 21:22:24 +05:30
PixelyIon	7d4e0a7844	Implement Mipmapped Texture Support Support for mipmapped textures was not implemented which is fairly crucial to proper rendering of games as the only level that would load is the first level (highest resolution), that might result in a lot more memory bandwidth being utilized. Mipmapping also has associated benefits regarding aliasing as it has a minor anti-aliasing effect on distant textures. This commit entirely implements mipmapping support but it does not extend to full support for views into specific mipmap levels due to the texture manager implemention being incomplete.	2022-05-28 21:22:24 +05:30
PixelyIon	da7e6a7df7	Replace Maxwell DMA `GuestTexture` usage with new swizzling API Maxwell DMA requires swizzled copies to/from textures and earlier it had to construct an arbitrary `GuestTexture` to do so but with the introduction of the cleaner API, this has become redundant which this commit cleans up and replaces with direct calls to the API with all the necessary values.	2022-05-28 21:22:24 +05:30
PixelyIon	de300bfdbe	Refactor Texture Swizzling The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases. The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.	2022-05-19 17:13:55 +05:30
Robin Kertels	0a3cf25823	Implement the Fermi 2D blitting engine The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering. Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA ``` 112233 112233 445566 445566 ``` These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following: ``` 123 456 ``` Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect. This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework. Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework. Co-authored-by: Billy Laws <blaws05@gmail.com>	2022-05-13 22:37:37 +01:00
Billy Laws	be2546138d	Move IOVA class to GMMU so it can be used for other engines	2022-05-13 22:37:37 +01:00
Billy Laws	3ad640fcbc	Fix accidental graphics context member/parameter duplication	2022-05-13 22:37:37 +01:00
PixelyIon	7a6f27a19a	Fix texture swizzling OOB writes Certain writes during swizzling went out of bounds due to incorrect `blockExtentY` calculation, the previous commit to fix this ended up breaking it further. This commit returns to the original commit's calculations with the proper addendum of a check for exact alignment with a GOB which is the case that was broken earlier.	2022-05-13 14:52:41 +05:30
PixelyIon	168e51e7ad	Always use `GetLayerStride` for layer stride in Texture The `GuestTexture::GetLayerStride` function was not always being utilized to retrieve the layer stride inside `Texture`, it would instead directly access the `guestTexture::layerStride` member. This is problematic as it may not be initialized and return `0` which would lead to a broken image copy.	2022-05-13 14:21:37 +05:30
Billy Laws	d08ac63bbf	Use TIC maximum index over TSC when tscIndexLinked is set	2022-05-12 17:38:22 +01:00

... 6 7 8 9 10 ...

720 Commits