Commit Graph

720 Commits

Author SHA1 Message Date
a6bb716123 Move packed pipeline state to a seperate file 2022-11-02 17:46:07 +00:00
4dcbf5c3a0 Drop the caching aspect of shader manager entirely
Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.
2022-11-02 17:46:07 +00:00
e77e4891dc Tidy up Maxwell 3D regs a bit 2022-11-02 17:46:07 +00:00
405d26fc22 Introduce Maxwell 3D shader state
Simple state holder that hashes the stored shader and reads it into a buffer
2022-11-02 17:46:07 +00:00
6865f0bdaf Transition color blend state to packed pipeline state 2022-11-02 17:46:07 +00:00
7049a521d2 Use Vulkan types directly in PackedPipelineState where possible 2022-11-02 17:46:07 +00:00
effeb074b6 Move pipeline cache Key to cpp file and rename to PackedPipelineState
The new name is more indicative of what the struct contains and how it will be used as more than just a key.
2022-11-02 17:46:07 +00:00
e1512c91a0 Transition depth stencil state to pipeline cache key 2022-11-02 17:46:07 +00:00
1f844e2c18 Transition rasterization state to pipeline cache key 2022-11-02 17:46:07 +00:00
9a6efb091c Transition tessellation state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
ae5d419586 Transition input assembly state to pipeline cache key 2022-11-02 17:46:07 +00:00
3f9161fb74 Transition vertex input state to pipeline cache key
Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.
2022-11-02 17:46:07 +00:00
ffe24aa075 Introduce a base pipeline cache key starting with RTs
It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.
2022-11-02 17:46:07 +00:00
1e8f7d7fcb Implement dynamic pipeline state 2022-11-02 17:46:07 +00:00
a94040ac7d Add default cases to enum conversions where necessary 2022-11-02 17:46:07 +00:00
6e55d4dcf4 Implement color blend pipeline state 2022-11-02 17:46:07 +00:00
cb11662ea5 Implement depth stencil pipeline state 2022-11-02 17:46:07 +00:00
2484a2d6b5 Add A1R5G5B5Unorm and S8Uint formats 2022-11-02 17:46:07 +00:00
690e96bce0 Implement rasterization pipeline state 2022-11-02 17:46:07 +00:00
90cd6adb91 Implement tessellation pipeline state 2022-11-02 17:46:07 +00:00
ef11900a39 Introduce reworked Maxwell 3D core interconnect
This mainly distributes operations down to activeState and pipelineState, aside from clears which are implemented in-place. The exposed interface is much reduced as opposed to the previous GraphicsContext system due to the newly introduced dirty system, this should hopefully make the code more maintainable and keep actual rendering operations seperate from primitive restart state or whatever. Currently draws are unimplemented and the only full implemented things are clears and constant buffer operations.
2022-11-02 17:46:07 +00:00
37b821a4dc Introduce Maxwell 3D interconnect active state
Active state encapsulates all state that isn't part of a pipeline and can  be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings.
Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.
2022-11-02 17:46:07 +00:00
5fdda78073 Introduce Maxwell 3D interconnect pipeline state
The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.
2022-11-02 17:46:07 +00:00
21f5611231 Rewrite constant buffer interconnect code
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
2022-11-02 17:46:07 +00:00
d1e7bbc1d8 Introduce common code for Maxwell 3D interconnect rewrite
This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.
2022-11-02 17:46:07 +00:00
a6c49115f9 Rewrite all Maxwell 3D registers up to clears to match Nvidia docs
All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.
2022-11-02 17:46:07 +00:00
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
e72fe02c15 Add inline fast-path for Buffer::FindOrCreate()
This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.
2022-11-02 17:46:07 +00:00
49478e178a Avoid redundantly syncing buffers before every Write in an execution
This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.
2022-11-02 17:46:07 +00:00
f7a726e452 Allow attempting to write to buffers without passing a GPU copy callback
Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.
2022-11-02 17:46:07 +00:00
5dca5cc10e Redesign buffer view infra to remarkably reduce creation overhead
Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.
2022-11-02 17:46:07 +00:00
6359852652 Introduce page size constants and replace all usages of PAGE_SIZE
Avoids using macros and results in code which looks slightly cleaner.
2022-11-02 17:46:07 +00:00
54172322fe Fix host synchronization for texture with a different guest format
Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.
2022-09-15 15:22:52 +02:00
34bd16426c Fix quads index buffer conversion not accounting for first index
Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index.
Indexed quad draws also suffered from the same issue, but was never encountered in games.
This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.
2022-09-04 12:42:33 +02:00
82444f3b0a Don't set push descrptor flag for desc sets
This is redundant and against the spec since we no longer use push descriptors.
2022-08-31 21:26:14 +01:00
bf491f71f9 Simplify blit helper shader vertex order 2022-08-10 15:43:16 +01:00
c32bec071c Adjust blit src{X,Y} to account for centred sampling before calling into helper shader
Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.
2022-08-10 15:39:37 +01:00
08f36aac33 Enable hades vertex position input workaround for Adreno
Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.
2022-08-08 18:09:00 +01:00
04e7b684d2 Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features
Used by Xenoblade Chronicles DE
2022-08-08 17:43:18 +01:00
390558c802 Add partial support for legacy attribute conversion
We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.
2022-08-08 17:43:18 +01:00
540437b547 Fixup index buffer view caching
We forgot to set the view size, which would end up forcing a view to be recreated with every call
2022-08-08 17:43:18 +01:00
c966cd3b26 Prevent runtimeInfo vertex state from leaking into wrong shaders
This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.
2022-08-08 17:43:13 +01:00
c52d3195cf Ensure shader stage enable state matches pipeline stage enable state
As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.
2022-08-08 17:40:35 +01:00
b1c669ba14 Always keep the VertexB shader stage enabled
HW doesn't allow disabling the VertexB stage, enforce this in code.
2022-08-08 17:40:35 +01:00
d5174175d1 Implement indexed quads support
We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.
2022-08-08 17:40:35 +01:00
e6741642ba Split out megabuffer allocation from pushing data
The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.
2022-08-08 17:40:35 +01:00
cdc6a4628a Enable VK uint8 indices feature when supported 2022-08-08 17:40:35 +01:00
dccc86ea97 Implement transform feedback with VK_EXT_transform_feedback
Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.
2022-08-08 17:40:35 +01:00
06053d3caf Rewrite Fermi 2D engine to use the blit helper shader
Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock
2022-08-08 17:40:35 +01:00
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00