strato

scl/strato

mirror of https://github.com/Takiiiiiiii/strato.git synced 2025-07-17 08:46:39 +00:00

Author	SHA1	Message	Date
Billy Laws	a6bb716123	Move packed pipeline state to a seperate file	2022-11-02 17:46:07 +00:00
Billy Laws	4dcbf5c3a0	Drop the caching aspect of shader manager entirely Caching here was deemed unnecessary since it will be done implicitly by the pipeline cache and creates issues with the legacy attribute conversion pass. It now purely serves as a frontend for Hades.	2022-11-02 17:46:07 +00:00
Billy Laws	e77e4891dc	Tidy up Maxwell 3D regs a bit	2022-11-02 17:46:07 +00:00
Billy Laws	405d26fc22	Introduce Maxwell 3D shader state Simple state holder that hashes the stored shader and reads it into a buffer	2022-11-02 17:46:07 +00:00
Billy Laws	6865f0bdaf	Transition color blend state to packed pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	7049a521d2	Use Vulkan types directly in PackedPipelineState where possible	2022-11-02 17:46:07 +00:00
Billy Laws	effeb074b6	Move pipeline cache Key to cpp file and rename to PackedPipelineState The new name is more indicative of what the struct contains and how it will be used as more than just a key.	2022-11-02 17:46:07 +00:00
Billy Laws	e1512c91a0	Transition depth stencil state to pipeline cache key	2022-11-02 17:46:07 +00:00
Billy Laws	1f844e2c18	Transition rasterization state to pipeline cache key	2022-11-02 17:46:07 +00:00
Billy Laws	9a6efb091c	Transition tessellation state to pipeline cache key Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.	2022-11-02 17:46:07 +00:00
Billy Laws	ae5d419586	Transition input assembly state to pipeline cache key	2022-11-02 17:46:07 +00:00
Billy Laws	3f9161fb74	Transition vertex input state to pipeline cache key Also adds dirty tracking and removes it from direct state while we're at it. Since we no longer use Vulkan structs directly there's no benefit to it.	2022-11-02 17:46:07 +00:00
Billy Laws	ffe24aa075	Introduce a base pipeline cache key starting with RTs It was determined that a general purpose Vulkan pipeline cache isn't viable for the significant performance reqs of Draw(), by using a Maxwell 3D specific key we can shrink state significantly more than if we used Vulkan structs.	2022-11-02 17:46:07 +00:00
Billy Laws	1e8f7d7fcb	Implement dynamic pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	a94040ac7d	Add default cases to enum conversions where necessary	2022-11-02 17:46:07 +00:00
Billy Laws	6e55d4dcf4	Implement color blend pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	cb11662ea5	Implement depth stencil pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	2484a2d6b5	Add A1R5G5B5Unorm and S8Uint formats	2022-11-02 17:46:07 +00:00
Billy Laws	690e96bce0	Implement rasterization pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	90cd6adb91	Implement tessellation pipeline state	2022-11-02 17:46:07 +00:00
Billy Laws	ef11900a39	Introduce reworked Maxwell 3D core interconnect This mainly distributes operations down to activeState and pipelineState, aside from clears which are implemented in-place. The exposed interface is much reduced as opposed to the previous GraphicsContext system due to the newly introduced dirty system, this should hopefully make the code more maintainable and keep actual rendering operations seperate from primitive restart state or whatever. Currently draws are unimplemented and the only full implemented things are clears and constant buffer operations.	2022-11-02 17:46:07 +00:00
Billy Laws	37b821a4dc	Introduce Maxwell 3D interconnect active state Active state encapsulates all state that isn't part of a pipeline and can be set dynamically with Vulkan calls. This includes both dynamic state like stencil faces, and command buffer state like vertex buffer bindings. Simililarly to the last commit, the main goal of this is to reduce the number of redundant work done per draw by employing dirty state as much as possible. Without using dirty state for this every active state operation would need to be performed every draw, which gets very expensive when things like buffer lookups end up being reqiored. Code has also been heavily cleaned up as is described in the previous commit.	2022-11-02 17:46:07 +00:00
Billy Laws	5fdda78073	Introduce Maxwell 3D interconnect pipeline state The main goal of this is to reduce the number of redundant lookups and work done per draw as much as possible, this is mainly achived through heavy used of dirty tracking though other optimisations like heavily using the linear allocator are also in play. In addition to the goal of performance, the code has been cleaned up and abstracted significantly from its state in graphics_context, hopefully making the GPU interconnect code much more maintainable in the future and reducing the boilerplace needed to add even simple functionality. This commit includes partial pipeline state, enough for implementing clears + a slight bit extra.	2022-11-02 17:46:07 +00:00
Billy Laws	21f5611231	Rewrite constant buffer interconnect code Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.	2022-11-02 17:46:07 +00:00
Billy Laws	d1e7bbc1d8	Introduce common code for Maxwell 3D interconnect rewrite This common code will be used across the entirety of the 3D rewrite, it also includes a stub for StateUpdateBuilder, which will be used by active state code to apply state updates.	2022-11-02 17:46:07 +00:00
Billy Laws	a6c49115f9	Rewrite all Maxwell 3D registers up to clears to match Nvidia docs All the names are directly translated from Nvidia docs, with minimal conversions to enums/structs when appropriate. Not all registers have been rewritten, only those that are needed to implement clears and dynamic state, the rest will be added as they are used in the GPU rework.	2022-11-02 17:46:07 +00:00
Billy Laws	8471ab754d	Introduce a spin lock for resources locked at a very high frequency Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.	2022-11-02 17:46:07 +00:00
Billy Laws	e72fe02c15	Add inline fast-path for `Buffer::FindOrCreate()` This can be inlined by the compiler much easier which helps perf a fair bit due to the number of times buffers are looked up, also avoids the need for small vector construction that was done in the previous fast-path.	2022-11-02 17:46:07 +00:00
Billy Laws	49478e178a	Avoid redundantly syncing buffers before every Write in an execution This isn't a guarantee provided by actual HW so we don't need to provide it either, the sync can be skipped once the buffer already been synced at least once within the execution.	2022-11-02 17:46:07 +00:00
Billy Laws	f7a726e452	Allow attempting to write to buffers without passing a GPU copy callback Constructing the GPU copy callback in `ConstantBuffers::Load()` ended up taking a fair amount of time despite it almost never being used in practice. By making it optional it can be skipped most of the time and only done when it's actually neccessary by calling `Write()` again if the initial call returned true.	2022-11-02 17:46:07 +00:00
Billy Laws	5dca5cc10e	Redesign buffer view infra to remarkably reduce creation overhead Buffer views creation was a significant pain point, requiring several layers of caching to reduce the number of creations that introduced a lot of complexity. By reworking delegates to be per-buffer rather than per-view and then linearly allocating delegates (without ever freeing) views can be reduced to just {delegatePtr, offset, size}, avoiding the need for any allocations or set operations in GetView. The one difficulty with this is the need to support buffer recreation, which is achived by allowing delegates to be chained - during recreation all source buffers have their delegates modified to point to the newly created buffer's delegate. Upon accessing a view with such a chained delegate the view will be modified to point directly to the end delegate with offset being updated accordingly, skipping the need to traverse the chain for future accesses.	2022-11-02 17:46:07 +00:00
Billy Laws	6359852652	Introduce page size constants and replace all usages of PAGE_SIZE Avoids using macros and results in code which looks slightly cleaner.	2022-11-02 17:46:07 +00:00
lynxnb	54172322fe	Fix host synchronization for texture with a different guest format Host synchronization of a guest texture with a different guest format represents a valid use case where the host doesn't support the guest format and conversion to a host-compatible format must be performed. The issue is most evident on Mali GPUs, as they don't support BCn texture formats thus needing manual decoding before submission. It was disabled by mistake in a previous commit, this commit re-enables it.	2022-09-15 15:22:52 +02:00
lynxnb	34bd16426c	Fix quads index buffer conversion not accounting for first index Unindexed quad draws were broken when multiple draw calls were done on the same vertex buffer, with a non-zero `first` index. Indexed quad draws also suffered from the same issue, but was never encountered in games. This commit fixes both cases by accounting for the `first` drawn index when generating conversion index buffers.	2022-09-04 12:42:33 +02:00
Billy Laws	82444f3b0a	Don't set push descrptor flag for desc sets This is redundant and against the spec since we no longer use push descriptors.	2022-08-31 21:26:14 +01:00
Billy Laws	bf491f71f9	Simplify blit helper shader vertex order	2022-08-10 15:43:16 +01:00
Billy Laws	c32bec071c	Adjust blit src{X,Y} to account for centred sampling before calling into helper shader Since the blit engine itself samples from pixel corners and the helper shader from pixel centres teh src coordinates need to be adjusted to avoid the helper shader wrapping round on the final column.	2022-08-10 15:39:37 +01:00
Billy Laws	08f36aac33	Enable hades vertex position input workaround for Adreno Caused crashes in any games using geometry shaders as by default hades uses the position builtin directly.	2022-08-08 18:09:00 +01:00
Billy Laws	04e7b684d2	Enable vertexPipelineStoresAndAtomics, fragmentStoresAndAtomics and shaderStorageImageWriteWithoutFormat Vulkan features Used by Xenoblade Chronicles DE	2022-08-08 17:43:18 +01:00
Billy Laws	390558c802	Add partial support for legacy attribute conversion We previously missed the hades pass for attribute conversion leading to crashes when games would attempt to use such an attribute. The hades pass for this isn't a proper fix however as it modifies the IR directly and will break if any of the previous stages in the pipeline change. Enable it to allow for games using them to at least have a chance at working. In the long term the pass will be reworked on the hades side to avoid modifying the IR in a way that can't be undone.	2022-08-08 17:43:18 +01:00
Billy Laws	540437b547	Fixup index buffer view caching We forgot to set the view size, which would end up forcing a view to be recreated with every call	2022-08-08 17:43:18 +01:00
Billy Laws	c966cd3b26	Prevent runtimeInfo vertex state from leaking into wrong shaders This vertex state must only be present for the last pipeline stage that touches vertices, if it is present for other stages it could result in incorrect behaviour like performing TFB in the fragment shader or flipping device coordinates twice.	2022-08-08 17:43:13 +01:00
Billy Laws	c52d3195cf	Ensure shader stage enable state matches pipeline stage enable state As the code was before, if we had a shader that was disabled and enabled again after without being invalidated the pipeline stage would stay disabled and break rendering.	2022-08-08 17:40:35 +01:00
Billy Laws	b1c669ba14	Always keep the VertexB shader stage enabled HW doesn't allow disabling the VertexB stage, enforce this in code.	2022-08-08 17:40:35 +01:00
lynxnb	d5174175d1	Implement indexed quads support We previously only supported non-indexed quads. Support for this is implemented by converting the index buffer at record time and pushing the result into the megabuffer, which is then used as the index buffer in the final draw command.	2022-08-08 17:40:35 +01:00
lynxnb	e6741642ba	Split out megabuffer allocation from pushing data The `Allocate` method allocates the given amount of space in a megabuffer chunk, returning a descriptor of the allocated region. This is useful for situations where you want to write directly to the megabuffer, avoiding the need for an intermediary buffer.	2022-08-08 17:40:35 +01:00
Billy Laws	cdc6a4628a	Enable VK uint8 indices feature when supported	2022-08-08 17:40:35 +01:00
Billy Laws	dccc86ea97	Implement transform feedback with VK_EXT_transform_feedback Tested to work in Xenoblade Chronicles DE, the code handles both hades varying input and buffer setup.	2022-08-08 17:40:35 +01:00
Billy Laws	06053d3caf	Rewrite Fermi 2D engine to use the blit helper shader Entirely rewrites the engine and interconnect code to take advantage of the subpixel and OOB blit support offered by the blit helper shader. The interconnect code is also cleaned up significantly with the 'context' naming being dropped due to potential conflicts with the 'context' from context lock	2022-08-08 17:40:35 +01:00
Billy Laws	395f665a13	Implement a system for helper shaders together with a simple blit shader It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may need in the future.	2022-08-08 17:40:35 +01:00

... 4 5 6 7 8 ...

720 Commits