Commit Graph

1740 Commits

Author SHA1 Message Date
eb4a9bab11 Mark freshly allocated descriptor slots as active 2022-11-02 17:46:07 +00:00
f3184cdff1 Reorder active descriptor set slots to end of list
Speeds up allocations by significantly reducing the number of needed atomic ops per alloc. Could be optimised further in the future if needed.
2022-11-02 17:46:07 +00:00
128b68d8b2 Avoid resetting command buffers manually it's implicit
Somewhat costly on adreno, and with the one time submit flag the reset is implicit for the next beginCommandBuffers call.
2022-11-02 17:46:07 +00:00
e1717ed811 Implement Maxwell samplers 2022-11-02 17:46:07 +00:00
f1600f5ad0 Support allocating into spans in the linear allocator 2022-11-02 17:46:07 +00:00
04cea9239f Implement descriptor set updating through StateUpdater
Will automatically resolve buffer views at record-time into descriptor write/copy structures then apply the write and bind the set.
2022-11-02 17:46:07 +00:00
d174ca950b Revert "Reset executor command buffers asynchronously"
This reverts commit fc7956df4ff56fdb2afc4b2bb0bbca82196179ca.
2022-11-02 17:46:07 +00:00
2bbe975ea7 Reset executor command buffers asynchronously
This took a little while to do on qcom drivers, moving it to the cycle waiter thread gives a tiny speedup.
2022-11-02 17:46:07 +00:00
054d32567d Allow mutation of input data by callback in CircularQueue::AppendTranform 2022-11-02 17:46:07 +00:00
7c9212743c Implement asynchronous command recording
Recording of command nodes into Vulkan command buffers is very easily parallelisable as it can effectively be treated as part of the GPU execution, which is inherently async. By moving it to a seperate thread we can shave off about 20% of GPFIFO execution time. It should be noted that the command scheduler command buffer infra is no longer used, since we need to record texture updates on the GPFIFO thread (while another slot is being recorded on the record thread) and then use the same command buffer on the record thread later. This ends up requiring a pool per slot, which is reasonable considering we only have four slots by default.
2022-11-02 17:46:07 +00:00
a197dd2b28 Allow for creating signalled fence cycles 2022-11-02 17:46:07 +00:00
542651232b Add a mutex to allow preventing buffer recreation 2022-11-02 17:46:07 +00:00
379b4f163d Implement popping from CircularQueue 2022-11-02 17:46:07 +00:00
6d9dc9c6fb Implement some more of Draw
Now performs VK state updates on execution and takes advantage of quick descriptor binding.
2022-11-02 17:46:07 +00:00
3b26f4f48a Expose way to check inter-pipeline descriptor compatibility
Allows users to skip/use quick descriptor sync even after switching pipelines.
2022-11-02 17:46:07 +00:00
943a38e168 Implement StateUpdater for rapid recording of VK state updates
Using command executor for each state individual update was found to be infeasible due to the shear number of state updates per draw and it relying on per-node heap allocations. Instead this commit takes advantage of each state update being used only once to implement a system of linearly-allocated state update commands that are linked together. After setting up all draw state with StateUpdateBuilder, the built StateUpdater can then be used in the execution phase to record all of the draw state into the command buffer with almost zero ovehead.
2022-11-02 17:46:07 +00:00
7b4da52445 Add a fast binding sync path for when only one cbuf has changed
SMO implements instanced draws by repeating the same draw just with a different constant buffer bound. Reduce the cost of this significantly by detecting such cases and instead of processing every descriptor, copy the previous descriptor set and update only the ones affected by the bound constant buffer.

Credits to ripinperiperi for the initial idea and making me aware of how SMO does these draws
2022-11-02 17:46:07 +00:00
89edd9b303 Reset megabuffer binding for disabled vertex buffers 2022-11-02 17:46:07 +00:00
6a1615a104 Expose color and depth attachments to Draw 2022-11-02 17:46:07 +00:00
aae957819e Simplify BufferView locking by requiring buffer manager be locked
Avoids the need to repeat the lookup after locking since recreations are impossible if buffer manager is locked.
2022-11-02 17:46:07 +00:00
9449b52f36 Reduce minimum megabuffer alignment to 128 bytes 2022-11-02 17:46:07 +00:00
b3cf9c40ba Update megabuffer execution/sequence numbers after updating an allocation 2022-11-02 17:46:07 +00:00
4b2b6fc6e9 Avoid calling SynchronizeGuest when attempting to megabuffer unless necessary 2022-11-02 17:46:07 +00:00
e5919e84a1 Pipeline state if statment cleanups 2022-11-02 17:46:07 +00:00
bf536aa168 Sync pipeline descriptors every draw 2022-11-02 17:46:07 +00:00
9223d7f524 Fix descriptor initialisation order
They need to be setup before the pipeline is created to avoid passing in garbage data.
2022-11-02 17:46:07 +00:00
4652cc5a0a Avoid parsing descriptors for disabled shader stages 2022-11-02 17:46:07 +00:00
3456fb39fa Fix pipeline to shader stage conversion when filling in shader infos
The two vertex pipeline stages need to be both treated as a single stage, and all subsequent stages need to be offset by -1
2022-11-02 17:46:07 +00:00
a9213debc7 Implement constant buffer reading 2022-11-02 17:46:07 +00:00
afcfe8a7fa Don't update scissor state >0 unless multiview is supported 2022-11-02 17:46:07 +00:00
55d77b7eb0 Update user code for new megabuffering 2022-11-02 17:46:07 +00:00
cc776ae395 Keep track of an 'execution number' in CommandExecutor
Allows users to efficiently cache resources that are valid for only one execution without resorting to callbacks.
2022-11-02 17:46:07 +00:00
99a34df4cc Avoid trapping frequently synced buffers by using megabuffer copies
When a buffer is trapped nearly every frame, the cost of trapping and synchronising its contents starts to quickly add up. By always using the megabuffer when this is the case, since megabuffer copies are done directly from the guest, we skip the need to synchronise/trap the backing.
2022-11-02 17:46:07 +00:00
a24aec03a6 Rework per-view megabuffering to cache allocs in the buffer itself
The original intention was to cache on the user side, but especially with shader constant buffers that's difficult and costly. Instead we can cache on the buffer side, with a page-table like structure to hold variable sized allocations indexed by the aligned view base address. This avoids most redundant copies from repeated use of the same buffer without updates inbetween.
2022-11-02 17:46:07 +00:00
b810470601 Invalidate HLE macro state on macro updates 2022-11-02 17:46:07 +00:00
2360ca24da Implement constant buffer and storage buffer pipeline descriptor types 2022-11-02 17:46:07 +00:00
25255b01c7 Keep track of more pipeline descriptor information
This is needed in order to allow allocating per-draw descriptor arrays without recounting their lengths each time
2022-11-02 17:46:07 +00:00
ad0275dbef Expose active pipeline for access by Maxwell3D class 2022-11-02 17:46:07 +00:00
6e22373b59 Add array support to AllocateUntracked 2022-11-02 17:46:07 +00:00
388cff3353 Implement simple pipeline transition cache
Avoids the need to hash PipelineState when we can guess the pipeline that will be used next. This could very easily be optimised in the future with generational, usage-based caching if necessary.
2022-11-02 17:46:07 +00:00
302b2fcc3f Force flush when dirty refresh returns true 2022-11-02 17:46:07 +00:00
ec4ea5c5d7 Supply dispatcher manually for shader creation 2022-11-02 17:46:07 +00:00
3404a3abdb Implement macro HLE for instanced draw macros
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
cf0752f937 Use NCE memory tracking for guest shaders
Prevents needing to hash them for every single pipeline state update, without this just hashing shaders takes up a significant amount of time.
2022-11-02 17:46:07 +00:00
19a75c3f65 Bind all pipeline states to main pipeline dirty state 2022-11-02 17:46:07 +00:00
a04d8fb5cf Setup minimal viewport Vulkan pipeline state 2022-11-02 17:46:07 +00:00
fe51db366b Mark all dirty resources as dirty initially 2022-11-02 17:46:07 +00:00
abfa5929f1 Treat vertex buffers with base addr 0 as disabled 2022-11-02 17:46:07 +00:00
e71ca05f19 Avoid bitfields for signed enum types in PackedPipelineState 2022-11-02 17:46:07 +00:00
2f2b615780 Add dynamic state support to VK graphics pipeline cache 2022-11-02 17:46:07 +00:00