Commit Graph

720 Commits

Author SHA1 Message Date
b04d18eba5 Add support for split mappings to I2M uploads
Used by Super Mario Sunshine and other Vulkan games.
2022-11-02 17:46:07 +00:00
db5e208379 Clear images even when aspects mismatch 2022-11-02 17:46:07 +00:00
3c8df327f1 Fixup subpass barriers and flags 2022-11-02 17:46:07 +00:00
5ab80901c6 Drop some debug code 2022-11-02 17:46:07 +00:00
4de89c8839 GPU NEW MARGEBAC 2022-11-02 17:46:07 +00:00
7670c83405 Ensure textures are clean before paging them out 2022-11-02 17:46:07 +00:00
93d43e0115 Fully fill in swizzle component mappings
Avoids the rest being default initialised to identity, which would break the intended effect of them.
2022-11-02 17:46:07 +00:00
37ff0ab814 Add buffer manager support for accelerated copies
These will be sequenced on the GPU/CPU depending on what's optimal and avoid any serialisation
2022-11-02 17:46:07 +00:00
cac287d9fd Implement accelerated uploads/copies through buffer manager
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
2022-11-02 17:46:07 +00:00
c5ec484d9a Avoid redundantly passing executor in ctors when it's already in ChannelCtx 2022-11-02 17:46:07 +00:00
463394ba72 Pass correct size for XFB buffers 2022-11-02 17:46:07 +00:00
bd976676f4 Fix SNorm vertex formats 2022-11-02 17:46:07 +00:00
b74098570f Zero-out unused XFB varyings before passing to hades 2022-11-02 17:46:07 +00:00
22f3ba6b93 Mark XFB buffers as GPU dirty 2022-11-02 17:46:07 +00:00
26aeeaecf5 Add constant buffer GPU write pipeline barrier 2022-11-02 17:46:07 +00:00
0b5d9308c4 Be more careful about potentially-unneeded GPU->CPU syncs
These can be especially expensive so should be avoided as much as possible.
2022-11-02 17:46:07 +00:00
e6530e2386 Delete graphics_context
F
2022-11-02 17:46:07 +00:00
b24a8465da Don't require depthClamp 2022-11-02 17:46:07 +00:00
0ebdbcf0ff Don't lock stateMutex when updating buffer cycle 2022-11-02 17:46:07 +00:00
dd360b8f75 Pass correct wait semaphore array size to queue submit 2022-11-02 17:46:07 +00:00
c78a4b9699 Fixup buffer recreation to avoid deadlock when waiting on srcs 2022-11-02 17:46:07 +00:00
95d849e1f6 Check FenceCycle signalled flag immediately before waiting
The lock release within the wait for submission means that another thread could end up signalling the cycle and then the VK wait still happen after when the lock has been reacquired.
2022-11-02 17:46:07 +00:00
1a23b929a7 Avoid chaining cycles in buffer recreation
This had a chance of creating circular chains which obviously caused issues, just do a wait instead for now.
2022-11-02 17:46:07 +00:00
6c0f084aae Introduce hack to ignore frequently read-back textures
Readback can be especially slow on mobile due to the varying load pattern it creates which often prevents the CPU/GPU from clocking up. Since some games perform texture readback but don't actually use it for anything significant implement a hack to skip it and significantly improve performance in such cases.
2022-11-02 17:46:07 +00:00
e45e7546c8 Redesign buffer megabuffering
Due to the frequency at which is is called megabuffering performance is critical to the performance of the entire emulator, especially in high-drawcall-count scenarios. After the view redesign, megabuffering on a per-view level was no longer possible nor desirable, and thus megabuffering was modified to just copy for every usage of a view. This worked great at the time since there were other bottlenecks, however gpu-new has since removed almost all of them and megabuffering is now a major sore point. Fix this by megabuffering small chunks and storing them in a page-table like structure within the buffer, these chunks can be referenced by multiple views and will be smartly invalidated whenever the sequence number or execution number changes to avoid any sequencing issues. In addition to this, to help the case where almost the whole buffer is read every single frame across a set of multiple views, an optimisation to skip the chunked tracking and use one large single megabuffer allocation and one single memcpy has been introduced. This reduces the overall amount of time spent in memcpy since large memcpys are quicker.
2022-11-02 17:46:07 +00:00
7491178a9e Pass base array layer to texture views 2022-11-02 17:46:07 +00:00
ff57d2fbbf Enforce stronger format and weaker dimension texture compat checks
Rather than using just bpb for format compat, additionally check that the exact component bit layout matches since many games end up reusing RTs for unrelated textures. The texture size requirements have also been weaked to only check the resulting layer size as opposed to width/height - this is somewhat hacky but it gets around the problem of blocklinear alignment.
2022-11-02 17:46:07 +00:00
14af383238 Only allow submitting swapchainImageCount images for host present at a time
Prevents situations where nothing would otherwise be waiting on the GPU and since presentation no longer blocks too many images would be submitted for presentation.
2022-11-02 17:46:07 +00:00
bcd96ac77d Fixup A8R8G8B8 TIC format mapping
8-bit formats are inverted in TICs compared to Vulkan
2022-11-02 17:46:07 +00:00
90466b8830 Implement depth clamp rasterisation state
Used in SMO for shadows.
2022-11-02 17:46:07 +00:00
1cfc4278f9 Disable preserve buffer/texture attachment opt for now
Causes several issues and crashes in Pokemon without an obvious cause.
2022-11-02 17:46:07 +00:00
e483cf9634 Use shader memory mirror when reading guest shaders
Avoids triggering any traps that may be present on the region
2022-11-02 17:46:07 +00:00
f6e4328b5a Ensure blit src/dst textures are attached as execution cycle dependencies
Since they're not in the TIC pool they would otherwise be freed
2022-11-02 17:46:07 +00:00
77a131df60 Support using in-app renderdoc API to capture individual executions 2022-11-02 17:46:07 +00:00
576bc6f37e Add CommandExecutor slot count setting 2022-11-02 17:46:07 +00:00
1a0819fb76 Use semaphores for presentation engine frame synchronisation
Avoids waits on the CPU which can be costly and confuse the scheduler, also reduces latency significantly.
2022-11-02 17:46:07 +00:00
0670e0e0dc Support using Vulkan semaphores with fence cycles
In some cases like presentation, it may be possible to avoid waiting on the CPU by using a semaphore to indicate GPU completion. Due to the binary nature of Vulkan semaphores this requires a fair bit of code as we need to ensure semaphores are always unsignalled before they are waited on and signalled again. This is achieved with a special kind of chained cycle that can be added even after guest GPFIFO processing for a given cycle, the main cycle's semaphore can be waited and then the cycle for the wait attached to the main cycle and it will be waited on before signalling.
2022-11-02 17:46:07 +00:00
ad3195e06f Split out guest texture layer size calcs into a seperate func 2022-11-02 17:46:07 +00:00
8fa83fdf13 Fix deswizzling non-pow2 block size formats
We need to use DivideCeil to avoid rounding off part of the texture.
Fixes texture in Nier Automata: Game of the YoRHa edition.
2022-11-02 17:46:07 +00:00
27de42f8df Use surfaceClip as a hint for the underlying rendertarget size
TIC sizes may not be aligned to block linear dimensions whereas RT sizes are and then limited by the surface clip. By using this to determine surface size we are more likely to get a match in texture manager for any future usages.
2022-11-02 17:46:07 +00:00
297597f697 Fix texture manager depth compat comparison 2022-11-02 17:46:07 +00:00
500f817a28 Synchronize all non-matching textures back to host before recreation 2022-11-02 17:46:07 +00:00
05581f2230 Remove now redundant buffer/texture/megabuffer manager locks
They have been superseeded by the global channel lock
2022-11-02 17:46:07 +00:00
b72720e8db Finish off transform feedback implementation 2022-11-02 17:46:07 +00:00
36fd885b49 Pack all draw state into a struct to avoid std::function allocations 2022-11-02 17:46:07 +00:00
b5d0060c3f Only use scissor for clear rect when enabled 2022-11-02 17:46:07 +00:00
f93df35e6c Only set line width when wideLines feature is supported 2022-11-02 17:46:07 +00:00
4cebdfc8d3 Pass texture and cbuf state into pipeline manager for hades callbacks 2022-11-02 17:46:07 +00:00
9ce848d4e0 Implement descriptor update batching and push descriptors
Batching helps to avoid the need to attach so many objects to the fence cycle, which ends up taking a fair bit of time due to the allocation required.
2022-11-02 17:46:07 +00:00
62a165b51e Reformat maxwell3d interconnect codebase 2022-11-02 17:46:07 +00:00