Commit Graph

220 Commits

Author SHA1 Message Date
d544ccf5ea Stub INotificationServicesForApplication 2023-01-20 21:08:12 +00:00
c53d99d393 Stub IDeliveryCacheFileService and IDeliveryCacheDirectoryService 2023-01-20 21:08:12 +00:00
8846a85d3a Stub some IPurchaseEventManager functions 2022-12-31 10:45:18 +00:00
e9bcdd06eb Introduce a pipeline cache manager for simple read/write cache accesses
All writes are done async into a staging file, which is then merged into the main pipeline cache file at the time of the next launch. Upon encountering file corruption the cache can be trimmed up to the last-known-good entry to avoid any excessive loss of data from just one error.
2022-12-22 18:05:45 +00:00
06bf1b38af Introduce a pipeline state accessor that reads from a bundle 2022-12-22 18:05:45 +00:00
f32ab1feff Include BS thread pool library 2022-12-22 18:05:45 +00:00
e849264028 Abstract out pipeline-compile-time GPU state accesses
Introduces the base abstractions that will be used for pipeline caching, with a 'PipelineStateBundle' that can be (de)serialised to/from disk and an abstract accessor class to allow switching between creating disk-cached pipelines and fresh ones.
2022-12-22 18:05:45 +00:00
68253fe995 Stub mii:e/mii:u
Needed for SSBU
2022-12-10 14:58:20 +00:00
e8ef2d80af CMake build file updates 2022-12-03 22:50:56 +00:00
720cfaafb6 Stub caps:su 2022-11-18 15:35:03 +00:00
74afca4aab Stub caps:u 2022-11-18 15:35:03 +00:00
27ff1ae19b Stub caps:c 2022-11-18 15:35:03 +00:00
ffb0546609 Stub caps:a 2022-11-18 15:35:03 +00:00
9afa8b881e Stub nsd:u/nsd:a and sfdnsres services 2022-11-15 16:24:33 +00:00
e571066409 Stub ldr:ro IRoInterface
Some games initialise this service on startup however don't actually use it. Add a simple stub to allow such games to boot.
2022-11-15 16:23:40 +00:00
1fc2641746 Stub the web applet 2022-11-13 11:37:18 +00:00
f4a8328cef Implement Symbol Hooking
Symbol hooking is required for HLE implementations of certain features in the future such as `nvdec` and for more in-depth debugging of games as we can inspect them on a SDK function level which allows us to debug issues far more easily.
2022-11-07 23:56:22 +05:30
b6e2fb894c service: bcat: Stub CreateDeliveryCacheStorageService 2022-11-06 20:39:41 +00:00
cac287d9fd Implement accelerated uploads/copies through buffer manager
Previously, both I2M uploads and DMA copies would force GPU serialisation if they happened to hit a trap or were used to copy GPU dirty buffers. By using the buffer manager to implement them on the host GPU we can avoid such slowdowns entiely.
2022-11-02 17:46:07 +00:00
77a131df60 Support using in-app renderdoc API to capture individual executions 2022-11-02 17:46:07 +00:00
5b72be88c3 Stub ldn:u service 2022-11-02 17:46:07 +00:00
ef0ae30667 Implement Maxwell3D texture pool management and view creation
Ontop of the TIC cache from previous code a simple index based lookup has been added which vastly speeds things up by avoding the need to hash the TIC structure every time.
2022-11-02 17:46:07 +00:00
f1600f5ad0 Support allocating into spans in the linear allocator 2022-11-02 17:46:07 +00:00
3404a3abdb Implement macro HLE for instanced draw macros
gm20b performs instanced draws by repeating draw methods for each instance, the code to detect this together with the cost of interpreting macros took up around 6% of GPFIFO time in Metro Kingdom. By detecting these specific macros and performing an instanced draw directly much of that cost can be avoided.
2022-11-02 17:46:07 +00:00
9b05c9c0c3 Introduce a pipeline manager and partial pipeline object
gpu-new will use a monolithic pipeline object for each pipeline to store state, keyed by the PackedPipelineState contents. This allows for a greater level of per-pipeline optimisations and a reduction in the overall number of lookups in a draw compared to the previous system.
2022-11-02 17:46:07 +00:00
a6bb716123 Move packed pipeline state to a seperate file 2022-11-02 17:46:07 +00:00
21f5611231 Rewrite constant buffer interconnect code
Adepted from the previous code to use dirty state tracking. The cache has also been removed since with the new buffer view and GMMU optimisations it actually ended up slowing lookups down, another result of the buffer view optimisations is that raw pointers are no longer used for buffer views since destruction is now much cheaper.
2022-11-02 17:46:07 +00:00
8471ab754d Introduce a spin lock for resources locked at a very high frequency
Constant buffer updates result in a barrage of std::mutex calls that take a lot of time even under no contention (around 5%). Using a custom spinlock in cases like these allows inlining locking code reducing the cost of locks under no contention to almost 0.
2022-11-02 17:46:07 +00:00
6b76c61cd1 Introduce a releasedebug build variant 2022-10-17 18:39:32 +02:00
70ad4498a2 Write HID LIFO entries at fixed intervals
Certain titles depend on HID LIFO entries being written out at a fixed frequency rather than on actual state change, not doing this can lead to applications freezing till the LIFO is filled up to maximum size, this behavior is seen in Super Mario Odyssey. In other cases such as Metroid Dread, the game can run into race conditions that would lead to crashes, these were worked around by smashing a button during loading prior.

This commit introduces a thread which sleeps and wakes up occasionally to write LIFO entries into HID shared memory at the desired frequencies. This alleviates any issues as it fills up the LIFO instantly and correctly emulates HID Shared Memory behavior expected by the guest.

Co-authored-by: Narr the Reg <juangerman-13@hotmail.com>
2022-08-31 22:49:36 +05:30
395f665a13 Implement a system for helper shaders together with a simple blit shader
It is desirable for us to use a shader for blits to allow easily emulating out of bounds blits and blits between different swizzled colour formats. The helper shader infrastructure is designed to be generic so it can be reused by any other helper shaders that we may  need in the future.
2022-08-08 17:40:35 +01:00
1da1698f90 Disable unused Vulkan HPP setters and smart handles 2022-08-08 14:57:44 +01:00
5b7572a8b3 Introduce chunked MegaBuffer allocation
After the introduction of workahead a system to hold a single large megabuffer per submission was implemented, this worked fine for most cases however when many submissions were flight at the same time memory usage would increase dramatically due to the amount of megabuffers needed. Since only one megabuffer was allowed per execution, it forced the buffer to be fairly large in order to accomodate the upper-bound, even further increasing memory usage.

This commit implements a system to fix the memory usage issue described above by allowing multiple megabuffers to be allocated per execution, as well as reuse across executions. Allocations now go through a global allocator object which chooses which chunk to allocate into on a per-allocation scale, if all are in use by the GPU another chunk will be allocated, that can then be reused for future allocations too. This reduces Hollow Knight megabuffer memory usage by a factor 4 and SMO by even more.
2022-08-07 03:12:27 +05:30
1815199d2b Add utilities for reading and installing gpu driver packages 2022-08-06 22:00:19 +05:30
1df98ba57f Enable fwrapv for defined signed integer overflow behaviour
Nintendo enables this for HOS so we should do the same to avoid any cases where it's relied on.
2022-07-29 20:07:14 +01:00
2840a126dd Introduce AndroidSettings class and use inheritance
The `Settings` class now has a pure virtual `Update` method, and uses inheritance over template specialization for platform-specific behavior override.
2022-07-26 20:16:24 +05:30
2d70be60d1 Remove PugiXML submodule
`PugiXML` was only used for parsing the SharedPreferences settings file, not needed anymore.
2022-07-26 20:16:24 +05:30
942e22f275 Write ApplicationErrorArg ErrorApplets to log
These applets are used by applications to display a custom error message to the user. Both the error message and the detailed error message are printed to the error log.


Co-authored-by: lynxnb <niccolo.betto@gmail.com>
2022-07-02 09:48:59 +05:30
f9a0394577 Implement Software Keyboard applet
This implements the non-inline version of the Software Keyboard (swkbd) applet, which games use to get text input from the user.
2022-07-01 15:19:53 -05:00
5d6902b3f8 Stub audin:u 2022-06-04 19:11:57 +01:00
22695c4feb Stub nim services used for eShop communication
We obviously don't need to implement these so add a simple set of stubs to satify games using them (mainly demos such as DQXII)
2022-05-31 22:07:01 +01:00
80c8fb8791 Implement CPU BCn Texture Decoding
Certain GPU vendors such as ARM's Mali do not have support for BCn textures whatsoever while other vendors such as AMD only have partial support (BC1-BC3). Most titles on the guest utilize BC textures and to address this on host GPUs without support for BCn, we need to decompress the texture on the CPU. This commit implements a CPU BCn texture decoder based off Swiftshader's BC decoder, it also adds the necessary infrastructure to have different formats for the `GuestTexture` and `Texture` objects.
2022-05-28 21:22:24 +05:30
de300bfdbe Refactor Texture Swizzling
The API for texture swizzling is now more concrete and abstracted out from `GuestTexture`, this allows for neater usage in certain areas such as MaxwellDMA while having a `GuestTexture` wrapper as well allowing for neater usage in those cases. 

The code itself has also been cleaned up slightly with all usage of `u32`s being upgraded to `size_t` as this is simply more efficient due to the compiler not needing to emulate wraparound behavior for integer types smaller than the processor word size.
2022-05-19 17:13:55 +05:30
0a3cf25823 Implement the Fermi 2D blitting engine
The Fermi 2D engine implements both image blit and resolve operations, supporting subpixel sampling with both linear and point filtering.

Resolve operations are performed by sampling from the center of each pixel in order to resolve the final image from the MSAA samples
MSAA images are stored in memory like regular images but each pixels dimensions are scaled: e.g for 2x2 MSAA
```
112233
112233
445566
445566
```
These would be sampled with both duDx and duDy as 2 (integer part), resolving to the following:
```
123
456
```
Blit operations are performed by sampling from the corner of each pixel, scaling the image as one would expect.

This implementation isn't fully complete as Vulkan blit doesn't support some combinations which Fermi does, most notably between colour and depth stencil. These will be implemented properly at a later date, likely after the texture manager rework.
Out of Bounds Blit, used by some OpenGL games is also missing since supporting it requires texture aliasing, this will also be supported after the texture manager rework.

Co-authored-by: Billy Laws <blaws05@gmail.com>
2022-05-13 22:37:37 +01:00
7d30ac0cd8 Add additional nifm stubs 2022-05-11 23:24:35 +01:00
a164635f32 Stub LibraryAppletPlayerSelect 2022-05-11 23:24:35 +01:00
f078a5d1ec Stub bt and btm:u
Stub BT services which is required by titles such as Pokémon Let's GO Pikachu and Eevee (non-Demo versions).
2022-05-11 20:44:09 +05:30
1c8d994161 Basic bcat:u implementation
A basic `bcat:u` implementation to prevent titles such as "Kirby and the Forgotten Land" dependent on BCAT support from crashing due to the lack of an implementation.
2022-05-06 15:41:48 +05:30
42573170c6 Implement Framebuffer Cache
Implements a cache for storing `VkFramebuffer` objects with a special path on devices with `VK_KHR_imageless_framebuffer` to allow for more cache hits due to an abstract image rather than a specific one. 

Caching framebuffers is a fairly crucial optimization due to the cost of creating framebuffers on TBDRs since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding framebuffer memory.
2022-05-01 18:27:27 +05:30
da931cf07b Implement Render Pass Cache
Implements a cache for storing `VkRenderPass` objects which are often reused, they are not extremely expensive to create generally but this is a required step to build up to a framebuffer cache which is an extremely expensive object to create on TBDRs generally since it involves calculating tiling memory allocations and in the case of Adreno's proprietary driver involves several kernel calls for mapping and allocating the corresponding memory.
2022-05-01 18:16:53 +05:30