TU-106 GPU Block Diagram
This is the TU106 GPU which powers the RTX 2070 and there has not been an updated image supplied so we have to extrapolate a bit from what we know. The TU106 which is in the 2070Â employs 36 SM’s where the 2060 is down to 30. Since the 2060 uses the same GPU I would assume that each GPC would lose two SM’s. Some media have guessed the furthest GPC would lose 6 SM’s but I would assume they would spread the SM’s out so that each has 10 SM’s.
New Streaming Multiprocessor (SM)
Here you can see the die layout of the new SM for the TU-102, TU-104, and TU-106. I am not sure if this will carry to any other future spins or GPU launches but as of this launch, this is the layout showing the full build of the SM.
Each SM is made up of
- 64 CUDA Cores
- 8 Tensor Cores
- 256KB register file
- 4 texture units
- 96KB of L1/shared memory
Deep Learning Super Sampling
Utilizing Nvidia NGX which is Nvidia deep neural network to build from and create the capability to accelerate graphics rendering by utilizing the Turing Tensor Cores for deep learning based operation. They accelerate Nvidias stored neural network information including stored supersampling data to better offer similar effects as high AA which would normally be very heavy on GPU loading and reducing framerates. Instead, the Tensor cores take a lower quality downsampled image and use the DLSS/Turing Tensor cores to build a super high AA image with a much lower graphical overhead on the shader.
Here is a list from Nvidia of the upcoming games which will support DLSS or be updated to support it.
- Ark: Survival Evolved from Studio Wildcard
- Atomic Heart from Mundfish
- Dauntless from Phoenix Labs
- Final Fantasy XV from Square Enix
- Fractured Lands from Unbroken Studios
- Hitman 2 from IO Interactive/Warner Bros.
- Islands of Nyne: Battle Royale from Define Human Studios
- Justice (Ni Shui Han) from NetEase
- JX3 from Kingsoft
- Mechwarrior 5: Mercenaries from Piranha Games
- PlayerUnknown’s Battlegrounds from PUBG Corp.
- Remnant: From the Ashes from Gunfire Games/Perfect World Entertainment
- Serious Sam 4: Planet Badass from Croteam/Devolver Digital
- Shadow of the Tomb Raider from Square Enix/Eidos-Montréal/Crystal Dynamics/Nixxes
- The Forge Arena from Freezing Raccoon Studios
- We Happy Few from Compulsion Games / Gearbox
- Darksiders 3 by Gunfire Games/THQ Nordic
- Deliver Us The Moon: Fortuna by KeokeN Interactive
- Fear the Wolves by Vostok Games / Focus Home Interactive
- Hellblade: Senua’s Sacrifice by Ninja Theory
- KINETIK by Hero Machine Studios
- Outpost Zero by Symmetric Games / tinyBuild Games
- Overkill’s The Walking Dead by Overkill Software / Starbreeze Studios
- SCUM by Gamepires / Devolver Digital
- Stormdivers by Housemarque
Ray Tracing (RTX)
With Turing as you saw above in the Die map, it is skirted by RT cores which are deployed to enable a world first real-time ray tracing, something that was hinted to still be 10 years away just a short time ago. The cores are only part of the package as it requires Nvidia’s RTX technology along with support for the new DirectX (DXR) Nvidia OptiX and Vulcan ray tracing to ensure that no matter the game engine there is likely to be a ray tracing opportunity to give a more immersive gaming environment.
Here is a list of the confirmed upcoming RTX games which will be RTX Enabled
- Assetto Corsa Competizione from Kunos Simulazioni/505 Games
- Atomic Heart from Mundfish
- Battlefield V from EA/DICE
- Control from Remedy Entertainment/505 Games
- Enlisted from Gaijin Entertainment/Darkflow Software
- MechWarrior 5: Mercenaries from Piranha Games
- Metro Exodus from 4A Games
- Shadow of the Tomb Raider from Square Enix/Eidos-Montréal/Crystal Dynamics/Nixxes
- Justice (Ni Shui Han) from NetEase
- JX3 from Kingsoft
- Project DH by Nexon
Hybrid Rendering (RTX-OPS)
With Turing, we have now seen the introduction of not just your normal SM but RT Cores and Tensor cores for AI. This enables a new Hybrid Rendering method where as mentioned previously, Ray tracing or RT cores are used for lighting workloads and Turing cores are used for AI calculations to accelerate rendering along with other features I’m sure to come along with traditional rendering methods for rasterization.
Obviously, not all of these will be at use all the time, and with that Nvidia ran some very deep mathematical calculations to show how RTX ops are calculated. Since I am quite sure most of you would not care about that I’m not gonna dig too deep into it but I will add it below for your reference, or for those like me who geek out on that kind of stuff.
The above is a visual representation of the hybrid rendering (RTX-OPS) calculation you will find below.
To compute RTX-OPs, the peak operations of each type based is derated on how often it is used. In particular:
– Tensor operations are used 20% of the time
– CUDA cores are used 80% of the time
– RT cores are used 40% of the time (half of 80%)
– INT32 pipes are used 28% of the time (35% of 80%)
For example, RTX-OPS = TENSOR * 20% + FP32 * 80% + RTOPS * 40% + INT32 * 28%
The above is an illustration of the peak operations of each type for GTX 2080 Ti. Plugging in those peak operation counts results in a total RTX-OPs number of 78. For example, 14 * 80% + 14 * 28% + 100 * 40% + 114 * 20%.
GPU Boost 4
GPU Boost 3
This is how your typical previous gen GPU Boost 3 implementation would work. As you can see the adjustment takes you straight across based on a power target/limit which you would set within the 3rd party app (Precision/Afterburner/etc) of your choice. however, the control was quite limited in terms of granularity. This is because the GPU boost 3 implementations while a good solution was mostly hidden in the driver away from users ability to really adjust it with the exception of the target sliders.
GPU Boost 4
GPU Boost 4 is a completely different animal as it allows you to set steps so that instead of dropping straight down to base clock when thermal limits are hit, it instead allows a lower boost clock plateau to be reached giving the card a chance to cool itself off at a higher boost speed rather than dropping drastically down to base clock. This, in turn, means more consistent control of your performance, thermals and acoustic characteristics of your GeForce card.