Fully-enabled AD102 GPU with 12 GPCs, 18,432 CUDA cores, and 98,304KB of L2 cache
Most of the details for AD102 GPU have been known so far, and, just for reference, the fully-enabled AD102 GPU features 12 Graphics Processing Clusters (GPCs), 72 Texture Processing Clusters (TPCs), 144 Streaming Multiprocessors (SMs), and a 384-bit memory interface with 12 32-bit memory controllers.
Each GPC features a dedicated Raster Engine, two Raster Operations (ROPs) partitions, each partition with eight individual ROP units, and six TPCs, each with one PolyMorph Engine and two SMs.
Each SM packs 128 CUDA cores, one RT core, four Tensor cores, four Texture Units, a 256KB Register File, and 128 KB of L1/Shared Memory.
The fully-enabled AD102 GPU so packs 176 ROPs, 576 Texture Units, as well as 18,432 CUDA cores, 144 RT cores, and 576 Tensor cores.
When it comes to memory, you are looking at a total of 18432KB of L1 cache (compared to 10,752KB in GA102), as well as a rather impressive 98,304KB of L2 cache (a 16x more compared to 6,144KB on the GA012). Nvidia was also keen to note that it worked closely with Micron to bring 22.4 Gbps GDDR6X memory.
The cut-down AD102 on the RTX 4090
Of course, the Geforce RTX 4090 features a cut-down AD102 GPU, suggesting a possible RTX 4090 Ti to be released in the future.
The GA102 GPU on the RTX 4090 packs 11 GPCs, 64 TPCs, and 128 SMs, adding up to 16,384 CUDA cores, 128 3rd Gen RT cores, and 4th Gen 512 Tensor cores. It features 176 ROPs, 512 Texture Units, and comes with 24GB of 21Gbps GDDR6 memory on a 384-bit memory interface.
As noted earlier, it is manufactured on TSMC's 4 nm Nvidia custom process and has a die size of 608.5 mm2 with 76.3 billion transistors. It packs 16,384KB of L1 and 73,728KB of L2 cache. The TGP is set at 450W.
The Geforce RTX 4090 will work at a 2,520MHz GPU boost clock, which is pretty impressive, and, of course, supports Nvidia's latest DLSS 3 and features two NVENC (8th Gen) video engines.
The AD103 and AD104 GPUs
The Geforce RTX 4080 16GB will end up with the AD103 GPU, a 4nm 378.6mm2 die size chip with 45.9 billion transistors. It packs 7 GPCs, 48 TPCs, and 76 SMs. This adds up to 9,728 CUDA cores, 76 RT cores, and 304 Tensor cores. It has 112 ROPs, 304 Texture Units, and packs 16GB of 22.4Gbps GDDR6X memory on a 256-bit memory interface, adding up to 716.8GB/s of memory bandwidth. It comes with 9,728KB of L1 and 65,536KB of L2 cache.
The Geforce RTX 4080 16GB will work at 2505MHz GPU Boost clock, and has a TGP (Total Graphics Power) of 320W.
The Geforce RTX 4080 12GB, which was pretty much a big mystery, is based on AD104 GPU, a 4nm chip with 294.5mm2 die size and 35.8 billion transistors.
The AD104 features 5 GPCs, 30 TPCs, and 60 SMs, for a total of 7,680 CUDA cores, 60 RT cores, and 240 Tensor cores. There are 80 ROPs, 240 Texture Units, and comes with 12GB of 21Gbps GDDR6X memory on a 192-bit memory interface, leaving it with a maximum memory bandwidth of 504GB/s.
The RTX 4080 12GB works at a 2610MHz GPU Boost clock and has a TGP of 285W. The L1 and L2 cache got cut down to 7,680KB and 49,152KB, respectively.
Both the AD103 and AD104 pack the same two NVENC 8th gen video engines.
No Founders Edition for the RTX 4080 12GB
As noted earlier, Nvidia will have Founders Edition cards for the RTX 4090 and the RTX 4080 16GB, while the RTX 4080 12GB will only come in custom versions from Nvidia AIC partners.
The flagship Geforce RTX 4090 will be available as of October 12th, starting at $1,599. The RTX 4080 versions will be available in November, with the price set at $1,199 for the 16GB and $899 for the 12GB version.