Meet KAVERI: STEAMROLLER CPU AND GCN GRAPHICS
As a company, AMD is always trying to get the industry to think differently. Way back in the Athlon era, the company introduced the concept of performance numbering system as oppose to the clock speed. That worked out well as it gets people to understand that clockspeed alone does not dictates the overall performance. When the Fusion platform was launched in 2011, AMD started to call their processors with the integrated graphics APU. Now, AMD is once again trying to redefine what we think as a processor as it is now introducing the concept of “compute cores”. The compute cores combines the number of CPU and the GPU cores together. In the case of the A10-7850K, it comes with 12 compute cores that breaks down to two CPU and eight GPU cores.
This maybe a marketing gimmick to get people to think that its APU has more cores and thus offers better performance. Obviously, educated buyers will know that it is not just the number of cores but other factors such as architecture and clock speed together will dictate the performance. We understand AMD’s reason for doing this as it is trying to push the concept of the total compute power of APUs with HSA and heterogenous computing. Unfortunately, this can be little bit misleading as you can have two APUs with same numbers of compute cores that is made up different numbers of CPU and GPU cores. Furthermore, the compute cores gives us no information regarding to the clock speed of the CPU and the GPU, so even if two APUs shared the same compute cores, it may not give us similar performance. So, counting just the compute cores is not as clear indicator to a chip’s raw performance.
At least, AMD is being very open about the number of CPU and GPU core on the box labels where you will see the number of the CPU and GPU cores in parenthesis next to the total cores. Additionally, at its current state, AMD has a much simpler product line up in their APU where we only have two models in the A10-7000 series where the difference is the CPU clockspeed and the number of the GPU cores as the GPU is clocked at the same 720MHz on both models. So it is not too difficult to distinguish between the two. However, going forwards, if the number of the models increased, then it can be a very challenging to distinguish one unit from another.
Currently, it is still hard to quantify a CPU/APU without talking about pure CPU and pure GPU performance as the HSA compatible software selection is very limited. HSA is still in its infancy; once and if the technology has gained a wider market, we may have to finally rethink how to quantify a processor’s total compute power that takes into account of all of the processor cores inside the chip and the clock speed.
Steamroller CPU Core: 28nm
While the Richland is build on the 32 nm fabrication process, Kaveri is build on Global Foundaries 28 nm SHP technology. The die reduction allows AMD to pack even more transistors on Kaveri. Kaveri’s die size is comparable to Richland where it has areal density of 245 mm² compare to Richland’s 246 mm². However, Kaveri is packed with 2.41 billion transistors; almost double the number compare to 1.3 billion on the Richland. The increase in the transistor count is largely due to the GPU as 47% of the die on the Kaveri is devoted to GPU. GPU design focuses on the transistor density, this is the reason why AMD decided to go with the Global Foundry 28nm SHP. The trade off here is that CPU frequency at higher TDP is lowered but AMD make it up with higher instruction per clock which the company estimates up to 20% improvement over the previous generation.
For AMD to compete against Intel and it needs to address its weakness. Thus, the focus on Kaveri is efficiency and power consumption. As a result, the maximum clockckspeed on the Kaveri has been reduced. Our A10-7850K has a base clock of 3.7GHz and turbo speed of 4.0 GHz compare to the A10-6800K’s 4.1GHz and 4.4GHz respectively. However, refinement on the CPU architecture allows the chip to perform just as fast as its predecessor at the same clock speed or even faster yet at the same time consumes much less power.
Kaveri CPU core is refined Piledriver that is still based on the Bullozer’s module design with two integer units paired with a float point unit. The third generation chip is codenamed “Steamroller”. The chip has up to 2MB of L2 cache per module or up to 4MB total as we will only find Kaveri comes in either single module (dual-core) or dual modules (quad core) variant. There is no new ISA on Kaveri as it still supports FMA4/3, AVX, AES, and XOP.
With Steamroller, the fetch on the APU has been improved where AMD reduce the i-cache misses by 30% and reduced the mispredicted branches by 20% and increase in the scheduling efficiency by up to 10%. The max-width dispatches per thread has been increased by 25% and improved in store handling. Together AMD estimates these improvements helped to gain the 20% IPC over the Richland.
At the same time, AMD also lowered the power consumption usage on the CPU. Desktop Kaveri will come in 45W, 65W, and 95W, a 5 watts reduction from the 100W TDP on the fastest Richland/Trinity. KAVERI also brings configurable TDP to the desktop. So, it is possible now to purchase a faster APU say A10-7850K and run it at 45W or 65W when it is constrained by the power or cooling. Then, once the constraints are lifted, you can run the APU at its full potential.
Kaveri’s memory controller also gets a speed boost where it now supports DDR3-2133. Only a selected model (namely A10-6800K) support DDR3-2133 while most off the 6000 series APUs only support DDR3-1866.
While the CPU seems a minor improvement, the GPU on the Kaveri gets a major overhaul. Kaveri now uses the same design as “Hawaii”, the same GPU found on the AMD current generation R9 290X dedicated graphics. Hence it supports 8xAA and 16xAF, DirectX 11.2, AMD Eyefinity, 4K Ultra HD, and DisplayPort 1.2.
Kaveri’s GPU features up to 8 graphic compute unit where each unit features 64 unified shader, 4 texture mapping units, and one render output units. The GPU features IEEE2008 compliance, texture fetch units, registers and precision improvements. The GPU is clocked at 720MHz on all three models (A10 and A8) that AMD has launched. We are not sure if it will be the same for future models.
As with previous APUs, Kaveri also supports AMD dual graphics. The Kaveri will be able to pair with the R7 based Radeon GPUs with DDR3 memory for additional performance.
Having the same microarchitecture as the desktop GPU, Kaveri inherited all of the features that is found on the Radeon graphics. Kaveri features TrueAudio, Mantle, dedicated video coding engine (VCE) 2.0, unified video decoder (UVD) 4, The VCE 2 adds support for the H.264 YUV420 B frames and H.264 YUV444 I frames for 60Hz wireless display. The UVD 4 adds improved error resiliency for H.264/AVCHD.
Mantle is a low level program API for game developers to improve gaming performance by reducing the overheads. Single thread performance on AMD CPU has been lagging behind Intel for quite awhile now so this could be one area where AMD get a lot of benefit if more workloads are pushing toward its GPU.
As you may recall, TrueAudio was on all of the AMD GCN 1.1 or higher GPUs (R9 and R7 GPUs). The idea here is to offload the audio processing to the special DSP on the chip as oppose to the CPU. Thus, CPU resources are not tied to the audio processing in game and can be used for other workloads or power can be saved with lower CPU usage. With TrueAudio to take off the workload for the audio processing, it is possible if games are able to deliver a more true to life audio with greater affect if games are written to use its dedicated DSP. And this is ultimately going to be the major task for AMD not just on the APUs but also on their Radeon graphic cards with TrueAudio DSP– get the software developers on board to use it.
Currently, there are a less than a handful of games—Thief and Battlefield 4– that can take advantage of the TrueAudio and Mantle. Given to the fact these technologies are not found just on the APUs but also Radeon GPUs, we think that it would have a much wider support by game developers. But only time will tell.