If you read our AMD Trinity Preview, you’ve already seen the integrated GPU performance in the new A-Series APU from AMD. Today, we take a look at the CPU side of the APU. In addition, we are able to see just how the new chip will perform at various applications and workloads, including media transcoding, data encryption, and  OpenCL.Â
Introduction
Let’s face it, Bulldozer was a bit of a disappointment. While the radical modular design of a single floating point cluster paired with two integer clusters has a lot of potential, it ultimately failed to move AMD forward, as the processor delivered worse performance than AMD’s own several-year old Phenom II X6 processor. When we reviewed Bulldzoer, we felt that it had potential and with a a bit of fine-tuning, it would possible for AMD to deliver a more competent processor to the market.
Fast-forward a year, and we are here today with the official launch of the desktop Trinity: the successor to the Llano, and AMD’s new A-series APU for mainstream and budget systems. Trinity, released back in May 2012, has finally brought us the processor we hoped AMD would put forth when it launched Bulldozer. Trinity is built based on AMD Piledriver, which stems from Bulldozer; and the Northern Islands GPU family found on the last generation HD 6000 graphic cards. Built on the same 32nm SOI die, Trinity features two to four Piledriver x86 core with up to 384 VLIW Radeon cores. This results the Trinity to have 1.303 billion of transistor (up from 1.178 billions on Llano) and 246mm2 die area (vs 228 on Llano).Â
Desktop Trinity APU: Piledriver + HD 7000
While “Piledriver” is still based on the same architecture and module design of Bulldozer, it has improved significantly over Bulldozer. The picture above shows what improvement and enhancements AMD has done on Piledriver. AMD has improved on the branch prediction, scheduling, and the hardware pre-fetcher. Together, these will help Piledriver to improve on its instruction per clock. By keeping it more streamline, Piledriver offers approximately 10 to 15% improvement over Bulldozer.
With Piledriver, AMD also adds two new ISA instruction sets: FMA3 atnd F16C. Bulldozer already supported FMA4 and with the addition to the FMA3, Piledriver is first CPU with such support as Intel won’t add FMA3 until Haswell.Â
The Trinity Turbo Core has been upgraded over Llano. Unlike Llano where only the CPU could turbo up to a higher speed, the Trinity APU will feature Turbo Core for both CPU and GPU. Both sides will be able to turbo up when there is thermal headroom available. Now if CPU is under heavy load while GPU is not, the CPU is able to run at higher speed. The converse is also true: when the CPU is idle but the GPU is under heavy load, it is also able to run at higher clockspeed. With Trinity there are two modes of Turbo clockspeed, full turbo and half turbo. Take the A10-5800K, for example. Its base clock is 3.8GHz and it is able to turbo to 4.0 GHz (half turbo) or 4.2 GHz (full turbo). The turbo speed seems to be either a 150MHz or 200MHz stepping depending on the model.
On the GPU front, it can turbo up to 800MHz. Unlike Intel’s processor where the turbo speed depends on the number of active cores, both modules on the APU will turbo up to the same clockspeed. We asked AMD whether there was a preference for GPU vs CPU turbo if a particular workload was both CPU and GPU intensive. We were told that even under these scenarios, both CPU and GPU would still turbo up. So it appears that the turbo is based on the workload and the thermal envelope.Â
While Turbo Core (and Intel’s Turbo) makes it hard to quantify absolute performance due to the dynamic clockspeed, it is nonetheless good for consumers to gain extra performance.Â
Piledriver’s memory controller has been updated so it now supports DDR3-1866 speeds (up from 1600 on Llano). The controller supports DDR3 up to 64GB for the desktop.
Trinity GPU
Trinity GPU borrows the AMD Northern Islands family’s VLIW4 design. The GPU features 6 SIMD engines with a 16 VLIW4 array for up to 384 Radeon cores. There are 24 texture units and 8 ROPs. The A10 models will come with all 384 cores enabled while the A8 will get 256 and the A6 will get 192 cores. The GPU will be clocked at 800MHz for the A10 an 760 MHz for the A8.Â
As expected, the features on the GPU is exactly the same as the Radeon HD 6000. We get DirectX 11, OpenGL 4.1, OpenCL 11, Eyefinity support. In fact, this is the first processor that is supporting 3+1 displays when using DisplayPort 1.2 port. The AMD Universal Video Decoder (UVD3) Â is here as well for helping video decode. What is new here is the addition of the hardware encode component it has borrowed from the Graphics Core Next’s Video Codec Engine (VCE). This should help out with transcoding multimedia where hopefully it is capable of delivering comparable performance like QuickSync on the Intel’s Sandy or Ivy Bridge