AMD is today launching a direct assault at NVIDIA with its newest HD 4800 series of GPU. Offering lots of performance at a low price AMD hopes to hit NVIDIA where it hurts; in the performance segment. But what do you actually get with the HD 4850 and the HD 4870? Read on to learn more about these new GPU’s.
AMD has been through some rough times the last year. Not only have they lagged behind with their processors compared to Intel but they also have not been able to compete at the enthusiast level with their ATI graphics cards. It was therefore quite interesting to attend to their briefing in Malaga, Spain a few weeks ago where they presented their new HD 4000-GPU’s. For the first time in a year or so AMD had a briefing which was filled with confidence and which left at least me with a feeling that AMD was very optimistic with what they were presenting for us journalist.
The product they were presenting was of course the new HD 4000 series of GPU’s. In focus was the HD 4850 and the HD 4870 but we also got some information for the R700 which I expect will be called something like HD 4870X2. These are the GPU’s that AMD are hoping will strike a big blow to NVIDIA’s market share.
AMD spent a lot of time telling us about the difference in strategy that we now can expect from them compared to NVIDIA. While NVIDIA has been concentrating on building bigger, fatter and faster chips AMD has decided to take another route and instead aim to build the best possible GPU for the $200-$300 price point and then scale it to the other segments, either by cutting it down for the entry level or by combining more than one GPU for the enthusiast level. The main reason AMD have chosen to take this path is to cut down the time it takes to introduce new technology at most price points. The current GPU’s from AMD and NVIDIA are perfect examples on two ways to go:
NVIDIA has introduced the huge GTX260 and GTX280 at the $400-650 price point. These are big chips at around 576 mm2. At the same time they are using their older G92 GPU to fill in the $200-300 price point as it is not easy to scale these chips down. It will probably be at least 6 months before we see that technology trickle down to the $200-$300 price point.
At the same time AMD is releasing two smaller GPU’s in the $200-300 price range (~260 mm2) and users can immediately combine them in CrossfireX to get the enthusiast performance or wait until the HD4870X2 and HD4850X2 are released in a month or so. This gives AMD the opportunity to introduce new technology in all price segments at once.
Something I found interesting when comparing these strategies is how each company will treat the mobile market. As Laptops gain more and more market share and people are starting to demand better GPU’s in them it will be interesting to see who of AMD and NVIDIA will be able to migrate their newest design to the mobile market first. NVIDIA has done well the last few years with the GeForce7, 8 and 9 but will they be able to easily continue with the GTS2x0 or is this AMD’s chance to bring the HD 4×00 to the mobile market first?
THE RADEON HD 4800 SERIES – ARCHITECTURE AND FEATURES
So, what exactly is the HD 4800? Well, the first thing AMD likes to brag about is that it is the first TeraFLOP GPU. For most of us this really does not mean a lot but it isn’t more than 12 years ago that the ASCI RED, the world’s first TeraFLOP computer, was built. It used nearly 10 000 Pentium Pro’s running at 200 MHz and consumed 500 kW of power. In addition to that it took nearly 500 kWatt of power just to cool the thing. This is the computing power that we now have in our hands.
Let’s take a look at the HD 4800 series and compare it to the previous generation.
The HD 4800 GPU is the second GPU using the 55 nm process. It’s a bit bigger than the HD 3800 and uses approximately 40% more transistors. One impressive number here is the shaders which have gone from 320 up to 800.
AMD of course have worked hard to optimize the layout of the chip as they are roughly keeping the same transistor count/mm2 as with the HD 3800. To get the extra performance they have improved the graphic engine by creating a new SIMD core layout, optimize the texture units, create a new texture cache design, create new memory architecture, optimize the render back-ends for faster anti-aliasing performance and enhanced the geometry shader& tessellator performance.
What we find in the core is 10 new SIMD cores, each with 80 32-bit Stream Processing units, as well as 40 texturing units and a GDDR5 memory interface.
Improvements in the SIMD cores
- New design allows texture fetch capability to scale with shader power, maintaining 4:1 ALU:TEX ratio
Improvements in the Stream Processing Units
- 40% increase in performance per mm2
- More aggressive clock gating for improved Performance per Watt
- Fast double precision processing(240 GigaFLOPS –2x of GTX280)
- Integer bit shift operations for all units (12.5x Improvement)
Improvements in the Texture Units
- Streamlined design – 70% increase in performance/mm2
- Double the texture cache bandwidth of the HD 3000 series
- 2.5x increase in 32-bit filter rate
- 1.25x increase in 64-bit filter rate
- Up to 160 fetches per clock
New cache design
- L2s aligned with memory channels
- L1s store unique data per SIMD2x increase in effective storage per L1,5x increase overall
- Separate vertex cache
- Increased bandwidth. Up to 480 GB/sec of L1 texture fetch bandwidth. Up to 384 GB/sec between L1 & L2
Improvements in Render Back-ends
- Focus on improving AA performance per mm2
- Doubled peak rate for depth/stencil ops to64 per clock
- Doubled AA fill rate for 32-bit & 64-bit color
- Doubled non-AA fill rate for 64-bit color
- Supports both fixed function (MSAA) and programmable (CFAA) modes
The new memory controller
In late 2005 ATI introduce the Ring Bus architecture in the X1000 GPU’s. This architecture allowed them to access the memory much more efficient than previous architectures. It is a testament how good the architecture is that it has been around so long but now it is time for the next evolution of the memory controller. The memory controller in the HD 4000 still has a bit of the ring bus architecture in it but now has a more distributed design where the controllers are distributed around periphery of chip, adjacent to the primary bandwidth consumers. Memory tiling & 256-bit interface allows reduced latency, silicon area, and power consumption. A hub handles the relatively low bandwidth traffic (PCI Express, CrossFireX interconnect, UVD2, display controllers, intercommunication).
All these improvements means that the HD 4800 memory controller allows AMD to use GDDR5 memory and that they get a major increase in bandwidth efficiency (95% for the HD 4800 compared to around 85% for the HD 3800).
THE RADEON HD 4800 SERIES – FEATURES CONTINUED
Saving energy – Dynamic Power Management
We have already seen dynamic power management on the HD 3000 GPU’s. This feature keeps an eye on the load and powers the GPU down when the full power is not needed.
On the HD 3800 this feature was actually responsible for some of the micro-stuttering problems some users have experiences in games, for example Crysis. AMD told us that the problem was that the games were CPU limited and thus the GPU occasionally got “starved” while waiting for the CPU to catch up. During this time the power management system thought the GPU no longer was under heavy load and started to clock down the GPU. As the CPU caught up the GPU had to start clock up again and thus you got micro-stuttering. A driver-hotfix from
AMD solved this by making the power management system less aggressive but in the HD 4800 AMD has improved the system even more.
While the system only looked at a very short timeframe on the HD 3800 and thus could not detect that it actually still was under heavy load, the dynamic power management in the HD 4800 now both looks at the big picture and the short term load when deciding on what to do with the GPU. The bigger picture is used when deciding to clock down the GPU. This means the system will wait a bit longer before clocking down the GPU if it notices that the GPU has been under heavy load just before it got starved. The short term picture is used when deciding on when to clock up the GPU. Thus as soon as the system detects that the GPU is coming under load it clocks up the GPU.
Already in 2006 there was support for ATI’s GPU’s in Folding@Home. AMD of course has continued to work with them and in the latest beta there is support for the newest GPU’s. Right now the full performance of the HD 4800 is not utilized and it performance like a HD 3800 but that should soon change as newer version will be released.
AMD also want to use the computing power of the HD 4800 to other things than Folding@Home. Video transcoding acceleration on the GPU is supported through plug-ins in programs like Adobe Premiere and CyberLink Power Director 7. A 1080p video-clip takes about 10 hours to transcode using an Intel Core 2 Duo E8500 while the HD 4800 GPU can do the same in about 32 minutes. AMD is also currently in talk with the people behind the Havoc physics engine to see where the GPU could be beneficial. This can be seen as a direct response to NVIDIA’s acquisition of AEGIA and them putting the PhysX engine on the GeForce GPU’s.
AMD of course could not let UVD be untouched as they created the new HD 4800 GPU’s.
Acceleration of two video streams
One of the new features is the fact that they now can accelerate two video streams at one time. This is mainly beneficial for BR-movies that utilize Picture-in-Picture.
7.1-sound over HDMI
On the HD 3800 we had to make due with 5.1-sound but now AMD has improved the audi-controller so it can handle 7.1-sound. Right now the 7.1 support is included natively under the Catalyst Windows XP drivers and is presently supported by RealTek under Windows Vista. The WHQL Certified RealTek driver can be downloaded from here. AMD plan on including native Vista support via Catalyst in a future update.
Improved video quality
UVD 2 comes with two improvements that should improve the video quality.
- DVD-upscaling to 1080p has been improved.
- Dynamic Contrast – this can improve the colors and the contrast of the picture. Just like similar features on TV’s it can be turned off.
Yeah I know – DirectX10.1 is “not important” if you listen to NVIDIA. It is actually amazing that they still have no DirectX10.1 GPU. AMD again was very adamant that DirectX10.1 is not something you can “skip over”. If NVIDIA wants to go to DirectX11 (or what it will be called) they will have to eventually support DirectX10.1 also.
The most important “feature” of DirectX10.1 is that you can do the same thing with less render-passes. Assassins Creed for example benchmarked up to 25% better with the DirectX10.1 patch than the DirectX10-version. Electronic Arts also showed their newest game Battleforge and currently the DirectX10.1 version runs about 30% faster than the DirectX10-version.
HD 4850, HD 4870 AND THE NOT YET RELEASED R700
As you read this the following two cards using the RV770 GPU’s should already be available on in the shops.
This card cost about $199 (the lowest I could find at the time of writing was $192 through Pricewatch.com).
This is a single-slot card that only needs one 6-pin PCI-E power connector.
This card should cost around $299.
This will be a dual-slot card and require two PCI-E power connectors.
While not announcing it “officially” yet AMD still told us a lot about the upcoming R700 which should be out after the summer.
- – R700 delivers 2 TeraFLOPS per board for leading performance in the ultra enthusiast segment!
- 2x RV770 on one dual slot board
- GDDR5 memory
- 2nd generation PCI Express bridge design + next generation GPU interconnect for improved scaling
- CrossFireX support
- Form factor similar to ATI Radeon™ HD 3870 X2
AMD teased us with a slide that showed us the performance a R700, which effectively is two HD4870 GPu’s in Crossfire on one board, in 3Dmark Vantage. The card got 12515 in the Extreme setting and the GPU subscore was 12525.
PERFORMANCE – SETTINGS
Enough talk and more action! AMD provided us with two HD 4850 that we pitted against an arsenal of other cards, both from AMD and NVIDIA. Unfortunately we had not access to a HD 4870 at the time of writing but we are working on getting one as soon as possible.
The cards were tested on the following system:
- Core 2 Quad Q6600 @ 2.7 GHz
- 2 GB DDR 2 memory @ 800 MHz
- ASUS Maximum Formula X38 motherboard
- Panasonic Blu-Ray drive
- Xbox 360 HD-DVD drive (USB 2.0)
- 320 GB Seagate SATAII
The cards tested were:
- HIS HD 3870
- HIS HD 3870X2
- Reference HD 4850
- Force3D HD 4850
- ASUS 8800GT
- ASUS 9800GTX
- ASUS 9800GTX (o/c to 800 MHz/2200MHz)
- Gigabyte GTX280
The following drivers were used:
- Catalyst 8.6 for all AMD boards except the HD4850
- Special Catalyst 8.6 with support for HD 4850
- ForceWare 175.15 for all NVIDIA cards except GTX280
- ForceWare 177.35 for GTX280
AMD released an updated driver (available as a hotfix from their site) after we had finished our testing which promises some performance gains, especially in Crossfire-mode.
The cards were tested with:
- 3Dmark Vantage
- World In Conflict
- Company of Heroes
We choose to test the cards with the Xtreme setting in 3DMark Vantage. Note that we did not have the ForceWare driver that turns on the PhysX-emulation on the GPU. There is also some debate whether this actually is permitted according to Futuremarks rules as the driver is affecting the CPU-score, something Futuremark prohibits, but we are sure Futuremark will tell us what is ok and not ok in the near future.
“5. Based on the specification and design of the CPU tests, GPU make, type or driver version may not have a significant effect on the results of either of the CPU tests as indicated in Section 7.3 of the 3DMark Vantage specification and whitepaper.”
The HD 4850 performs well and is only beat by the GTX 280 and itself in Crossfire Mode.
Company of Heroes
The HD 4850 again performs similar to the HD 3870 X2. The CoH game engine works well in Crossire/SLI mode and also does seem to like a higher core clock as the overclocked 9800GTX get a nice boost.
We used the built-in GPU-benchmark and set the game settings to High.
At higher resolutions the HD 4850 performs very well almost tying the GeForce 9800GTX.
World In Conflict
We used the built-in benchmark and tested the cards at both the Very High preset as well as our own custom preset where everything was set to its highest setting except water reflection size that was set to 768. AA was set to 8x and AF to 16x.
At the Very High setting the HD 4850 again performs very well, especially at the higher resolutions.
Turning everything up to the max does not change the picture much. The HD 4850 still performs well compared to all the other (more expensive) cards.
Idle: At Windows Desktop for 1 hour
Load: The highest wattage measured when running 3DMark Vantage GPU-tests at Extreme-level.
Even though AMD has increased the performance by almost double compared to the HD 3870, the HD 4850 still only draws about the same in power. In Crossfire it does however match the GTX 280 and quite frankly uses a awfull lot of power.
NVIDIA are worried. If anything their obviously desperate paper launch of the GeForce 9800 GTX+ should be proof of that. And they are right to be worried. Even the cheap HD 4850 is sniffing at the tail of the more expensive 9800GTX and even if NVIDIA now drops the price while they sell off the 9800GTX stock and then introduces the 9800GTX+ at $229 they still will have a fight on their hand as I am sure AMD will not sit idle. The HD 4850 is available now and a lot can happen to its price before the GeForce 9800GTX+ comes out.
Even more interesting is the HD4870. While we could not get our hands on one it is obvious that it should be quite a bit faster and thus even be able to take on the GTX 260, and that at a lower price.
Lastly but not least AMD have the R700 waiting in the shadows ready to take on the mighty GTX 280 after the summer.
I must admit I am starting to warm up to AMD’s strategy regarding their GPU’s. We’ve already seen that CrossfireX has matured a lot with the HD3000 GPU’s and thus it will be interesting to see if AMD can get even more games to take full advantage of it with newer drivers, something that will be crucial for the strategy to succeed.
It also feels like AMD currently has the edge when it comes to features as it offers DirectX10.1 support, tesselation support as well as full 7.1-sound over the HDMI-port. NVIDIA’s cards might be fast but they have not been updated with features for some time now.
2008 might be the year when AMD finally got a lucky break and clawed back some market share from NVIDIA. With the HD 4800 series they should at least have a fighting chance of doing that.
The HD 4850 and HD 4870 should be available from retailers already today.