PDA

View Full Version : Nehalem, much less about gaming performance?


Pages : [1] 2

mullet
08-19-08, 10:47 AM
http://www.anandtech.com/weblog/showpost.aspx?i=480

As IDF has started, the first benchmarks of Nehalem will probably pop up. It is without a doubt an impressive architecture that gets a much better platform to run on, but this CPU is not about giving you better frames per second in your favorite game than the Penryn family. Let me make that more clear: even when the GPU is not the bottleneck, it is likely that most games will not significantly faster than on Penryn. We, the people behind it.anandtech.com will probably have the most fun with it, more than your favorite review crew at Anandtech.com :-). And no, I have not seen any tests before I type this. Nehalem is about improving HPC, Database and virtualization performance, much less about gaming performance. Maybe this will change once games get some heavy physics threads, but not right away.

Why? Most Games are about fast caches and super integer performance. After all, most of the Floating point action is already happening on the GPU. All Core 2 CPUs were a huge step forward in integer performance (not in the least because of memory disambiguation) compared to the CPUs of that time (P4 and K8). Nehalem is only a small step forward in integer performance. And the gains due to slightly increased integer performance are mostly negated by the new cache system. In a previous post I told you that most games really like the huge L2 of the Core family. With Nehalem they are getting a 32 KB L1 with a 4 cycle latency, next a very small (compared to the older Intel CPUs) 256 KB L2 cache with 12 cycle latency and after that a pretty slow 40 cycle 8 MB L3. When running on Penryn, they used to get a 3 cycle L1 and a 14 cycle 6144 KB L2. That is a 24 times larger L2 than Nehalem!

The percentage of L2 caches misses of most games running on a Penryn CPU is extremely low. Now that is going to change. The integrated memory controller of Nehalem can't help much, as the fact remains that the L3 is slow and the L2 is small.

But that doesn't mean Intel made a bad choice. Intel made a superbly good choice by improving the performance where Core (Merom/Penryn) was mediocre to good. Penryn was already a magnificent gaming CPU, but it could not beat the AMD competitor in HPC benchmarks. And AMD gave good resistance in the database performance benchmarks. That is all going to change.

Most Database code can not use the wide architecture of Penryn very well. The number of instructions per cycle get lower than 0.5 and waiting for the memory is the most probably cause. SMT or Hyperthreading can do wonders here: while one thread waits for a memory stall, the other thread continues working and vice versa.

Secondly, quad (and eight) socket performance is going to improve a lot as four Nehalems only have to keep four L3 in sync, while a similar Tigerton system has to keep 8 L2 caches in sync. That is why the cache system is perfect for server performance, but a little less interesting for gaming performance.

The massive bandwidth that the integrated tri-channel memory controller delivers will do wonders for HPC code. And the new TLB architecture with EPT will make Nehalem shine compared to it's older Core brothers.

No, Nehalem was made to please the IT and HPC people. Bring it to it.anandtech.com, it is not that interesting for you gamers ;-)

Imbroglio
08-19-08, 10:55 AM
if i were a betting man i would bet against Anand here.

slaWter
08-19-08, 11:04 AM
It'll help ins sims like FSX or iRacing. Games like that aren't GPU limited.

walterman
08-19-08, 11:09 AM
When will they release the 8 cores nehalem ?

brady
08-19-08, 01:40 PM
Truthfully, with physics moving to the GPU there just isn't a need for "game" optimized CPUs right now.

Ninja Prime
08-19-08, 01:55 PM
When will they release the 8 cores nehalem ?

Never, for gaming. The 8 cores are server-only products.

Intel17
08-19-08, 02:56 PM
I'll take what Anand says with a grain of salt...

wolfgar
08-19-08, 05:48 PM
Never, for gaming. The 8 cores are server-only products.

I will agree/disagree with a position of: "Not for a while, probably 2 years or more before gaming engines can effectively use more than 2 cores"

The GZ wars are over. We hit the wall. Now its cores. Intel expects 4,8,16, ... and more cores in the future.
last year Intel told developers to start planning muti-core support. Few listened. Even MS is way behind.

I did some multi threaded programming years ago in OS/2. Its tough stuff to do right, with only 2 threads. More threads = even harder.
Its also not something you just can patch into an existing engine, like a better CPU instruction set. A few areas can usually be identified and be split off into small separate threads, but that's only for small pieces, not the heart.

It sounds easy, until you try. Thread locks, wait conditions, race conditions, concurrency issues (worse with the virtual memory mapping in XP/Vista)

Microsoft hasn't really endorsed it previously, but now the development tools are there in the latest .NET 3.5 (and 4.0 at year end) Visual studio 2008 SP1, it is a lot easier, but then what game engine will want to require the .Net overhead??

New game engines will need to be developed from a low level to use multi-core effectively.

The real problem?

to maximize return on investment, every game manufacturer is thinking multi-platform now. Even Cormack said the blasphemy of: 'that the console is the primary future target system because of the piracy problem and ease of 'known' hardware'.

Writing an application (let alone a game engine) that is both multi platform and variable # threaded, just gives a me headache even thinking about it.

FYI: if an application is multi-threaded, and there are not enough cores to handle the load, you actually get less performance.

SO, we end back at: lowest common denominator as a requirement. 1 thread... <sigh>

Until multi-core becomes the norm.

Monolyth
08-19-08, 06:07 PM
I can't track w/Anand on this. Clock for clock the Nehalem is already a higher-performing part then Penryn, whether it is the architecture or IMC/Quickpath technology at work who knows. We will see an increase in performance irregardless, the real question is what happens after IMC & Quickpath?

what he said

I hear you on the multithread programming. I've actually done quite a bit of it in .NET 2.0. Worker threads are amazingly handy (when used in the right spots) but can increase the complexity of your code. You really have to analyze the gains of a particular instance because no matter what you will lose some cycles to additional checks that must be performed because you decided to make a call asynch. These checks are what degrade performance in an already taxed system.

walterman
08-19-08, 06:40 PM
Never, for gaming. The 8 cores are server-only products.

Sad :(

About the Anandtech comments about the cache size & latency issues, well, it will depend of the cache miss rate of the application. Usually games have a huge miss rate, but, i guess that Nehalem will need less clock cycles for some instructions, and it will feature more execution units that will help for computing intensive applications (gaming included). So, overall, it will have a good balance. Also, HT will help to increase the performance of each core, cause, when a thread is locked waiting for a memory petition, that core could switch from thread, and keep executing instructions from a thread that is ready.

rhink
08-19-08, 06:41 PM
not to mention there's a good bit of overhead in the OS for heavyweight threads.

Though anand may be right- the CPU probably is more optimized around servers than games. Servers are a bigger market than gamers, and it's the area AMD is most competitive in (because they aren't focusing much on gamers, either!). Fact is most recent CPU's are already fast enough for most games. It may end up being faster for at least some games, but if that's what they were focusing on, it'd be faster still.

rage10
08-20-08, 01:24 AM
FYI: if an application is multi-threaded, and there are not enough cores to handle the load, you actually get less performance.

SO, we end back at: lowest common denominator as a requirement. 1 thread... <sigh>

Until multi-core becomes the norm.
the only currentl manufactured console with one core is the wii the ps3 has 8 or 9 and the xbox 360 has 3. I'd say its the norm.

Viral
08-20-08, 04:25 AM
I can't track w/Anand on this. Clock for clock the Nehalem is already a higher-performing part then Penryn, whether it is the architecture or IMC/Quickpath technology at work who knows. We will see an increase in performance irregardless, the real question is what happens after IMC & Quickpath?

Only in heavily multithreaded apps,

http://images.anandtech.com/graphs/nehalempreview_060508030043/17023.png

Nehalem is very similar to Barcelona. The IMC is a big step up from penryn, but in many cases, the cache setup is a step down.

slaWter
08-20-08, 06:14 AM
Here is an official IDF game test, Lost Planet, Yorkfield vs Nehalem both at 3.2GHz with a 9800GX2: Link (German) (http://www.computerbase.de/news/hardware/prozessoren/intel/2008/august/idf_benchmarks_intels_nehalem/)

LP Test 1:
QX9770 - 56,1 FPS
Nehalem - 79,5 FPS

LP Test 2:
QX9770 - 92,2 FPS
Nehalem - 126,3 FPS

Not bad :D

Heinz68
08-20-08, 06:17 AM
In my case the only choice is to get Nehalem. At this stage upgrading to LGA 775 Socket is to late at least for me.
Nehalem is definitely better CPU than the Penryn so it should shows some improvements in gaming compared to Penryn even on the games that are not yet multi-core optimized.
Intel claims new chip trebles read/write speed
One of the most significant changes was already known. Intel now plans to build a part called an integrated memory controller - which moves information between the microprocessor and the computer's memory - directly into the processor itself.

That's a key change because processors are asked to do more and more, and any lag in communication can seriously hurt performance. AMD has already been incorporating integrated memory controllers into its processors.

Because of that and other tweaks, Intel said its new design, which is code-named Nehalem, will triple the speed at which data can be written to memory or read back, compared to previous generations. Intel says Nehalem also will have nearly double the 3-D animation capabilities as past chips, and better utilize the multiple "cores," or processing engines, on each chip.
SOURCE (http://www.nzherald.co.nz/category/story.cfm?c_id=55&objectid=10528002)
Nehalem and a Penryn side by side demo of Lost Planet: Colonies. I guess the game must be multi-core optimized.
Intel did show a demo of Lost Planet: Colonies running side-by-side on both a Nehalem and a Penryn processor, each clocked at 3.2GHz. In terms of performance in the demo showed, the Nehalem-based system was around 50-to-80 percent faster, but the question is whether or not this is a final clock speed for Intel’s Core i7 ‘Extreme Edition’ processor. That's unclear at the moment – I guess only time will tell, but we’ll be keeping our ear to the ground ahead of any official announcement from Intel.
SOURCE (http://www.bit-tech.net/news/2008/08/19/nehalem-derivatives-detailed/1)

Nutty
08-20-08, 09:33 AM
I'd take any demo staged by Intel with a pinch of salt. Who knows what memory they had in there, or what the driver settings were set to.

Of course they're gonna set it up to make Nehalem look good.

nekrosoft13
08-20-08, 09:47 AM
Here is an official IDF game test, Lost Planet, Yorkfield vs Nehalem both at 3.2GHz with a 9800GX2: Link (German) (http://www.computerbase.de/news/hardware/prozessoren/intel/2008/august/idf_benchmarks_intels_nehalem/)

LP Test 1:
QX9770 - 56,1 FPS
Nehalem - 79,5 FPS

LP Test 2:
QX9770 - 92,2 FPS
Nehalem - 126,3 FPS

Not bad :D

that is impressive

Ninja Prime
08-20-08, 11:06 PM
Here is an official IDF game test, Lost Planet, Yorkfield vs Nehalem both at 3.2GHz with a 9800GX2: Link (German) (http://www.computerbase.de/news/hardware/prozessoren/intel/2008/august/idf_benchmarks_intels_nehalem/)

LP Test 1:
QX9770 - 56,1 FPS
Nehalem - 79,5 FPS

LP Test 2:
QX9770 - 92,2 FPS
Nehalem - 126,3 FPS

Not bad :D


Ehh, has nothing to do with single thread performance though. It one of the 3-4 games that can take advantage of 4 cores.

Intel17
08-21-08, 03:13 AM
Games that cannot utilize multithreaded goodness probably don't need Nehalem's power anyway.

walterman
08-21-08, 09:57 AM
I bet that with the proper coding, the chip will shine :)

josiahsuarez
08-21-08, 07:40 PM
in any case Intel hasn't totally forgotten about gamers. remember there's still Larrabee :)

walterman
08-22-08, 10:28 AM
For the users interested in the architecture changes, i recommend this reading:
http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3382

There are a couple of changes that i like:

1) Inclusive L3 cache -> Helps to reduce the cache snoops traffic when you have a lot of cores. This means higher multicore efficiency.

2) Unaligned Cache Accesses without reduction in performance -> Non mod 16 aligned memory acceses (movups), had a performance hit. Now they perform at full speed (like with movaps). This helps in special to the multimedia apps, where you cannot guarantee that your data will be always aligned.

Heinz68
08-22-08, 11:33 AM
The 2.93GHz Nehalem benchmarks by Hexus.net (http://www.hexus.net/content/item.php?item=15015&page=1)

walterman
08-22-08, 04:13 PM
Video encoding results are as expected, but gaming results are odd. We better wait for proper tests.

Heinz68
08-22-08, 04:39 PM
Video encoding results are as expected, but gaming results are odd. We better wait for proper tests.
Sure we better wait, here what Hexus (http://www.hexus.net/content/item.php?item=15015&page=8) said about the gaming test
One look in Device Manager showed that not all the correct drivers had been installed, which did little to hinder 2D performance, but played a part in sub-optimal 3D results.

A table has been included to highlight the results we observed, but it is abundantly clear that something was awry in the test box.

Firstly, the Quake Wars: ET 1,680x1,050 result is significantly lower than expected, because the test becomes practically GPU-limited at that setting, even on a Radeon HD 4870 512MB card: we see that from the similarity of the results between other CPUs.

Secondly, the 3DMark Vantage default test score is around 1,000 marks too low, again resulting from, we believe, an unoptimised setup. But take a look at the CPU-only score and Nehalem's power rears its head.