Tom's Guide > Forum > CPU & Components > CPUs > SPEs and PPEs in cell

SPEs and PPEs in cell

Forum CPU & Components : CPUs - SPEs and PPEs in cell

TomsGuide.com: Over 800,000 questions and answers to address all your high-tech questions. Sign up now! Its free!
Word :    Username :           
 

Are they some sort of cores or are they some sort os co-processors,

Thanx

Sponsored Links
Register or log in to remove.

PPE = General purpose core.
SPE = Glorified SIMD cores.

There's one PPE and 8 SPE's.

Reply to Action_Man
- 0 +

Cell is 9 core CPU, but all the cores are not the same and have different functionality, the inventors are thinking about the asymetric multicore architecture that is the future and other compainies like Intel and AMD will follow their example.
The PPE, named as Power Element is 1 core that is very similar to PowerMac G4 processor. It has the role of accepeting the tasks and share them to the rest 8 cores, named as Synerghistic Cores.
The PPE is properly sharing the multimedia jobs to the SPEs, taking in account which one is how available. It is easy to divide and process multimedia in parallel, but the PPE task is not that easy becouse there are a lot of data that is flowing through it.
SPE 8 cores are connected with extremly fast links with bandwidth of 300GB/S per link, 256kB cache each, operating at 4-5GHz, therefore doing 192-256GFLoop/s. That is...OMG, my overclocked Athlon is doing less than 6GFloop/s

Reply to gOJDO

[/quote]

Quote :

SPE 8 cores are connected with extremly fast links with bandwidth of 300GB/S per link, 256kB cache each, operating at 4-5GHz, therefore doing 192-256GFLoop/s. That is...OMG, my overclocked Athlon is doing less than 6GFloop/s



Sorry for hijacking this thread but how would you find out how many GFloop/s a cpu does. never heard of this crap before. satisfy my curiousity please.lol

Reply to Trance183
- 0 +

IEEE Spectrum, January 2006
MICROPROCESSORS By Samuel K. Moore
http://img150.imageshack.us/img150/1163/toshibaibmsony8fr.jpg
"WINNER MULTIMEDIA MONSTER: Cell's nine processors make it a supercomputer on a chip"

GOAL: Make a new microprocessor architecture that beats all others at handling graphics and broadband multimedia.
WHY IT'S A WINNER: It met that goal and is being designed into highvolume massmarket items like game consoles and televisions.
ORGANIZATIONS: IBM, Sony and Toshiba.
CENTER OF ACTIVITY: Austin, Texas.
NUMBER OF PEOPLE ON THE PROJECT: 400 at its peak.
BUDGET: US $400 million.



WE'RE FLYING AT ABOUT MARCH 1.5 around Mount Saint Helens, in Washington state. IBM Corp. senior programmer Barry L. Minor is at the controls, rocketing us over the crater and then down to the lake at its base to skim over the tree trunks that have been floating there since the volcano exploded over 25 years ago. The flight is exhilarating, even though it's just a simulation projected on a widescreen monitor in a cluttered testing lab.

Then, at the flick of a switch, Minor turns the simulation over from his new Cell processor to a dualprocessor Apple Power Mac G5, and the scenery freezes. The G5 almost audibly groans under the burden, though it's no slouch. In fact, it's currently the top of the line for PCs. But Cell is something different entirely. It's a bet on what consumers will do with data and how best to suit microprocessors to the task and it's really, really fast. Cell, which is shorthand for Cell Broadband Engine Architecture, is a US $400 million joint effort of IBM, Sony, and Toshiba. It was originally conceived as the microprocessor to power Sony's third-generation game console, PlayStation 3, to be released this spring, but it is expected to find a home in lots of other broad-band-connected consumer items and in servers too.
Executives at Sony Corp., in Tokyo, wanted more than just an incremental improvement over PlayStation 2's processor, the Emotion Engine. What they got was a 36fold acceleration, to a whopping 192 billion floating-point operations per second (192 gigaflops). Because Cell is a combination of general-purpose and multimedia processors, it defies an exact comparison with other upcoming chips, but it's thought to be more powerful than the chips driving competing game systems.
Cell can calculate at such blazing speed, in part, because it's made up of nine processors on a single chip of silicon, optimized for the kind of realtime calculations needed in today's broadband, media-rich environment. A specially designed 300gigabit-per-second bus knits the processors into a single machine, and interface technology from Rambus Inc., Los Altos, Calif., gives it fast access to memory and other offchip systems.
So far, microprocessor watchers have been impressed with what they've seen of Cell. "To bring huge parallel processing onto a single chip in a clean and efficient way is a real accomplishment," says Ruby B. Lee, a professor of electrical engineering at Princeton University and an IEEE Fellow.
A graphics-heavy item such as PlayStation 3 isn't just a showcase for an unusual chip. For IBM it's a philosophical statement. "Gaming is the next interface driving computing," says James A. Kahle, Cell's chief architect with the IBM Technology Group, in Austin, Texas [see photo, "Multicellular"]. Just as moving from punch cards to electronic displays changed what people expected of computers, the highly collaborative, realtime realism of today's games will set the standard for what people want from computers in the future.

http://img227.imageshack.us/img227/971/cell2hc.jpg

But even now, the sheer desire for power in the gaming market guarantees that Cell will he made in volumes that more than make up for the loss last year of IBM's highest profile customer, Apple Computer Inc. Market research firm iSuppli Corp., in El Segundo, Calif., predicts that 37 million game consoles will be sold this year alone worldwide. By 2007, when all three game console makers will have released their nextgeneration products, the market will have grown to 44 million. And though Cell is exclusive to the PlayStation 3, IBM has a lock on the rest of the console market. Its microprocessors will power both of Sony's competitors, Microsoft's Xbox and Nintendo's GameCube.
The Cell powered PlayStation 3 can expect to pick up a little less than half of what could become a market worth up to $9.5 billion in 2007, according to iSuppli senior analyst Chris Crotty. And, of course, there are other high-volume plans for Cell.
Toshiba Corp., in Tokyo, for one, plans to build television sets around it. The company has already shown that a single Cell processor can decode and display 48 compressed video streams at once, potentially allowing a television viewer to choose a channel based on dozens of thumbnail videos displayed simultaneously on the screen. And in a smaller market, Cell has already found its first outside customer in medical and militarysystems maker Mercury Computer Systems Inc., in Chelmsford, Mass., which is developing a two Cell blade server due out by April.
With two such massive consumer electronics makers as Toshiba and Sony behind it, Cell is an obvious attempt to control the "digital living room," as technology executives have dubbed their dream of a home where all the media players are intelligent and networked together. "[Sony's] goal is to make a computer fun...to make it an entertainment platform," says Sony's Cell director Masakazu Suzuoki. "But even if we make the Cell system an entertainment platform, there's nothing if there's no content."
Indeed, experts say Cell's success hinges on whether programmers outside IBM, Sony, and Toshiba will be able to exploit the gigaflops that Cell has to offer. Tony Massimini, chief of technology at the consulting firm Semico Research Corp., in Phoenix, puts it bluntly: "Cell has strong potential, assuming that the game developers satisfy their customers' needs. But if the games suck, who wants to buy it?"
That Cell has more than one processor core on a single chip is more a sign of the times than a revolution. All the microprocessor stalwarts are moving to multicore design. The principal reason is that the old way of doing things—increasing the number of calculations per second by shrinking the processors into a tighter knot of tinier transistors and then dialing up the clock speed has essentially crashed headlong into the brick wall of heat generation.
Because transistors using today's technology are so small, even when they are supposed to be in the "off" state, infinitesimal currents still leak through them. That leakage warms them constantly, and with the extra heat generated when transistors switch "on" or "off," it produces a microfurnace on a chip. If chip makers had continued on their old path, by the year 2015, microprocessors would be throwing off more watts per square millimeter than the surface of the sun.
As a result, the industry has shifted from maximizing performance to maximizing performance per watt, mainly by putting more than one microprocessor on a single chip and running them all well below their top speed. Because the transistors are switching less frequently, the processors generate less heat. And because there are at least two hot spots on each chip, the heat is spread more evenly over it, so it's less damaging to the circuitry and easier to get rid of with fans and heat sinks.
Multicore processors on the market today are generally symmetrical—that is, they have two copies of essentially the same core on one chip. Cell, on the other hand, has an asymmetric architecture that contains two different kinds of cores [see photo, "Cell City Map"]. One, the Power processing element, is similar to the CPU in a Mac, it runs the Linux operating system and divides up work for the other eight processors to do. Those eight—called Synergistic processing elements—are designed specifically to juggle multimedia applications: video compression and decompression, encryption and decryption of copyrighted content, and, especially, rendering and modifying graphics.



The Synergistic elements were built from the ground up to do what are called singleprecision floatingpoint calculations—the kind of operations needed for dazzling threedimensional graphics and a host of other multimedia tasks. The design traded flexibility-a Synergistic element is not versatile enough to run the Linux operating system on its own—for eye-popping speed. When pushed to its 5.6gigahertz limits, a single unit can do 44.8 billion singleprecision floatingpoint calculations per second. Not wanting to cut Cell off from a role in scientific computing, its designers included circuitry in each Synergistic element that can do the more exacting calculations, called doubleprecision, that scientists demand, but its performance is only about onetenth that of the singleprecision unit.
In fact, the Synergistic elements are so fast that a single one could easily consume the entire bandwidth on the interconnects to the offchip memory, leaving its siblings starved for data and stalled out. IBM and its partners had to design a special chunk of circuitry into Cell just to prevent that problem.
Apart from its raw power, Cell has content-protection tricks that should make it attractive to multimedia applications makers. For instance, the Synergistic element's architecture prevents any application or external device from accessing the element's local memory, so that, for instance, a program cannot steal a music file that is being decrypted by the processor. "Once you bring your code in and decrypt it, it can execute in a virtually trusted environment," says IBM's Cell architect Charles R. Johns. "All the data it calculates on, sends out, and brings in is fully protected."
The isolation function can be used in several ways, says Kahle. "We knew we couldn't anticipate all the different security needs in the future, but we wanted to know we had the right hardware to support a very robust security system."
Barry Minor's Mount Saint Helens simulator is a good example of how Cell's different processors work together. His program takes a satellite photo of the volcano, lines it up with an elevation map, and then turns it into a detailed 3D terrain on the fly. The Mount Saint Helen's data has a resolution of 2.4 meters. The city of Austin, where the Cell design center is, once gave Minor access to its 15.4centimeter-resolution satellite map. "You could land in Michael Dell's backyard and check out his view," Minor says with a grin.
What's happening inside the processor is a finely choreographed dance. The Power processing element starts by figuring out where the joystick is pointing the simulator in the stored 2D maps. Then it divides that scene into 32 portions, four for each Synergistic element. Though perfectly capable of it, the Power processing element does no calculations on the actual data. Instead, it plays to its strength as a controller, figuring out which chunk of work should go to each of the other cores according to how complex the scene is and which cores have more or less time on their hands.
The Synergistic elements then go to work. They pull their portion of the data into their local memories, which they can access at great speed. Then each runs a rendering algorithm on the data and stores it off the chip in the system memory. When the processors are done, they signal the Power element, which instructs one of the synergistic units to run a video compression algorithm. That processor compresses its sister units' finished products and then pushes them out to be displayed on the screen or streamed to a PDA or some other device.
Because the compression takes less time than rendering the graphics, the compressing processor automatically switches gears when it's finished and runs the rendering algorithm on a portion of data until it's needed for compression again. With each frame, the process starts over.
This dance works so well for two reasons. The first has to do with the way Cell handles memory. Rather than waste several clock cycles waiting for the right data to arrive from memory, a Synergistic element works only on data stored in its own 256 kilobytes of memory, to which it has a high-bandwidth connection. More important, Cell's memory-handling engines can be programmed to keep data streaming through the processor. "We can get over 128 memory transactions going in flight at once," boasts Michael N. Day, a distinguished engineer at IBM.
The memory-access engine takes in new data and sends out the old just in time for the synergistic unit to perform the necessary calculations. When Cell runs Minor's volcano simulator, it waits for data to arrive from memory for only 1 percent of the time, the G5, in contrast, stands idle for about 40 percent of the time.
Cell's other key to speed has to do with breaking problems into parts that can be done in parallel. In Minor's simulation, it probably seems obvious that an image can be divided up into eight strips and these worked on independently. What wasn't so obvious was that the 3-D rendering could be done four pieces of data at a time within each synergistic processor. Such four-way parallel computing is called single instruction multiple data, or SIMD, and it is particularly well suited to the manipulation of graphics and other multimedia.
In these problems, you typically want to perform the same operation on each of the elements in a large chunk of data. For example, to increase the brightness of an image, you'd want to add the same number to every pixel in it. Since around the mid-1990s, general-purpose processors such as the Intel x86 architectures have been doing SIMD computing using a set of multimedia-specific instructions, explains Princeton's Lee, a multimedia instructions pioneer.
But SIMD instructions run far faster on Cell's Synergistic processors, because the Cell processors were designed from the start to handle them. And don't forget: there are eight such processors on each chip. Cell programmers spend most of their time turning complex algorithms into efficient SIMD algorithms, says Minor. "Once you've done that, you're 80 percent done."
The Chip's commercial success will depend on whether programmers can learn to exploit its full potential. To that end, the developers have from the beginning put a high priority on crafting the appropriate software tools.
One of the key deadlines the Cell development team had to meet was having its software ready and tested in time for the arrival of the first chips, in spring 2004. The software team was running programs on a Cell simulator two full years before it got the first chip—and when the chip finally arrived, both the operating system and the applications worked on the first try. "Had we waited to do software development until the chip came back, it would have been a disaster," says Theodore R. Maeurer, software manager at IBM.
With such a head start on the software, the group could focus on how to familiarize new programmers with Cell. "A programmer has to do a really nice job of laying out the data transfers and so forth," says Day. But soon that job will be turned over to the compiler and the programming tools. IBM software engineers are also developing tools that will make it easier for programmers to divide tasks between the Power element and the Synergistic cores, and they're making others to automatically find solutions to problems that fit well with the Synergistic units' SIMD strengths. The company has already released more than 700 pages of documents to applications developers and will begin releasing tools and compilers, as well.
Cell's asymmetric architecture signals the beginning of a big shift in how computers are programmed, says Craig Steffen, a senior research scientist at the National Center for Super-computing Applications, Urbana-Champaign, Ill., who gained some fame lashing together 70 PlayStation 2 consoles to form a $50 000 supercomputer.
"How do you program with eight engines running full speed without them constantly stopping and waiting for data?" Steffen asks. Cell will force mainstream programmers to wrestle with that question. But ultimately, parallel programming will become fairly routine, he predicts. "Over the next several years, we won't think of an asymmetric processor as anything different."
Indeed, some think Cell is an indication of what's to come in other microprocessors. "In the future, we'll see convergence of general-purpose multiprocessors and game- and media-oriented processors," says Princeton's Lee. "Media processors will become more general purpose, and general purpose, more multimedia." And with any luck, that will make your living room a more entertaining place.

Reply to gOJDO
- 0 +

Quote :

PPE = General purpose core.
SPE = Glorified SIMD cores.

There's one PPE and 8 SPE's.



Your forgot to mention it's difficult to code for and it additionally sucks, max peak performance is attainable on "hand picked code".

Reply to spud
- 0 +

There would be appropriate compilers soon, maybe there is available one at the moment. So programers don't have to worry about the difficulty of coding, and thats why the PPE is, to share the jobs between the SPEs, where most of the jobs are multimedia and easy to split for parallel processing.

Reply to gOJDO

We're talking peak theoretical performance; real numbers won't be nearly as high.

Quote :

Microsoft's Xbox 360 & Sony's PlayStation 3 - Examples of Poor CPU Performance

In our last article we had a fairly open-ended discussion about many of the challenges facing both of the recently announced next-generation game consoles. We discussed misconceptions about the Cell processor and its ability to accelerate physics calculations, as well as touched on the GPUs of both platforms. In the end, both the Xbox 360 and the PlayStation 3 are much closer competitors than you would think based on first impressions.

The Xbox 360’s Xenon CPU features more general purpose cores than the PlayStation 3 (3 vs. 1), however game developers will most likely only be using one of those cores for the majority of their calculations, leveling the playing field considerably.

The Cell processor derives much of its power from its array of 7 SPEs (Synergistic Processing Elements), however as we discovered in our last article, their purpose is far more specialized than we had thought. Speaking with Epic Games’ head developer, Tim Sweeney, he provided a much more balanced view of what sorts of tasks could take advantage of the Cell’s SPE array.

The GPUs of the next-generation platforms also proved to be quite interesting. In Part I we speculated as to the true nature of NVIDIA’s RSX in the PS3, concluding that it’s quite likely little more than a higher clocked G70 GPU. We will expand on that discussion a bit more in this article. We also looked at Xenos, the Xbox 360’s GPU and characterized it as equivalent to a very flexible 24-pipe R420. Despite the inclusion of the 10MB of embedded DRAM, Xenos and RSX ended up being quite similar in our expectations for performance; and that pretty much summarized all of our findings - the two consoles, although implementing very different architectures, ended up being so very similar.

So we’ve concluded that the two platforms will probably end up performing very similarly, but there was one very important element excluded from the first article: a comparison to present-day PC architectures. The reason a comparison to PC architectures is important is because it provides an evaluation point to gauge the expected performance of these next-generation consoles. We’ve heard countless times that these new consoles would offer better gaming performance than anything we’ve had on the PC, or anything we would have for a matter of years. Now it’s time to actually put those claims to the test, and that’s exactly what we did.

Speaking under conditions of anonymity with real world game developers who have had first hand experience writing code for both the Xbox 360 and PlayStation 3 hardware (and dev kits where applicable), we asked them for nothing more than their brutal honesty. What did they think of these new consoles? Are they really outfitted with the PC-eclipsing performance we’ve been lead to believe they have? The answer is actually quite frequently found in history; as with anything, you get what you pay for.

Learning from Generation X

The original Xbox console marked a very important step in the evolution of gaming consoles - it was the first console that was little more than a Windows PC.
It featured a 733MHz Pentium III processor with a 128KB L2 cache, paired up with a modified version of NVIDIA's nForce chipset (modified to support Intel's Pentium III bus instead of the Athlon XP it was designed for). The nForce chipset featured an integrated GPU, codenamed the NV2A, offering performance very similar to that of a GeForce3. The system had a 5X PC DVD drive and an 8GB IDE hard drive, and all of the controllers interfaced to the console using USB cables with a proprietary connector.

For the most part, game developers were quite pleased with the original Xbox. It offered them a much more powerful CPU, GPU and overall platform than anything had before. But as time went on, there were definitely limitations that developers ran into with the first Xbox.

One of the biggest limitations ended up being the meager 64MB of memory that the system shipped with. Developers had asked for 128MB and the motherboard even had positions silk screened for an additional 64MB, but in an attempt to control costs the final console only shipped with 64MB of memory.
The next problem is that the NV2A GPU ended up not having the fill rate and memory bandwidth necessary to drive high resolutions, which kept the Xbox from being used as a HD console.

Although Intel outfitted the original Xbox with a Pentium III/Celeron hybrid in order to improve performance yet maintain its low cost, at 733MHz that quickly became a performance bottleneck for more complex games after the console's introduction.

The combination of GPU and CPU limitations made 30 fps a frame rate target for many games, while simpler titles were able to run at 60 fps. Split screen play on Halo would even stutter below 30 fps depending on what was happening on screen, and that was just a first-generation title. More experience with the Xbox brought creative solutions to the limitations of the console, but clearly most game developers had a wish list of things they would have liked to have seen in the Xbox successor. Similar complaints were levied against the PlayStation 2, but in some cases they were more extreme (e.g. its 4MB frame buffer).

Given that consoles are generally evolutionary, taking lessons learned in previous generations and delivering what the game developers want in order to create the next-generation of titles, it isn't a surprise to see that a number of these problems are fixed in the Xbox 360 and PlayStation 3.

One of the most important changes with the new consoles is that system memory has been bumped from 64MB on the original Xbox to a whopping 512MB on both the Xbox 360 and the PlayStation 3. For the Xbox, that's a factor of 8 increase, and over 12x the total memory present on the PlayStation 2.

The other important improvement with the next-generation of consoles is that the GPUs have been improved tremendously. With 6 - 12 month product cycles, it's no surprise that in the past 4 years GPUs have become much more powerful. By far the biggest upgrade these new consoles will offer, from a graphics standpoint, is the ability to support HD resolutions.

There are obviously other, less-performance oriented improvements such as wireless controllers and more ubiquitous multi-channel sound support. And with Sony's PlayStation 3, disc capacity goes up thanks to their embracing the Blu-ray standard.
But then we come to the issue of the CPUs in these next-generation consoles, and the level of improvement they offer. Both the Xbox 360 and the PlayStation 3 offer multi-core CPUs to supposedly usher in a new era of improved game physics and reality. Unfortunately, as we have found out, the desire to bring multi-core CPUs to these consoles was made a reality at the expense of performance in a very big way.

-------------------------------------------------------------

Problems with the Architecture

At the heart of both the Xenon and Cell processors is IBM’s custom PowerPC based core. We’ve discussed this core in our previous articles, but it is best characterized as being quite simple. The core itself is a very narrow 2-issue in-order execution core, featuring a 64KB L1 cache (32K instruction/32K data) and either a 1MB or 512KB L2 cache (for Xenon or Cell, respectively). Supporting SMT, the core can execute two threads simultaneously similar to a Hyper Threading enabled Pentium 4. The Xenon CPU is made up of three of these cores, while Cell features just one.

Each individual core is extremely small, making the 3-core Xenon CPU in the Xbox 360 smaller than a single core 90nm Pentium 4. While we don’t have exact die sizes, we’ve heard that the number is around 1/2 the size of the 90nm Prescott die.

IBM’s pitch to Microsoft was based on the peak theoretical floating point performance-per-dollar that the Xenon CPU would offer, and given Microsoft’s focus on cost savings with the Xbox 360, they took the bait.

While Microsoft and Sony have been childishly playing this flops-war, comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs processing power of the Cell, the real-world performance war has already been lost.

Right now, from what we’ve heard, the real-world performance of the Xenon CPU is about twice that of the 733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the next 4 - 5 years, it’s nothing short of disappointing. To put it in perspective, floating point multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.

The reason for the poor performance? The very narrow 2-issue in-order core also happens to be very deeply pipelined, apparently with a branch predictor that’s not the best in the business. In the end, you get what you pay for, and with such a small core, it’s no surprise that performance isn’t anywhere near the Athlon 64 or Pentium 4 class.

The Cell processor doesn’t get off the hook just because it only uses a single one of these horribly slow cores; the SPE array ends up being fairly useless in the majority of situations, making it little more than a waste of die space.

We mentioned before that collision detection is able to be accelerated on the SPEs of Cell, despite being fairly branch heavy. The lack of a branch predictor in the SPEs apparently isn’t that big of a deal, since most collision detection branches are basically random and can’t be predicted even with the best branch predictor. So not having a branch predictor doesn’t hurt, what does hurt however is the very small amount of local memory available to each SPE. In order to access main memory, the SPE places a DMA request on the bus (or the PPE can initiate the DMA request) and waits for it to be fulfilled. From those that have had experience with the PS3 development kits, this access takes far too long to be used in many real world scenarios. It is the small amount of local memory that each SPE has access to that limits the SPEs from being able to work on more than a handful of tasks. While physics acceleration is an important one, there are many more tasks that can’t be accelerated by the SPEs because of the memory limitation.

The other point that has been made is that even if you can offload some of the physics calculations to the SPE array, the Cell’s PPE ends up being a pretty big bottleneck thanks to its overall lackluster performance. It’s akin to having an extremely fast GPU but without a fast CPU to pair it up with.

-------------------------------------------------

What About Multithreading?

We of course asked the obvious question: would game developers rather have 3 slow general purpose cores, or one of those cores paired with an array of specialized SPEs? The response was unanimous, everyone we have spoken to would rather take the general purpose core approach.

Citing everything from ease of programming to the limitations of the SPEs we mentioned previously, the Xbox 360 appears to be the more developer-friendly of the two platforms according to the cross-platform developers we've spoken to. Despite being more developer-friendly, the Xenon CPU is still not what developers wanted.

The most ironic bit of it all is that according to developers, if either manufacturer had decided to use an Athlon 64 or a Pentium D in their next-gen console, they would be significantly ahead of the competition in terms of CPU performance.

While the developers we've spoken to agree that heavily multithreaded game engines are the future, that future won't really take form for another 3 - 5 years. Even Microsoft admitted to us that all developers are focusing on having, at most, one or two threads of execution for the game engine itself - not the four or six threads that the Xbox 360 was designed for.

Even when games become more aggressive with their multithreading, targeting 2 - 4 threads, most of the work will still be done in a single thread. It won't be until the next step in multithreaded architectures where that single thread gets broken down even further, and by that time we'll be talking about Xbox 720 and PlayStation 4. In the end, the more multithreaded nature of these new console CPUs doesn't help paint much of a brighter performance picture - multithreaded or not, game developers are not pleased with the performance of these CPUs.

What about all those Flops?

The one statement that we heard over and over again was that Microsoft was sold on the peak theoretical performance of the Xenon CPU. Ever since the announcement of the Xbox 360 and PS3 hardware, people have been set on comparing Microsoft's figure of 1 trillion floating point operations per second to Sony's figure of 2 trillion floating point operations per second (TFLOPs). Any AnandTech reader should know for a fact that these numbers are meaningless, but just in case you need some reasoning for why, let's look at the facts.

First and foremost, a floating point operation can be anything; it can be adding two floating point numbers together, or it can be performing a dot product on two floating point numbers, it can even be just calculating the complement of a fp number. Anything that is executed on a FPU is fair game to be called a floating point operation.

Secondly, both floating point power numbers refer to the whole system, CPU and GPU. Obviously a GPU's floating point processing power doesn't mean anything if you're trying to run general purpose code on it and vice versa. As we've seen from the graphics market, characterizing GPU performance in terms of generic floating point operations per second is far from the full performance story.

Third, when a manufacturer is talking about peak floating point performance there are a few things that they aren't taking into account. Being able to process billions of operations per second depends on actually being able to have that many floating point operations to work on. That means that you have to have enough bandwidth to keep the FPUs fed, no mispredicted branches, no cache misses and the right structure of code to make sure that all of the FPUs can be fed at all times so they can execute at their peak rates. We already know that's not the case as game developers have already told us that the Xenon CPU isn't even in the same realm of performance as the Pentium 4 or Athlon 64. Not to mention that the requirements for hitting peak theoretical performance are always ridiculous; caches are only so big and thus there will come a time where a request to main memory is needed, and you can expect that request to be fulfilled in a few hundred clock cycles, where no floating point operations will be happening at all.

So while there may be some extreme cases where the Xenon CPU can hit its peak performance, it sure isn't happening in any real world code.

The Cell processor is no different; given that its PPE is identical to one of the PowerPC cores in Xenon, it must derive its floating point performance superiority from its array of SPEs. So what's the issue with 218 GFLOPs number (2 TFLOPs for the whole system)? Well, from what we've heard, game developers are finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.

Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64 will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of the next-generation consoles; don't fall for it.

------------------------------------------------------

Why did Sony/MS do it?

For Sony, it doesn't take much to see that the Cell processor is eerily similar to the Emotion Engine in the PlayStation 2, at least conceptually. Sony clearly has an idea of what direction they would like to go in, and it doesn't happen to be one that's aligned with much of the rest of the industry. Sony's past successes have really come, not because of the hardware, but because of the developers and their PSX/PS2 exclusive titles. A single hot title can ship millions of consoles, and by our count, Sony has had many more of those than Microsoft had with the first Xbox.

Sony shipped around 4 times as many PlayStation 2 consoles as Microsoft did Xboxes, regardless of the hardware platform, a game developer won't turn down working with the PS2 - the install base is just that attractive. So for Sony, the Cell processor may be strange and even undesirable for game developers, but the developers will come regardless.

The real surprise was Microsoft; with the first Xbox, Microsoft listened very closely to the wants and desires of game developers. This time around, despite what has been said publicly, the Xbox 360's CPU architecture wasn't what game developers had asked for.

They wanted a multi-core CPU, but not such a significant step back in single threaded performance. When AMD and Intel moved to multi-core designs, they did so at the expense of a few hundred MHz in clock speed, not by taking a step back in architecture.

We suspect that a big part of Microsoft's decision to go with the Xenon core was because of its extremely small size. A smaller die means lower system costs, and if Microsoft indeed launches the Xbox 360 at $299 the Xenon CPU will be a big reason why that was made possible.

Another contributing factor may be the fact that Microsoft wanted to own the IP of the silicon that went into the Xbox 360. We seriously doubt that either AMD or Intel would be willing to grant them the right to make Pentium 4 or Athlon 64 CPUs, so it may have been that IBM was the only partner willing to work with Microsoft's terms and only with this one specific core.

Regardless of the reasoning, not a single developer we've spoken to thinks that it was the right decision.

---------------------------------------------------------

-Anandtech.com

Reply to Heyyou27

A complier can only do so much.

Reply to Action_Man
- 0 +

It'd be good for BOINC/Folding =D

Reply to borandi
- 0 +

well, looks like Cell will not be that good as I expected.
I trust the IEEE, but it is well explained by anand, the theory is one, but the reality is different.

Reply to gOJDO

Yeah, unfortunately it looks like Cell is having many problems on many levels.

Aside from the difficulty to program, it appears that due to the complexity and disperse geometry of functional units on the die surface, actual production yields have been dismal.

Rumor has it that Sony can't even get a single unit to show at E3 this year.

Cell should work well for many workloads, but I remember the initial press blitz when IBM announced the existance of this chip. As cool as it would be, it certainly will never live up to that inital hype.

Reply to iterations
- 0 +

Not to mention 1 of the SPE's will be disabled for yield sake. The OS will ALWAYS have full control of at least 1 SPE so that's 2 down right there. The OS also has the right to take away another SPE as it see's fit for media, and communication, and other features as well. Sony seems bent on creating a machine which is hard to port from just like the PS2. To be honest the 360 impresses me more than anything right now because MS seems to honestly want to give us a great gaming platform like nintendo and sega strive to do.

Reply to K8MAN
- 0 +

Maybe the Octopiler will help things a bit.

http://news.zdnet.com/2100-9593_22-6042132.html

Reply to mjp1618

Pffft, there's nothing new about it and like so many devs have said, they have no faith in it.

Reply to Action_Man
- 0 +

Quote :

Pffft, there's nothing new about it and like so many devs have said, they have no faith in it.


Cell will be great for physics but that gimick will get old soon. I heard a quote from a developer saying that you could always tell which room had PS3 development going on because you could hear the programmer's yelling !@#$ and !@#!@$ and Mother!@#!@#$!@#$ sony!!! :lol:

Reply to K8MAN
- 0 +

Quote :

... it's difficult to code for and it additionally sucks, max peak performance is attainable on "hand picked code".



Same problem the Itanium has.

Luckily in games it's fairly common to write vectorized code (i.e you can do a 3D dot product in one cycle or a matrix multiplication in 4-6 cycles). The Cell couldn't be any harder to programmer for than the PS2's VU units...

Past vectorizing compilers have failed to optimize code properly (see Itanium compilers and products like VectorC) and probably won't until compiler research makes bigger breakthroughs.

Games, rendering, and high-end scientific apps are all about these "vector" computations. Still, the Cell is cheap and it does double-precision operations fast (single precision floats aren't accurate enough) - it's going to make it in the non-games sphere.

Reply to voxel
- 0 +

Quote :

We're talking peak theoretical performance; real numbers won't be nearly as high.

- Anandtech.com Article snipped -



A pretty accurate article. My beef with Xbox 1 development was the dodgy GPU (terrible fillrate and little system bandwidth) and a crappy Intel CPU that had varying performance characteristics. The CPU didn't seem to care if memory was aligned or not (performance for either was the same) and in the end I think it was FASTER to use SSE1/2 to do floating point operations than the standard FP ops. That's why Xbox games ran at 30fps not the usually 60fps as expected. While the PS2 was incredibly difficult to program for - it was more predictable.

It seems console makers love to continue to cut corners. Neither CPU sound easy to program for due to the complexity of writing threaded 3D engines (which are essential sequential in design - read input, simulate, render).

Reply to voxel
- 0 +

Good for physics ?, how can a completly in-order processor be good for physics, I dont care how many cores there are, physics is one of the most sporratic gaming effects created, you cannot turn them off when the player turns around, you cannot control which way,degree,speed the player chooses to use these physics, and in-order is simply not qualified for that kind of processing environnement. The xbox 360 is ahead of the ps3, I could care less, what cgi trailer sony puts out, they have one real in-game demo and that is I-8 or resisitance fall of man which fails to reach HL2 graphics.... it falls way short of the bar. Since the xbox 360 has the R600 archetecture one may assume it is ready for DX10, if not can be updated with an auto firmware patch via xbox live. Dx 10 is much faster, more optimised. To date no game on the 360 uses all three cores. There is a really good reason why many xbox games ran at 30 fps.... they were simply much more graphically advanced then the ps2. Id really like to see a cell platform in a windows/linux environnement and see the cell get destroyed by our dual cores of today.

Reply to Mike995
- 0 +

Quote :

There is a really good reason why many xbox games ran at 30 fps.... they were simply much more graphically advanced then the ps2.



That's only partially true and a naive answer. Yes, Xboxes were more graphically advanced - the GPU ate vertices for breakfast whereas the PS2 VU units could transformation less than half the Xbox could. In the end, it was all for naught. As you soon as you drew a bunch of large screen size texture particles/quads/etc. the GPU would STALL and drop the framerate to 20fps. It didn't matter if you had 1,000 polygons or 500,000 polygons. The weakness of the fillrate is akin to be having a 64k L2 cache on a modern Intel CPU because overall it's hard to predict WHEN then bandwidth/fillrate is max-ed out so the only solution was to run most games at 30fps.

BTW Physics is a sequential in-order operation as is input reading and rendering. Modern game engines are struggling to use multiple threads and multiple cores and multiple CPUs because of the sequential nature of data flow:

http://www.gamasutra.com/features/ [...] b_01.shtml

I've thought about this problem for a while and the solutions I've come up with aren't pretty or easy to implement.

P.S I doubt DX10 will make Xbox 360 any faster. Most API wrappers on consoles are very THIN. What made the Xbox (the original) go faster was better optimized compilers. Custom written vectorized assembly on a console can spank the fastest desktop PC. Consoles have the bandwidth that PCs don't. However, PC video cards can be more powerful.

Reply to voxel
- 0 +

It could make a graphical improvement, maybe not higher framerate but the games would look better while maintaining a relativily high framerate. Computer is and will always be better then consoles, for gaming and everything, no matter what sony, ms, or nintendo says. We are getting atis 2nd gen of the R600, which has a rumored 64 half pipes, its going to be a monster. Yea there are alot of xbox games that the framerate would drop off randomly, very frustrating and I didnt know why it did so, good to find an answer. The thing about consoles having higher memory bandwidth is they need it more then pcs, where as they have much lower capacitys of 512 mbs for the whole console (ps3,and 360), we in high end gpus have 512 mbs of memory for just the gpu which is now in the range of 50 gb/s, plus most high end systems have 2 gbs or more ram, which with AM2 coming out,and most intel platforms already having ddr2 memory, most of the memory in a high end pc will be ddr2 and get around 10 gb/s or more memory bandwidth while having 4x-8x the capacity. The new consoles do have alot of cores, but they all are in-order creating an extreme bottleneck.

Reply to Mike995
- 0 +

Quote :

no whining code writer will stop it from coming into bieng.


I will dissagre with you. Very simple, there is no use of any hardware if it is not software supported. It is like you have a ferrari, but you don't have fuel.
Remember what kind of stuff programers did on C64, Amiga, Atari?
What kind of hardware were they using for that?
So, compared to what we have today and todays hardware, I think that software lacks a lot behind hardware.

Reply to gOJDO
- 0 +

Quote :

no whining code writer will stop it from coming into bieng.


I will dissagre with you. Very simple, there is no use of any hardware if it is not software supported. It is like you have a ferrari, but you don't have fuel.
Remember what kind of stuff programers did on C64, Amiga, Atari?
What kind of hardware were they using for that?
So, compared to what we have today and todays hardware, I think that software lacks a lot behind hardware.

Reply to gOJDO
- 0 +

There is difference between the nature and man. Man is making mistakes, nature is perfect. In computer terms, for example the Netburst architecture was a mistake. The same can be with the Cell's asymetric architecture.

Reply to gOJDO
- 0 +

Quote :

Computer is and will always be better then consoles, for gaming and everything, no matter what sony, ms, or nintendo says.



That's a naive answer.

Consoles have (and still do) a huge bandwidth advantage. While you think this is because of the limited memory, that's not entirely accurate. PCs have the CPU + GPU power, but have traditionally suffered in high-framerate 2D or racing games that require constant streaming of data.

Also Windows is not such a great real-time OS (IRIX is) and pre-emptive multithreading wreaks havoc (creates input latency, frame drops). So really, the modern PC architecture is fine - but Windows is a so-so operating system for games.

Consoles are also standardization and "easier" to optimize for. Console CPUs typically only accept CACHE ALIGNED data. They halt execution if your data isn't on aligned boundaries. They might be slower per clock than PCs CPUs, but the code is always written "correctly."

Reply to voxel
- 0 +

Quote :

Consoles have (and still do) a huge bandwidth advantage. While you think this is because of the limited memory, that's not entirely accurate. PCs have the CPU + GPU power, but have traditionally suffered in high-framerate 2D or racing games that require constant streaming of data.


Consoles were never better than PCs. They have less bandwidth, they have lower framerate for 2D and for 3D. See the hardware specifications first, than see the benchmarks.

Quote :

Consoles are also standardization and "easier" to optimize for. Console CPUs typically only accept CACHE ALIGNED data. They halt execution if your data isn't on aligned boundaries. They might be slower per clock than PCs CPUs, but the code is always written "correctly."


No, the consoles are not standard, and are much more harder to optimize. PCs are designed for multiple purposes, while consoles are designed for gaming purpose only. There are many times more application developers, application tools, programing languages and compilers, utilites, etc for PC. There is no code written "wrong", every code must be written correctly in order to be compiled and linked. There are unefficient methods and algorythams, depends on who is coding.

Reply to gOJDO
- 0 +

Quote :

Consoles were never better than PCs. They have less bandwidth, they have lower framerate for 2D and for 3D.



And until recently, it was impossible to edit/composite film resolution images in real-time on the PC due to bus bandwidth limitations - whereas SGI was doing this back in 1998 (albeit you had to pay a million bucks for this right). SGI used Cray's insanely fast "crossbar" architecture after(?) they bought that company in the mid-90s. AMD's using a variation of it nowadays.

Most console games have a lower framefrate because they are locked at a multiple of the TV's vsync (60hz or 50hz). You get shearing if you display graphics at a different framerate.

I recall the PS2 had approx an average 9 GB/s(?) of bandwidth. It was insane compared to the shoddy PCI bandwidth(~110MB/s) of yesteryear. Xbox 1 had HyperTransport to the I/O system and something else to the GPU.

PCIe x16 is about 4Gbp/s (~ 512MB/s) + FSB memory 3.2-6.4GB/s - all of which combined is mediocre compared to the SGIs.

From wiki:

Xbox: Theoretical Memory Bandwidth: 6.4 GB/s.
PS2: Memory Bus Bandwidth: 3.2 GB/s + DRAM Bus bandwidth: 47.0GB/s.

Fast low-latency RAM, fast caches, etc. all help.

As for modern consoles, I dug out a few numbers:

http://xbox360.ign.com/articles/617/617951p3.html

The PS3 has 22.4 GB/s of GDDR3 bandwidth and 25.6 GB/s of RDRAM bandwidth for a total system bandwidth of 48 GB/s.

The Xbox 360 has 22.4 GB/s of GDDR3 bandwidth and a 256 GB/s of EDRAM bandwidth for a total of 278.4 GB/s total system bandwidth.


The GameCube had super-fast video RAM and losts of bandwidth. It suffered due to the tiny amount of video RAM tho'.

Quote :

No, the consoles are not standard, and are much more harder to optimize. PCs are designed for multiple purposes, while consoles are designed for gaming purpose only.



One CPU. One GPU. One type of optical drive. It's MUCH easier to optimize and tune for a stable, predictable platform than the mishmash of PC components - most PC games aim for a low common denominator.

Reply to voxel
- 0 +

Quote :

And until recently, it was impossible to edit/composite film resolution images in real-time on the PC due to bus bandwidth limitations - whereas SGI was doing this back in 1998 (albeit you had to pay a million bucks for this right). SGI used Cray's insanely fast "crossbar" architecture after(?) they bought that company in the mid-90s. AMD's using a variation of it nowadays.

Most console games have a lower framefrate because they are locked at a multiple of the TV's vsync (60hz or 50hz). You get shearing if you display graphics at a different framerate.

I recall the PS2 had 9.x?GB/s of bandwidth. It was insane compared to the shoddy PCI bandwidth(~110MB/s) of yesteryear. Xbox 1 had HyperTransport to the I/O system and something else to the GPU.

PCIe x16 is about 4Gbp/s (~ 512MB/s) + FSB memory 3.2-6.4GB/s - all of which combined is mediocre compared to the SGIs

As for modern consoles, I dug out a few numbers:

http://xbox360.ign.com/articles/617/617951p3.html

The PS3 has 22.4 GB/s of GDDR3 bandwidth and 25.6 GB/s of RDRAM bandwidth for a total system bandwidth of 48 GB/s.

The Xbox 360 has 22.4 GB/s of GDDR3 bandwidth and a 256 GB/s of EDRAM bandwidth for a total of 278.4 GB/s total system bandwidth.


The GameCube had super-fast video RAM and losts of bandwidth. It suffered due to the tiny amount of video RAM tho'.

One CPU. One GPU. One type of optical drive. It's MUCH easier to optimize and tune for a stable, predictable platform than the mishmash of PC components - most PC games aim for a low common denominator.


SGI is not a gaming console. It is PC!
On TV resolution every graphics card can be insane.
Those numbers are meaning nothing, you are comparing PS2 cpu&gpu total bandwidth to PCI bus bandwith. And all other mentioned numbers are only theoretical and are not big deal in terms of performance. You can't count on the GPU to VideoRAM + CPU to RAM + interconects of each CPU as a total bandwidth.
For example, 8P opteron server. Each CPU has 3 HTT links, each providing 8GB/s + 6.4GB/s RAM bandwith. That is 30.4GB/s for each CPU, multiplied by 8, that is 241GB/s. Now lets add the inter bandwith of two Radeon 1900XT in crossfire. Each providing 54GB/s memory interface bandwidth + the crossfire connection. Now lets add the mainboard resources. I can't count, I guess with your math, such PC will have over 2TB/s.
But as I said, those numbers are meaning nothing. Try with real-life performance benchmarks.

Reply to gOJDO
- 0 +

Quote :

(...)So, compared to what we have today and todays hardware, I think that software lacks a lot behind hardware.



Maybe true, but I have to disagree on what regards past references and near-future probabilistic trends: From a few ground-breaking non-x86 ISA uArchs (Itanium, MIPS, Sparc, Transmeta, POWER, PPC,...), only DEC Alpha vanished (not to oblivion, though: It's still a reference!). Not because the lack of (formidable) compilers, not because of lack of comparable performance, not because of software support (and, even x86 emulation, "FX32!", if I recall correctly...); rather, because it couldn't compete - at the time - with the x86 ISA (well, one of the reasons, anyway).
MIPS adapted & became a hardware-wise open source processing platform; Itanium survives and it's not a soon-to-be paper release; it's a [material] fact.
Cell may be hard to programme; well, x86 is not easy, at all! Although inherently hard, it's stands out in complexity by comparison with x86 & PPC, for example.
As for its workings, it's also advisable to check the sources: http://www-128.ibm.com/developerwo [...] larnutter/

And, it has already left a "material" footprint (aside the PS3):
http://www.research.ibm.com/cell/w [...] _cloth.pdf;
http://techon.nikkeibp.co.jp/engli [...] 25/105050/

Certainly it might vanish, leaving room for more realistic (read actual) approaches (I don't pretend to sound like Cell's attorney!); however, most probably, highly complex compilers will have to be developed, for the near-future uArchs; and Cell might well be one of many new computing paths to explore.
This is my opinion & my conviction.


Cheers!

Reply to joset

Quote :

The Xbox 360 has 22.4 GB/s of GDDR3 bandwidth and a 256 GB/s of EDRAM bandwidth for a total of 278.4 GB/s total system bandwidth.



I can't believe you quoted that.

Its 32GB/s between the parent and daughter die and its 256gb/s between the logic and eDRAM.

Reply to Action_Man
- 0 +

Quote :

(...)So, compared to what we have today and todays hardware, I think that software lacks a lot behind hardware.



Quote :

I have to disagree on what regards past references and near-future probabilistic trends.



Quote :

however, most probably, highly complex compilers will have to be developed, for the near-future uArchs; and Cell might well be one of
many new computing paths to explore.



So, do you agree that software is laging?

Reply to gOJDO

Quote :

Computer is and will always be better then consoles, for gaming and everything, no matter what sony, ms, or nintendo says.



That's a naive answer.

Consoles have (and still do) a huge bandwidth advantage. While you think this is because of the limited memory, that's not entirely accurate. PCs have the CPU + GPU power, but have traditionally suffered in high-framerate 2D or racing games that require constant streaming of data.

Also Windows is not such a great real-time OS (IRIX is) and pre-emptive multithreading wreaks havoc (creates input latency, frame drops). So really, the modern PC architecture is fine - but Windows is a so-so operating system for games.
Consoles are also standardization and "easier" to optimize for. Console CPUs typically only accept CACHE ALIGNED data. They halt execution if your data isn't on aligned boundaries. They might be slower per clock than PCs CPUs, but the code is always written "correctly."


Ahahaha. . . so what do you game in? Solaris?

Reply to jkflipflop98
- 0 +

Quote :

(...)So, compared to what we have today and todays hardware, I think that software lacks a lot behind hardware.



Quote :

I have to disagree on what regards past references and near-future probabilistic trends.



Quote :

however, most probably, highly complex compilers will have to be developed, for the near-future uArchs; and Cell might well be one of
many new computing paths to explore.



So, do you agree that software is laging?

Absolutely.
My point is, because it's lagging (an lacking!) and, because I tend to believe corporations don't just throw away thousands of millions (in money) and all sorts of resources just for the fun of it, the end result challenge is big enough for programming developers to engage into the "do the impossible" task.
Moreover, when we're talking about Microsoft, Toshiba, IBM, Sony & the likes, I risk to affirm that they know what they're doing... and, market-wise, both XBox & PS3 consoles may be a big plus for Microsoft & Sony; all the other intervenients, await success in not-so-specific niches.

I must also state that, not being an expert in any hard/soft areas, I really do appreciate your inputs.


Cheers!

Reply to joset
- 0 +

Quote :

Ahahaha. . . so what do you game in? Solaris?



Xbox/Xbox360 uses a cut-down real-time version of Windows. PS2/PS3 uses a real-time Linux. Maybe somebody should create a NEW OS (when EFI becomes standard) for games...

Reply to voxel
- 0 +

Quote :

Those numbers are meaning nothing, you are comparing PS2 cpu&gpu total bandwidth to PCI bus bandwith. And all other mentioned numbers are only theoretical and are not big deal in terms of performance. You can't count on the GPU to VideoRAM + CPU to RAM + interconects of each CPU as a total bandwidth.



I think I was comparing PS2 CPU+GPU bandwidth(well over 6.4GB/s) to PCI + GPU bandwidth(total bandwidth under 6.4GB/s). circa 2002. The PS2 owned back then because it had to - there was so little memory in the subprocessors: GS, VUs - that the machine was constant sending gigabytes of data around. Even the lowly Xbox had 6.4GB/s in it's unified memory architecture though in practice I think I only got around 4-5 GB/s.

The PS3 and Xbox360 raise the bar even higher. The bandwidth to handle rendering a 60fps 720p 4xFSAA with a million triangles and textures with 512mb shared memory is insane. PCs tend to "cache" vertex and texture data in the GPU reducing the need for a crazy CPU->GPU (PCI) bandwidth consoles, but that means preloading data whereas many consoles have optied for dynamic streaming model (severly limited by the DVD read speed).

Quote :

For example, 8P opteron server...



Still mediocre compared to a 5 year old SGI server. Opterons are sweet and the fact they are affordable is great, but consoles use the same architecture as the SGIs did - Unified Memory Architecture (no need for separate video and CPU memory), Crossbar/HyperTransport and fast, fast GPUs + rasterizers.

The whole idea PCs have always had greater bandwidth is false. I wanted to show PCs in the past have had shoddy bandwidth and that the new PCs (Opterons w/ multiple HyperTransport links) and SLI/CrossFire have finally caught up to consoles + SGIs.

I hope to see the next-gen of PCs go Unified Memory Architecture - I never liked the idea of separate memory for audio, video, CPU. I also want Opteron-like HyperTransport (I own a dual Opteron rig) bus between all the components: CPU, GPU, Physics processor, audio processor, etc..

Reply to voxel

AGEIA claims that the PhysX card has nearly 2TB/s of internal bandwidth.

Reply to Heyyou27
Tom's Guide > Forum > CPU & Components > CPUs > SPEs and PPEs in cell
Go to:

There are 18 identified and unidentified users. To see the list of identified users, Click here.

Please mind

You are about to answer a thread that has been inactive for more than 6 months.
If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.

Add a reply Cancel
Google ads