Intel looks for thousand core chips

Intel apparently can’t get enough cores.

Intel scientists have been contemplating their navels to work out a way that they can pass more than a thousand.

Chipzilla boffin Timothy Mattson has released a paper which said that the architecture for the Intel 48-core Single Chip Cloud Computer (SCC) processor is “arbitrarily scalable.”

Apparently the problem is that after you hit about 1,000 cores the diameter of the mesh, grows so much that it stuffs up performance.

However he is still convinced that the future progress of microprocessors will depend on packing ever more cores onto a chip.

Currently multicore chip architectures depend on cache coherency, which is a set of protocols that make sure that each core has the same view of the system’s memory.

But as more cores are added, this all breaks down because “the protocol overhead per core grows with the number of cores, leading to a ‘coherency wall’ beyond which the overhead exceeds the value of adding cores.”

Mattson thinks that it would be better to kill off cache coherency and allow cores to pass messages among one another.

His team has been developing message-passing techniques for the chip that would scale as more cores are added.

So far they have come up with an experimental chip which has not made it to Intel’s product road map.

Apparently it was first fabricated with a 45 nanometre process at Intel facilities about a year ago.

It is a six-by-four array of tiles, each tile containing two cores. It has more than 1.3 billion transistors and consumes from 25 to 125 watts.

To keep it simple the team dusted off an off-the-shelf 1994-era Pentium processor design for the cores themselves. Mattson admits that the performance is pants, and it uses a standard X86 instruction set.

But each core has a “mesh interface component” that packages data into packets and connects to an on-board router. Each tile also has a “message-passing buffer,” with 16 kilobytes of random access memory.

By installing the TCP/IP protocol on the data link layer, the team was able to run a separate Linux-based operating system on each core.

Mattson a 48-node Linux cluster on the chip was possible but boring.

The team also developed a small API library for message passing among the cores, called RCCE.

Tests showed that a message passing among the cores could be just as speedy using RCCE as with TCP/IP-based Linux cluster.

So far, it has all been promising in that the team has managed to prove that the SCC processor and its native message passing API provide an effective software development platform.

“The expected difficulties due to the lack of asynchronous message passing have so far not materialised,” the paper says.