Multicore chips need to be mini-internets

Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, said that in the future massively multicore chips will need to resemble little Internets.

Peh told the International Symposium on Computer Architecture that each core will need an associated router, and data travels between cores in packets of fixed size.

This week Peh’s group unveiled a 36-core chip that features just such a “network-on-chip” to make his point.

This chip fixes the cache coherence problems that have stuffed up previous attempts to design networks-on-chip. Until now ensuring that cores’ locally stored copies of globally accessible data remain up to date has been a problem.

Most chip cores are connected by a bus and when two cores need to communicate, they’re granted exclusive access to the bus.

But that approach won’t work as the core count mounts as cores spend all their time waiting for the bus and when one finally shows up several arrive at the same time.

Bhavya Daya, an MIT graduate student in electrical engineering and computer science, and first author on the new paper said that in a network-on-chip, each core is connected only to those immediately adjacent to it.

This means that it is possible to reach the neighbouring chip really quickly and have multiple paths to your destination. So if you’re going way across, rather than having one congested path, you could have multiple ones.

But the bus system makes it easier to maintain cache coherence. Every core on a chip has its own cache, a local, high-speed memory bank in which it stores frequently used data. As it performs computations, it updates the data in its cache, and every so often, it undertakes the relatively time-consuming chore of shipping the data back to main memory.

To fix the problems of another core needing the data before it’s been shipped, chips use a protocol called “snoopy,” because it involves snooping on other cores’ communications.

But in a network-on-chip, data is flying everywhere, and packets will frequently arrive at different cores in different sequences. The implicit ordering that the snoopy protocol relies on breaks down.

Daya, Peh, and their colleagues solve this problem by equipping their chips with a second network.

Groups of declarations reach the routers associated with the cores at discrete intervals — intervals corresponding to the time it takes to pass from one end of the shadow network to another. Each router can thus tabulate exactly how many requests were issued during which interval, and by which other cores. The requests themselves may still take a while to arrive, but their recipients know that they’ve been issued.

After testing the prototype chips to ensure that they’re operational, Daya intends to load them with a version of the Linux operating system, modified to run on 36 cores, and evaluate the performance of real applications, to determine the accuracy of the group’s theoretical projections.

At that point, she plans to release the blueprints for the chip, written in the hardware description language Verilog, as open-source code.