Intel spills the beans on Skylake’s graphics

Sky-and-lakeIntel has been releasing a few details about what is under the bonnet of its Skylake integrated graphics system.

A white paper on Intel’s website detailing the compute architecture of Intel’s “Gen9” graphics. It’s an update to the Gen8 whitepaper which can be found in all its glorious whiteness here .

As with previous Core processors, Skylake uses a System-on-Chip (SoC) architecture.

“Intel 6th generation Core processors are complex SoCs integrating multiple CPU cores, Intel processor graphics, and potentially other fixed functions all on a single shared silicon die,” the white paper says.

“The architecture implements multiple unique clock domains, which have been partitioned as a per-CPU core clock domain, a processor graphics clock domain, and a ring interconnect clock domain. The SoC architecture is designed to be extensible for a range of products, and yet still enable efficient wire routing between components within the SoC.”

Ring topology is an on-die bus between CPU cores, caches, and graphics. It’s bi-directional and 32-bytes wide with separate lines for different tasks. All off-chip system memory transactions going to and from the CPU cores and to and from the graphics portion, are facilitated by the ring interconnect.

However what Skylake has is a coherent SVM write performance is significantly improved via new LLC cache management policies. L3 cache capacity has been increased to 768 Kbytes per slice, that’s 512 Kbytes for application data.

The sizes of both L3 and LLC request queues have been increased and EDRAM now acts as a memory-side cache between LLC and DRAM. The EDRAM memory controller has moved into the system agent, adjacent to the display controller, to support power efficient and low latency display refresh.

Texture samplers now natively support an NV12 YUV format for improved surface sharing between compute APIs and media fixed function units.

Gen9 adds native support for the 32-bit float atomics operations of min, max, and compare/exchange. Also the performance of all 32-bit atomics is improved for kernel scenarios that issued multiple atomics back to back.

The chip’s 16-bit floating point capability is improved with native support for denormals and gradual underflow.

Gen9 adds new power gating and clock domains for more efficient dynamic power management. This can particularly improve low power media playback modes.

Skylake’s graphics execution unit (EU) is similar to the Gen8 design. Each Gen9 EU has seven threads to work with, each of which features 128 general purpose registers. Each of those registers can store 32 bytes accessible as a SIMD 8-element vector of 32-bit data elements.

It will be Intel’s fastest integrated Intel HD graphics solutions to date and does have a chance of competing with lower end graphics cards.