With the new Skylake lineup, Intel continues its Tick-Tock release strategy, but this particular one is rather more important than most releases up to date. Some of you may wonder why is that since the Broadwell family already brought us the 14nm FinFET manufacturing process so what does Skylake really has to offer? To be honest, you can’t expect big performance bumps, but Intel sure as hell did a fine job implementing numerous new features aimed mainly at power efficiency and microarchitecture optimization that weren’t present in Broadwell processors.
Furthermore, Intel boasts a huge Skylake lineup release ranging from 4.5W TDP chips to 91W desktop processors, which, to be honest, is a huge deal. It’s nice to know that you can go to your local store and pick up a notebook, convertible or a desktop PC that implements the latest features Skylake has to offer. In addition, almost all CPUs integrate new iGPUs with various performance boosts and even GT4e cores with 128MB eDRAM, but we will talk about that later. Now, let’s take a deep dive into what Skylake has to offer with its new architecture.
Before we begin, you can take a look at where the CPUs (Core i7-6500U and Core i7-6700HQ), which are included in the review, are placed in our ranking charts:
The architecture presented in the Skylake generation is based on the same 14 nm FinFET process and features that we saw in Broadwell but with significant upgrades to further improve power consumption, performance etc.
However, Intel has a lot of interesting opportunities with Skylake installed for us so let’s begin with the DMI for example. The DMI in Broadwell chips was version 2.0 allowing data transfer of 5.0 GT/s (2GB/sec) while Skylake chips will take advantage of the improved DMI 3.0 protocol with 8.0 GT/s (3.93GB/sec) speeds with a small detail – the CPU has to be as near as 7 inches from the chipset in order to maintain the required signal speed. In comparison, the DMI 2.0 protocol required a maximum of 8 inches. Overall this will allow significant upgrades on the system.
Speaking of upgrades, the new Skylake processors have similar to the Broadwell chips 16 lanes of PCIe 3.0 to be used directly by the attached devices. Depending on the motherboard design, lanes can be divided in x8/x8 lanes, x8/x4/x4 or a single x16 lane.
Now let’s take a look at the FIVR (Fully Integrated Voltage Regulator) that’s been now moved away from the CPU itself. It’s an interesting design solution, but quite understandable. You can see on the image, below that the Haswell and Broadwell chips include this module inside, but the Skylake architecture requires the FIVR to be moved away for one notable reason – temperature. With the Haswell and Broadwell generation Intel has integrated the voltage regulator inside the die in order to reduce motherboard costs and reduce power consumption. While this worked for mobile SoCs, the same cannot be said about desktop configurations. The FIVR is known to produce a lot of heat at higher frequencies resulting in unwanted heat and limiting the overclocking capabilities of the CPU. Reverting back the changes for the desktop PC market might result in a slightly bump in prices of the motherboards but it will surely prove effective. Nonetheless, the FIVR remains unchanged on mobile motherboards.
As eDRAM is an important aspect for some of you, Intel has made some changes here as well. On the graphic above you will see the typical alignment of the eDRAM based cache on Haswell and Broadwell processors. It’s accessed by a store of L4 tags that are located inside the Last Level Cache (LLC) of each core acting as a pseudo-L4 cache and mostly relying on L3. In practice, this really improves performance, but eDRAM isn’t used efficiently as a dynamic access memory.
The image below represents the design of the new eDRAM cache implemented in Skylake CPUs. The Last Level Cache now acts as a buffer allowing any software that needs DRAM access to do so freely. While the graphic workloads system agent will still need to circle around, drawing more power in the process, but at least the GPU drivers won’t need to check the size of the eDRAM, because it’s now fully visible and allows display engine-related tasks to bypass L3 cache and go straight to the LLC in contrary to the conventional eDRAM scenario. Still, Intel has allowed some interference with accessing the eDRAM since some applications will need a larger pool of data than the L3 offers. So in this case, the application will directly reside on the eDRAM to prevent overwriting the data on the L3 and recaching it to the next level.
Furthermore, Intel will now be releasing CPUs with 64MB and 128MB of eDRAM rather than settling only at 128MB for the Haswell and Broadwell generation CPUs. This means that 64MB of eDRAM cache will be available in GT3e configurations with 48 EU (Execution Units) and 128MB variants can be found in GT4e configurations. We can expect Skylake-U CPUs to integrate GT3e configurations and Skylake-H chips will go along with GT4e configurations.
Skylake-Y (Core M) with configurable TDP (cTDP) of 3.5-7W
The lineups will be the same as last year’s, but with small changes in the names. Starting with the Core M family (codenamed Skylake-Y), Intel will keep things as they are with 4.5W TDP SoCs, low base frequencies and as big as 2.0 GHz Turbo Boost frequencies. But this time around Intel is tightening the strings around the Core M implementation inside products since we saw quite a few problems with the OEMs not following specific guidelines to assure maximum performance. Now Intel will give specific guidelines and design limitations that need to be followed by the manufacturers and make sure they will deliver the expected performance of the CPU. We would also like to note that the Core M SoCs will support only LPDDR3/DDR3L DRAM and will not bring support to the DDR4L. Maybe there were some design limitations or problems that occur with power consumption but with the thrive of technology Intel will surely bring the DDR4L support to the Core M chips in the future.
Skylake-U (Ultra-low voltage CPUs with 15W TDP and 28W)
Skylake-U chips will be roughly the same as we saw last generation – 15W and 28W variants with eDRAM configurations or in other words – 2+2 and 2+3e. Moreover, as mentioned before, Skylake-U chips with eDRAM will come in two flavors – 64MB and 128MB. More information about the performance can be found later in the review thanks to the Core i7-6500U engineering sample that we’ve acquired.
Skylake-H (High-performance CPUs with 45W TDP)
The Skylake-H history suggests we will deal again with BGA only socket CPUs (soldered directly to the motherboard) and again the chipset is external to the CPU package. The TDP falls to 45W from 47W compared to the Haswell generation CPUs and will be coming in the well known configurations such as 4+2 and 4+4e with the latter being only 128MB eDRAM configurations. Luckily, we have one chip representing the Skylake-H lineup (Core i7-6700HQ), again in the form of an engineering sample, but will give you a relevant information about the performance of the CPU.
In the same category falls the i7-6820HK that is user overclock-enabled. The CPU is specially designed to be tweaked and will surely find its way into some 17-inch gaming beasts.
Skylake Intel Xeon mobile processors (45W)
Before going into the desktop realm, Intel for the first time launches its first mobile Xeon CPUs that are already implemented in some configurations and will hit the market really soon. This means that users that need the extra power, ECC support and all the benefits Xeon has to offer, can take one home in mobile flavor with TDP of 45W.
Skylake-S (Desktop CPUs with 35-65W TDP)
Finally, Skylake-S stands for the desktop CPUs including the K (overclock-enabled processors) series. Contrary to what most users expected, with the move to 14nm manufacturing process Intel didn’t reveal hexa-core configurations keeping the 2+2 and 4+2 variants. The list of awaited processors part of the Skylake-S family includes the regular 65W CPUs like i7-6700, i5-6600 etc., the overclockable Skylake-K chips and low voltage CPUs like i7-6700T, i5-6600T, and i3-6100T. As you may have already picked up, the desktop CPUs will integrate GT2 graphics with different clock speeds.
Unfortunately, we cannot acquire information about the transistor count in each lineup since Intel discontinued giving detailed information of this sort. Mainly because Intel thinks that such information is irrelevant to the end consumer and will give its competitors valuable information.
Technology and features
The most interesting feature we see in the Skylake generation is arguably the so-called Intel Speed Shift technology. The feature basically allows for the CPU and the OS to control the voltage of the processor according to the current state, load, and frequency. That kind of reminds us of the technology that AMD implemented in their new Carrizo chips to control the voltage in real time. You can see the standard implementation of the technology on the image below. It is said that P-states can take up to 30 ms to adjust, but if the CPU manages the so-called P-states the time is greatly reduced to around 1 ms. As if a particular task is done without any power to spare and adjusted fast, efficiently and accordingly. All P-states from P0 turbo modes to Pn idle states are managed by the hardware and the OS overrides control of the cores.
Things, however, get a bit different with the new Speed Shift technology. Intel is bringing back the P-states down to hardware-level spanning between the lowest clocks (usually 100 MHz) up to the maximum allowed frequency. This allows for a much faster response rate and adjusts the frequency according to the situation. This implementation will find its way especially in the Core M processors that feature a big span of burst clocks.
However, for Speed Shift to work as it should, the OS should be able to retake control of the cores when needed. The whole point in this is to adjust faster the P-states when needed (CPU control) or a particular performance is needed to meet the requirements of an application for example (OS control). Unfortunately, for this feature to work seamlessly with the CPU, the OS has to be Speed Shift-enabled and currently no OS supports this neat feature. Now Intel is working closely with Microsoft to implement the new feature into Windows 10.
As shown on the graph below not everything is performance and power states. Sometimes decreased performance can result in increased energy consumption and vice versa. Everything has to balanced, but in order to do so the system has to gather enough information about the minimum power consumption needed for the system to continue running. After all the information is gathered, the system will have a goal to remain at the lowest possible power state for as long as possible. This leads us to the next notable feature coming with Skylake.
Psys (power of System)
Again the help of the PCU (Power Control Unit) is needed for this feature to work properly, or work at all. The PCU is an important addition to the CPU as it keeps track of the power requests, energy consumption of certain silicon areas and computes all of the information providing essential part of the CPU’s power gating, duty cycling, and adjust frequency and voltage.
Since we cleared that out, let’s take a look at the Autonomous Algorithms implemented to adjust the power consumption and the overall performance of the CPU. Intel has tackled this with two different algorithms – one for high range and one for low range.
It’s really hard for a CPU to know whether the user needs work to be done faster or more efficiently so this is where the autonomous algorithm for high range comes into play. Usually for things to be done faster we need more raw power, but this comes at the expense of more energy consumption, so that’s why the algorithm can be preprogrammed by the OS or the OEM based on a percentage performance increase for percentage loss in efficiency. In some particular situations when burst performance is needed and in some – fixed function. This is analyzed again by the algorithm and can be adjusted by the OS.
All in all, the Speed Shift’s goal isn’t overall performance, but efficiency. We would like to stress on this because synthetic benchmarks like Cinebench, Novabench etc. will not record increase in raw performance due to the steady state nature of the benchmarks. Nonetheless, in other applications like office work, browsing, conference calls, video calls, where repeated adjustment of the voltage and frequency is needed, the change will be notable.
Power control balancing
The power control balance is essential to overall performance and efficiency. During prolonged workload, the power should be distributed to where is needed the most because if it’s the other way around it can lead to performance drops. For example, more power to the CPU will not be needed when the iGPU is requesting power due to the graphic-related workload. Now OEMs that will integrate Skylake-YUH CPUs can have access to alter the following characteristics of the system: rolling average temperature limit (RATL), SOC/enclosure temperature (PL1), regulator sustained current (PL2) and battery current protection (PL3). The final result is efficient distribution of the required power between cores, graphics and system and keeping the sum power within the PL1 range.
Duty Cycling is a feature that has been introduced in the previous Broadwell generation, but it applies only to iGPUs. Now Intel is enabling the same feature, but for the CPU cores as well. It’s all about the race between idle state to sleep and the other way around. The main purpose of this feature is to get the job done as fast as the CPU can and then go back to idle for increased power efficiency.
As easy as it sounds, the implementation of this feature meets an obstacle. A certain amount of voltage is needed to trigger the transistors. After passing this threshold the transistors become active and enter into the “efficient region” seen on the graph below and as it turns out there’s no point in further decreasing the frequency in order to reduce power consumption. It’s not effective.
Intel tackles the issue by a rather intriguing way and here’s where Duty Cycling shines. The technology allows the iGPU as well as the CPU to rapidly cycle between on and off states and bypassing the threshold problem. A few overhead triggers the CPU cores (or GPU) to wake them up and put them back to sleep. Intel claims that the DCC feature enables near to “only off state” energy saving.
While this is a great way to reduce power consumption and reach digits that the idling state cannot achieve, there’s another problem that Intel tackled with ease. The OS cannot have the luxury of turning off the CPU for longer periods of time so that’s why Intel has found a balancing point where the processor’s cores can stay off without disrupting the OS. The process can occur as often as 800 microseconds. The image below perfectly illustrates what the process is all about and it can be adjusted to suit the needs accordingly.
This might be the most effective power saving feature Intel has to offer in the new Skylake processors. With DCC, Intel is further optimizing the times which the CPU and iGPU cores are turned on and off without affecting the OS.
Integrated GPU – Gen9 graphics
Things in the GPU department are a bit different. Gen9 adds another configuration on top of GT3 – GT4e. The new addition brings another portion of slices of EU and the available eDRAM is 128MB leaving the GT3e to Iris and GT4e in Iris Pro iGPUs. You can also look at the image below explaining which features come with each SKU.
The same power saving features, which we already explained, in the previous section, apply to the iGPUs as well. Take your time and take a look at the images below for more information about the additional power saving methods and other features.
Multi-plane overlays MPO
Arguably, this is the most exciting feature announced with the Gen9 graphics. Often when we see images or animation on the screen that are distorted, skewed or require some editing, the image is loaded into the memory sent to the GPU, then back into the DRAM and finally to the display controller. This circle costs energy and that’s why the MPO detects up to 3 planes and fills them in their own buffers without going to the GPU. However, there are some things Intel will still need to consider – what if one of the apps hides another, but the latter still needs some work to be done and then not discarded properly? In full screen modes that are not OS-related and it will recognize it as one single plane and the same DRAM -> GPU -> DRAM -> DWM cycle will repeat.
With all the implementations, Intel is assuring 17% less power used when watching [email protected] videos on a QHD screen.
Performance and benchmarking
You can see the results from the two engineering samples we have – Intel Core i7-6700HQ and Core i7-6500U and their GPUs – Intel HD Graphics 530 and Intel HD Graphics 520, respectively, both incorporating the GT2 configuration.
Results are from the Cinebench 15 CPU test (the higher the score, the better)
Results are from the Fritz chess benchmark (the higher the score, the better)
Results are from our Photoshop benchmark test (the lower the score, the better)
Results are from the 3DMark: Fire Strike (Graphics) benchmark (higher the score, the better)
Results are from the Unigine Heaven 3.0 benchmark (higher the score, the better)
Results are from the Unigine Heaven 4.0 benchmark (higher the score, the better)
Results are from the Unigine Superposition benchmark (higher the score, the better)
If you already haven’t go take a look at where the CPUs (Core i7-6500U and Core i7-6700HQ), which are included in the review, are placed in our ranking charts:
Intel is also claiming that the new Core M processors will deliver 40% improved graphics performance with significantly lowered power consumption. And we missed covering that the Core M’s “family members” will acquire new names – Core m3, m5, m7 following the Core i7, i5, i3 lineup for less confusion. New features like supporting Intel’s RealSense, which technology we covered in our not so recent review.
As for the Skylake-YUH, Intel is indicating 60% gain in efficiency over the Haswell-YUH processors relying on the new features that we already discussed in the previous section.
Moreover, the GT4e with 128MB eDRAM and 72 EUs (Execution Units) is said to match 80% of the discrete GPUs currently on the market, which is rather impressive by the way, if it’s true, of course.
All the new implementations look fine on paper, but it’s up to debate whether these new features will cause a big impact on the actual user experience in the real world. Further testing of the final units will confirm or disapprove the claims that Intel has, but still we will need a few more months for the devices to start shipping and Windows 10 to support the new features.