Intel Alder Lake, features and specifications of the twelfth generation of CPU

Intel Alder Lake, features and specifications of the twelfth generation of CPU

Intel Alder Lake

Intel has finally unveiled the key features of the new Alder Lake SoC architecture. In particular, the new high-efficiency and high-performance cores were presented, together with the Thread Director who will manage their use according to the user's needs. Finally, the key features of the Alder Lake SoCs were discussed, represented by the structure of the SoCs, support for the most modern DDR5 memories and compatibility with modern fifth generation PCIe interfaces. All this has been designed to offer a high detachment compared to the past, also guaranteed by the transition to 10-nanometer SuperFin technology.

Let's go and discover in detail all the technical specifications and features of the Intel Alder Lake architecture for the new SoCs arriving in 2021.

What is Alder Lake?

Intel Alder Lake with key features The arrival of Alder Lake marks a significant step forward for Intel in the x86 multi-core architecture. Finally, the US manufacturer has abandoned the 14-nanometer construction technique to switch to the 10-nanometer SuperFin one. It goes without saying that this transition has resulted in numerous benefits, creating a nice gap to previous generations of SoCs. This new Alder Lake architecture will be the basis of the next Intel CPUs for the desktop and mobile market that will hit the market later this year.

In the design phase, Intel has set three important goals for Alder Lake with which to take this important step forward. The first of these is represented by the highly scalable SoC architecture, suitable for both desktop and mobile and ultra mobile devices. Intel later designed two new cores, one high-efficiency and another high-performance. To make the two types of cores work simultaneously, the Thread Director has been implemented directly in the hardware. Finally, as the third major innovation from Alder Lake, Intel wanted to bring advanced support to the latest memory and I / O technologies such as DDR5 and fifth generation PCIe compatibility.

Efficient-core

The main features of Alder Lake's new Efficient-core Intel's new Efficient-core (E-core) is undoubtedly a key element of the new Alder Lake architecture. Intel wanted to design the world's most efficient x86 architecture core, taking into account scalable multithreaded performance for modern multitasking. Efficient-core operates at low voltage to reduce overall power consumption, while creating the power headroom to operate at higher frequencies. Intel has been able to work on a variety of technical innovations to prioritize workloads, without wasting computing power and boosting performance with features that improve instructions per cycle. Among the notable innovations are:

5,000 entry branch target cache for more accurate branch prediction 64 kilobyte instruction cache to keep useful instructions close without consuming the power of the memory subsystem The first decoder of Intel on-demand instruction length generating pre-decoding information Intel clustered out-of-order decoder enabling up to six instructions per cycle to be decoded while maintaining power efficiency A large backend with 5-wide allocation and 8-wide retirement, 256 out-of-order window entry and 17 execution ports Robust security features that support Intel technology for control flow enforcement and Intel virtualization technology redirection protection The implementation of AVX ISA, along with new extensions to support integral artificial intelligence (AI) operations

All this translates into a performance c Overall improved compared to the Skylake CPU with 40% more performance for the same power or offers the same performance while consuming less than 40% of the energy. When it comes to throughput performance, four Efficient cores deliver 80% more performance while consuming less power than two Skylake cores running four threads or the same throughput performance while consuming 80% less power.

Performance-core

The main features of the new Alder Lake Performance-core With an Efficient-core of this level, an equally innovative Performance-core (P-core) could not be missing . Intel wanted to create a core designed specifically for speed and further reduction of latency, while increasing the performance of single-threaded applications. Workloads are supported by increasingly complex codes that require greater execution capabilities. Datasets also grow enormously along with data bandwidth requirements. Intel's new Performance-core microarchitecture offers a significant all-purpose performance boost and better support for large code-backed applications. In particular, the US manufacturer has concentrated its efforts on an enlarged front-end, which fetches the instructions and decodes them in micro-operations to put them in an even larger queue. Subsequently the Out of Order Engine (also enlarged to maximize performance) takes the micro-operations in the queue and sends them to the executive units. The latter have been revised with new and optimized Integer Execution Units and Vector Execution Units. The expansion of the parts just discussed has led to an inevitable enlargement of the L1 and L2 Cache and an improvement in performance of the entire Memory Subsystem. Performance core features a wider, deeper and smarter architecture:

Wider: six decoders (two more); μop cache 8-wide (compared to six); allocation of six (compared to five); 12 execution ports (compared to 10) Deeper: Larger log files; larger physical log files; deeper reorder buffer with 512 entries Smarter: better accuracy of branch prediction; reduced effective L1 latency; full write predictive bandwidth optimizations in L2

Overall, Intel has achieved a performance gap of approximately 19% over its 11th generation ISO core cores for general performance. The P-core also includes Intel Advanced Matrix Extensions (AMX), the next-generation integrated AI accelerator, designed for deep learning and increased instructional performance.

Thread Director

Goals Intel Thread Director Once the two new cores were designed, Intel had to think of an efficient solution to make them work together. For this reason, Intel Thread Director is born, which divides the work between the two types of cores to maximize CPU performance and efficiency. The goal they set themselves in the design phase was to find a dynamic solution capable of adapting to the software used by the user in real time. Thread Director also works completely autonomously, without the need for developers to make changes to their software. Being a hardware solution integrated directly into the cores, it is able to monitor programs running in every single thread in real time. Additionally, Thread Director sends continuous feedback to the operating system to make optimal decisions about splitting processes into high-efficiency, high-performance cores, which it previously couldn't do with this transparency between hardware and software. At the same time, Thread Director must also consider individual core temperatures and available electrical power in order to ensure optimal operation in any situation.Traditionally, the operating system made decisions based on the limited statistical data available, such as foreground activities. and in the background. Thread Director Adds a New Dimension of Operation for Greater Optimization:

Using Hardware Telemetry to Route Higher Performance Threads to the Right Performance-core at That Time Monitoring Instruction Mix, Core Status and of other telemetry related to microarchitecture at the capillary level, which helps the operating system make smarter planning decisions Optimizing the Thread Director for best performance on Windows 11 by partnering with Microsoft Extending the PowerThrottling API, which enables developers to explicitly specify quality of service attributes for their threads Applying a new EcoQoS classification that informs the scheduler if the thread prefers energy efficiency (such threads are scheduled on Efficient-core)

The principles of how Intel Thread Director works Let's try to better understand how I work Intel Thread Director with an example of operation. When the user starts a particularly demanding software such as a game or video editing program, this process is assigned to a high-performance core (P-core). Other programs such as the mail client which will continue to synchronize email will be assigned to the high efficiency core (E-core) instead. Let's assume that all P-cores are busy carrying out demanding processes, but the user would like to start a new one that is particularly heavy like an AI application. In this case Thread Director is able to move the less demanding process from P-cores to E-cores, giving space to the new incoming process. All of this works dynamically. If after a few seconds the AI ​​application no longer needs high computing power, it will immediately be moved to an E-core. The same happens for example with a paused video game, where the need for computing power is reduced, so it can switch to an E-core to return to the P-core as soon as we start playing again.

Technical specifications

SoC structure Alder Lake Finally, let's move on to the technical specifications of Alder Lake, first taking a look at the structure of the SoCs. Intel has provided up to 16 cores per chip, with 8 P-cores and 8 E-cores, each featuring high dynamic frequency. The maximum number of threads instead reaches 24, with 2 threads for each P-core and 1 thread for each E-core. Finally, the Alder Lake SoC will have up to 30 MB of LL Cache. Obviously the number of P-cores and E-cores will depend on the type of device on which the new Alder Lake SoCs will be mounted. For example, on desktop devices we will find 8 P-cores, while on ultra mobile devices only 2 P-cores. Leveraging a single SoC architecture, Intel wants to create three different chips:

A two-chip socket desktop with high performance, energy efficiency, memory and I / O A high-performance mobile BGA package that adds images, Larger Xe graphics and Thunderbolt 4 connectivity Thin, low-power, high-density packaging with optimized I / O and power

Alder Lake PCIe interface compatibility Intel with Alder Lake ensures compatibility with the latest memory technologies . In particular, Alder Lake is compatible with DDR5-4800, DDR4-3200, LP5-5200 and LP4x-4266 using Dynamic voltage-frequency scaling and enabling advanced overcloking. Intel therefore wants to embrace the transition to DDR5 memory, offering almost total compatibility with all technologies to give maximum freedom to the end user.

Even in terms of PCIe connectivity, Alder Lake is absolutely state of the art. The new Intel architecture supports the new fifth generation PCIe standard, which offers a bandwidth up to two times higher than the Gen4 and a transfer speed of up to 64 GB / s with an x16 connector. Alder Lake's PCIe support extends from x16 PCIe Gen5, x4 PCIe Gen4, x12 PCIe Gen4 up to x16 PCIe Gen3. There is also support for Thunderbolt 4 and Wi-Fi 6E. The complexity of building such a highly scalable architecture is to meet the incredible bandwidth demands of compute and I / O agents without compromising on power. To make all cores, memory and PCIe interfaces work at maximum speed while minimizing latency, Intel has designed three independent fabrics:

The compute fabric can support up to 1,000 gigabytes per second (GBps) , which is 100 GBps per core or per cluster, and connects the cores and graphics to memory via the last-level cache. It also features a high dynamic frequency range and is able to dynamically select the data path for latency versus bandwidth optimization based on actual fabric loads. Also dynamically adjusts the last level cache policy, inclusive or non-inclusive, based on usage Fabric I / O supports up to 64GBps, connecting different types of I / O and internal devices and can change speeds without interruption without interruption interfere with the normal operation of a device by selecting the fabric speed to match the required amount of data transfer The memory fabric can provide up to 204 GBps of data and dynamically scale the bus width and speed to support more points operating for high bandwidth, low latency, or low power

Have you noticed any errors?





Powered by Blogger.