What makes the Lunar Lake special is the fact that it is manufactured by TSMC, with the compute tile made on TSMC's N3B node, while the SoC tile, or what Intel now calls the "Platform Controller", is made on TSMC's N6 node. Intel is using its Foveros packaging technology to bring it all together including up to 32GB of LPDDR5X memory sitting on the same package. This also means that memory is not user upgradable, which we wrote about here. Intel claims that putting the memory on the same package reduces the PHY power by 40 percent, as well as increases area savings by 250 mm2.
New Lion Cove P-cores and Skymont E-cores
The Lunar Lake CPU part has a total of 8 cores, four of which are Lion Cove performance cores (P-cores) and four of which are Skymont efficiency cores (E-cores). Unlike previous Intel processors, this time these cores do not share an L3 cache or the same ringbus if you will, but rather just share the same die.
The P-cores, although meant for performance, are also efficiency optimized and do not feature Hyper-Threading technology (HTT), as Intel claims that this increased performance per power by 5 percent, reduced the performance per area by 15 percent, and increased performance per power per area by 15 percent. The general increase in IPC is 14 percent compared to the Redwood P-cores in Meteor Lake, with significant power efficiency savings. The P-cores also got a new power management system with an AI-driven self-tuning controller that reduces power and clocks by using 16.67 MHz intervals, which is much more precise compared to 100MHz used before.
On the architecture side, Intel has completely redesigned the front end with up to 8x larger branch prediction block, wider fetch, and other improvements; improved integer and vector execution; improved memory subsystem, and more. In terms of numbers, each Lion Cove has 192KB of L1 cache, 2.5MB of L2 cache and those four cores share 12MB of L3 cache.
While the P-cores got a significant overhaul and decent IPC improvement, the star of the show are the Skymont E-cores, which bring an impressive 68 percent IPC increase compared to the Meteor Lake E-cores, at least in floating point performance. The integer IPC gain is up by 38 percent as well. This translates to a 300 percent performance to power increase on some level. The Skymont E-cores also got overhauled with wider machine predict and decode, a new out-of-order engine, deeper queueing more dispatch ports, higher vector performance, and memory subsystem enhancements. In numbers, those four E-cores share 4MB of L2 cache and Intel managed to double the L2 bandwidth and increase L1 to L1 transfers among cores.
Intel Thread Director is the key to getting it all right
Intel has also made improvements to its Thread Director, something that is a must in order to use the right CPU cores for each processing load and migrate it correctly. Intel has improved OS and OEM integration, gave the algorithm finer granularity, introduced OS containment zones, and a new dynamic scheduling policy.
Arc Xe2 Battlemage GPU increases performance by 50 percent
Intel's Lunar Lake also brings the new Xe2 Battlemage architecture, promising a 50 percent improvement in gaming performance compared to the Xe-LPG GPU in the Meteor Lake. The GPU core has eight 512-bit vector engines, eight 2048-bit XMX matrix engines, and has an increased 192KB L1 cache per SLM. Those same Matrix Extension Engines (XMXs) the iGPU with 8 GPU cores get 1,024 unified shaders and put up 67 TOPS of AI performance. It also means it packs 8 next-generation ray tracing units, which puts it in line with DirectX 12 Ultimate requirements.
The GPU also gets a new media engine with hardware acceleration for AV1 encoding and decoding, support for VVC hardware decoding, and more, while the display engine improvements bring support for up to 3 displays, support for the new eDisplayPort 1.5 standard, in addition to the DisplayPort 2.1 and HDMI 2.1, and more.
The Intel NPU 4 gets Intel onto the Microsoft Copilot+ train
Although this is the second-generation NPU, Intel calls it the NPU 4, but that does not mean that it does not get a lot of improvements, including up to four times higher AI inferencing performance compared to the Meteor Lake NPU, pushing it up from 12 TOPS to 48 TOPS. This means that Intel is safely above the 40 TOPS requirement for Microsoft Copilot+.
Connectivity side gets Thunderbolt 4 and more
The Platform Controller or the I/O, which is built on TSMC's 6nm N6 process, includes an integrated Thunderbolt 4 controller with support fo up to three 40Gbps ports. It also gives Lunar Lake 8 PCIe lanes, mostly used for Gen 5 SSDs and other platform connectivity. The I/O also gets WiFi 7, Bluetooth 5.4, and other connectivity options.
Intel managed to pretty much surprise everyone with its Lunar Lake architecture and it will certainly put a lot of pressure on the likes of Apple and Qualcomm with M-series and Snapdragon X Elite chips. It targets the same ultra-portable and thin-and-light devices, and while both of these have a decent headstart, Intel could have an edge with its new E- and P-cores, new NPU, and especially the new Xe2 iGPU.
Unfortunately for Intel, Lunar Lake is currently on track to launch in Q3 2024. Of course, that timetable still puts it in time for the holiday buying season, and Intel claims it will power more than 80 different AI PC designs from 20 original equipment manufacturers (OEMs), with Intel expecting to deploy more than 40 million Core Ultra processors in the market this year.