我一直在为Intel Core 2 Quad (Yorkfield) 处理器调整我的 Linux 内核,我注意到以下消息来自dmesg
:
[ 0.019526] cpuidle: using governor menu
[ 0.531691] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 0.550918] intel_idle: does not run on family 6 model 23
[ 0.554415] tsc: Marking TSC unstable due to TSC halts in idle
PowerTop 仅显示用于封装和单个内核的状态 C1、C2 和 C3:
Package | CPU 0
POLL 0.0% | POLL 0.0% 0.1 ms
C1 0.0% | C1 0.0% 0.0 ms
C2 8.2% | C2 9.9% 0.4 ms
C3 84.9% | C3 82.5% 0.9 ms
| CPU 1
| POLL 0.1% 1.6 ms
| C1 0.0% 1.5 ms
| C2 9.6% 0.4 ms
| C3 82.7% 1.0 ms
| CPU 2
| POLL 0.0% 0.1 ms
| C1 0.0% 0.0 ms
| C2 7.2% 0.3 ms
| C3 86.5% 1.0 ms
| CPU 3
| POLL 0.0% 0.1 ms
| C1 0.0% 0.0 ms
| C2 5.9% 0.3 ms
| C3 87.7% 1.0 ms
很好奇,我查询sysfs
了一下,发现acpi_idle
是在使用旧版驱动(我希望能看到intel_idle
驱动):
cat /sys/devices/system/cpu/cpuidle/current_driver
acpi_idle
查看内核源代码,当前的intel_idle驱动程序包含一条调试消息,特别指出该驱动程序不支持某些 Intel family 6 型号:
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && boot_cpu_data.x86 == 6)
pr_debug("does not run on family %d model %d\n", boot_cpu_data.x86, boot_cpu_data.x86_model);
intel_idle.c的早期分支(2010 年 11 月 22 日)显示了对 Core 2 处理器的预期支持(模型 23 实际上涵盖了 Core 2 Duo 和 Quad):
#ifdef FUTURE_USE
case 0x17: /* 23 - Core 2 Duo */
lapic_timer_reliable_states = (1 << 2) | (1 << 1); /* C2, C1 */
#endif
上述代码在 2010 年 12 月的commit中被删除。
不幸的是,源代码中几乎没有文档,因此没有解释这些 CPU 中缺乏对空闲功能的支持。
我当前的内核配置如下:
CONFIG_SMP=y
CONFIG_MCORE2=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
CONFIG_INTEL_IDLE=y
我的问题如下:
- 是否存在不支持 Core 2 处理器的特定硬件原因
intel_idle
? - 是否有更合适的方法来配置内核以获得对该系列处理器的最佳 CPU 空闲支持(除了禁用对 的支持
intel_idle
)?
在研究 Core 2 CPU 电源状态(“ C-states ”)时,我实际上设法实现了对大多数传统 Intel Core/Core 2 处理器的支持。此处记录了包含所有背景信息的完整实现(Linux 补丁)。
随着我积累了有关这些处理器的更多信息,很明显,Core 2 模型中支持的 C 状态比早期和晚期处理器中的 C 状态要复杂得多。这些被称为增强型 C 状态(或“ CxE ”),它涉及封装、单个内核和芯片组上的其他组件(例如内存)。在
intel_idle
发布驱动程序时,代码还不是特别成熟,并且已经发布了几个具有冲突 C 状态支持的 Core 2 处理器。在2006 年的这篇文章中找到了一些关于 Core 2 Solo/Duo C 状态支持的令人信服的信息。这与 Windows 上的支持有关,但它确实表明这些处理器上强大的硬件 C 状态支持。有关肯茨菲尔德的信息与实际型号相冲突,所以我相信他们实际上指的是下面的约克菲尔德:
这篇 2008 年的文章概述了对多核 Intel 处理器(包括 Core 2 Duo 和 Core 2 Quad)上的每核 C 状态的支持(在戴尔的这份白皮书中找到了其他有用的背景资料):
我找到了一份来自 Intel 的 2010 演示文稿,其中提供了有关驱动程序的一些额外背景
intel_idle
,但遗憾的是没有解释缺乏对 Core 2 的支持:上面的介绍确实表明该
intel_idle
驱动程序是“菜单”CPU 调控器的实现,它对 Linux 内核配置有影响(即CONFIG_CPU_IDLE_GOV_LADDER
vs.CONFIG_CPU_IDLE_GOV_MENU
)。此答案简明扼要地描述了梯子和菜单管理器之间的区别。戴尔有一篇有用的文章列出了 C 状态 C0 到 C6 的兼容性:
从这个表(我后来发现在某些情况下是不正确的),似乎 Core 2 处理器的 C 状态支持存在各种差异(请注意,几乎所有 Core 2 处理器都是 Socket LGA775,除了 Core 2 Solo SU3500,即Socket BGA956和Merom/Penryn处理器。“Intel Core” Solo/Duo处理器是Socket PBGA479或PPGA478之一)。
在本文中发现了该表的另一个例外:
有趣的是,QX9650 是 Yorkfield 处理器(Intel family 6,model 23,stepping 6)。作为参考,我的Q9550S是Intel family 6,model 23 (0x17),stepping 10,据说支持C-state C4(通过实验确认)。此外,Core 2 Solo U3500 具有与 Q9550S 相同的 CPUID(系列、型号、步进),但可用于非 LGA775 插槽,这混淆了对上表的解释。
显然,CPUID 必须至少用于单步执行,以便识别对这种处理器模型的 C 状态支持,并且在某些情况下可能不够(此时未确定)。
分配 CPU 空闲信息的方法签名是:
在asm/intel-family.h
model
中列举了哪里。检查这个头文件,我发现英特尔 CPU 被分配了 8 位标识符,这些标识符似乎与英特尔家族 6 型号匹配:综上所述,我们将 Intel Family 6, Model 23 (0x17) 定义为
INTEL_FAM6_CORE2_PENRYN
. 这应该足以为大多数 Model 23 处理器定义空闲状态,但可能会导致 QX9650 出现上述问题。因此,至少需要在此列表中定义具有不同 C 状态集的每组处理器。
Zagacki 和 Ponnala,Intel Technology Journal 12 (3):219-227, 2008表明 Yorkfield 处理器确实支持 C2 和 C4。它们似乎还表明 ACPI 3.0a 规范仅支持 C 状态 C0、C1、C2 和 C3 之间的转换,我认为这也可能会限制 Linux
acpi_idle
驱动程序在有限的一组 C 状态之间进行转换。但是,这篇文章表明情况可能并非总是如此:另外值得注意的是:
我使用的芯片组确实是 Intel Q45 Express 芯片组。
关于 MWAIT 状态的英特尔文档很简洁,但确认了 BIOS 特定的 ACPI 行为:
我对上表的解释(结合维基百科、asm/intel-family.h和上述文章中的一个表)是:
型号 9 0x09(奔腾 M和赛扬 M):
型号 13 0x0D(奔腾 M和赛扬 M):
型号 14 0x0E INTEL_FAM6_CORE_YONAH(增强型奔腾 M、增强型赛扬 M或英特尔酷睿):
型号 15 0x0F INTEL_FAM6_CORE2_MEROM(某些Core 2和Pentium Dual-Core):
型号 23 0x17 INTEL_FAM6_CORE2_PENRYN(核心 2):
从仅 Core 2 系列处理器中 C 状态支持的多样性来看,似乎缺乏对 C 状态的一致支持可能是未尝试通过
intel_idle
驱动程序完全支持它们的原因。我想完整完成整个 Core 2 系列的上述列表。这并不是一个真正令人满意的答案,因为它让我想知道由于没有充分利用这些处理器上强大的节能MWAIT C 状态,使用了多少不必要的电源,并且已经(并且仍然)产生了过多的热量。
Chattopadhyay等人。2018 年,节能高性能处理器:设计绿色高性能计算的最新方法对于我在 Q45 Express 芯片组中寻找的特定行为值得注意:
作为测试,我在linux/drivers/idle/intel_idle.c第 127 行插入了以下内容:
在
intel_idle.c
第 983 行:在
intel_idle.c
第 1073 行:快速编译并重新启动我的 PXE 节点后,
dmesg
现在显示:现在 PowerTOP 显示:
I've finally accessed the Enhanced Core 2 C-states, and it looks like there is a measurable drop in power consumption - my meter on 8 nodes appears to be averaging at least 5% lower (with one node still running the old kernel), but I'll try swapping the kernels out again as a test.
An interesting note regarding C4E support - My Yorktown Q9550S processor appears to support it (or some other sub-state of C4), as evidenced above! This confuses me, because the Intel datasheet on the Core 2 Q9000 processor (section 6.2) only mentions C-states Normal (C0), HALT (C1 = 0x00), Extended HALT (C1E = 0x01), Stop Grant (C2 = 0x10), Extended Stop Grant (C2E = 0x11), Sleep/Deep Sleep (C3 = 0x20) and Deeper Sleep (C4 = 0x30). What is this additional 0x31 state? If I enable state C2, then C4E is used instead of C4. If I disable state C2 (force state C2E) then C4 is used instead of C4E. I suspect this may have something to do with the MWAIT flags, but I haven't yet found documentation for this behavior.
I'm not certain what to make of this: The C1E state appears to be used in lieu of C1, C2 is used in lieu of C2E and C4E is used in lieu of C4. I'm uncertain if C1/C1E, C2/C2E and C4/C4E can be used together with
intel_idle
or if they are redundant. I found a note in this 2010 presentation by Intel Labs Pittsburgh that indicates the transitions are C0 - C1 - C0 - C1E - C0, and further states:I believe that is to be interpreted as the C1E state is entered on other components (e.g. memory) only when all cores are in the C1E state. I also take this to apply equivalently to the C2/C2E and C4/C4E states (Although C4E is referred to as "C4E/C5" so I'm uncertain if C4E is a sub-state of C4 or if C5 is a sub-state of C4E. Testing seems to indicate C4/C4E is correct). I can force C2E to be used by commenting out the C2 state - however, this causes the C4 state to be used instead of C4E (more work may be required here). Hopefully there aren't any model 15 or model 23 processors that lack state C2E, because those processors would be limited to C1/C1E with the above code.
Also, the flags, latency and residency values could probably stand to be fine-tuned, but just taking educated guesses based on the Nehalem idle values seems to work fine. More reading will be required to make any improvements.
I tested this on a Core 2 Duo E2220 (Allendale), a Dual Core Pentium E5300 (Wolfdale), Core 2 Duo E7400, Core 2 Duo E8400 (Wolfdale), Core 2 Quad Q9550S (Yorkfield) and Core 2 Extreme QX9650, and I have found no issues beyond the afore-mentioned preference for state C2/C2E and C4/C4E.
Not covered by this driver modification:
The only issues that I can think of are:
intel_idle
driver appears to choose the appropriate C1/C1E based on hardware support of the sub-states.I managed to find a slide from a 2009 Intel presentation on the transitions between C-states (i.e., Deep Power Down):
In conclusion, it turns out that there was no real reason for the lack of Core 2 support in the
intel_idle
driver. It is clear now that the original stub code for "Core 2 Duo" only handled C-states C1 and C2, which would have been far less efficient than theacpi_idle
function which also handles C-state C3. Once I knew where to look, implementing support was easy. The helpful comments and other answers were much appreciated, and if Amazon is listening, you know where to send the check.This update has been committed to github. I will e-mail a patch to the LKML soon.
Update: I also managed to dig up a Socket T/LGA775 Allendale (Conroe) Core 2 Duo E2220, which is family 6, model 15, so I added support for that as well. This model lacks support for C-state C4, but supports C1/C1E and C2/C2E. This should also work for other Conroe-based chips (E4xxx/E6xxx) and possibly all Kentsfield and Merom (non Merom-L) processors.
Update: I finally found some MWAIT tuning resources. This Power vs. Performance writeup and this Deeper C states and increased latency blog post both contain some useful information on identifying CPU idle latencies. Unfortunately, this only reports those exit latencies that were coded into the kernel (but, interestingly, only those hardware states supported by the processor):
Update: An Intel employee recently published an article on
intel_idle
detailing MWAIT states.您启用了 ACPI,并检查了 acpi_idle 是否正在使用中。我真诚地怀疑您是否错过了任何有用的内核配置选项。您可以随时查看
powertop
可能的建议,但您可能已经知道这一点。这不是答案,但我想格式化它:-(。
不,它没有:-)。
该
if
语句不排除 Family 6。相反,该if
语句在启用调试时提供一条消息,表明此特定的现代 Intel CPU 不受intel_idle
. 事实上,我目前的 i5-5300U CPU 是 Family 6,它使用intel_idle
.不包括您的 CPU 是表中没有匹配
intel_idle_ids
项。我注意到这个实现表的提交。它删除的代码有一个
switch
语句。这很容易看出最早的模型 intel_idle 已经实现/成功测试/无论是 0x1A = 26。 https://github.com/torvalds/linux/commit/b66b8b9a4a79087dde1b358a016e5c8739ccf186我怀疑这可能只是机会和成本的问题。添加时
intel_idle
,似乎计划了对 Core 2 Duo 的支持,但从未完全实施——也许当英特尔工程师开始考虑它时,它已经不值得了。这个等式相对复杂:intel_idle
需要提供足够的优势acpi_idle
,使其值得在这里支持,在 CPU 上将看到足够数量的“改进”内核......正如sourcejedi的回答所说,驱动程序并不排除所有系列 6。
intel_idle
初始化检查CPU 型号列表中的 CPU,基本上涵盖了从 Nehalem 到 Kaby Lake 的所有微架构。Yorkfield 比这更老(并且显着不同——Nehalem 与之前的架构非常不同)。family 6测试只影响是否打印错误信息;它的效果只是错误消息只会显示在 Intel CPU,而不是 AMD CPU(Intel family 6 包括自 Pentium Pro 以来的所有非 NetBurst Intel CPU)。要回答您的配置问题,您可以完全禁用
intel_idle
,但也可以保留它(只要您不介意警告)。Another year, less and less of these old machines, but still no kernel support for idle states. I customised a kernel in a similar way to above and got useful drops in core temperature and power use. Previously the core temperatures were around 60C at idle, and around 45C with the custom intel_idle.
I used a slightly different configuration in intel_idle.c. I set the disable C1E promotion flag: C1E is a state that is normally reached automatically (if configured in the BIOS) whenever a processor is put into C1. intel_idle disables this in all cases and treats C1E as a separate C-state, mainly to avoid the processor dropping unexpectedly into a state with a latency that could cause QoS issues (note that this means max_cstate is one higher than you might expect because 2 is C1E). The C2E state is similar, but not disabled or handled separately in intel_idle, so I left it out of the configuration completely. The BIOS usually has an option to enable or disable automatic promotion to this state, so you could disable it in the BIOS, enable it in intel_idle, and possibly get better results but I haven't tried it.
This was on an E2180, so no states higher than C2/C2E, but the main gains seem to come with the first idle state. I have an even older Pentium 4 with Linux support only for C1 and that knocks 10-15C off the core temperature. Experiments with a Core i7 show only marginal extra temperature drops with enabling C3 or C6, although you might think that battery savings are still worth it. I haven't found any noticeable performance issues with the latencies configured as shown above, but it isn't exactly a hardcore gaming machine anyway.