coreboot: kgpe-d16: do not enable hw monitor until kernel boots
This commit adds a coreboot patch which causes kgpe-d16 to skip the PNP enumeration/assignment process for the "hardware monitor" (basically a temperature sensor and fan-speed controller) block on the southbridge chip. I have found this patch solves the last remaining boot reliability problem I was having with my unattended kgpe-d16 machines. The commit message for this patch is below: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The hardware monitor is one of the blocks within the w83667hg-a chip (there are many others). It is basically a bunch of ADCs (analog to digital converters) hooked up to voltage, current, and temperature sensors in various locations on the motherboard. This block has the ability to generate several different interrupts (SMI#, OVT#, etc) in response to thermal conditions. It appears to sometimes (about 10% of boot-ups, depending on temperature) spew erroneous alarm interrupts the instant you enable it, when doing so from within coreboot. This causes the w83667hg-a chip and the entire system to hang, and the watchdog cannot recover from this state because it is part of the w83667hg-a chip too. An even bigger problem is that the hardware monitor is initialized *before* the fans are brought up to full speed. So if the CPU is above the critical temperature it will remain there because the fans are in their default low-speed boot state. The chip just keeps getting hotter and hotter -- not enough to damage itself, but hot enough that it won't come down to an acceptable temperature with simple reboots and power-cycles; you have to leave the system off for a while. Since the fans aren't running while the system is off this takes quite a while (several minutes). It's a very fussy and fidgety process, and not something you want to walk a remote-hands guy at the datacenter through over the phone. To avoid this whole mess, let's simply not assign PNP resources to the hardware monitor from coreboot. Linux doesn't need these anyways; it communicates with the hardware using I2C.master
parent
2d5f5d5bee
commit
c0c6141971
@ -0,0 +1,53 @@
|
|||||||
|
From 9ab2d370ea17e86301d8de4134f4c0abf82b211e Mon Sep 17 00:00:00 2001
|
||||||
|
From: Your Name <you@example.com>
|
||||||
|
Date: Mon, 12 Jun 2023 02:29:45 -0700
|
||||||
|
Subject: [PATCH 2/4] kgpe-d16: do not enable hw monitor until kernel boots
|
||||||
|
|
||||||
|
The hardware monitor is one of the blocks within the w83667hg-a chip
|
||||||
|
(there are many others). It is basically a bunch of ADCs (analog to
|
||||||
|
digital converters) hooked up to voltage, current, and temperature
|
||||||
|
sensors in various locations on the motherboard.
|
||||||
|
|
||||||
|
This block has the ability to generate several different interrupts
|
||||||
|
(SMI#, OVT#, etc) in response to thermal conditions. It appears to
|
||||||
|
sometimes (about 10% of boot-ups, depending on temperature) spew
|
||||||
|
erroneous alarm interrupts the instant you enable it, when doing so
|
||||||
|
from within coreboot. This causes the w83667hg-a chip and the
|
||||||
|
entire system to hang, and the watchdog cannot recover from this
|
||||||
|
state because it is part of the w83667hg-a chip too.
|
||||||
|
|
||||||
|
An even bigger problem is that the hardware monitor is initialized
|
||||||
|
*before* the fans are brought up to full speed. So if the CPU is
|
||||||
|
above the critical temperature it will remain there because the fans
|
||||||
|
are in their default low-speed boot state. The chip just keeps
|
||||||
|
getting hotter and hotter -- not enough to damage itself, but hot
|
||||||
|
enough that it won't come down to an acceptable temperature with
|
||||||
|
simple reboots and power-cycles; you have to leave the system off
|
||||||
|
for a while. Since the fans aren't running while the system is off
|
||||||
|
this takes quite a while (several minutes). It's a very fussy and
|
||||||
|
fidgety process, and not something you want to walk a remote-hands
|
||||||
|
guy at the datacenter through over the phone.
|
||||||
|
|
||||||
|
To avoid this whole mess, let's simply not assign PNP resources to
|
||||||
|
the hardware monitor from coreboot. Linux doesn't need these
|
||||||
|
anyways; it communicates with the hardware using I2C.
|
||||||
|
---
|
||||||
|
src/mainboard/asus/kgpe-d16/devicetree.cb | 2 +-
|
||||||
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||||
|
|
||||||
|
diff --git a/src/mainboard/asus/kgpe-d16/devicetree.cb b/src/mainboard/asus/kgpe-d16/devicetree.cb
|
||||||
|
index 3be328d1725..5725fce4a09 100644
|
||||||
|
--- a/src/mainboard/asus/kgpe-d16/devicetree.cb
|
||||||
|
+++ b/src/mainboard/asus/kgpe-d16/devicetree.cb
|
||||||
|
@@ -206,7 +206,7 @@ chip northbridge/amd/amdfam10/root_complex # Root complex
|
||||||
|
device pnp 2e.209 off end # GPIO4
|
||||||
|
device pnp 2e.309 off end # GPIO5
|
||||||
|
device pnp 2e.a on end # ACPI
|
||||||
|
- device pnp 2e.b on # HW Monitor
|
||||||
|
+ device pnp 2e.b off # HW Monitor
|
||||||
|
io 0x60 = 0x290
|
||||||
|
# IRQ purposefully not assigned to prevent lockups
|
||||||
|
end
|
||||||
|
--
|
||||||
|
2.39.1
|
||||||
|
|
Loading…
Reference in New Issue