This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Wednesday, February 17, 2016

ESXi 5.0 host experiences a purple diagnostic screen with the errors "Failed to ack TLB invalidate" or "no heartbeat" on HP servers with PCC support

Whevever - ESXi 5.0 host fails with a purple diagnostic screen

The purple diagnostic screen or core dump contains messages similar to:

PCPU 39 locked up. Failed to ack TLB invalidate (total of 1 locked up, PCPU(s): 39).
0x41228efc7b88:[0x41800646cd62]Panic@vmkernel#nover+0xa9 stack: 0x41228efe5000
0x41228efc7cb8:[0x4180064989af]TLBDoInvalidate@vmkernel#nover+0x45a stack: 0x41228efc7ce8

@BlueScreen: PCPU 0: no heartbeat, IPIs received (0/1)....

0x4122c27c7a68:[0x41800966cd62]Panic@vmkernel#nover+0xa9 stack: 0x4122c27c7a98
0x4122c27c7ad8:[0x4180098d80ec]Heartbeat_DetectCPULockups@vmkernel#nover+0x2d3 stack: 0x0

NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x7eb2e(0x418009600000):0x4122c2307688:0x4010](Src 0x1, CPU140)

Heartbeat: 618: PCPU 140 didn't have a heartbeat for 8 seconds. *may* be locked up

Cause might be some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.
As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.

The solution should be

This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default.
To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.
To disable PCC:

Connect to the ESXi host using the vSphere Client.

    Click the Configuration tab.
    In the Software menu, click Advanced Settings.
    Select vmkernel.
    Deselect the vmkernel.boot.usePCC option.
    Restart the host for the change to take effect.

No comments:

Post a Comment