Decoding Ryzen Machine Check Exceptions in Linux

2022/06/15

Tags: linux AMD Ryzen

Firstly once your computer has rebooted, grep through the logs for MCE information. It should be printed out on the first boot after the MCE.

 journalctl | grep 'mce:'
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: CPU 11: Machine Check: 0 Bank 5: bea0000001000108
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffabdf1106 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000 
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1646673544 SOCKET 0 APIC 16 microcode a201016
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000001000108
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR 7f96d7c641d9 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1655312140 SOCKET 0 APIC a microcode a201016
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: CPU 31: Machine Check: 0 Bank 5: bea0000000000108
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffb304100e MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1655312140 SOCKET 0 APIC 1f microcode a201016

Next download the decoder tool from https://github.com/DimitriFourny/MCE-Ryzen-Decoder

And run it with the bank and error code as parameters

 ./run.py 5 bea0000001000108 
Bank: Execution Unit (EX)
Error: Watchdog Timeout error (WDT 0x0)