Skip to main content

Decoding Ryzen Machine Check Exceptions in Linux

·252 words

Firstly once your computer has rebooted, grep through the logs for MCE information. It should be printed out on the first boot after the MCE.

 journalctl | grep 'mce:'
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: CPU 11: Machine Check: 0 Bank 5: bea0000001000108
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffabdf1106 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000 
Mar 07 17:19:10 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1646673544 SOCKET 0 APIC 16 microcode a201016
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000001000108
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR 7f96d7c641d9 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1655312140 SOCKET 0 APIC a microcode a201016
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: Machine check events logged
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: CPU 31: Machine Check: 0 Bank 5: bea0000000000108
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffb304100e MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
Jun 15 17:55:47 ed-work kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1655312140 SOCKET 0 APIC 1f microcode a201016

Next download the decoder tool from https://github.com/DimitriFourny/MCE-Ryzen-Decoder

And run it with the bank and error code as parameters

 ./run.py 5 bea0000001000108 
Bank: Execution Unit (EX)
Error: Watchdog Timeout error (WDT 0x0)