My laptop is working just fine. It’s from 2018 and it has an NVME drive.
It has an EFI boot partition and other partition with LUKS and LVM on top of that.
Since this week I see these logs from time to time:
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: device [8086:34b6] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: [ 0] RxErr (First)
Mar 07 17:31:14 almendra kernel: pcieport 0000:00:1d.6: AER: Error of this Agent is reported first
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0: device [8086:0975] error status/mask=00000001/00002000
Mar 07 17:31:14 almendra kernel: nvme 0000:02:00.0: [ 0] RxErr (First)
The devices are:
$ lspci -vv | grep 1d.6
00:1d.6 PCI bridge: Intel Corporation Device 34b6 (rev 30) (prog-if 00 [Normal decode])
$ lspci -vv | grep 02:00.0
02:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME SSD H10 with Solid State Storage [Teton Glacier] (prog-if 02 [NVM Express])
The laptop works like always, but I have the impression that the NVME drive is telling me something bad.
It happens from time to time:
$ journalctl --since yesterday | grep -c "nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical"
9
Do you know what does it mean?
$ sudo smartctl -a /dev/nvme0 [sudo] password for ****: smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.7.6-arch1-2] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: INTEL HBRPEKNX0202A Serial Number: BTTE95101RQM512B-1 Firmware Version: G002 PCI Vendor/Subsystem ID: 0x8086 IEEE OUI Identifier: 0x5cd2e4 Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 512,110,190,592 [512 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Fri Mar 8 12:09:53 2024 CET Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0016): Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 77 Celsius Critical Comp. Temp. Threshold: 80 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 3.50W - - 0 0 0 0 0 0 1 + 2.70W - - 1 1 1 1 0 0 2 + 2.00W - - 2 2 2 2 0 0 3 - 0.0250W - - 3 3 3 3 2000 5000 4 - 0.0040W - - 4 4 4 4 5000 9000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 30 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 32% Data Units Read: 6,877,173 [3.52 TB] Data Units Written: 9,397,485 [4.81 TB] Host Read Commands: 54,359,124 Host Write Commands: 239,213,047 Controller Busy Time: 2,412 Power Cycles: 536 Power On Hours: 6,350 Unsafe Shutdowns: 62 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, 16 of 256 entries) No Errors Logged Self-test Log (NVMe Log 0x06) Self-test status: No self-test in progress Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed without error 6334 - - - - - 1 Short Completed without error 6334 - - - - -