Easa Posted November 26 Share Posted November 26 Basically everything on the PCIE Bus - some Intel stuff, then Nvidia, Samsung NVMe drives...Ill try to get a complete list. ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted November 26 Share Posted November 26 check the WHEA + Boot event stack. Open Event Viewer and filter for these: Kernel-Power - Event 6008 (unexpected shutdown) System events 14/1/0 around boot You already saw WHEA 17, now verify if you also see any of these higher-impact entries: WHEA 1, 18, 19 CPU internal errors (these are way more serious than 17) WHEA 20 PCIe fatal error at the root complex DistributedCOM 10016 after boot Usually caused by Dell power service hammering Windows ACPI 15 during POST Embedded Controller firmware instability If you’re seeing ACPI 15 + WHEA 17 together, that’s a blinking neon sign of PCIe root-port resets triggered by bad power state handling in firmware. It’s a warning at the OS, but the root is below it. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
Easa Posted November 26 Share Posted November 26 14 minutes ago, MyPC8MyBrain said: check the WHEA + Boot event stack. Open Event Viewer and filter for these: Kernel-Power - Event 6008 (unexpected shutdown) System events 14/1/0 around boot You already saw WHEA 17, now verify if you also see any of these higher-impact entries: WHEA 1, 18, 19 CPU internal errors (these are way more serious than 17) WHEA 20 PCIe fatal error at the root complex DistributedCOM 10016 after boot Usually caused by Dell power service hammering Windows ACPI 15 during POST Embedded Controller firmware instability If you’re seeing ACPI 15 + WHEA 17 together, that’s a blinking neon sign of PCIe root-port resets triggered by bad power state handling in firmware. It’s a warning at the OS, but the root is below it. No unexpected shutdowns No WHEA errors apart from 17 Two ACPI 15 Errors always come at the same time as the whole WHEA 17 stack These devices throw an error for instance, with multiple entries. Its 41 Errors in total. PCI\VEN_105B&DEV_E11D&SUBSYS_E11D105B&REV_00 PCI\VEN_10DE&DEV_22E9&SUBSYS_000010DE&REV_A1 PCI\VEN_144D&DEV_A80C&SUBSYS_A801144D&REV_00 PCI\VEN_144D&DEV_A810&SUBSYS_A801144D&REV_00 PCI\VEN_8086&DEV_272B&SUBSYS_40F08086&REV_1A 1 ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted November 26 Share Posted November 26 next, check PCIe root ports for reset loops. Run: powercfg /devicequery wake_armed pnputil /enum-drivers | findstr "272B 2723 271F 272C 10DE 144D 144D" look for Do the PCIe root ports reset repeatedly at boot? Does Windows show link state flapping or ports re-training over and over? Is anything on the PCIe bus reinitializing in a loop? Warnings are fine. Repeat resets are not. That’s not “normal”, that’s Dell’s ASPM implementation losing its mind. force PCIe link-state reporting, Check the active power scheme state: powercfg /qh SCHEME_CURRENT SUB_PCIEXPRESS PCIEXPRESS_ASPM_STATE You want one answer "0 = ASPM Off" If it shows anything else, flip it off before doing anything further. Kill ASPM properly (likely the actual root of this mess). Dell turns on ASPM at the PCIe root ports for battery savings, but on 2025 Intel HX + Nvidia Blackwell + PCIe Gen5 NVMe, their implementation is flat-out unstable. Disable it reg add "HKLM\SYSTEM\CurrentControlSet\Services\pci\Parameters" /v "ASPMOptOut" /t REG_DWORD /d 1 /f reg add "HKLM\SYSTEM\CurrentControlSet\Control\Power" /v "EnableASPM" /t REG_DWORD /d 0 /f reg add "HKLM\SYSTEM\CurrentControlSet\Control\Power" /v "PlatformAoAcOverride" /t REG_DWORD /d 0 /f reboot. this has to be done first or you’ll be chasing ghosts forever. After ASPM is OFF If the WHEA warnings still show up after reboot, only then move on to storage: Replace the Windows inbox NVMe driver with Samsung’s Standard NVMe Driver (through Samsung Magician driver package). Don’t do this step until the WHEA + ACPI + PCIe layer is calm. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
Easa Posted November 26 Share Posted November 26 powercfg /qh SCHEME_CURRENT SUB_PCIEXPRESS PCIEXPRESS_ASPM_STATE throws invalid parameter I have done a powercfg /energy report, and it says that the PCIE ASPM is disabled when running on AC Power. What I am also observing that the machine sometimes freezes for like 2 seconds with no apparent reason, sometimes its when maximizing window, sometimes its when launching an app or opening context menu. No clear link to anything. ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted November 26 Share Posted November 26 Good, that confirms ASPM is off on AC, so the PCIe power state isn’t the smoking gun. you could have run powercfg /qh SCHEME_CURRENT SUB_PCIEXPRESS ASPM If that also errors, powercfg -query SCHEME_CURRENT SUB_PCIEXPRESS ASPM What you want to see in the output is 0x0 or Off = ASPM is disabled for good measure apply suggested reg entries and reboot (system already reports ASPM is off anyway), and confirm status still persist. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted November 26 Share Posted November 26 35 minutes ago, Easa said: am also observing that the machine sometimes freezes for like 2 seconds with no apparent reason Even when ASPM is disabled, the vendor drivers still run their own power polling and interrupt management under the OS radar. You can’t fix it by searching logs alone you fix it by trimming the excess until only essentials remain. Most likely suspects Realtek/Intel audio driver DPC spikes (shows up when menus open or apps launch because system sounds try to initialize) iGPU dGPU handoff stalls (Optimus flapping) Not a “crash”, just unstable driver behavior that Windows doesn’t classify as failure. USB controller power polling at interrupt level Shows up when context menu, context click, or window maximize triggers HID calls. None of these will scream error in logs. They produce invisible queue stalls. try disabling every Dell add-on service you don’t actively use. Not drivers that operate hardware, services that pretend to optimize things for you. Turn off system sounds temporarily to test if audio driver latency is involved Control Panel > Sound > Sound Scheme > No Sounds If the freezes vanish or reduce, that points straight at the audio stack. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
Easa Posted November 26 Share Posted November 26 5 hours ago, MyPC8MyBrain said: Good, that confirms ASPM is off on AC, so the PCIe power state isn’t the smoking gun. you could have run powercfg /qh SCHEME_CURRENT SUB_PCIEXPRESS ASPM If that also errors, powercfg -query SCHEME_CURRENT SUB_PCIEXPRESS ASPM What you want to see in the output is 0x0 or Off = ASPM is disabled for good measure apply suggested reg entries and reboot (system already reports ASPM is off anyway), and confirm status still persist. Registry entries added. After running the command: "Vypnuté" means OFF. WHEA Errors still present, they were there even before I had the 990 Pro drives inside. Nothing changed, yet. This: https://www.dell.com/support/kbdoc/en-us/000216115/laptops-with-12th-gen-and-12th-gen-hx-intel-core-processors-may-display-warning-message-whea-loggerid17 eh? 😄 ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted November 26 Share Posted November 26 If Advanced Optimus off fixed the black screen, that already tells you the issue sits in the switching pipeline, not the touchpad event. The WHEA-Logger Event 17 flood on multiple PCIe endpoints during boot is not individual devices failing. It means the system is struggling with PCIe lane initialization at POST, retrying link training before it settles. Since the warnings were there even before adding Samsung 990 Pro or any drive changes, and persist after disabling ASPM, the remaining root causes are The CPU PCIe controller (root complex), or Dell’s BIOS/EC layer mis-training lanes at boot, or Board-level power/clock signaling noise affecting PCIe during early boot A quick way to confirm direction Remove all secondary PCIe storage Turn Advanced Optimus off in the BIOS Boot using only the main SSD Check if WHEA 17 warnings reduce or stop If the warnings still appear in the same volume, it points to a hardware or BIOS-level defect, most likely the mainboard or CPU PCIe path. No driver update is required for this kind of instability to surface—it can start after runtime state drift, stress, or uptime even when nothing visibly changed. If stability matters more than battery life, running with a fixed MUX path or AO off is the reliable choice. But long term, a workstation laptop should not produce PCIe training retries at every boot. If it continues even with minimal population, it’s a warranty case for the system. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
Easa Posted Thursday at 09:00 AM Share Posted Thursday at 09:00 AM 10 hours ago, MyPC8MyBrain said: If Advanced Optimus off fixed the black screen, that already tells you the issue sits in the switching pipeline, not the touchpad event. Yes and I agree with you from the start on this, but like I have said - I have followed your instructions: Clean uninstallation of all GPU drivers, installed the latest Intel driver (just the driver, without CC) Installed the RTX driver (also latest, full package) Disabled PSR through registry After these steps, I have decided to give it a try with the AO still enabled, and the issue has not manifested yet. It crashed frequently during the VLC video playback before, now its all good and the battery life is great. If there wont be any more black screen events or other crashes, Ill probably leave AO on. 10 hours ago, MyPC8MyBrain said: Since the warnings were there even before adding Samsung 990 Pro or any drive changes, and persist after disabling ASPM, the remaining root causes are The CPU PCIe controller (root complex), or Dell’s BIOS/EC layer mis-training lanes at boot, or Board-level power/clock signaling noise affecting PCIe during early boot A quick way to confirm direction Remove all secondary PCIe storage Turn Advanced Optimus off in the BIOS Boot using only the main SSD Check if WHEA 17 warnings reduce or stop If the warnings still appear in the same volume, it points to a hardware or BIOS-level defect, most likely the mainboard or CPU PCIe path. No driver update is required for this kind of instability to surface—it can start after runtime state drift, stress, or uptime even when nothing visibly changed. If stability matters more than battery life, running with a fixed MUX path or AO off is the reliable choice. But long term, a workstation laptop should not produce PCIe training retries at every boot. If it continues even with minimal population, it’s a warranty case for the system. Yes, the warnings were there right from the start, on the first day of running the system. I will try to shave off everything I can to determine this issue, starting with a fixed MUX. As for the stability, except for the black screen, there is no issue so far (or I have not discovered any, yet). The machine can run 24/7 without a single error entry, the errors appear only at the post (or wake up from screen). ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
Easa Posted Thursday at 09:56 AM Share Posted Thursday at 09:56 AM Running dGPU via BIOS MUX Switch Disabled WWAN and NFC Disabled C-States for dGPU Now however... POST WHEA Error count reduced to 17, but now the WHEA errors pop up even during usage, without a specific usage trigger like 3D Workload, etc. During 5 minutes of OS run, 14 more errors appeared, few of them pop up each minute. Now, its only these devices throwing errors: PCI\VEN_8086&DEV_272B&SUBSYS_40F08086&REV_1A PCI\VEN_10DE&DEV_2C39&SUBSYS_0D131028&REV_A1 PCI\VEN_10DE&DEV_22E9&SUBSYS_000010DE&REV_A1 Getting spontaneous ACPI 13 and 15 Events now Total System Power now idling at 50 - 70W even with no usage ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted Thursday at 03:16 PM Share Posted Thursday at 03:16 PM The devices throwing those boot warnings are PCIe endpoints. The IDs map to the Intel Wi-Fi 7 module, the Nvidia Blackwell GPU core, and the Nvidia audio controller that runs through PCIe during early boot. The warning flood means the system is retrying PCIe lane initialization and link training at POST, then recovering before Windows fully boots. Now that you also see WHEA 17 during normal use and ACPI 13/15 events with 50-70W idle draw, it’s pointing at unstable lane bring-up and EC/BIOS power-state handling, not individual devices failing. This is platform power and PCIe link instability showing up when endpoints attempt memory access over lanes that never fully stabilize after EC power gating or GPU switching. If it still repeats with one SSD and a fixed GPU path, the next stop is a warranty case for the system as a whole. Don’t hesitate to request an exchange. You paid for a premium workstation, so treat the warranty as part of what you purchased. Dell ships replacements while you keep the current unit, which gives you leverage. If the next one isn’t solid, repeat the process calmly until they deliver a platform that boots cleanly and maintains stable PCIe links and power states. Years ago this is exactly how ThinkPads and Latitudes were handled swap until it works. Today’s Dell QC is hit-or-miss, so a systematic exchange cycle is often the fastest path to a clean, validated root complex and EC state. I’ve replaced Dell purchases multiple times before landing on a healthy unit. Once you get a stable chassis, migrate your data, hand back the old one, and move forward. Keep your current laptop until you confirm the incoming system is the one worth imaging. If it takes several rounds, so be it. That’s how hardware quality has traditionally been forced out of vendors, and it still works if you stay polite and consistent. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
Easa Posted Thursday at 07:01 PM Share Posted Thursday at 07:01 PM Thank you. I have already contacted the Dell Support, they requested diagnostic results, system log that was ran from Powershell, invoice, adapter photo, service tag photo, various stuff. I have requested a machine replacement. The lower level employee told me that the system is already 57 days old (which is weird because I have received the unit on 20. of October) and that the replacement is not possible past 30 days, but then he escalated it to "L3 Team" and said that they will consider it and ultimately decide what will be the solution. They have also mentioned a Collect & Return service, that might take 5-12 days, but that is sort of unacceptable for me. 1 ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted Thursday at 08:13 PM Share Posted Thursday at 08:13 PM Here’s the reality. When every major device on the PCIe bus logs WHEA-Logger Event 17 at boot, it doesn’t mean the drives, Wi-Fi, or GPU are bad. It means the system had trouble training PCIe lanes at POST and had to retry the handshake before it stabilized. That’s a firmware or board-level issue sitting above all endpoints. The spontaneous ACPI 13 and 15 events after boot point to the embedded controller and BIOS mishandling power states, and the 50-70W idle draw shows the laptop isn’t exiting early-boot high-power mode cleanly. The final symptom screen going black after an Advanced Optimus or MUX handoff without TDR logs means the panel lost its display engine at the firmware level, not in Windows. You bought a 2025 mobile workstation. A healthy system should bring up PCIe cleanly on the first try and never drop the panel without leaving a proper error trail. This isn’t about drivers or storage population. This is a foundation problem, and the correct path is replacement, not shipping your only unit away for over a week. Shipping your only unit away for 5-12 days is unacceptable for a workstation role. Replacement without Collect & Return is possible, but you must frame it as a workflow disruption and hardware foundation fault, not a driver issue. Dell escalation teams (L3/L3.5/L4) normally ship a replacement first when it’s framed as “fault in platform bring-up and panel power state corruption.” Don’t let support push you into a collect and return cycle unless they commit to a full swap. Stay polite, stay consistent, and make the warranty work for you, that’s the only leverage we still have to ensure vendors deliver stable hardware. It's your money, Time, and Data they are putting on the line with their hit or miss QC strategy, They should be grateful you are giving them another try instead of asking for a full refund. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
yslalan Posted Friday at 08:47 PM Share Posted Friday at 08:47 PM I have been experiencing a persistent WHEA17 error on my 7680 (only reported on dGPU). Even swapping the GPU module hasn't resolved the issue. However, it doesn't seem to affect overall stability or performance, so I haven't made significant efforts to troubleshoot it. Precision 7680 i9-13950HX - NVIDIA RTX 5000 Ada 16G - 96G DDR5 - UHD+ Display - 3840*2400 OLED - 6T NVMe Dell Pro Max 16 Plus Ultra9-285HX - NVIDIA RTX Pro 4000 Blackwell - 16G DDR5 - UHD+ Display - 3840*2400 OLED - 512G NVMe Link to comment Share on other sites More sharing options...
yslalan Posted Friday at 08:53 PM Share Posted Friday at 08:53 PM On 11/27/2025 at 2:01 PM, Easa said: The lower level employee told me that the system is already 57 days old The device's age is determined solely by its shipping date, which reflects the day it was manufactured, not the day you received it. It’s concerning that it may be related to a PCB circuit design problem and may not be resolved through a system exchange. I recommend requesting a refund and considering other models. Precision 7680 i9-13950HX - NVIDIA RTX 5000 Ada 16G - 96G DDR5 - UHD+ Display - 3840*2400 OLED - 6T NVMe Dell Pro Max 16 Plus Ultra9-285HX - NVIDIA RTX Pro 4000 Blackwell - 16G DDR5 - UHD+ Display - 3840*2400 OLED - 512G NVMe Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted Saturday at 12:12 AM Share Posted Saturday at 12:12 AM 2 hours ago, yslalan said: Even swapping the GPU module hasn't resolved the issue Did you test with the dGPU fully disabled or removed from the bus? you can run a clean test by cutting onboard dGPU power and rebooting. If the WHEA17 warnings stop, it confirms the GPU endpoint is not failing in isolation, and the problem sits upstream in PCIe lane training or Embedded Controller/BIOS power sequencing. A single WHEA17 limited to one endpoint could be a corrected during boot, but once it repeats across several PCIe endpoints or appears after a GPU swap, it stops being a device issue and becomes a platform bring-up and lane training problem. The 2025 Dell HX + Blackwell workstation stack brings up PCIe lanes under very tight signal integrity and early power gating controlled by BIOS and EC. If those layers mis-train or retain dGPU high-power states after POST, every endpoint sharing that root complex will log a corrected AER event (WHEA17), even when Windows feels stable afterward. If you’re not seeing freezes, headless crashes, or ACPI 13/15 spam each minute, then you’re sitting on a bus that ultimately settles after retries. That’s why it looks fine until a power transition hits again. Traditionally, this is a motherboard or CPU PCIe root complex issue handled best by replacement, not module swaps. If you ever start seeing black screen flickers or runtime WHEA17s increasing by the minute, you’ll know the platform didn’t settle cleanly. For now, it’s stable only because it corrected itself on each boot, but a clean POST should not require correction at all. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
win32asmguy Posted Sunday at 05:55 PM Share Posted Sunday at 05:55 PM On 11/27/2025 at 1:13 PM, MyPC8MyBrain said: You bought a 2025 mobile workstation. A healthy system should bring up PCIe cleanly on the first try and never drop the panel without leaving a proper error trail. This isn’t about drivers or storage population. This is a foundation problem, and the correct path is replacement, not shipping your only unit away for over a week. I have seen WHEA error 17 on multiple other Arrow Lake systems including the Eluktronics Hydroc G2 and Lenovo Legion 9i Gen 10. Supposedly setting PCIe Power State Management to disabled in the bios could fix this but I have not tested that. Desktop - Xeon W7-2495X, 64GB DDR5-6400 C32 ECC, 800GB Optane P5800X, MSI RTX 5090 Gaming Trio OC, Corsair HX1500i, Fractal Define 7 XL, Asus W790E-SAGE SE, Windows 10 Pro 22H2 Hydroc G2 / Uniwill IDY X6AR559Y - 275HX, 2x16GB DDR5-6400 CL38, 4TB WD SN850X, RTX 5090 mobile, 16.0 inch QHD+ 300hz MiniLED, Windows 11 Pro 24H2 Link to comment Share on other sites More sharing options...
MyPC8MyBrain Posted Sunday at 06:26 PM Share Posted Sunday at 06:26 PM @win32asmguy That lines up with what been observed across multiple Arrow Lake systems, the platform is touchy with PCIe power-state transitions. Disabling PCIe Power State Management might mask the symptom by preventing the low-power handoff that triggers a retrain, but I don’t think it explains the pattern here. The key detail is this, most units with the same BIOS build are not throwing WHEA17 across multiple endpoints during POST. If the BIOS setting alone were the root cause, the failure rate would be universal, not limited to a subset of machines. That leaves two scenarios BIOS/EC firmware bug that only manifests on hardware that’s already marginal (signal integrity, lane quality, power-state timing). or pure hardware variance on the board, where the firmware is exposing a weakness rather than causing it. In both cases, the common thread is that healthy units don’t log corrected link-training retries across the entire bus. That’s why I lean toward a hardware-level instability that the BIOS setting can temporarily hide, not cure. It’s still worth a try, your advice is logical, but the failure pattern doesn’t look like a pure firmware toggle issue. It behaves like hardware that only just meets the edge of the timing window and loses the race during POST. the impossible is not impossible, its just haven't been done yet. Link to comment Share on other sites More sharing options...
yslalan Posted Sunday at 08:06 PM Share Posted Sunday at 08:06 PM 2 hours ago, win32asmguy said: Eluktronics Hydroc G2 The interesting point is that the Hydroc G2 and the Mechrevo CangLong 16 series share the same chassis/hardware, both manufactured by the TongFang ODM. There have been some discussions about WHEA-17 errors related to the dGPU daughterboard on the CangLong 16 in Chinese laptop forums and video platforms. I didn’t look into it too deeply, but from what I skimmed, their idea is: This WHEA-17 issue is mainly caused by the PCIe bridge, which uses a design similar to Dell’s DGFF solution. Precision 7680 i9-13950HX - NVIDIA RTX 5000 Ada 16G - 96G DDR5 - UHD+ Display - 3840*2400 OLED - 6T NVMe Dell Pro Max 16 Plus Ultra9-285HX - NVIDIA RTX Pro 4000 Blackwell - 16G DDR5 - UHD+ Display - 3840*2400 OLED - 512G NVMe Link to comment Share on other sites More sharing options...
Easa Posted Monday at 09:01 AM Share Posted Monday at 09:01 AM 15 hours ago, win32asmguy said: I have seen WHEA error 17 on multiple other Arrow Lake systems including the Eluktronics Hydroc G2 and Lenovo Legion 9i Gen 10. Supposedly setting PCIe Power State Management to disabled in the bios could fix this but I have not tested that. Funny thing, on ARL Desktops, you REALLY want to have ASPM / PCIe Power State Management Enabled in BIOS, otherwise you get funny stuff like iGPU not working or NPU throwing an error. ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
OneSunOne Posted Monday at 07:15 PM Share Posted Monday at 07:15 PM On 11/22/2025 at 1:34 AM, Easa said: The laptop has been working fine for a month, since this day, I have experienced this weird issue. Without a clear link to any activity or app, while browsing or watching a video, on battery or plugged in, the screen freezes, goes black (without backlit), then flashes about two or three times and stays black with backlit on. Like when MUX switching. However, after this, the laptop either freezes or continues to work, but without image. Have to force reset. Switching off Advanced Optimus in the BIOS seems to do the trick, but I would like to keep my battery life. Also, there was no issue for a month. The drivers are the same. I have experienced this several times during the last few hours. Each time it happens, there is this entry in the eventlog, exactly like this one: Anybody has experienced this behaviour ? Thank you. Yes, I have a similar issue with mine. It usually happens when I'm also playing audio or streaming video. THe machine would freeze, I would hear a buzzing noise and then the screen would go black. Sometimes it returns to windows, other times I have to hard reset. I've noticed it help to get it to return to windows if I unplug the charge cable. Dell is coming to replace the system board beginning next week. Link to comment Share on other sites More sharing options...
Easa Posted 15 hours ago Share Posted 15 hours ago Today, I was greeted by a BSOD upon entering my credentials. 0x00000050 (0xffffe30be74da488, 0x0000000000000000, 0xfffff805e7dab1de, 0x0000000000000002) Where should I start to look? Any clues, please? ACE-Floodland Project CPU: Intel Core Ultra9 285K | MBD: MSI MEG Z890 ACE | RAM: G.Skill Z5CK 48GB 8400/40 | GPU: Gainward Phantom RTX 5090 GS 32GB | OS SSD: Intel Optane 905P M.2 380GB | STORAGE: 4x Intel Optane 905P U.2 1.5TB / 2x Kingston DC600M 960GB | PSU: CoolerMaster X Silent Edge Platinum 1100W | CASE: Lian Li V3000+ COOLING: CPU WB: Aquacomputer Cuplex Kryos NEXT NiSG | GPU WB: Watercool Heatkiller V Ultra 5090 | PUMP/TOP: Aquacomputer Ultitop Dual Brass / 2x AQC D5 Next | EXP: AQC Ultitube 150 / EK-Quantum Volume FLT 360 | SENSOR: AQC High Flow Next | RAD: 4x HardwareLabs SR2 480MP | FITTINGS: Bitspower Black Sparkle / 4x Koolance QD3 | TUBING: EK-ZMT 16/10 | FAN: 16x Phanteks T30-120 / 4x Noctua NF-A12x25 G2 / 2x Noctua NF-A4x20 SCREEN: Sony Inzone M9II 27" 4K | MOUSE: Razer Naga V2 Pro | KBD: Razer Huntsman V2 | PAD: Asus ROG Scabbard II | DAC: RME ADI2 DAC FS | HP: BeyerDynamic DT880 250Ω MWS: Dell Pro Max Plus 18 | CPU: Ultra9 285HX | GPU: RTX Pro 4000 BW | RAM: 2x32GB CSODIMM 6400 | SCREEN: IPS 2560x1600 120Hz | SSD: PM9E1 1TB + 2x 990 PRO 2/4TB Link to comment Share on other sites More sharing options...
Aaron44126 Posted 12 hours ago Author Share Posted 12 hours ago 4 hours ago, Easa said: Where should I start to look? Any clues, please? PAGE_FAULT_IN_NONPAGED_AREA ...Not a fun one. Some kernel code (probably a driver) tried to read memory from a bad address. For clues, you need to open the memory dump in windbg and run the command "!analyze -v". It should hopefully point you to what driver is to blame. (Look for "MODULE_NAME" and/or "FAULTING_MODULE" in the output.) My money is on NVIDIA, just based on past experience. The BSOD screen itself sometimes shows the faulting driver on the screen (reported as a .sys file, like nvlddmkm.sys). Look out for that if it happens again. I feel like I can't point you where to look more specifically because I'm not really sure what the BSOD screen looks like now (didn't Microsoft change the layout and make it black instead of blue?)... I am using Windows less and less these days, as little as I can possibly get away with, and it has probably been over two years since I ran into a BSOD. Apple MacBook Pro 16-inch, 2023 (personal) • Dell Precision 7560 (work) • Full specs in spoiler block below Info posts (Windows) — Turbo boost toggle • The problem with Windows 11 • About Windows 10/11 LTSC Spoiler Apple MacBook Pro 16-inch, 2023 (personal) M2 Max 4 efficiency cores 8 performance cores 38-core Apple GPU 96GB LPDDR5-6400 8TB SSD macOS 15 "Sequoia" 16.2" 3456×2234 120 Hz mini-LED ProMotion display Wi-Fi 6E + Bluetooth 5.3 99.6Wh battery 1080p webcam Fingerprint reader Also — iPhone 12 Pro 512GB, Apple Watch Series 8 Dell Precision 7560 (work) Intel Xeon W-11955M ("Tiger Lake") 8×2.6 GHz base, 5.0 GHz turbo, hyperthreading ("Willow Cove") 64GB DDR4-3200 ECC NVIDIA RTX A2000 4GB Storage: 512GB system drive (Micron 2300) 4TB additional storage (Sabrent Rocket Q4) Windows 11 Enterprise LTSC 2024 15.6" 3940×2160 IPS display Intel Wi-Fi AX210 (Wi-Fi 6E + Bluetooth 5.3) 95Wh battery 720p IR webcam Fingerprint reader Previous Dell Precision 7770, 7530, 7510, M4800, M6700 Dell Latitude E6520 Dell Inspiron 1720, 5150 Dell Latitude CPi Link to comment Share on other sites More sharing options...
SvenC Posted 11 hours ago Share Posted 11 hours ago In addition: if your are patient enough and wait for the automatic reboot after the BSOD you might find a memory.dmp in c:\windows or c:\windows\temp (too long ago for me as well to know the exact steps and location to get the memory dump). If you find the dump file, try the windbg steps Aaron described above. If it is your dGPU, you might find someone to give advice which version of the nvidia drivers (from dell or nvidia driver downloads?) you could try with which version of the iGPU driver (from Dell or Intel downloads?) Some combinations might work better than others. DDU might be your friend to start all over if your installed versions of the graphics drivers are in a messed up state. Dell Pro Max Plus 18 * Ultra 7 265HX * 64GB CSoDIMM * 4TB, 2TB, 2560x1600, iGPU previous; Dell Precision 7680 * i7 13850hx * 64GB SO-DIMM * 4TB, 2TB, 1920x1200 previous: Dell Precision 7740 * i7 9750h * 48GB * 512GB, 2TB, 4TB * RTX 3000 * 1920x1080 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now