Aaron44126 Posted June 1, 2023 Share Posted June 1, 2023 I've done a bit of a "deep dive" on getting NVIDIA graphics switching working properly on Linux. For now I'm going to post everything that you need to know to make the GPU power down properly when not in use, which was a bit tricky for me to get working properly. I plan to update this post in the future with some tips for getting applications to run on a particular GPU, but I have found that the "automatic behavior" works decently well in this case. Make the GPU power down properly when not in use This is doable but the implementation seems to be a bit "brittle", it has to be configured just so, and it was not set up properly out-of-the-box in Ubuntu/Kubuntu. I've been sitting with the GPU always on and drawing 13-18W of power even when no programs are using it. Note here that I am assuming that the NVIDIA GPU PCI device ID in Linux is 01:00.0. This seems to be "standard" but you can check using the command "lspci". Some commands below will need to be tweaked if the PCI device ID is different. Also note that everything here most likely applies only to NVIDIA GPUs based on the Turing architecture or later. My understanding is that NVIDIA doesn't have support in their Linux driver for automatically powering off the GPU on Pascal and older architectures, so for those you will need to resort to older methods (Bumblebee bbswitch or direct ACPI commands) which are more messy. Right now I have only tested this on a GeForce RTX 3080 Ti laptop GPU, which is Ampere architecture. To check and see if the GPU is powered on or not, use: cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status If you get "active" then the GPU is powered on, and if you get "suspended" then it is properly powered down. If you get "active" when you think that no programs are using the GPU and it "should" be powered off, then there are a number of things to check. First, check the value here: cat /sys/bus/pci/devices/0000\:01\:00.0/power/control This should come back with the value "auto". If it instead returns "on" then the NVIDIA GPU will never power off. To get it set to "auto", proper udev rules need to be in place. # Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto" ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto" # Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on" ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on" You can find the udev rules in /etc/udev/rules.d and /lib/udev/rules.d. If there is not an exiting rule that needs to be tweaked, you can add these in a new file. I put them in in a file named "/lib/udev/rules.d/80-nvidia-pm.rules". (All of this was borrowed from Arch.) Note that after making udev configuration changes, you should fire off: update-initramfs -u and then reboot. The next thing to check is the NVIDIA kernel module configuration. You can check the current kernel module parameters with this command: cat /proc/driver/nvidia/params The value for "DynamicPowerManagement" should be 2. (NVIDIA documentation indicates that "3" is also OK for Ampere and later architectures.) You can set this value with a rule in modprobe. Look in /etc/modprobe.d and /lib/modprobe.d for an existing rule to change, and if there is not one there then make a new file with the rule. You just need to add this line to a file ending with ".conf": options nvidia "NVreg_DynamicPowerManagement=0x02" ...and then run: update-initramfs -u and reboot. Use the command above to check the kernel module parameters and make sure that the change stuck. Finally, if the GPU still won't power off, you should confirm that there is nothing running in the background that could be using it. You can run the command "nvidia-smi" and it will print out some status information which includes a list of the processes using the NVIDIA GPU in the bottom. However, that may not be a sufficient check. In my case I discovered that having the Folding@Home client service active would cause the GPU to always stay powered on, even though it was not doing any GPU work and there was nothing in nvidia-smi showing Folding@Home using it. (Using "lsof" to look for processes accessing libcuda.so would catch that.) In "nvidia-smi", you might see the Xorg process using the GPU with a small amount of memory listed. Xorg automatically attaches to all available GPUs even if they are not driving a display. You can disable this behavior by adding this configuration blob to a file in /etc/X11/xorg.conf.d (filename ending with ".conf" if you want to add a new one): Section "ServerFlags" Option "AutoAddGPU" "off" EndSection This will cause Xorg to only attach to whatever GPU the BIOS thinks is the default. It might break displays attached to a different GPU if you have a multi-monitor configuration. ...In any case, this isn't necessary to actually get the GPU to power off. The GPU can still suspend with Xorg listed in the process list of "nvidia-smi" as long as it is not actually driving a display. Also note that running the command "nvidia-smi" causes the GPU to wake up and you will need to wait several seconds after running it before the GPU will return to its "suspended" state. This will also be the case for most other tools that monitor the NVIDIA GPU, like nvtop. (Just one other thing that could trick you up when checking to see if things are working right.) If you do have the GPU powering off properly, you might want to also check the power draw of the whole system. There are apparently a few laptops out there that have an incorrect BIOS configuration and end up drawing more power with the GPU "off" than with it "on" in a low-power state. (I've seen MSI mentioned specifically.) 3 Apple MacBook Pro 16-inch, 2023 (personal) • Dell Precision 7560 (work) • Full specs in spoiler block below Info posts (Windows) — Turbo boost toggle • The problem with Windows 11 • About Windows 10/11 LTSC Spoiler Apple MacBook Pro 16-inch, 2023 (personal) M2 Max 4 efficiency cores 8 performance cores 38-core Apple GPU 96GB LPDDR5-6400 8TB SSD macOS 15 "Sequoia" 16.2" 3456×2234 120 Hz mini-LED ProMotion display Wi-Fi 6E + Bluetooth 5.3 99.6Wh battery 1080p webcam Fingerprint reader Also — iPhone 12 Pro 512GB, Apple Watch Series 8 Dell Precision 7560 (work) Intel Xeon W-11955M ("Tiger Lake") 8×2.6 GHz base, 5.0 GHz turbo, hyperthreading ("Willow Cove") 64GB DDR4-3200 ECC NVIDIA RTX A2000 4GB Storage: 512GB system drive (Micron 2300) 4TB additional storage (Sabrent Rocket Q4) Windows 10 Enterprise LTSC 2021 15.6" 3940×2160 IPS display Intel Wi-Fi AX210 (Wi-Fi 6E + Bluetooth 5.3) 95Wh battery 720p IR webcam Fingerprint reader Previous Dell Precision 7770, 7530, 7510, M4800, M6700 Dell Latitude E6520 Dell Inspiron 1720, 5150 Dell Latitude CPi Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now