Gpu has fallen off the bus hive os ошибка - ErrorsMaster.ru - большая энциклопедия ошибок и их решений

Всем привет, люди добрые помогите разобраться в чем беда?? Отваливается первая карта в риге, пробовал перетыкать в другой слот pci-e проблема не уходит. На винде разгон стоял больше, проблем не было, но потом однажды зависла и больше не завелась, даже материнка не стартовала, монитор был подключён к этой же карте на которой сейчас отвалы, на скрине разгон по ней занижен

Screenshot_20220421-144524_Hive OS.jpg

446,8 КБ · Просмотры: 213
Screenshot_20220421-204829_Hive OS.jpg

322,3 КБ · Просмотры: 206

Разгон у Вас просто дикий!

Чё ты такой злоЙ!

Хайникс опять наелся и спит.

Всем привет, люди добрые помогите разобраться в чем беда?? Отваливается первая карта в риге, пробовал перетыкать в другой слот pci-e проблема не уходит. На винде разгон стоял больше, проблем не было, но потом однажды зависла и больше не завелась, даже материнка не стартовала, монитор был подключён к этой же карте на которой сейчас отвалы, на скрине разгон по ней занижен

Отвал чипа или памяти. Ставь в отдельный комп и тести, разгон до стока скинь.

Хреновый контакт райзера был неделю назад, потеплело и отошел. Пластиковый фиксатор решил проблему. Хьюникс, адептов отвала памяти в каждой теме 3060ti категорически приветствую

Apr 15 15:53:51 Home kernel: [ 2686.810573][ C1] NVRM: GPU at PCI:0000:21:00: GPU-639efbeb-e5da-1cd4-45dd-7ec3b00535f1

Apr 15 15:53:51 Home kernel: [ 2686.810577][ C1] NVRM: Xid (PCI:0000:21:00): 79, pid=0, GPU has fallen off the bus.

Apr 15 15:53:51 Home kernel: [ 2686.810580][ C1] NVRM: GPU 0000:21:00.0: GPU has fallen off the bus.

Хреновый контакт райзера был неделю назад, потеплело и отошел. Пластиковый фиксатор решил проблему. Хьюникс, адептов отвала памяти в каждой теме 3060ti категорически приветствую

Apr 15 15:53:51 Home kernel: [ 2686.810573][ C1] NVRM: GPU at PCI:0000:21:00: GPU-639efbeb-e5da-1cd4-45dd-7ec3b00535f1

Apr 15 15:53:51 Home kernel: [ 2686.810577][ C1] NVRM: Xid (PCI:0000:21:00): 79, pid=0, GPU has fallen off the bus.

Apr 15 15:53:51 Home kernel: [ 2686.810580][ C1] NVRM: GPU 0000:21:00.0: GPU has fallen off the bus.

Спасибо попробую, райзер кстати менял не помогло

готов продать бодрый хайникс с 1660с))) тока 6 чипов, два нарыть нада))

Всем привет, люди добрые помогите разобраться в чем беда?? Отваливается первая карта в риге, пробовал перетыкать в другой слот pci-e проблема не уходит. На винде разгон стоял больше, проблем не было, но потом однажды зависла и больше не завелась, даже материнка не стартовала, монитор был подключён к этой же карте на которой сейчас отвалы, на скрине разгон по ней занижен

Ты меня извини, но у тебя и остальные скоро отрыгнут.
Понижай разгон все будет стабильно работать, для них 2500 уже в натяг… Не гробь карты.

отдельный стенд, видна и в разных вариантах тестить. Память можно попробовать прогнать в камбустере старой версии, там есть тест для неё.)

Ты меня извини, но у тебя и остальные скоро отрыгнут.
Понижай разгон все будет стабильно работать, для них 2500 уже в натяг… Не гробь карты.

Смотря какой хайникс, их вроде две ревизии. Второй держит около 3000 по хайву…

Смотря какой хайникс, их вроде две ревизии. Второй держит около 3000 по хайву…

По моему мнению, это вопрос времени, это не те карты, что бы столько разгон брать.

У самого 6 таких от гигабайт. Нормальный разгон по памяти для них 1800-1900 на хайве. Каждая по 47-47.5 хэшей жмёт при таком разгоне

Попробуй снизить разгон и температуру по фану до 50гр.

По моему мнению, это вопрос времени, это не те карты, что бы столько разгон брать.

опять люди недопонимают , а советуют . 3200-3300 по памяти. разгон без повышения вольтажа не гробит память, гробит температура!

77.PNG

15,2 КБ · Просмотры: 102

Всем привет, люди добрые помогите разобраться в чем беда?? Отваливается первая карта в риге, пробовал перетыкать в другой слот pci-e проблема не уходит. На винде разгон стоял больше, проблем не было, но потом однажды зависла и больше не завелась, даже материнка не стартовала, монитор был подключён к этой же карте на которой сейчас отвалы, на скрине разгон по ней занижен

Сегодня в 1 риге стала отваливаться 2 карты по центру (3060ти лхр стрикс), стояли в дуале етн+тон . Естественно причина не разгон. А что же ??? Правильно, как уже, сказали выше — ТЕМПЕРАТУРА.

p.s. при том важно понимать. По ядру температура может быть 40-45 градусов, а по памяти 105 + . Как вариант — учучшить охлаждение, снизить разгон.

опять люди недопонимают , а советуют . 3200-3300 по памяти. разгон без повышения вольтажа не гробит память, гробит температура!

Но, что бы она работала и вольтаж же завышают, одного без другого не будет, я это понимаю

Источник

I recently purchased a new GPU card for my server, but have not been successful in getting the nvidia drivers to work with it — either as a graphics card, or as a cuda compute engine. The card in question is a Nvidia GeForce GT 710. (https://www.zotac.com/us/product/graphi … b-pcie-x-1) (The motherboard I’m using is an Asus PRIME B250M-A, if that matters.)

I have all the needed packages installed:

[darose@darsys12 ~]$ pacman -Q | grep nvidia
nvidia-dkms 465.31-1
nvidia-settings 465.31-1
nvidia-utils 465.31-1
opencl-nvidia 465.31-1

The machine definitely *sees* the card:

[darose@darsys12 ~]$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)

But no matter what I try, I keep getting «GPU has fallen off the bus» errors:

[darose@darsys12 ~]$ dmesg | grep -i -e nvidia -e nvrm
[    0.000000] Command line: BOOT_IMAGE=../vmlinuz-linux root=LABEL=root rw nvidia-drm.modeset=1 initrd=../intel-ucode.img,../initramfs-linux.img
[    0.046539] Kernel command line: BOOT_IMAGE=../vmlinuz-linux root=LABEL=root rw nvidia-drm.modeset=1 initrd=../intel-ucode.img,../initramfs-linux.img
[    1.099115] nvidia: loading out-of-tree module taints kernel.
[    1.099123] nvidia: module license 'NVIDIA' taints kernel.
[    1.109036] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    1.132441] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[    1.133265] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    1.249990] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  465.31  Thu May 13 22:24:36 UTC 2021
[    1.253004] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  465.31  Thu May 13 22:14:23 UTC 2021
[    1.253833] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[    1.262431] nvidia-uvm: Loaded the UVM driver, major device number 237.
[    1.284546] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    1.692351] NVRM: GPU at PCI:0000:01:00: GPU-94909028-e642-5b1e-39c9-b3c510991de2
[    1.692354] NVRM: Xid (PCI:0000:01:00): 79, pid=140, GPU has fallen off the bus.
[    1.692355] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[    1.696537] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0xf:1242)
[    1.696593] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    1.696712] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[    1.696916] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
...

And the nvidia drivers can’t access it:

[darose@darsys12 ~]$ sudo nvidia-smi 
No devices were found
[darose@darsys12 ~]$ sudo clinfo 
Number of platforms                               0

That «GPU has fallen off the bus» is a pretty vague error message, and there’s not a ton of good suggestions on the Net about how to fix this. Several pages suggested either that the card wasn’t seated correctly, or it’s a hardware error. But I tried re-seating the card again (as well as installing it in a different slot) but that made no difference. As far as a hardware error, that seems unlikely too, as a) this is a brand new card, and b) it works *perfectly* when I use the nouveau driver instead (though without any compute capabilities, obviously).

I’ve tried upgrading and downgrading kernels, upgrading and downgrading the nvidia packages, booting using «rcutree.rcu_idle_gp_delay=1 pcie_aspm=off» and numerous other things, but nothing seems to fix the issue.

Yes, I can get the graphics working on the card if I need to (using nouveau), but that doesn’t really help me. This is a headless server, and so doesn’t really make any use of graphics capabilities, and I bought the card in order to use it for compute purposes.

Been wrestling with this for several days now, and I’m pretty frustrated and at my wits end on this.

Any suggestions on how to fix or debug this further would be most welcome!

Last edited by darose (2021-07-13 22:07:31)

Источник

I recently made a post because I couldn’t get my NVIDIA GPU up and running. This is the post: link to my other post. I got my gpu working now (through NVIDIA X server settings). These are my specs:

ubuntu version: 16.04.1

GPU: NVIDIA Corporation GM108M [GeForce 840M]

But every time I suspend my laptop and reboot it I get a black screen with this error message:

[ 5107.273042] usbhid 2-3:1.0: suspend error -5  
[ 5107.644336] NVRM: Xid (PCI:0000:03:00): 79, GPU has fallen off the bus.)  
[ 5107.644336]

The only solution there is is to completely reboot my laptop with the power button.

— Extra information —

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 840M        Off  | 0000:03:00.0     Off |                  N/A |
| N/A   47C    P0    N/A /  N/A |    242MiB /  2002MiB |     24%      Default |
+-------------------------------+----------------------+----------------------+
                                                                             
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1409    G   /usr/lib/xorg/Xorg                             149MiB |
|    0      2471    G   compiz                                          92MiB |
|    0      2774    G   /usr/lib/firefox/firefox                         1MiB |
+-----------------------------------------------------------------------------+

Thus, it only happens when my laptop puts itself in sleep-/suspend-mode (I deactivated that now so it doesn’t go into sleep-/suspend-mode anymore). Powering up my laptop has no problems with this. Never encountered this before either. I also switched back to my Intel GPU to test if it still occurs, but it doesn’t. So it has to do something with my Nvidia GPU.

asked Jan 5, 2017 at 12:18

I had the exact same problem, I solved it by putting the graphics card into persistent mode:

$ sudo nvidia-smi -pm 1

I don’t know what this really does but it seems that it’s working for me.

I found the solution in the next forum: https://bbs.archlinux.org/viewtopic.php?id=145527

answered May 28, 2018 at 9:14

Update: There was a related bug on the ubuntu issue tracker that has since been fixed and released. Not sure if this answer is helpful anymore. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1847937

Here is a viable solution that doesn’t require you to limit your low power state usage:

The fix is to add the following arguments to the kernel boot parameters:
rcutree.rcu_idle_gp_delay=1 acpi_osi=! acpi_osi='Windows 2009'
You can test this fix by rebooting and pressing «e» on your primary boot entry in grub. Add the arguments to the end of the line ending with linux and press CTRL+X to boot. Try suspending and waking the system. If it works, you’re golden! To make the fix permanent you need to edit your etcdefaultgrub file:

Open a terminal window and paste the following command: sudo xed /etc/default/grub

Enter your password. Then, find the line that starts with GRUB_CMDLINE_LINUX_DEFAULT=

Add the arguments to the end of this line, inside the quotes. So it looks roughly like this: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash rcutree.rcu_idle_gp_delay=1 acpi_osi=! acpi_osi='Windows 2009'"

Run sudo update-grub

Allegedly, the successful result can be achieved by replacing Windows 2009 with Linux to clarify to BIOS what OS we are using. This did not work for me, but others commented below that it helped them. In my case, I left it as Windows 2009.

Source: https://forums.linuxmint.com/viewtopic.php?p=1728952&sid=d2f654dfa1082400eeea98c9fbf01918#p1728952

answered May 7, 2020 at 19:24

Next time, try to log in via SSH to halt/reboot your computer.
Other procedure would be to press magic+r to unbind keyboard from X and pressing cltr+alt+del.

I have the same problem with this version of driver.
Try the ubuntu driver package!

answered Jan 8, 2017 at 20:27

Tried everything.
Only one thing helped: disable the ASPM.
Add this to kernel boot arguments: pcie_aspm=off

answered Jan 31, 2020 at 15:03

znd0znd0

393 bronze badges

Having the same issue on Ubuntu 18.04, I’m using nvidia-prime for graphics switching with nvidia-driver-396(.24) installed. This issue only occurs when running on the dedicated card using:

sudo prime-select nvidia

On recovering from suspend, the desktop flashes up then black screens as mentioned above with the very same error message.

Hardware (Dell inspiron 7559):
Nvidia GTX 960m
Intel i7-6700HQ

Word Around:

A fix that worked for me, was to delete the default swapfile made during install and create a dedicated swap partition, of course remember to add to fstab and direct grub to the partition with resume=»UUID».

answered Jul 13, 2018 at 12:55

None of the solutions here and anywhere else helped for me. There was, though, one thing I noticed.

I was getting the PCIe bus error severity=corrected messages (lots of them) in dmesg. This most likely caused the NVIDIA driver to crash. So, I found more information about this error, and found this article. My solution was to edit /etc/default/grub and add pci=nomsi to kernel arguments (by editing this line):

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nomsi"

Save the file, and run sudo update-grub. Then, reboot and the NVIDIA driver shouldn’t have any issues.

I haven’t tested everything with this solution, but the NVIDIA driver finally started on boot, and even suspend was working. I’ll edit this post if I experience any issues again.

Edit: This didn’t help… it just made the drivers working for one boot, then I experienced the issue again. I asked for this as another question: GPU has fallen off the bus (on almost every boot)

answered Jan 2, 2021 at 21:39

adazem009adazem009

9223 gold badges12 silver badges26 bronze badges

I found out that Ubuntu 18.04 gives an option of using Ubuntu on Wayland, which I learnt is an alternative to the x server.
I logged in using the Ubuntu on Wayland option on the login screen:

Now I can use the suspend option without any problem.

answered Dec 27, 2018 at 19:38

I had a similar issue on Linux Mint.
Most forum posts suggested that this is something to do with Active State Power Management (ASPM) on Linux that doesn’t play well with the NVidia driver. They suggested that you turn off ASPM in the boot options. However that doesn’t work if you have ASPM off in your bios. I finally got it to work by turning on ASPM in bios but turning it off in the boot options — so Linux knows that its controlling ASPM and not bios.

answered Jun 19, 2019 at 4:06

Источник

Intermittent «GPU has fallen off the bus» issues — How to troubleshoot

Not sure if this is a HW or software issue — Will replace the riser later today, although it was running 48hrs without problems.

[ 5484.100465] NVRM: GPU at PCI:0000:21:00: GPU-3e5bc452-b0c8-a404-5b10-246cf7fe8a76

[ 5484.100469] NVRM: GPU Board Serial Number:

[ 5484.100472] NVRM: Xid (PCI:0000:21:00): 79, pid=0, GPU has fallen off the bus.

[ 5484.100478] NVRM: GPU 0000:21:00.0: GPU has fallen off the bus.

[ 5484.100480] NVRM: GPU 0000:21:00.0: GPU is on Board .

[ 5484.100493] NVRM: A GPU crash dump has been created. If possible, please run

NVRM: nvidia-bug-report.sh as root to collect this data before

NVRM: the NVIDIA kernel module is unloaded.

Источник

NVIDIA Developer Forums

Источник