Bug 1105974 - Noveau driver dumps many of these and the laptop freezes for 10 seconds
Noveau driver dumps many of these and the laptop freezes for 10 seconds
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.1
x86-64 Other
: P5 - None : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-24 20:00 UTC by Andres Nogueiras
Modified: 2021-12-31 12:54 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo? (anogueiras)


Attachments
Full dmesg output 2018-08-24 (126.44 KB, text/plain)
2018-08-31 09:29 UTC, Andres Nogueiras
Details
Full dmesg output 2018-08-27 (153.19 KB, text/plain)
2018-08-31 09:32 UTC, Andres Nogueiras
Details
Last days dmesg (119.87 KB, text/plain)
2018-08-31 09:33 UTC, Andres Nogueiras
Details
dmesg from upgraded version, boot and poweroff (108.44 KB, text/plain)
2019-12-02 20:57 UTC, Andres Nogueiras
Details
dmesg from kernel 5.4.21-2.1.gfcf6204.x86_64 (265.15 KB, text/plain)
2019-12-06 14:49 UTC, Andres Nogueiras
Details
report with kernel 4.12.14 (279.64 KB, text/plain)
2019-12-29 12:43 UTC, Andres Nogueiras
Details
report with kernel 5.4.6 (430.36 KB, text/plain)
2019-12-29 12:44 UTC, Andres Nogueiras
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andres Nogueiras 2018-08-24 20:00:39 UTC
Since booting the noveau driver delivers these:

nouveau 0000:01:00.0: timeout
[  +0.000012] ------------[ cut here ]------------
[  +0.000031] WARNING: CPU: 6 PID: 281 at ../drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c:86 nvkm_pmu_reset+0x146/0x160 [nouveau]
[  +0.000001] Modules linked in: btrfs xor raid6_pq sr_mod cdrom nouveau(+) rtsx_pci_sdmmc mxm_wmi mmc_core ttm i915(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops crc32c_intel xhci_hcd seri
o_raw ahci rtsx_pci drm libahci usbcore drm_panel_orientation_quirks wmi video button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[  +0.000016] CPU: 6 PID: 281 Comm: systemd-udevd Not tainted 4.12.14-lp150.12.16-default #1 openSUSE Leap 15.0
[  +0.000000] Hardware name: HP HP ENVY Laptop 17-ae1xx/83AD, BIOS F.24 06/25/2018
[  +0.000001] task: ffff88045a71c000 task.stack: ffffc90002120000
[  +0.000018] RIP: 0010:nvkm_pmu_reset+0x146/0x160 [nouveau]
[  +0.000001] RSP: 0018:ffffc90002123820 EFLAGS: 00010282
[  +0.000001] RAX: 000000000000001d RBX: ffff88045e0c0200 RCX: 0000000000000000
[  +0.000001] RDX: ffff88046ed9fd40 RSI: ffff88046ed97a68 RDI: ffff88046ed97a68
[  +0.000000] RBP: ffff880459d90c00 R08: 0000000000000356 R09: 0000000000000004
[  +0.000001] R10: ffffe8ffffffffff R11: 0000000000000001 R12: ffff8804587e6600
[  +0.000000] R13: ffff8804587ecd00 R14: 00000000fe9964e0 R15: 0000000000000030
[  +0.000001] FS:  00007f2cd86cad40(0000) GS:ffff88046ed80000(0000) knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007fc9fd57d000 CR3: 000000045a0fa006 CR4: 00000000003606e0
[  +0.000000] Call Trace:
[  +0.000018]  nvkm_pmu_init+0x16/0x40 [nouveau]
[  +0.000015]  nvkm_subdev_init+0xb2/0x1f0 [nouveau]
[  +0.000020]  nvkm_device_init+0x132/0x260 [nouveau]
[  +0.000018]  nvkm_udevice_init+0x41/0x60 [nouveau]
[  +0.000015]  nvkm_object_init+0x3d/0x180 [nouveau]
[  +0.000013]  nvkm_ioctl_new+0x1a5/0x260 [nouveau]
[  +0.000013]  ? nvkm_client_notify+0x30/0x30 [nouveau]
[  +0.000016]  ? nvkm_udevice_rd08+0x20/0x20 [nouveau]
[  +0.000012]  nvkm_ioctl+0x10a/0x240 [nouveau]

[  +0.000013]  ? nvkm_client_notify+0x30/0x30 [nouveau]
[  +0.000016]  ? nvkm_udevice_rd08+0x20/0x20 [nouveau]
[  +0.000012]  nvkm_ioctl+0x10a/0x240 [nouveau]
[  +0.000012]  nvif_object_init+0xbf/0x110 [nouveau]
[  +0.000011]  nvif_device_init+0xe/0x30 [nouveau]
[  +0.000022]  nouveau_cli_init+0x134/0x190 [nouveau]
[  +0.000020]  nouveau_drm_load+0x56/0x8c0 [nouveau]
[  +0.000010]  ? drm_dev_register+0xfd/0x1c0 [drm]
[  +0.000005]  drm_dev_register+0x132/0x1c0 [drm]
[  +0.000006]  drm_get_pci_dev+0x93/0x170 [drm]
[  +0.000017]  nouveau_drm_probe+0x1a9/0x230 [nouveau]
[  +0.000003]  ? __pm_runtime_resume+0x47/0x50
[  +0.000002]  local_pci_probe+0x42/0xa0
[  +0.000001]  pci_device_probe+0x12c/0x150
[  +0.000003]  driver_probe_device+0x2e4/0x440
[  +0.000001]  __driver_attach+0xb8/0xe0
[  +0.000002]  ? driver_probe_device+0x440/0x440
[  +0.000000]  bus_for_each_dev+0x5e/0x90
[  +0.000002]  bus_add_driver+0x161/0x260
[  +0.000001]  ? 0xffffffffa0700000
[  +0.000001]  driver_register+0x57/0xc0
[  +0.000001]  ? 0xffffffffa0700000
[  +0.000002]  do_one_initcall+0x4e/0x190
[  +0.000002]  ? __vunmap+0x6d/0xb0
[  +0.000002]  ? __vunmap+0x6d/0xb0
[  +0.000002]  do_init_module+0x5b/0x1e4
[  +0.000002]  load_module+0x18db/0x1f70
[  +0.000003]  ? SYSC_finit_module+0xb7/0xd0
[  +0.000002]  SYSC_finit_module+0xb7/0xd0
[  +0.000002]  do_syscall_64+0x7b/0x150
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  +0.000001] RIP: 0033:0x7f2cd750c139
[  +0.000000] RSP: 002b:00007fffb94a4098 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  +0.000002] RAX: ffffffffffffffda RBX: 000055d3f4a0b100 RCX: 00007f2cd750c139
[  +0.000000] RDX: 0000000000000000 RSI: 00007f2cd7e4883d RDI: 0000000000000016
[  +0.000001] RBP: 00007f2cd7e4883d R08: 0000000000000000 R09: 000055d3f49f4530
[  +0.000000] R10: 0000000000000016 R11: 0000000000000246 R12: 0000000000020000
[  +0.000001] R13: 000055d3f4a0d420 R14: 0000000000000000 R15: 0000000003938700
[  +0.000001] Code: 0f 0b e9 02 ff ff ff 48 8b 7d 10 48 8b 5f 50 48 85 db 74 24 e8 dc f8 ff e0 48 89 da 48 89 c6 48 c7 c7 f3 90 6a a0 e8 90 d1 c2 e0 <0f> 0b e9 48 ff ff ff 48 8b 5f 10 eb b1 48 8b 5f 10 eb d6 0f 1f 
[  +0.000017] ---[ end trace 717dc0eeeb6983fe ]---
[  +0.000055] vga_switcheroo: enabled
[  +0.000037] nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB
[  +0.000001] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[  +0.000002] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[  +0.000001] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[  +0.000001] nouveau 0000:01:00.0: DRM: Pointer to TMDS table invalid
[  +0.000005] nouveau 0000:01:00.0: DRM: DCB version 4.1
[  +0.213143] nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
[  +0.021028] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
[  +0.000757] [drm] Initialized i915 1.6.0 20171222 for 0000:00:02.0 on minor 0
[  +0.002062] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[  +0.001156] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input6
[  +0.000288] ACPI: Video Device [PXSX] (multi-head: no  rom: yes  post: no)
[  +0.000051] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:37/LNXVIDEO:01/input/input7
[  +0.004448] fbcon: inteldrmfb (fb0) is primary device
[  +1.220720] Console: switching to colour frame buffer device 240x67
[  +0.023815] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device


And keeps repeting...
Comment 1 Takashi Iwai 2018-08-29 18:47:10 UTC
Could you give the full dmesg output if available?

Basically, nouveau driver code on Leap 15.0 is equivalent with 4.14.x kernel ones, so it means 4.14.x is buggy, too...
Comment 2 Andres Nogueiras 2018-08-31 09:29:44 UTC
Created attachment 781495 [details]
Full dmesg output 2018-08-24

some lines omitted, because they were repeated
Comment 3 Andres Nogueiras 2018-08-31 09:31:39 UTC
This is one of the first I've captured, and start wondering if this was a bug or just warnings. After searching the net for a while, decided to report this as a bug (maybe it is not, as the chipset + laptop is new). Will add a few more.
Comment 4 Andres Nogueiras 2018-08-31 09:32:25 UTC
Created attachment 781496 [details]
Full dmesg output 2018-08-27
Comment 5 Andres Nogueiras 2018-08-31 09:33:09 UTC
Created attachment 781497 [details]
Last days dmesg
Comment 6 Andres Nogueiras 2018-08-31 09:34:49 UTC
(In reply to Takashi Iwai from comment #1)
> Could you give the full dmesg output if available?
> 
> Basically, nouveau driver code on Leap 15.0 is equivalent with 4.14.x kernel
> ones, so it means 4.14.x is buggy, too...

Added a few dmesg from the last days (sessions).
If need more info, don't hesitate to ask.
Comment 7 Miroslav Beneš 2019-12-02 11:47:30 UTC
Andres, is the issue still present with newer distribution/kernel?
Comment 8 Andres Nogueiras 2019-12-02 20:12:23 UTC
(In reply to Miroslav Beneš from comment #7)
> Andres, is the issue still present with newer distribution/kernel?

Now I'm on openSuSE 15.1 and YES, now it shows less messages, but the behavior is still not totally correct.

Let me put it in words:

On boot, the system takes longer than any other laptop (I have a few). It starts the login screen (sddm) and get stuck in there for around 15 to 25 seconds. Then I can put my password and switch to consoles.

The laptop allows me to work normally, and from time to time it get stuck. Not too often, but for around 20 seconds. 

And when KDE is closing session, it always throws an error and keep the machine blocked for around a minute or minute and a half. I just need, sometimes, to log in from a ssh console and power off the machine as root. A very few times couldn't log in, and have to power it off by keeping the on button for 4 or more seconds.

Will attach the dmesg output from today when finishing replying.

Thanks in advance for your interest.
Comment 9 Andres Nogueiras 2019-12-02 20:57:45 UTC
Created attachment 825294 [details]
dmesg from upgraded version, boot and poweroff
Comment 10 Takashi Iwai 2019-12-02 21:56:17 UTC
There is little we can help about nouveau from our side for this kind of bug, unfortunately.

Could you try the latest upstream kernel in OBS Kernel:stable repo, and see whether the problem persists?  Just download the kernel-default.rpm from the repository and install on top of your Leap 15.1 system.

If the problem persists with the upstream kernel, it's better to report to upstream.  OTOH, if the latest upstream works, we still have some chance for backporting a fix.
Comment 11 Andres Nogueiras 2019-12-06 14:47:41 UTC
(In reply to Takashi Iwai from comment #10)
> There is little we can help about nouveau from our side for this kind of
> bug, unfortunately.
> 
> Could you try the latest upstream kernel in OBS Kernel:stable repo, and see
> whether the problem persists?  Just download the kernel-default.rpm from the
> repository and install on top of your Leap 15.1 system.
> 
> If the problem persists with the upstream kernel, it's better to report to
> upstream.  OTOH, if the latest upstream works, we still have some chance for
> backporting a fix.

I've installed and try the kernel-default-5.4.1-2.1.gfcf6204.x86_64

It just works (boots, let me connect to net, X11 works, not too much more tested), but throws even more messages on dmesg, and many of them are pretty similar to the previous ones.

Adding the dmesg dump (boot and poweroff) for this.
Comment 12 Andres Nogueiras 2019-12-06 14:49:38 UTC
Created attachment 825708 [details]
dmesg from kernel 5.4.21-2.1.gfcf6204.x86_64
Comment 13 Andres Nogueiras 2019-12-29 12:40:20 UTC
Been a bit busym, but finally managed to install the kernel version 5 from

  http://download.opensuse.org/repositories/Kernel:/stable/standard/

and follow some of the ideas on

  https://en.opensuse.org/openSUSE:Bugreport_X

to have the two reports attached.

The procedure is as follows: start the laptop, let the machine display the sddm screen, log in, close the session, grab the data from the report and power it off, by forcing (if not takes near 10 minutes to shut down).

The first one was generated with the kernel 4. from the oss 15.1 repo. The next one was generated with kernel 5. Hope this puts a bit of more insight to all those who can read the guts of nvidia :)
Comment 14 Andres Nogueiras 2019-12-29 12:43:48 UTC
Created attachment 826770 [details]
report with kernel 4.12.14
Comment 15 Andres Nogueiras 2019-12-29 12:44:32 UTC
Created attachment 826771 [details]
report with kernel 5.4.6
Comment 16 Miroslav Beneš 2020-01-09 12:45:54 UTC
Since it happens also on upstream kernel, you should report it there. At least the problem which persists. I am not sure we can do something with nouveau driver on our side.

Takashi may provide more (useful) feedback.
Comment 17 Miroslav Beneš 2020-04-27 11:16:51 UTC
Andres, have you reported the problem to upstream?
Comment 18 Miroslav Beneš 2020-08-28 13:09:03 UTC
No response, closing.
Comment 19 Miroslav Beneš 2021-12-31 12:54:56 UTC
Closing for real this time.