Bug 1212019

Summary: Thunderbolt USB audio disconnecting with Kernel 6.3.4
Product: [openSUSE] openSUSE Tumbleweed Reporter: Michael Pujos <pujos.michael>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: oneukum, pujos.michael, tiwai
Version: CurrentFlags: tiwai: needinfo? (pujos.michael)
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Michael Pujos 2023-06-05 11:01:20 UTC
I have a laptop with a TB2 dock connected via TB3 using the TB3->TB2 Apple adapter.

I'm using USB audio on this dock and since kernel 6.3.4, I get a random disconnection that can happen 1h or a few hours after boot:

[Jun 5 11:25] xhci_hcd 0000:09:00.0: WARN Event TRB for slot 1 ep 1 with no TDs queued?
[  +0.033032] xhci_hcd 0000:09:00.0: WARN Event TRB for slot 1 ep 1 with no TDs queued?
[  +0.906993] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead
[  +0.000037] xhci_hcd 0000:09:00.0: HC died; cleaning up
[  +0.000041] usb 3-4: USB disconnect, device number 2


When this happens and audio is playing, it stops playing via the dock and switches to my laptop speakers.

Replugging the TB3 adapter usually make it work again (if audio is playing it switches automatically from speakers to TB USB audio), for a while.

That setup was rock solid until updated to kernel 6.3.4.
Comment 1 Takashi Iwai 2023-06-05 13:13:07 UTC
So when you boot with 6.3.3 kernel on the same system now, the problem doesn't happen?
Comment 2 Takashi Iwai 2023-06-05 14:07:24 UTC
Also, I'm building a test kernel with the backport of the thunderbolt fix patch in the upstream subsystem tree.  It's being built in OBS home:tiwai:bsc1212019 repo.  Once after the build finishes (it takes an hour or so), could you check whether it works better?
Comment 3 Michael Pujos 2023-06-05 20:39:58 UTC
Unfortunately, I moved place today for 2 weeks and do not have my TB dock for testing as I only use it at home. I will be able to test when I'm back.
Comment 4 Michael Pujos 2023-06-21 08:12:52 UTC
I have access to my TB dock again and this issue just happened with kernel  6.3.7. Should I test older kernel at home:tiwai:bsc1212019 ?
Comment 5 Takashi Iwai 2023-06-21 09:15:39 UTC
Not sure whether 6.3.7 contains the backport.  At best, check with the kernel in OBS Kernel:stable tree instead.  That's the very latest one.
Comment 6 Michael Pujos 2023-06-28 08:58:58 UTC
Still happening with Kernel 6.3.9:

[ 2422.754635] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead
[ 2422.754674] xhci_hcd 0000:09:00.0: HC died; cleaning up
[ 2422.754716] usb 3-4: USB disconnect, device number 2
Comment 7 Takashi Iwai 2023-06-28 09:14:53 UTC
Could you check 6.4 kernel in OBS Kernel:stable?  A known regression of TW seems to have been already addressed there.
Comment 8 Michael Pujos 2023-06-28 09:25:55 UTC
I just installed 6.4 and will report.
Comment 9 Michael Pujos 2023-06-28 12:57:18 UTC
Still happened after  with 6.4 after about 2h of playing audio, although with different logging:

[Jun28 12:42] xhci_hcd 0000:09:00.0: WARN Event TRB for slot 1 ep 1 with no TDs queued?
[  +0.007571] xhci_hcd 0000:09:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  +0.000011] xhci_hcd 0000:09:00.0: Looking for event-dma 00000001dba553e0 trb-start 00000001dba553d0 trb-end 00000001dba553d0 seg-start 00000001dba55000 seg-end 00000001dba55ff0
[  +0.000007] xhci_hcd 0000:09:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  +0.000004] xhci_hcd 0000:09:00.0: Looking for event-dma 00000001dba553f0 trb-start 00000001dba553d0 trb-end 00000001dba553d0 seg-start 00000001dba55000 seg-end 00000001dba55ff0
[  +0.000005] xhci_hcd 0000:09:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  +0.000004] xhci_hcd 0000:09:00.0: Looking for event-dma 00000001dba55400 trb-start 00000001dba553d0 trb-end 00000001dba553d0 seg-start 00000001dba55000 seg-end 00000001dba55ff0
[  +0.000004] xhci_hcd 0000:09:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  +0.000004] xhci_hcd 0000:09:00.0: Looking for event-dma 00000001dba55410 trb-start 00000001dba553d0 trb-end 00000001dba553d0 seg-start 00000001dba55000 seg-end 00000001dba55ff0
[  +0.000005] xhci_hcd 0000:09:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  +0.000003] xhci_hcd 0000:09:00.0: Looking for event-dma 00000001dba55420 trb-start 00000001dba553d0 trb-end 00000001dba553d0 seg-start 00000001dba55000 seg-end 00000001dba55ff0
[  +0.036511] xhci_hcd 0000:09:00.0: WARN Event TRB for slot 1 ep 1 with no TDs queued?
[  +1.746876] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead
[  +0.000038] xhci_hcd 0000:09:00.0: HC died; cleaning up



I'm wondering if my old TB2 dock is not just dying...
Comment 10 Michael Pujos 2023-06-29 06:38:55 UTC
I'm testing the possibility that the 135mv undervolt that I apply to my laptop CPU is causing this issue and will report.
Comment 11 Michael Pujos 2023-07-06 08:28:59 UTC
Confirming it is still happening with no CPU undervolt.
Comment 12 Michael Pujos 2023-07-21 18:37:42 UTC
Still happening with Kernel 6.4.3 although it took a tad longer to happen this time. Only the USB hub part of that TB dock is disconnecting. I also have a DisplayPort monitor connected to that dock and it is not affected.
Comment 13 Takashi Iwai 2023-07-25 15:39:41 UTC
The report should be better moved to the upstream bug tracker, I suppose.
e.g. bugzilla.kernel.org.  Could you try it?
Comment 14 Michael Pujos 2023-07-25 17:21:20 UTC
I will if it happens again once I have updated to 6.4.4.
With 6.4.3 is only happened once in several days of usage. 
With 6.3.x it was much more frequent.
Still wonder if my old TB2 dock is not the problem, overheating or something.
Comment 15 Michael Pujos 2023-07-27 10:18:43 UTC
Still happening regularly with kernel 6.4.4.
Issue submitted:

https://bugzilla.kernel.org/show_bug.cgi?id=217715
Comment 16 Michael Pujos 2023-11-08 14:22:03 UTC
I have not seen this issue happening since Kernel 6.5.6, so closing it.