Bug 1161867 - kernel 5.3.18-lp152.1 : Interrupts enabled after mce_syscore_resume
kernel 5.3.18-lp152.1 : Interrupts enabled after mce_syscore_resume
Status: RESOLVED DUPLICATE of bug 1164813
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.2
Other Other
: P2 - High : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-01-27 07:32 UTC by Bernhard Wiedemann
Modified: 2020-11-13 15:09 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo? (bwiedemann)


Attachments
dmesg (91.46 KB, text/plain)
2020-01-27 07:32 UTC, Bernhard Wiedemann
Details
kernel-default-5.5.1 journal (143.18 KB, text/plain)
2020-02-04 18:58 UTC, Bernhard Wiedemann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Wiedemann 2020-01-27 07:32:30 UTC
Created attachment 828310 [details]
dmesg

Steps to reproduce
0. have Asrock Deskmini A300 + Athlon 3000G
1. systemctl hibernate
2. resume
3. (maybe optional) kill X11 with 2x ctrl-alt-backspace

Actual Results:
dmesg contains 2 backtraces
no accelerated graphics anymore
Comment 1 Takashi Iwai 2020-01-27 08:14:55 UTC
Looks like a nvme issue.
Comment 2 Daniel Wagner 2020-01-27 09:35:09 UTC
The messager: chip.c:

static int
__irq_startup_managed(struct irq_desc *desc, struct cpumask *aff, bool force)
{
[...]
	if (cpumask_any_and(aff, cpu_online_mask) >= nr_cpu_ids) {
		/*
		 * Catch code which fiddles with enable_irq() on a managed
		 * and potentially shutdown IRQ. Chained interrupt
		 * installment or irq auto probing should not happen on
		 * managed irqs either.
		 */
		if (WARN_ON_ONCE(force))
			return IRQ_STARTUP_ABORT;
[...]
}

And the source of the problem:

/*
 * Poll for completions any queue, including those not dedicated to polling.
 * Can be called from any context.
 */
static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
{
	struct pci_dev *pdev = to_pci_dev(nvmeq->dev->dev);
	u16 start, end;
	int found;

	/*
	 * For a poll queue we need to protect against the polling thread
	 * using the CQ lock.  For normal interrupt driven threads we have
	 * to disable the interrupt to avoid racing with it.
	 */
	if (test_bit(NVMEQ_POLLED, &nvmeq->flags)) {
		spin_lock(&nvmeq->cq_poll_lock);
		found = nvme_process_cq(nvmeq, &start, &end, tag);
		spin_unlock(&nvmeq->cq_poll_lock);
	} else {
		disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
		found = nvme_process_cq(nvmeq, &start, &end, tag);
		enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
	}

	nvme_complete_cqes(nvmeq, start, end);
	return found;
}
Comment 3 Takashi Iwai 2020-01-30 16:26:42 UTC
Could you check whether the issue is reproduced with TW kernel and 5.5 kernel in OBS Kernel:stable?  If the recent upstream works, we have a good chance for the fix backport.  Otherwise, it should be reported to upstream at first.
Comment 4 Bernhard Wiedemann 2020-02-04 18:58:45 UTC
Created attachment 829172 [details]
kernel-default-5.5.1 journal

5.5.1 only shows the first warning trace during shutdown,
then needs manual power off, but manages to correctly resume
without the 2nd backtrace.
Comment 5 Takashi Iwai 2020-02-12 09:06:58 UTC
OK, then the kernel WARNING is rather irrelevant with the amdgpu problem itself.
It appears to be the invalid IRQ re-enablement during suspend, but maybe it's harmless at least for hibernation.

Does the suspend-to-RAM work, or does it hit the similar problem with amdgpu graphics?
Comment 6 Miroslav Beneš 2020-07-16 13:25:02 UTC
Adding Jiri, because my google-fu says he met nvme irq warning during suspend/resume cycle last year.

Bernhard, there is a chance that the latest kernel in TW/Kernel:HEAD may behave differently. Could you retest once again, please?
Comment 7 Michal Kubeček 2020-07-16 14:07:59 UTC
Could this be a duplicate of bug 1164813?
Comment 8 Jiri Kosina 2020-07-17 09:25:55 UTC
(In reply to Miroslav Beneš from comment #6)
> Adding Jiri, because my google-fu says he met nvme irq warning during
> suspend/resume cycle last year.

That particular issue is resolved by

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ec527c318036a65a083ef68d8ba95789d2212246

Which is in 5.3 already.
Comment 9 Miroslav Beneš 2020-07-17 10:25:15 UTC
(In reply to Michal Kubeček from comment #7)
> Could this be a duplicate of bug 1164813?

Looks like it.

Bernhard, could you confirm the warning is gone?
Comment 10 Miroslav Beneš 2020-11-13 15:09:49 UTC
No response, so let me close as duplicate of bug 1164813, because it really looks the same.

*** This bug has been marked as a duplicate of bug 1164813 ***