Bug 1228425 - Trace in sysfs during setup of hvc with kernel-source 6.10.2-1.1
Summary: Trace in sysfs during setup of hvc with kernel-source 6.10.2-1.1
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: S/390-64 Other
: P2 - High : Critical (vote)
Target Milestone: ---
Assignee: Miroslav Franc
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-29 15:39 UTC by Berthold Gunreben
Modified: 2024-08-26 09:55 UTC (History)
9 users (show)

See Also:
Found By: Community User
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Replace dev_set_name() all (1.16 KB, text/plain)
2024-07-31 08:01 UTC, LTC BugProxy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Berthold Gunreben 2024-07-29 15:39:56 UTC
The latest Tumbleweed version with kernel-source from Thu Jul 25 12:37:14 CEST 2024 fails in openQA because openQA cannot connect to /dev/hvc0

Looking at autoinst-log.txt one can find the following trace:

expect_3270 queue content:
	[    1.846766][   T24] Freeing initrd memory: 75820K                            
	[    1.847487][    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded 
	(major 250)                                                                     
	[    1.847540][    T1] io scheduler mq-deadline registered                      
	[    1.847543][    T1] io scheduler kyber registered                            
	[    1.847564][    T1] io scheduler bfq registered                              
	[    1.848365][    T1] shpchp: Standard Hot Plug PCI Controller Driver version: 
	0.4                                                                             
	[    1.848531][    T1] sysfs: cannot create duplicate filename '/devices/iucv/hv
	c_iucv1827699952'                                                               
	[    1.848535][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.1-1-defaul
	t #1 openSUSE Tumbleweed 84b24c83f5e66871052e51fb35f0e9a59b94613c               
	[    1.848540][    T1] Hardware name: IBM 8561 LT1 400 (z/VM 7.3.0)             
	[    1.848543][    T1] Call Trace:                                              
	[    1.848546][    T1]  [<000001aa6da5123a>] dump_stack_lvl+0x72/0x98           
	[    1.848556][    T1]  [<000001aa6d3fa698>] sysfs_warn_dup+0x78/0x90           
	[    1.848560][    T1]  [<000001aa6d3fa80a>] sysfs_create_dir_ns+0xda/0xf0      
	[    1.848563][    T1]  [<000001aa6da1d38c>] kobject_add_internal+0xdc/0x340  

See https://openqa.opensuse.org/tests/4363889/logfile?filename=autoinst-log.txt
Note, that kernel version 6.9.9 still worked in openQA
osc rdiff -r 36fd608fa90d991e150abb50963bb31d:5afd09fb4c6fb64db820aed23183b441 openSUSE:Factory:zSystems kernel-source
Comment 1 Sarah Kriesch 2024-07-29 16:09:51 UTC
Thanks for the report!

The most interesting needle is that: https://openqa.opensuse.org/tests/4363889#step/bootloader_s390/35

The error on the command line:
/dev/hvc0: No such device
Comment 2 Sarah Kriesch 2024-07-29 17:09:00 UTC
Nikolay has add some ipl related patches to s390-tools for activation in the last week:
https://build.opensuse.org/projects/openSUSE:Factory:zSystems/packages/s390-tools/files/s390-tools.changes?expand=1

That can be related.
Comment 3 Miroslav Franc 2024-07-30 01:12:58 UTC
I believe this to be a kernel bug.

[    0.341483] [    T1] sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv527465712'
[    0.341487] [    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.1-1.g178f0b6-default #1 openSUSE Tumbleweed (unreleased) af3a669763d3674057710f333a872a5bb858d543
[    0.341492] [    T1] Hardware name: IBM 8561 LT1 400 (z/VM 7.3.0)
[    0.341493] [    T1] Call Trace:
[    0.341495] [    T1]  [<000003412021923a>] dump_stack_lvl+0x72/0x98
[    0.341502] [    T1]  [<000003411fbc2698>] sysfs_warn_dup+0x78/0x90
[    0.341505] [    T1]  [<000003411fbc280a>] sysfs_create_dir_ns+0xda/0xf0
[    0.341507] [    T1]  [<00000341201e538c>] kobject_add_internal+0xdc/0x340
[    0.341510] [    T1]  [<00000341201e5662>] kobject_add+0x72/0xc0
[    0.341512] [    T1]  [<000003411fede77c>] device_add+0xcc/0x7d0
[    0.341517] [    T1]  [<0000034120cc63a6>] hvc_iucv_init+0x336/0x468
[    0.341521] [    T1]  [<000003411f74c9cc>] do_one_initcall+0x3c/0x220
[    0.341523] [    T1]  [<0000034120c8ea26>] kernel_init_freeable+0x2de/0x340
[    0.341526] [    T1]  [<000003412021ab3e>] kernel_init+0x2e/0x180
[    0.341529] [    T1]  [<000003411f74f08c>] __ret_from_fork+0x3c/0x60
[    0.341531] [    T1]  [<000003412022b76a>] ret_from_fork+0xa/0x30
[    0.341535] [    T1] kobject: kobject_add_internal failed for hvc_iucv527465712 with -EEXIST, don't try to register things with the same name in the same directory.
[    0.341539] [    T1] hvc_iucv: Creating a new HVC terminal device failed with error code=-17


Reverting
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=effb83572685eaa70d05a8dd6307ca574a11fcf3
and
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ccec5032291b108e694b55394cd035c9d840052a
makes it disappear.

I'm not yet sure what's the problem.  Those patches should not change the kernel's behavior.
Comment 4 Sarah Kriesch 2024-07-30 16:15:46 UTC
Hello Marcus,
Can you mirror this bug report to IBM and forward that to the Kernel Developers, please?

We can not install openSUSE Tumbleweed with the latest kernel at the moment.
Here are our openQA results: https://openqa.opensuse.org/tests/4367326#step/bootloader_s390/35 

In the best case, you will forward this bug to Heiko Carstens and Alexander Gordeev, because the referenced commits affecting this situation are from them.

(In reply to Miroslav Franc from comment #3)
> Reverting
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=effb83572685eaa70d05a8dd6307ca574a11fcf3
> and
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=ccec5032291b108e694b55394cd035c9d840052a
> makes it disappear.
> 
> I'm not yet sure what's the problem.  Those patches should not change the
> kernel's behavior.
Comment 5 Sarah Kriesch 2024-07-30 16:23:18 UTC
Used kernel-source version: 6.10.2-1.1
Comment 6 LTC BugProxy 2024-07-31 08:01:57 UTC
Created attachment 876396 [details]
Replace dev_set_name() all


------- Comment on attachment From h.carstens@de.ibm.com 2024-07-31 03:55 EDT-------


Can you give the attched patch a try, please?
Comment 7 Sarah Kriesch 2024-07-31 08:44:56 UTC
@Miroslav, can you support us in testing the patch, please?

If we as a community want to test it, the patch has to be added to the kernel-source project, the iso image has to be built based on it and then we would be able to test it in openQA.
Comment 8 Miroslav Franc 2024-07-31 13:51:53 UTC
(In reply to LTC BugProxy from comment #6)
> Created attachment 876396 [details]
> Replace dev_set_name() all
> 
> 
> ------- Comment on attachment From h.carstens@de.ibm.com 2024-07-31 03:55
> EDT-------
> 
> 
> Can you give the attched patch a try, please?

Thanks a lot, not only it makes sense, but it fixes the issue, I just tried the kernel with it.
Comment 9 Miroslav Franc 2024-07-31 13:52:51 UTC
(In reply to Sarah Kriesch from comment #7)
> @Miroslav, can you support us in testing the patch, please?
> 
> If we as a community want to test it, the patch has to be added to the
> kernel-source project, the iso image has to be built based on it and then we
> would be able to test it in openQA.

If you take my word for it, the patch fixes it.  I just tested it.  I assume Heiko Carstens will send the patch upstream and once I have some upstream reference, I will push it to stable branch.
Comment 10 LTC BugProxy 2024-07-31 14:42:02 UTC
------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT-------
> If you take my word for it, the patch fixes it.  I just tested it.  I assume
> Heiko Carstens will send the patch upstream and once I have some upstream
> reference, I will push it to stable branch.

Yes, I will take care of upstreaming this.
If you want to have Tested-by and/or Reported-by tags please provide them here, and I'll add them to the patch.

Thanks a lot for reporting, analyzing, and verifying!
Comment 11 Miroslav Franc 2024-07-31 14:54:51 UTC
(In reply to LTC BugProxy from comment #10)
> ------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT-------
> > If you take my word for it, the patch fixes it.  I just tested it.  I assume
> > Heiko Carstens will send the patch upstream and once I have some upstream
> > reference, I will push it to stable branch.
> 
> Yes, I will take care of upstreaming this.
> If you want to have Tested-by and/or Reported-by tags please provide them
> here, and I'll add them to the patch.
> 
> Thanks a lot for reporting, analyzing, and verifying!

Thanks a lot.  AFAIC I don't need those tags.
Comment 12 Sarah Kriesch 2024-07-31 15:44:06 UTC
Thanks for the verification, Miroslav.

@Heiko You can send the fix upstream to the Linux kernel.
We are living "upstream first" (equal to Debian and Fedora). The Kernel Maintainers will update our packages then, and I will close this bug report as soon as the patch passes our openQA tests and all tests are green again.
Comment 13 LTC BugProxy 2024-08-01 11:11:18 UTC
------- Comment From WINTERA@de.ibm.com 2024-08-01 07:04 EDT-------
FYI, a simple way to detect the problem / verify the fix:

Symptom:
In a z/VM guest you expect:
> ls /sys/devices/iucv/
hvc_iucv0/  uevent
But instead you see e.g.:
> ls /sys/devices/iucv/
hvc_iucv8780520/  uevent
Comment 14 Sarah Kriesch 2024-08-06 16:56:08 UTC
Yesterday, we received the next kernel-source update for openSUSE Tumbleweed.
@Heiko Is your kernel contribution accepted already so that you can reference it here in the bug?

Hi, Sandy! Nice to meet you in our openSUSE Bugzilla. :)
You should know that our community Members are using z/VMs in the LinuxONE Community Cloud. Yes. I am responsible for all VMs, but the setup is OpenStack based, and you have to go the workaround via a SLES setup and an upgrade to openSUSE Tumbleweed so that you can test that all. What is, if the kernel is not working? In our case, we also can not install openSUSE Tumbleweed from scratch on z/VM based on this bug. What, if I install the former kernel and after the kernel update all is damaged again? Therefore, we are doing our verification via openQA. 

What a pity I was not allowed to meet you in person this year (in Böblingen or Frankfurt). My next IBM conference will be in Las Vegas.
Comment 15 Berthold Gunreben 2024-08-12 07:28:05 UTC
(In reply to LTC BugProxy from comment #10)
> ------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT-------
> Yes, I will take care of upstreaming this.
> If you want to have Tested-by and/or Reported-by tags please provide them
> here, and I'll add them to the patch.
> 
> Thanks a lot for reporting, analyzing, and verifying!

Do you have a time estimation when you will submit this request? This bug blocks new releases for Tumbleweed on s390x now since two weeks.
Comment 16 Miroslav Franc 2024-08-12 12:20:41 UTC
I pushed the known fix to stable.  However, I would like to keep this bug open until it can be eventually refreshed with the upstream version.
Comment 17 LTC BugProxy 2024-08-12 12:49:53 UTC
------- Comment From WINTERA@de.ibm.com 2024-08-12 08:40 EDT-------
The patch is currently in linux-next as
2dca436ca7e3 ("s390/iucv: Fix vargs handling in iucv_alloc_device()")
Comment 18 OBSbugzilla Bot 2024-08-16 11:55:02 UTC
This is an autogenerated message for OBS integration:
This bug (1228425) was mentioned in
https://build.opensuse.org/request/show/1194289 Factory / kernel-source
Comment 19 LTC BugProxy 2024-08-16 13:01:35 UTC
------- Comment From WINTERA@de.ibm.com 2024-08-16 08:51 EDT-------
There was some discussion on the mailing list about the fix in
linux-next.

We are working on an improved version.
Comment 20 Miroslav Franc 2024-08-16 13:20:34 UTC
(In reply to LTC BugProxy from comment #19)
> ------- Comment From WINTERA@de.ibm.com 2024-08-16 08:51 EDT-------
> There was some discussion on the mailing list about the fix in
> linux-next.
> 
> We are working on an improved version.

Yes, I can see here*.  I intend to carry the current version, until something else lands in Linus tree.


* https://lore.kernel.org/linux-s390/cover.thread-d8267b.your-ad-here.call-01723545029-ext-2515@work.hours/T/
Comment 21 Sarah Kriesch 2024-08-16 15:52:09 UTC
We want to have a working openSUSE Tumbleweed again. :)
As long as the fix is working, that is ok as a first step.

I don't have any problems with optimizations or refactoring.
I am happy, if the mainframe can provide the best performance also with Linux on Z. But "working code" has got the highest priority. ^^
Therefore, this bug report has the priority "Critical".
Comment 22 Sarah Kriesch 2024-08-18 18:44:15 UTC
Our tests are green again: https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20240817&groupid=34

Thank you, Sandy (Alexandra) and Heiko for the fix/updates!
Thank you to Miroslav for your support from SUSE side!
Comment 23 Sarah Kriesch 2024-08-18 18:45:03 UTC
tested by openQA
Comment 24 LTC BugProxy 2024-08-19 07:02:24 UTC
------- Comment From WINTERA@de.ibm.com 2024-08-19 02:53 EDT-------
For the records:
As you probably saw in the upstream discussions, Heiko's patch causes a
compile error when iucv is compiled as a module.
This is NOT an issue for hvc over iucv usage of OpenSuse or other enterprise
distros. hvc can only be built in and depends on iucv, so iucv will also be built in.

------- Comment From WINTERA@de.ibm.com 2024-08-19 02:55 EDT-------
(In reply to comment #22)
> For the records:
> As you probably saw in the upstream discussions, Heiko's patch causes a
> compile error when iucv is compiled as a module.
> This is NOT an issue for hvc over iucv usage of OpenSuse or other enterprise
> distros. hvc can only be built in and depends on iucv, so iucv will also be
> built in.

So it's totally fine for you to take Heiko's patch until we have an improved version upstream.
Comment 25 LTC BugProxy 2024-08-23 07:40:43 UTC
------- Comment From WINTERA@de.ibm.com 2024-08-23 03:33 EDT-------
FYI:
Corrected patch was accepted to the net repository:
0124fb0 ("s390/iucv: Fix vargs handling in iucv_alloc_device()")

If everything goes well, it should go into kernel v6.10
Remember: the issue was introduced in v6.10-rc1
4452e8e ("s390/iucv: Provide iucv_alloc_device() / iucv_release_device()")

Thanks again to OpenSUSE for noticing and reporting the issue.
Comment 26 Miroslav Franc 2024-08-26 09:55:22 UTC
After a quick test, I refreshed the "emergency" stable patch to 0124fb0ebf3b.  It should stay in place until the entire kernel is rebased to the version containing the patch.  Therefore, I consider the issue closed.