Bugzilla – Bug 1228425
Trace in sysfs during setup of hvc with kernel-source 6.10.2-1.1
Last modified: 2024-08-26 09:55:22 UTC
The latest Tumbleweed version with kernel-source from Thu Jul 25 12:37:14 CEST 2024 fails in openQA because openQA cannot connect to /dev/hvc0 Looking at autoinst-log.txt one can find the following trace: expect_3270 queue content: [ 1.846766][ T24] Freeing initrd memory: 75820K [ 1.847487][ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250) [ 1.847540][ T1] io scheduler mq-deadline registered [ 1.847543][ T1] io scheduler kyber registered [ 1.847564][ T1] io scheduler bfq registered [ 1.848365][ T1] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 1.848531][ T1] sysfs: cannot create duplicate filename '/devices/iucv/hv c_iucv1827699952' [ 1.848535][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.1-1-defaul t #1 openSUSE Tumbleweed 84b24c83f5e66871052e51fb35f0e9a59b94613c [ 1.848540][ T1] Hardware name: IBM 8561 LT1 400 (z/VM 7.3.0) [ 1.848543][ T1] Call Trace: [ 1.848546][ T1] [<000001aa6da5123a>] dump_stack_lvl+0x72/0x98 [ 1.848556][ T1] [<000001aa6d3fa698>] sysfs_warn_dup+0x78/0x90 [ 1.848560][ T1] [<000001aa6d3fa80a>] sysfs_create_dir_ns+0xda/0xf0 [ 1.848563][ T1] [<000001aa6da1d38c>] kobject_add_internal+0xdc/0x340 See https://openqa.opensuse.org/tests/4363889/logfile?filename=autoinst-log.txt Note, that kernel version 6.9.9 still worked in openQA osc rdiff -r 36fd608fa90d991e150abb50963bb31d:5afd09fb4c6fb64db820aed23183b441 openSUSE:Factory:zSystems kernel-source
Thanks for the report! The most interesting needle is that: https://openqa.opensuse.org/tests/4363889#step/bootloader_s390/35 The error on the command line: /dev/hvc0: No such device
Nikolay has add some ipl related patches to s390-tools for activation in the last week: https://build.opensuse.org/projects/openSUSE:Factory:zSystems/packages/s390-tools/files/s390-tools.changes?expand=1 That can be related.
I believe this to be a kernel bug. [ 0.341483] [ T1] sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv527465712' [ 0.341487] [ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.1-1.g178f0b6-default #1 openSUSE Tumbleweed (unreleased) af3a669763d3674057710f333a872a5bb858d543 [ 0.341492] [ T1] Hardware name: IBM 8561 LT1 400 (z/VM 7.3.0) [ 0.341493] [ T1] Call Trace: [ 0.341495] [ T1] [<000003412021923a>] dump_stack_lvl+0x72/0x98 [ 0.341502] [ T1] [<000003411fbc2698>] sysfs_warn_dup+0x78/0x90 [ 0.341505] [ T1] [<000003411fbc280a>] sysfs_create_dir_ns+0xda/0xf0 [ 0.341507] [ T1] [<00000341201e538c>] kobject_add_internal+0xdc/0x340 [ 0.341510] [ T1] [<00000341201e5662>] kobject_add+0x72/0xc0 [ 0.341512] [ T1] [<000003411fede77c>] device_add+0xcc/0x7d0 [ 0.341517] [ T1] [<0000034120cc63a6>] hvc_iucv_init+0x336/0x468 [ 0.341521] [ T1] [<000003411f74c9cc>] do_one_initcall+0x3c/0x220 [ 0.341523] [ T1] [<0000034120c8ea26>] kernel_init_freeable+0x2de/0x340 [ 0.341526] [ T1] [<000003412021ab3e>] kernel_init+0x2e/0x180 [ 0.341529] [ T1] [<000003411f74f08c>] __ret_from_fork+0x3c/0x60 [ 0.341531] [ T1] [<000003412022b76a>] ret_from_fork+0xa/0x30 [ 0.341535] [ T1] kobject: kobject_add_internal failed for hvc_iucv527465712 with -EEXIST, don't try to register things with the same name in the same directory. [ 0.341539] [ T1] hvc_iucv: Creating a new HVC terminal device failed with error code=-17 Reverting https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=effb83572685eaa70d05a8dd6307ca574a11fcf3 and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ccec5032291b108e694b55394cd035c9d840052a makes it disappear. I'm not yet sure what's the problem. Those patches should not change the kernel's behavior.
Hello Marcus, Can you mirror this bug report to IBM and forward that to the Kernel Developers, please? We can not install openSUSE Tumbleweed with the latest kernel at the moment. Here are our openQA results: https://openqa.opensuse.org/tests/4367326#step/bootloader_s390/35 In the best case, you will forward this bug to Heiko Carstens and Alexander Gordeev, because the referenced commits affecting this situation are from them. (In reply to Miroslav Franc from comment #3) > Reverting > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=effb83572685eaa70d05a8dd6307ca574a11fcf3 > and > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=ccec5032291b108e694b55394cd035c9d840052a > makes it disappear. > > I'm not yet sure what's the problem. Those patches should not change the > kernel's behavior.
Used kernel-source version: 6.10.2-1.1
Created attachment 876396 [details] Replace dev_set_name() all ------- Comment on attachment From h.carstens@de.ibm.com 2024-07-31 03:55 EDT------- Can you give the attched patch a try, please?
@Miroslav, can you support us in testing the patch, please? If we as a community want to test it, the patch has to be added to the kernel-source project, the iso image has to be built based on it and then we would be able to test it in openQA.
(In reply to LTC BugProxy from comment #6) > Created attachment 876396 [details] > Replace dev_set_name() all > > > ------- Comment on attachment From h.carstens@de.ibm.com 2024-07-31 03:55 > EDT------- > > > Can you give the attched patch a try, please? Thanks a lot, not only it makes sense, but it fixes the issue, I just tried the kernel with it.
(In reply to Sarah Kriesch from comment #7) > @Miroslav, can you support us in testing the patch, please? > > If we as a community want to test it, the patch has to be added to the > kernel-source project, the iso image has to be built based on it and then we > would be able to test it in openQA. If you take my word for it, the patch fixes it. I just tested it. I assume Heiko Carstens will send the patch upstream and once I have some upstream reference, I will push it to stable branch.
------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT------- > If you take my word for it, the patch fixes it. I just tested it. I assume > Heiko Carstens will send the patch upstream and once I have some upstream > reference, I will push it to stable branch. Yes, I will take care of upstreaming this. If you want to have Tested-by and/or Reported-by tags please provide them here, and I'll add them to the patch. Thanks a lot for reporting, analyzing, and verifying!
(In reply to LTC BugProxy from comment #10) > ------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT------- > > If you take my word for it, the patch fixes it. I just tested it. I assume > > Heiko Carstens will send the patch upstream and once I have some upstream > > reference, I will push it to stable branch. > > Yes, I will take care of upstreaming this. > If you want to have Tested-by and/or Reported-by tags please provide them > here, and I'll add them to the patch. > > Thanks a lot for reporting, analyzing, and verifying! Thanks a lot. AFAIC I don't need those tags.
Thanks for the verification, Miroslav. @Heiko You can send the fix upstream to the Linux kernel. We are living "upstream first" (equal to Debian and Fedora). The Kernel Maintainers will update our packages then, and I will close this bug report as soon as the patch passes our openQA tests and all tests are green again.
------- Comment From WINTERA@de.ibm.com 2024-08-01 07:04 EDT------- FYI, a simple way to detect the problem / verify the fix: Symptom: In a z/VM guest you expect: > ls /sys/devices/iucv/ hvc_iucv0/ uevent But instead you see e.g.: > ls /sys/devices/iucv/ hvc_iucv8780520/ uevent
Yesterday, we received the next kernel-source update for openSUSE Tumbleweed. @Heiko Is your kernel contribution accepted already so that you can reference it here in the bug? Hi, Sandy! Nice to meet you in our openSUSE Bugzilla. :) You should know that our community Members are using z/VMs in the LinuxONE Community Cloud. Yes. I am responsible for all VMs, but the setup is OpenStack based, and you have to go the workaround via a SLES setup and an upgrade to openSUSE Tumbleweed so that you can test that all. What is, if the kernel is not working? In our case, we also can not install openSUSE Tumbleweed from scratch on z/VM based on this bug. What, if I install the former kernel and after the kernel update all is damaged again? Therefore, we are doing our verification via openQA. What a pity I was not allowed to meet you in person this year (in Böblingen or Frankfurt). My next IBM conference will be in Las Vegas.
(In reply to LTC BugProxy from comment #10) > ------- Comment From h.carstens@de.ibm.com 2024-07-31 10:38 EDT------- > Yes, I will take care of upstreaming this. > If you want to have Tested-by and/or Reported-by tags please provide them > here, and I'll add them to the patch. > > Thanks a lot for reporting, analyzing, and verifying! Do you have a time estimation when you will submit this request? This bug blocks new releases for Tumbleweed on s390x now since two weeks.
I pushed the known fix to stable. However, I would like to keep this bug open until it can be eventually refreshed with the upstream version.
------- Comment From WINTERA@de.ibm.com 2024-08-12 08:40 EDT------- The patch is currently in linux-next as 2dca436ca7e3 ("s390/iucv: Fix vargs handling in iucv_alloc_device()")
This is an autogenerated message for OBS integration: This bug (1228425) was mentioned in https://build.opensuse.org/request/show/1194289 Factory / kernel-source
------- Comment From WINTERA@de.ibm.com 2024-08-16 08:51 EDT------- There was some discussion on the mailing list about the fix in linux-next. We are working on an improved version.
(In reply to LTC BugProxy from comment #19) > ------- Comment From WINTERA@de.ibm.com 2024-08-16 08:51 EDT------- > There was some discussion on the mailing list about the fix in > linux-next. > > We are working on an improved version. Yes, I can see here*. I intend to carry the current version, until something else lands in Linus tree. * https://lore.kernel.org/linux-s390/cover.thread-d8267b.your-ad-here.call-01723545029-ext-2515@work.hours/T/
We want to have a working openSUSE Tumbleweed again. :) As long as the fix is working, that is ok as a first step. I don't have any problems with optimizations or refactoring. I am happy, if the mainframe can provide the best performance also with Linux on Z. But "working code" has got the highest priority. ^^ Therefore, this bug report has the priority "Critical".
Our tests are green again: https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20240817&groupid=34 Thank you, Sandy (Alexandra) and Heiko for the fix/updates! Thank you to Miroslav for your support from SUSE side!
tested by openQA
------- Comment From WINTERA@de.ibm.com 2024-08-19 02:53 EDT------- For the records: As you probably saw in the upstream discussions, Heiko's patch causes a compile error when iucv is compiled as a module. This is NOT an issue for hvc over iucv usage of OpenSuse or other enterprise distros. hvc can only be built in and depends on iucv, so iucv will also be built in. ------- Comment From WINTERA@de.ibm.com 2024-08-19 02:55 EDT------- (In reply to comment #22) > For the records: > As you probably saw in the upstream discussions, Heiko's patch causes a > compile error when iucv is compiled as a module. > This is NOT an issue for hvc over iucv usage of OpenSuse or other enterprise > distros. hvc can only be built in and depends on iucv, so iucv will also be > built in. So it's totally fine for you to take Heiko's patch until we have an improved version upstream.
------- Comment From WINTERA@de.ibm.com 2024-08-23 03:33 EDT------- FYI: Corrected patch was accepted to the net repository: 0124fb0 ("s390/iucv: Fix vargs handling in iucv_alloc_device()") If everything goes well, it should go into kernel v6.10 Remember: the issue was introduced in v6.10-rc1 4452e8e ("s390/iucv: Provide iucv_alloc_device() / iucv_release_device()") Thanks again to OpenSUSE for noticing and reporting the issue.
After a quick test, I refreshed the "emergency" stable patch to 0124fb0ebf3b. It should stay in place until the entire kernel is rebased to the version containing the patch. Therefore, I consider the issue closed.