|
Bugzilla – Full Text Bug Listing |
| Summary: | nbd fails after a while (strange output from module when loading ...) | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE Linux 10.1 | Reporter: | Forgotten User abccHJSkz0 <forgotten_abccHJSkz0> |
| Component: | Kernel | Assignee: | Pavel Machek <pavel> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | ||
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | SuSE Linux 10.0 | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Forgotten User abccHJSkz0
2005-11-15 11:59:54 UTC
The other bug-ID was #76874 ... The messages during load occur from hotplug scanning the blockdevice. Kay, can you take a look at how to disable that? Care to replace the line 182 in /etc/udev/rules.d/50-udev.rules: KERNEL=="ram*|loop*|fd*", GOTO="persistent_end" with KERNEL=="ram*|loop*|fd*|nbd*", GOTO="persistent_end" and see if the messages go away? Sorry, same procedure :-( added: KERNEL=="ram*|loop*|fd*|nbd*", GOTO="persistent_end" to the file named and loaded module via "modprobe nbd" afterwards ... nbd: registered device at major 43 nbd0: Request when not-ready end_request: I/O error, dev nbd0, sector 4294965120 Buffer I/O error on device nbd0, logical block 536870640 nbd0: Request when not-ready end_request: I/O error, dev nbd0, sector 4294965120 Buffer I/O error on device nbd0, logical block 536870640 nbd0: Request when not-ready end_request: I/O error, dev nbd0, sector 4294964992 Buffer I/O error on device nbd0, logical block 536870624 nbd0: Request when not-ready end_request: I/O error, dev nbd0, sector 4294964992 Buffer I/O error on device nbd0, logical block 536870624 nbd0: Request when not-ready end_request: I/O error, dev nbd0, sector 4294964992 ... but that is only minor part of the problem :-( Hi!! It seems to be a more generic kernel problem: A student of mine programs an alternative network block device (called dnbd). It operates over UDP instead of TCP and will be able to use more than one server in operation (for failover reasons). I get exactly the same problems as with the standard nbd kernel module: After a while the client slows down and finally the operation of the network block devices stops completely. So it seems not to be a specific problem of the network block devices (both perform rather well under the 9.3 kernel - updated versions - I never got the problems I have with the 10.0 kernel!!) Sorry forgot to click "provides the needed information" last time, so doing it this round :-) The information provided here might give some more insight ... Just tested it again with updated kernel on both sides (server and client). Both server and client are 10.0 installations. The client does not hang as early as described above (depending on type of machine - faster on some - after few Megabytes of reads, takes longer to hang on others - more than 50 MBytes ...) There was/seems to be an issue with the scheduler!? Kernel: Linux lsfks04 2.6.13-15.7-smp #1 SMP Tue Nov 29 14:32:29 UTC 2005 i686 athlon i386 GNU/Linux (different machines: server 64bit and 32bit architecture, client same but only 32bit binaries on the clients) So, can we close the bug? I have no idea what to do here? I will check it with newer kernels (got now the idea how to pivot_root/run-init from initial ramdisk). Hopefully the issue is fixed within them ... Identical problem with the SuSE10.1 Beta3 version. I'm using kernel "Linux lsfks20 2.6.16-rc1-git3-7-default #1 Mon Jan 30 21:52:12 UTC 2006 i686 i686 i386 GNU/Linux". I read somewhere that there was a problem with some scheduler, so that the nbd-client/kernel hangs after a while :-( Server is a patched SuSE10.0. Should I open a new bug for the SuSE10.1 !? Reassigning. "nbd-client/kernel hangs after a while" - I have no idea what's going wrong here. Did exhaustive tests: nbd hangs after a while, if not using the kernel command line option "elevator=noop" (did not test other elevators yet). It hangs when using ext2 or squashfs (the latter has problems itself - it has to be patched so it does not get into troubles because of scheduler itself). If the elevator option is given I did not encounter problems, at least for the tests I did (sent around 500MB over the net). Jens, looks like a block layer issue... I don't think nbd is supported, but I'll reassign to Pavel to take a look since it's his baby. Dirk, you should try and see if elevator=anticipatory works for you as well or if that hangs too. Looking at the nbd request handling, it is doing an illegal down() inside the request_fn. nbd should be offloading the actual transmit to a work queue handler. I just checked that: elevator=anticipatory seems to work like noop. The cfq scheduler (the default case) seems to deadlock the system in many cases (the kernel could be stopped with SysRQ-O still) ... I have checked in a CFQ fix that should make it work for you. Closing bug, please retest with the next beta kernel (not beta4, the one after that). *** Bug 183818 has been marked as a duplicate of this bug. *** |