Bug 117541

Summary: Install get stuck during Initialization, lost interrupts
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Roger Larsson <roger.larsson>
Component: KernelAssignee: Greg Kroah-Hartman <gregkh>
Status: RESOLVED FIXED QA Contact: Klaus Kämpf <kkaempf>
Severity: Blocker    
Priority: P5 - None    
Version: RC 1   
Target Milestone: ---   
Hardware: 32bit   
OS: SUSE Other   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 128931    
Attachments: The requested messages
Boor messages from AMD system
/proc/interrupts from AMD system
/proc/devices for AMD system
/proc/modules for AMD system
/proc/ide/sis from AMD system
lsmod output from AMD system
lspci from SuSE 9.2 booted system
boot.msg from 10.0 beta3
acpidmp from beta3
lsmod for beta3
lspci from beta3
lsusb from beta3
/proc/devices from beta3
/proc/ide/sis from beta3
/proc/interrupts from beta3
/proc/modules from beta3
y2log from beta3
boot.msg from 10.0 RC1 in safe settings
boot.msg from 10.0 RC1 with insmod=ide-generic
dmesg output from 10.0 RC1 with insmod=ide-generic
boot.msg from 2.6.13.2
dmesg output from 10.0 GM
boot.msg from 10.0 GM with insmod=ide-generic
dmesg output from 10.0 GM with insmod=ide-generic
y2log from 10.0 GM with insmod=ide-generic
boot.msg from 10.1 alpha 2
dmesg output from 10.1 alpha 2
boot.msg from 10.1 alpha 2 with irqpoll pci=usepirqmask
dmesg output from 10.1 alpha 2 with irqpoll pci=usepirqmask
boot.msg from 10.1 alpha 2 on new ULi based system
dmesg output from 10.1 alpha on ULi based system
one zip of boot.msg from different starts

Description Roger Larsson 2005-09-16 19:32:21 UTC
hdc CRN-8241B ATAPI CD/DVD-ROM drive  
ide1 at 0x170-0x177, 0x376 on irq 15  
hdc ATAPI 24x CD-ROM drive, 128 kB cache, DMA  
Uniform CD-ROM driver Revision 3.20 
- - - 
hdc: task_in_intr status=0x51 {DriveReady, SeekComplete, Error} 
                  error=0xb4 {AbortedCommand, LastFailedSense)=0x0b} 
failed opcode was 0xa1 
- - - 
after this it keeps spitting out 
lost interrupt...
Comment 1 Roger Larsson 2005-09-16 19:43:13 UTC
10.0 beta 3 was possible to install on the Thinkpad, but install of first 
CD was very slow (bug #113283), can it be related? 
 
When rebooting into 10.0 beta 3, I can read the 10.0 RC1 CD without getting 
the errors nor lost interrupts. 
 
I will retry RC1 with safe settings. 
Comment 2 Roger Larsson 2005-09-16 19:53:55 UTC
Install with safe settings seems to be a working workaround. 
Comment 3 Roger Larsson 2005-09-17 13:31:15 UTC
It is not working perfectly even in "safe settings", I run into an IO-error 
when reading unzip-5.52-2.i586.rpm. But retrying seems to work - still slow. 
 
Comment 4 Lukas Ocilka 2005-09-19 10:42:38 UTC
It seems to be a hardware or CD media problem. Attaching file /var/log/messages
would help us.
Comment 5 Roger Larsson 2005-09-19 19:50:54 UTC
Created attachment 50350 [details]
The requested messages

Or is it the messages from a failing (non "safe settings") install attempt you
want?
Comment 6 Roger Larsson 2005-09-20 16:49:57 UTC
Now tried to install on my main system - same problem! 
Quite different hardware (AMD Athlon, SiS chipset) 
Will attach some logs. 
CD Media is verified, once during CD-burn, once with the tool. 
Comment 7 Roger Larsson 2005-09-20 17:02:18 UTC
Created attachment 50442 [details]
Boor messages from AMD system

/var/log/messages was empthy!
Comment 8 Roger Larsson 2005-09-20 17:03:45 UTC
Created attachment 50443 [details]
/proc/interrupts from AMD system
Comment 9 Roger Larsson 2005-09-20 17:04:37 UTC
Created attachment 50444 [details]
/proc/devices for AMD system
Comment 10 Roger Larsson 2005-09-20 17:05:15 UTC
Created attachment 50445 [details]
/proc/modules for AMD system
Comment 11 Roger Larsson 2005-09-20 17:05:57 UTC
Created attachment 50446 [details]
/proc/ide/sis from AMD system
Comment 12 Roger Larsson 2005-09-20 17:06:33 UTC
Created attachment 50447 [details]
lsmod output from AMD system
Comment 13 Roger Larsson 2005-09-20 17:09:08 UTC
Created attachment 50448 [details]
lspci from SuSE 9.2 booted system
Comment 14 Roger Larsson 2005-09-20 18:26:46 UTC
Verified the MD5SUM of the RC1 CD1, it matches with  
http://ftp.opensuse.org/pub/opensuse/distribution/SL-10.0-OSS-RC1/iso/MD5SUMS  
  
# md5sum /dev/dvdrecorder  
e479f35810ead9238f0cca363ace87e6  /dev/dvdrecorder  
  
CD is correct but does not work on two completely different systems...  
Suggestions? (Did not see the checkbox below earlier) 
Comment 15 Roger Larsson 2005-09-20 19:40:36 UTC
Created attachment 50460 [details]
boot.msg from 10.0 beta3

Since beta3 worked a lot better on both systems. And that should be very
similar - I retried it on the AMD.

Notice differences in APIC (IOAPIC v. PIC), ACPI, irq assignments, and last but
not least IDE

 <7>Probing IDE interface ide0...
-<4>hda: HDS722512VLAT80, ATA DISK drive
-<4>hdb: ST3200822A, ATA DISK drive
+<4>hda: V33OA63AHDS722512VLAT80, ATA DISK drive
+<4>hdb: 3.01 ST3200822A, ATA DISK drive
 <4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
-<6>hda: max request size: 1024KiB
-<6>hda: 241254720 sectors (123522 MB) w/7938KiB Cache, CHS=16383/255/63,
UDMA(1
00)
+<6>hda: max request size: 128KiB
+<4>hda: cannot use LBA48 - full capacity 802213000 sectors (410733 MB)
+<6>hda: 268435456 sectors (137438 MB) w/8614KiB Cache, CHS=47189/85/200
+<4>hda: set_multmode: status=0x51 { DriveReady SeekComplete Error }
+<4>hda: set_multmode: error=0x04 { DriveStatusError }
+<4>ide: failed opcode was: 0xef
 <6>hda: cache flushes supported
-<6> hda: hda1 hda2
-<6>hdb: max request size: 1024KiB
-<6>hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63,
UDMA(1
00)
+<3>hda: INVALID GEOMETRY: 85 PHYSICAL HEADS?
+<6> hda: unknown partition table
+<6>hdb: max request size: 128KiB
+<6>hdb: 0 sectors (0 MB) w/9496KiB Cache, CHS=0/165/55
+<4>hdb: set_multmode: status=0x51 { DriveReady SeekComplete Error }
+<4>hdb: set_multmode: error=0x04 { DriveStatusError }
+<4>ide: failed opcode was: 0xef
 <6>hdb: cache flushes supported
-<6> hdb: hdb1 hdb2 hdb3 hdb4
+<3>hdb: INVALID GEOMETRY: 165 PHYSICAL HEADS?
Comment 16 Roger Larsson 2005-09-20 19:42:25 UTC
Created attachment 50461 [details]
acpidmp from beta3
Comment 17 Roger Larsson 2005-09-20 19:43:00 UTC
Created attachment 50462 [details]
lsmod for beta3
Comment 18 Roger Larsson 2005-09-20 19:43:44 UTC
Created attachment 50463 [details]
lspci from beta3
Comment 19 Roger Larsson 2005-09-20 19:44:19 UTC
Created attachment 50464 [details]
lsusb from beta3
Comment 20 Roger Larsson 2005-09-20 19:45:00 UTC
Created attachment 50465 [details]
/proc/devices from beta3
Comment 21 Roger Larsson 2005-09-20 19:45:46 UTC
Created attachment 50466 [details]
/proc/ide/sis from beta3
Comment 22 Roger Larsson 2005-09-20 19:46:20 UTC
Created attachment 50467 [details]
/proc/interrupts from beta3
Comment 23 Roger Larsson 2005-09-20 19:46:58 UTC
Created attachment 50468 [details]
/proc/modules from beta3
Comment 24 Roger Larsson 2005-09-20 19:47:36 UTC
Created attachment 50469 [details]
y2log from beta3
Comment 25 Roger Larsson 2005-09-20 20:04:17 UTC
Summary of AMD system: 
motherboard: ASRock K7S8X R3 (not that uncommon) 
disks: 
 hda: HDS722512VLAT80 
 hdb: ST3200822A 
 hdc: _NEC DVD_RW ND-2500A 
video: GeForce FX 5200 
Comment 26 Klaus Kämpf 2005-09-21 06:49:11 UTC
This all looks like (the usual :-() broken BIOS problem. 
 
Roger, you did try "failsafe" booting, didn't you ?! 
Comment 27 Roger Larsson 2005-09-21 07:01:06 UTC
I have not tried failsafe on the AMD. Failsafe on the Thinkpad did work. 
But failure to install RC1 without failsafe on two completely different 
systems when Beta3 did work - I would count this as a blocker! 
 
I say it again: beta3 worked for both this systems! 
 
I installed it on the Thinkpad, did everything but the final command on the 
AMD. 
 
Comment 28 Roger Larsson 2005-09-21 07:25:37 UTC
Created attachment 50515 [details]
boot.msg from 10.0 RC1 in safe settings

This is a true blocker, it does not even start
with safe settings!!!
(I am home with my 1 year old son, he requires on-demand play RIGHT NOW...)
Comment 29 Roger Larsson 2005-09-21 09:02:20 UTC
Have already sent the needed info, but have to check that checkbox... 
Comment 30 Roger Larsson 2005-09-21 11:44:51 UTC
Created attachment 50535 [details]
boot.msg from 10.0 RC1 with insmod=ide-generic

With ide-generic you come a bit further.
Comment 31 Roger Larsson 2005-09-21 11:47:26 UTC
Created attachment 50536 [details]
dmesg output from 10.0 RC1 with insmod=ide-generic

But you will get lots of task_in_intr, extremely slow, not practically
installable (I gave up)
Comment 32 Roger Larsson 2005-09-21 20:18:06 UTC
Created attachment 50564 [details]
boot.msg from 2.6.13.2

So I downloaded and recompiled 2.6.13.2
It boots nicely on the AMD system (but lacks modules to become fully
operational)
Comment 33 Olaf Kirch 2005-09-22 08:49:20 UTC
Jens, any idea? This looks like a mix of several problems to me 
Comment 34 Roger Larsson 2005-09-23 08:54:41 UTC
Is there any newer RC that I could test? 
(only first CD is needed) 
Comment 35 Roger Larsson 2005-09-30 16:24:25 UTC
So I guess 10.0 will be of no use for me then... 
Nobody is working on this bug, it is NEW not ASSIGNED. 
No reaction to questions or other input... 
 
Comment 36 Roger Larsson 2005-10-14 18:13:59 UTC
Tried the new GM release on the Thinkpad - did not work!
(not even with Failsafe)
I have not tried on the AMD yet...
Comment 37 Roger Larsson 2005-10-17 07:16:31 UTC
Tried on the AMD now - lost interrupts etc. during udev startup.
Did not start properly. I have not tried Failsafe nor ide-generic on
the AMD yet.
Comment 38 Roger Larsson 2005-10-18 08:58:36 UTC
Created attachment 54503 [details]
dmesg output from 10.0 GM
Comment 39 Roger Larsson 2005-10-18 08:59:29 UTC
Created attachment 54504 [details]
boot.msg from 10.0 GM with insmod=ide-generic
Comment 40 Roger Larsson 2005-10-18 09:00:16 UTC
Created attachment 54506 [details]
dmesg output from 10.0 GM with insmod=ide-generic
Comment 41 Roger Larsson 2005-10-18 09:01:33 UTC
Created attachment 54508 [details]
y2log from 10.0 GM with insmod=ide-generic
Comment 42 Roger Larsson 2005-10-18 09:09:06 UTC
As you can see I have now tried to run with
failsafe - did not work any better
insmod=ide-generic - works "best"
but see the dmesg log... FULL of stuff like this
 hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
 hda: task_in_intr: error=0x10 { SectorIdNotFound }, CHS=15790/10/129, sector=268435328
 ide: failed opcode was: unknown

What is happening here? It is only hda in the list...
Could it be some incompability with that?
BTW Knoppix 4.0.2 DVD from Linux Magazine works!
Comment 43 Matiss Piesins 2005-10-21 22:48:23 UTC
I'v got a very similar problem - install failed due to all these ide opcodes/errors/lost interrupts on a quite reliable desktop machine with 2 CDROMs and VIA chipset. It never had any such problems with a bunch of distoros I had tried on it over time from slackware 9 to some bleeding edge live cd's (with 2.6.13 kernel).

Booting in safemode was even worse - it froze up completely, whereas the normal or noacpi case just left the ide light on, but with otherwise responsible system.

Disconencting the other CD-drive helped...  so I could install and than, later on reconnect that drive, update CD-integration and it all worked as it should.

Is the install cd kernel different from the default? If so - there might be something gooten bad with ide or mounting.
Comment 44 Roger Larsson 2005-10-29 21:28:03 UTC
Created attachment 55980 [details]
boot.msg from 10.1 alpha 2

Tried 10.1 alpha 2 - same problem...
Comment 45 Roger Larsson 2005-10-29 21:28:57 UTC
Created attachment 55981 [details]
dmesg output from 10.1 alpha 2
Comment 46 Roger Larsson 2005-10-29 21:39:44 UTC
BTW since this affects SiS systems with both Intel and AMD CPU
my guess is that the problem is in the south bridge driver/hw:
My south bridge is: SiS963L
Comment 47 Roger Larsson 2005-11-09 00:34:06 UTC
Created attachment 56736 [details]
boot.msg from 10.1 alpha 2 with irqpoll pci=usepirqmask

Found a possible workaround in bug #128931
Did not work.
But notice that a common theme when things do not work is that disk geometry is completely wrong!
Comment 48 Roger Larsson 2005-11-09 00:35:40 UTC
Created attachment 56737 [details]
dmesg output from 10.1 alpha 2 with irqpoll pci=usepirqmask
Comment 49 Roger Larsson 2005-11-14 01:32:53 UTC
Severe system breakdown!

Suddenly the AMD failed to boot, nothing on screen.
And since I really needed the system this weekend I went out shopping...

Bought a new motherboard (ASRock 939Dual-SATA2, ULi M1695+M1567 chipset)
and a matching processor AMD64 3500+ and put them in a new case.

I will try to find out the cause of the failure of my old system.
But will also try installation on the new one (same disks, DRAM, DVD, etc.)
Comment 50 Roger Larsson 2005-11-15 07:52:01 UTC
Created attachment 57337 [details]
boot.msg from 10.1 alpha 2 on new ULi based system

There are no problems with "10.1 alpha 2"
on the ULi based system.
* Same DISKs (other but identical IDE cable - 80pin)
* Same DVD
* Same Floppy
* Same VGA board - Nvidia AGP
* Same RAM (but right now only one module - 512 MB)
* Same Power supply
* New case, with case cooler
* New Motherboard - ASRock 939Dual-SATA2
* New CPU
* New CPU cooler
Cooling is better (but also louder) on the new system
Comment 51 Roger Larsson 2005-11-15 07:53:18 UTC
Created attachment 57338 [details]
dmesg output from 10.1 alpha on ULi based system
Comment 52 Jens Axboe 2005-11-15 14:31:32 UTC
Roger, I can't read any of your gzip'ed attachments. They come out garbled here, are you sure you attached the right ones?
Comment 53 Roger Larsson 2005-11-15 20:48:34 UTC
Created attachment 57417 [details]
one zip of boot.msg from different starts

Same for me... strange...
Trying to zip everything in one file (had it on a floppy)
Comment 54 Roger Larsson 2005-11-15 20:50:15 UTC
I have verified that I can download and read the new attachment.
Comment 55 Jens Axboe 2006-09-12 13:08:28 UTC
Reopen for 10.1 if the bug still exists.