Bug 299623

Summary: radeon: display distorted (strange colors) during first Xserver start (M24)
Product: [openSUSE] openSUSE 10.3 Reporter: Magnus Boman <mboman>
Component: X.OrgAssignee: Stefan Dirsch <sndirsch>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Major    
Priority: P5 - None CC: eich, lnussel, werner
Version: Beta 1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: SaX log
Xorg log
Log when it starts ok
Distorted (haldaemon is started)
Not distorted (haldaemon is not running)
haldaemon not running
haldaemon running
lsmod-earlyxdm
lsmod after "blacklist" of specific modules
par=off
par=off after boot
diff
Messages
Warn
Log files as requested

Description Magnus Boman 2007-08-12 05:39:52 UTC
When restarting the machine and it boots to run level 5, the first gray screen can be seen before the screen get all messes up by strange (green and yellow) colors.
When switching to tty1, it will take around 10 seconds before the console can be seen. Then going to run level 3 and back to 5 again and the screen is ok.
Attaching SaX and Xorg logs.
Comment 1 Magnus Boman 2007-08-12 05:40:10 UTC
Created attachment 156937 [details]
SaX log
Comment 2 Magnus Boman 2007-08-12 05:40:23 UTC
Created attachment 156938 [details]
Xorg log
Comment 3 Magnus Boman 2007-08-12 05:40:58 UTC
I've seen a couple of other bugs where the screen goes all black. Not sure if this one is related since I don't get a black screen.
Comment 4 Stefan Dirsch 2007-08-12 07:09:16 UTC
This sounds similar to Bug #297441, but here is no XEN kernel involved.
Comment 5 Stefan Dirsch 2007-08-12 07:11:12 UTC
Could you attach Xorg.0.log for the first (failed) and second (successful) run
of the Xserver? Thanks. 
Comment 6 Magnus Boman 2007-08-12 07:20:00 UTC
Created attachment 156940 [details]
Log when it starts ok
Comment 7 Magnus Boman 2007-08-12 07:20:35 UTC
The first log is from a failed attempt and the second log is after I gone run level 3->5
Comment 8 Stefan Dirsch 2007-08-12 08:22:48 UTC
Hmm. I don't think it's related to ACPI.

--- bug-299623_Xorg.0.log.bad   2007-08-12 10:16:36.000000000 +0200
+++ bug-299623_Xorg.0.log.good  2007-08-12 10:16:21.000000000 +0200
@@ -11,7 +11,7 @@
 Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 12 13:53:09 2007
+(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 12 17:10:22 2007
 (==) Using config file: "/etc/X11/xorg.conf"
 (==) ServerLayout "Layout[all]"
 (**) |-->Screen "Screen[0]" (0)
@@ -76,9 +76,7 @@
 (==) ModulePath set to "/usr/lib/xorg/modules/updates,/usr/lib/xorg/modules"
 (**) Option "AllowMouseOpenFail" "on"
 (**) Option "Xinerama" "off"
-(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)
-(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)
-(II) No APM support in BIOS or kernel
+(II) Open ACPI successful (/var/run/acpid.socket)
 (II) Loader magic: 0x81e6560
 (II) Module ABI versions:
        X.Org ANSI C Emulation: 0.3
@@ -826,7 +824,6 @@
 (II) XINPUT: Adding extended input device "Mouse[1]" (type: MOUSE)
 (II) XINPUT: Adding extended input device "Keyboard[0]" (type: KEYBOARD)
 (II) Mouse[1]: ps2EnableDataReporting: succeeded
-(II) Open ACPI successful (/var/run/acpid.socket)
 (II) RADEON(0): RADEONRestoreMemMapRegisters() : 
 (II) RADEON(0):   MC_FB_LOCATION   : 0x1fff0000
 (II) RADEON(0):   MC_AGP_LOCATION  : 0x27ff2000
Comment 9 Stefan Dirsch 2007-08-12 08:30:32 UTC
(II) PCI: 01:00:0: chip 1002,3154 card 1014,0570 rev 80 class 03,00,00 hdr 00
(--) Chipset ATI FireGL M24 GL 3154 (PCIE) found

I have a "ATI V3200/M24 PCIe" card. Maybe I can reproduce this issue. I'll give it a try.
Comment 10 Magnus Boman 2007-08-12 08:37:51 UTC
I booted to run level 3, then went to run level 5 and it worked fine.
Went back to run level 3, stopped acpid, deleted /var/run/acpid.socket then back to run level 5 and it was still fine. Had plenty of "(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)" even when it started fine.

FYI;
This is a fresh install of Beta1 with the GNOME pattern and accepted default settings from SaX
Comment 11 Stefan Dirsch 2007-08-12 08:40:12 UTC
Thanks for confirmation that it's *not* related to ACPI.
Comment 12 Stefan Dirsch 2007-08-13 17:15:52 UTC
I can only reproduce the ACPI issue (which is fixed meanwhile by replacing earlykdm in /etc/init.d/xdm with earlyxdm). I can't reproduce the distorted display issue with my "ATI V3200/M24 PCIe" card with nearly the same
xorg.conf.
Comment 13 Egbert Eich 2007-08-13 18:54:47 UTC
Could this be one of those instances again where the Xserver starts too early and something else is writing to the graphics hardware (during boot?).
This should be protected by mutexes in the kernel (ie. the kernel should prevent any hardware access once the device goes to graphics mode).
But we have seen similar problems in the past.
One should try if serializing the boot helps.
/etc/sysconfig/boot: RUN_PARALLEL=no 
should do the trick.
Comment 14 Magnus Boman 2007-08-13 20:56:01 UTC
Setting RUN_PARRALEL=no did not help.

Setting earlyxdm to off worked though
Comment 15 Stefan Dirsch 2007-08-13 21:05:26 UTC
(In reply to comment #14 from Magnus Boman)
> Setting RUN_PARRALEL=no did not help.
Ok.
 
> Setting earlyxdm to off worked though
Not sure what you mean with this? What did you do exactly?

Comment 16 Magnus Boman 2007-08-13 21:32:30 UTC
mblxws01:/home/mboman # chkconfig earlyxdm off

Then restarted and the GUI came up alright.
Comment 17 Stefan Dirsch 2007-08-14 00:42:47 UTC
So obviously Egbert was right. It's a timing issue. I wonder what "else" is writing to the graphics hardware in the background during/after the Xserver starts/has started ...
Comment 18 Magnus Boman 2007-08-15 07:26:53 UTC
Stefan,

I started to disable services during the startup (with chkconfig). Once I disabled haldaemon, the screen came up without distortion.

Not sure if it helps you but I'll attach to dmesg, one where the GUI comes up fine and one where the GUI is distorted.
Comment 19 Magnus Boman 2007-08-15 07:27:38 UTC
Created attachment 157575 [details]
Distorted (haldaemon is started)
Comment 20 Magnus Boman 2007-08-15 07:28:02 UTC
Created attachment 157576 [details]
Not distorted (haldaemon is not running)
Comment 21 Stefan Dirsch 2007-08-15 08:09:53 UTC
Very interesting! Could you also attach the lsmod output when haldaemon is running / not running.

When haldaemon enabled. Does it help to add 

  haldaemon

to Required-Start: of /etc/init.d/earlyxdm? Run insserv afterwards to adjust
the init script run order in /etc/init.d/rc*.d.
Comment 22 Magnus Boman 2007-08-15 08:22:02 UTC
Attaching lsmod when haldaemon is running v not running.
Adding haldaemon as Required-Start to earlyxdm did not help.
Comment 23 Magnus Boman 2007-08-15 08:22:32 UTC
Created attachment 157587 [details]
haldaemon not running
Comment 24 Magnus Boman 2007-08-15 08:22:52 UTC
Created attachment 157588 [details]
haldaemon running
Comment 25 Stefan Dirsch 2007-08-15 09:06:00 UTC
--- bug-299623_haldaemon_not_running.txt.sort
+++ bug-299623_haldaemon_running.txt.sort
@@ -3,0 +4 @@
+acpi_cpufreq           12808  1 
@@ -8,0 +10,3 @@
+cpufreq_conservative    11272  0 
+cpufreq_powersave       5888  0 
+cpufreq_userspace       8704  0 
@@ -37 +41 @@
-processor              40488  1 thermal
+processor              40488  2 acpi_cpufreq,thermal
@@ -51,0 +56 @@
+speedstep_lib           8708  0 
Comment 26 Stefan Dirsch 2007-08-15 09:32:27 UTC
Could you add the line

  lsmod > /tmp/lsmod-earlyxdm

in /etc/init.d/earlyxdm right before

  exec /etc/init.d/xdm ${1+"$@"}

and attach /tmp/lsmod-earlyxdm? With haldaemon init script active.
Comment 27 Magnus Boman 2007-08-15 09:54:56 UTC
Created attachment 157623 [details]
lsmod-earlyxdm

Information as requested
Comment 28 Egbert Eich 2007-08-15 10:10:47 UTC
What we would need is a listing of all kernel modules that are loaded prior to xdm startup. (Also it would be instructive to see if this list has changed during xdm start).
Magnus, could you add a line like:
lsmod > /tmp/modlist.xdm 
just prior to the exec of xdm in earlyxdm?
maybe you could also change the line:
exec /etc/init.d/xdm ${1+"$@"}
to
/etc/init.d/xdm ${1+"$@"}; lsmod > /tmp/modlist.xdm

As a note: 
Even this may not be fully conclusive. X is starting in the background while some other drivers are also initialized. 
While in theory there shouldn't be any conflicts the situation could become quite racey.

Note2: This could be a duplicate of bug #297441.
Comment 29 Stefan Dirsch 2007-08-15 10:11:23 UTC
Thanks.

--- bug-299623_lsmod-earlyxdm.sort      2007-08-15 11:58:41.903718000 +0200
+++ bug-299623_haldaemon_running.txt.sort       2007-08-15 11:58:33.356274000 +0200
@@ -3,0 +4 @@
+acpi_cpufreq           12808  1 
@@ -8,0 +10,3 @@
+cpufreq_conservative    11272  0 
+cpufreq_powersave       5888  0 
+cpufreq_userspace       8704  0 
@@ -15 +19 @@
-firmware_class         13568  2 pcmcia,ipw2200
+firmware_class         13568  3 microcode,pcmcia,ipw2200
@@ -30,0 +35 @@
+microcode              14860  0 
@@ -36 +41 @@
-processor              40488  1 thermal
+processor              40488  2 acpi_cpufreq,thermal
@@ -50,0 +56 @@
+speedstep_lib           8708  0 

So we need to figure out, which module is the culprit here. For this blacklist
the modules in /etc/modprobe.d/blacklist. Add

blacklist acpi_cpufreq 
blacklist cpufreq_conservative
blacklist cpufreq_powersave
blacklist cpufreq_userspace
blacklist microcode  
blacklist speedstep_lib

Then comment out these lines one by one. Afterwards reboot (again and again).
I'm sorry. This is a lot of work for you  and it's boring I know.
Comment 30 Magnus Boman 2007-08-15 10:45:47 UTC
Stefan,
Not to worry. openSUSE 10.3 boots fast so not to much hard work for me :-)

The modules can not be blacklisted as they are modprobe'd from haldaemon and microcode.ctl. I edited the scripts to make sure they did not load. Unfortunately, the result was that the GUI still comes up distorted.
I'm attaching 'after' which is the lsmod after all the above modules have been rem'ed out.
Comment 31 Magnus Boman 2007-08-15 10:46:20 UTC
Created attachment 157626 [details]
lsmod after "blacklist" of specific modules
Comment 32 Stefan Dirsch 2007-08-15 11:18:39 UTC
So none of the modules is the culprit. Sure you can add some

  sleep <sec>

in /etc/init.d/earlyxdm right before

  exec /etc/init.d/xdm ${1+"$@"}

but this is no generic solution ...
Comment 33 Magnus Boman 2007-08-15 11:36:32 UTC
Yeah, it works if I delay it with 5 seconds or more. Anything less and the problem persists.
Comment 34 Stefan Dirsch 2007-08-15 12:26:31 UTC
That's bad.
Comment 35 Egbert Eich 2007-08-15 12:35:27 UTC
Magnus, I wanted to get back to you on IRC but you left while we were at lunch.
I think the modules list turned up by the diff was inconclusive.
We need to look at some other way to figure out what's causing the problem.
Can you disable the parallel mode for rc scripts to have more reproducible results ? RUN_PARALLEL=n in /etc/syconfig/boot should to the trick.
Comment 36 Magnus Boman 2007-08-15 20:37:39 UTC
Egbert, yeah, I'm in Australia and it was a bit late so had to go to bed :-)

I assume that you wanted me to get the output of lsmod before this line in earlyxdm;

   exec /etc/init.d/xdm ${1+"$@"}

and after the machine booted. Find them attached.
Comment 37 Magnus Boman 2007-08-15 20:38:04 UTC
Created attachment 157778 [details]
par=off
Comment 38 Magnus Boman 2007-08-15 20:38:23 UTC
Created attachment 157779 [details]
par=off after boot
Comment 39 Stefan Dirsch 2007-08-16 08:25:24 UTC
Created attachment 157847 [details]
diff

diff -U0 bug-299623_lsmod-earlyxdm-par-off.txt.sort \
         bug-299623_lsmod-earlyxdm-par-off-after-boot.txt.sort | grep -v ^@@
Comment 40 Stefan Dirsch 2007-08-16 09:06:05 UTC
Seems to be a dbus/haldeamon (init script) combination problem.

+dbus/+haldeamon --> broken
+dbus/-haldeamon --> works
-dbus/+haldaemon --> works
+dbus/+haldeamon --> broken

+: enabled
-: disabled

Adding haldaemon/dbus maintainers.
Comment 41 Danny Al-Gaaf 2007-08-16 09:35:14 UTC
I don't see what we (from the HAL/D-Bus side) can do here, or what the exact problem is. 
What you implies is that you have problems if HAL is running, but HAL doesn't touch your graphic card, it load only some ACPI drivers which we loaded in acpid or powersaved in the past.
Comment 42 Stefan Dirsch 2007-08-16 10:02:57 UTC
Is there a log, what haldaemon/dbus did exactly during boot (besides of dmesg)?
This might help us ...
Comment 43 Danny Al-Gaaf 2007-08-16 10:36:07 UTC
What exactly do you mean? What they do in the startscripts or what the programms itself do (hal don't touch your graphic card, it reads only stuff from proc, sysfs and maybe dmidecode)?
Comment 44 Magnus Boman 2007-08-16 11:05:22 UTC
Danny,

I disabled the following services (chkconfig <service> off);

postfix
powersaved
smartd
sshd
hotkey-setup
cups
nscd
fbset
splash
smbfs
setserial
resmgr
portmap
irq_balancer
boot.apparmor
auditd
alsasound
acpid
cron
syslog
network
haldaemon
SuSEfirewall2_setup
SuSEfirewall2_init
dbus
consolekit
boot.loadmodules
boot.md
boot.device-mapper
boot.fuse
boot.udev_retry
boot.ipconfig
boot.clock

If I turn them ALL back on, EXCEPT dbus and haldaemon, the GUI will work fine (ie NOT distorted). If I enable DBus AND haldaemon (with the rest of the services disabled OR enabled) the GUI WILL be distorted. That's the reason why we're asking you and Timo if you have any ideas on where to look what DBus and HAL is doing (not blaiming HAL/DBus, but they might load other modules/scripts that could be the cause of this issue)
Please also see comment#29 on what modules we already prevented from loading.
Comment 45 Danny Al-Gaaf 2007-08-16 12:07:06 UTC
If you don't start HAL may some resmgr related stuff wouldn't be done. This would be the only idea atm. Maybe ask lnussel.
Comment 46 Stefan Dirsch 2007-08-16 19:15:54 UTC
... which would mean that disabling regmgr alone fixes the isue as well. Magnus, could you check this, please?
Comment 47 Magnus Boman 2007-08-18 11:26:12 UTC
Disabling resmgr does not help.
Comment 48 Stefan Dirsch 2007-08-18 11:59:31 UTC
Maybe there is one thing you can still try.

1) disable dbus, haldaeomon, earlyxdm and xdm for boot
2) reboot
3) backup /var/log/warn and /var/log/messages
4) run "dbus; haldaemon; earlyxdm"
5) Attach backups for /var/log/{warn,messages} and the files themselves.

Does the problem persist?
Comment 49 Magnus Boman 2007-08-18 12:15:42 UTC
No, I don't get the problem doing it this way.
I tried to start the services in the following order (included network as it failed to start when dbus wasn't running);

dbus; haldaemon; earlyxdm
dbus; haldaemon; network; earlyxdm
earlyxdm; dbus; haldaemon; network
earlyxdm; dbus; haldaemon; network; xdm

Naturally restarted the machine between all of the above tests.

Find warn and messages attached.
Comment 50 Magnus Boman 2007-08-18 12:16:09 UTC
Created attachment 158325 [details]
Messages
Comment 51 Magnus Boman 2007-08-18 12:16:25 UTC
Created attachment 158326 [details]
Warn
Comment 52 Stefan Dirsch 2007-08-18 13:54:58 UTC
Strange, that it works now.

Since you didn't attach the backups of warn/messages, I still don't know what happens exactly when dbus/haldaemon starts. :-(

"dbus; haldaemon; network; earlyxdm" should be the right order. Also disable network during boot. 
Comment 56 Stefan Dirsch 2007-08-19 03:35:54 UTC
It seems you didn't get the idea. What I'm interesting in are the additional lines in /var/log/{warn,messages} when "dbus/haldaemon" are running (since the difference in loaded kernel modules didn't help us).
Comment 57 Magnus Boman 2007-08-19 04:30:50 UTC
Guess I didn't. Find the log files attached.
Comment 58 Magnus Boman 2007-08-19 04:31:21 UTC
Created attachment 158344 [details]
Log files as requested
Comment 59 Stefan Dirsch 2007-08-20 09:38:57 UTC
This is happening on a T43p.
Comment 60 Egbert Eich 2007-08-20 10:43:28 UTC
Rhe services that communicate with haldaemon do so over dbus, thus it seems natural that if dbus isn't working the problem doesn't exist.
Interestingly a 'sleep 5' at the start of earlyxdm fixed the problem, too.

I'd suggest the following approach:
1. Identify all system services that are started after haldaemon and before   earlyxdm.
2. move the 'sleep 5' from earlyxdm down thru the other system services down to
the haldaemon script. (perferrably at the beginning).
The service it's in once it stops working should be looked at.
Comment 62 Magnus Boman 2007-08-20 11:41:44 UTC
Using delay was very slow, so started to add "exit 0" to one script at a time instead. When adding "exit 0" to the start of kbd, the problem disappeared.
Comment 63 Magnus Boman 2007-08-20 12:22:40 UTC
After remarking the following line in /etc/init.d/kbd the issue disappears;

/bin/setfont -C $tty $CONSOLE_FONT $UMAP $SMAP
Comment 64 Stefan Dirsch 2007-08-20 12:42:39 UTC
Very interesting to see these changelog entries done after openSUSE 10.2!

kbd.changes:
-------------------------------------------------------------------
Mon Jun 18 14:07:53 CEST 2007 - jw@suse.de

- sleep 3 removed. bugzilla #284348

-------------------------------------------------------------------
Sat Jun 16 17:07:11 CEST 2007 - coolo@suse.de

- as discussed with jw and werner kbd does not need to run before
  xdm

Comment 65 Stefan Dirsch 2007-08-20 12:56:35 UTC
framebuffer is disabled on this machine.

egbert: the handling of vga and fb consoles is quite different. it's possible
egbert: that the problem goes away. the vga text console code path gets tested
egbert: much less often.
Comment 66 Stefan Dirsch 2007-08-20 13:31:20 UTC
So could you test again with an appropriate vesa mode set (vga=0x0XXX) in /boot/grub/menu.lst? Check with "hwinfo --framebuffer" which one to chose for 1400x1050@24bpp. Does this help?

Additionally verify if adding kbd to Required-Start: of earlyxdm helps. Run insserv afterwards.
Comment 67 Magnus Boman 2007-08-20 22:04:38 UTC
Stefan,

both adding the vga=0x0XXX and adding kbd as Required start to earlyxdm works  independently of each other.
So I guess the real solution is to add kdb to earlyxdm.

Thanks!
Comment 68 Stefan Dirsch 2007-08-21 00:36:07 UTC

*** This bug has been marked as a duplicate of bug 302010 ***
Comment 69 Egbert Eich 2007-08-21 04:39:23 UTC
Magnus, thanks for hanging in investigating this! I think I know where to look now and there is an easy workaround to avoid the problem.