|
Bugzilla – Full Text Bug Listing |
| Summary: | radeon: display distorted (strange colors) during first Xserver start (M24) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.3 | Reporter: | Magnus Boman <mboman> |
| Component: | X.Org | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED DUPLICATE | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | eich, lnussel, werner |
| Version: | Beta 1 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
SaX log
Xorg log Log when it starts ok Distorted (haldaemon is started) Not distorted (haldaemon is not running) haldaemon not running haldaemon running lsmod-earlyxdm lsmod after "blacklist" of specific modules par=off par=off after boot diff Messages Warn Log files as requested |
||
|
Description
Magnus Boman
2007-08-12 05:39:52 UTC
Created attachment 156937 [details]
SaX log
Created attachment 156938 [details]
Xorg log
I've seen a couple of other bugs where the screen goes all black. Not sure if this one is related since I don't get a black screen. This sounds similar to Bug #297441, but here is no XEN kernel involved. Could you attach Xorg.0.log for the first (failed) and second (successful) run of the Xserver? Thanks. Created attachment 156940 [details]
Log when it starts ok
The first log is from a failed attempt and the second log is after I gone run level 3->5 Hmm. I don't think it's related to ACPI.
--- bug-299623_Xorg.0.log.bad 2007-08-12 10:16:36.000000000 +0200
+++ bug-299623_Xorg.0.log.good 2007-08-12 10:16:21.000000000 +0200
@@ -11,7 +11,7 @@
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 12 13:53:09 2007
+(==) Log file: "/var/log/Xorg.0.log", Time: Sun Aug 12 17:10:22 2007
(==) Using config file: "/etc/X11/xorg.conf"
(==) ServerLayout "Layout[all]"
(**) |-->Screen "Screen[0]" (0)
@@ -76,9 +76,7 @@
(==) ModulePath set to "/usr/lib/xorg/modules/updates,/usr/lib/xorg/modules"
(**) Option "AllowMouseOpenFail" "on"
(**) Option "Xinerama" "off"
-(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)
-(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)
-(II) No APM support in BIOS or kernel
+(II) Open ACPI successful (/var/run/acpid.socket)
(II) Loader magic: 0x81e6560
(II) Module ABI versions:
X.Org ANSI C Emulation: 0.3
@@ -826,7 +824,6 @@
(II) XINPUT: Adding extended input device "Mouse[1]" (type: MOUSE)
(II) XINPUT: Adding extended input device "Keyboard[0]" (type: KEYBOARD)
(II) Mouse[1]: ps2EnableDataReporting: succeeded
-(II) Open ACPI successful (/var/run/acpid.socket)
(II) RADEON(0): RADEONRestoreMemMapRegisters() :
(II) RADEON(0): MC_FB_LOCATION : 0x1fff0000
(II) RADEON(0): MC_AGP_LOCATION : 0x27ff2000
(II) PCI: 01:00:0: chip 1002,3154 card 1014,0570 rev 80 class 03,00,00 hdr 00 (--) Chipset ATI FireGL M24 GL 3154 (PCIE) found I have a "ATI V3200/M24 PCIe" card. Maybe I can reproduce this issue. I'll give it a try. I booted to run level 3, then went to run level 5 and it worked fine. Went back to run level 3, stopped acpid, deleted /var/run/acpid.socket then back to run level 5 and it was still fine. Had plenty of "(WW) Open ACPI failed (/var/run/acpid.socket) (Connection refused)" even when it started fine. FYI; This is a fresh install of Beta1 with the GNOME pattern and accepted default settings from SaX Thanks for confirmation that it's *not* related to ACPI. I can only reproduce the ACPI issue (which is fixed meanwhile by replacing earlykdm in /etc/init.d/xdm with earlyxdm). I can't reproduce the distorted display issue with my "ATI V3200/M24 PCIe" card with nearly the same xorg.conf. Could this be one of those instances again where the Xserver starts too early and something else is writing to the graphics hardware (during boot?). This should be protected by mutexes in the kernel (ie. the kernel should prevent any hardware access once the device goes to graphics mode). But we have seen similar problems in the past. One should try if serializing the boot helps. /etc/sysconfig/boot: RUN_PARALLEL=no should do the trick. Setting RUN_PARRALEL=no did not help. Setting earlyxdm to off worked though (In reply to comment #14 from Magnus Boman) > Setting RUN_PARRALEL=no did not help. Ok. > Setting earlyxdm to off worked though Not sure what you mean with this? What did you do exactly? mblxws01:/home/mboman # chkconfig earlyxdm off Then restarted and the GUI came up alright. So obviously Egbert was right. It's a timing issue. I wonder what "else" is writing to the graphics hardware in the background during/after the Xserver starts/has started ... Stefan, I started to disable services during the startup (with chkconfig). Once I disabled haldaemon, the screen came up without distortion. Not sure if it helps you but I'll attach to dmesg, one where the GUI comes up fine and one where the GUI is distorted. Created attachment 157575 [details]
Distorted (haldaemon is started)
Created attachment 157576 [details]
Not distorted (haldaemon is not running)
Very interesting! Could you also attach the lsmod output when haldaemon is running / not running. When haldaemon enabled. Does it help to add haldaemon to Required-Start: of /etc/init.d/earlyxdm? Run insserv afterwards to adjust the init script run order in /etc/init.d/rc*.d. Attaching lsmod when haldaemon is running v not running. Adding haldaemon as Required-Start to earlyxdm did not help. Created attachment 157587 [details]
haldaemon not running
Created attachment 157588 [details]
haldaemon running
--- bug-299623_haldaemon_not_running.txt.sort +++ bug-299623_haldaemon_running.txt.sort @@ -3,0 +4 @@ +acpi_cpufreq 12808 1 @@ -8,0 +10,3 @@ +cpufreq_conservative 11272 0 +cpufreq_powersave 5888 0 +cpufreq_userspace 8704 0 @@ -37 +41 @@ -processor 40488 1 thermal +processor 40488 2 acpi_cpufreq,thermal @@ -51,0 +56 @@ +speedstep_lib 8708 0 Could you add the line
lsmod > /tmp/lsmod-earlyxdm
in /etc/init.d/earlyxdm right before
exec /etc/init.d/xdm ${1+"$@"}
and attach /tmp/lsmod-earlyxdm? With haldaemon init script active.
Created attachment 157623 [details]
lsmod-earlyxdm
Information as requested
What we would need is a listing of all kernel modules that are loaded prior to xdm startup. (Also it would be instructive to see if this list has changed during xdm start).
Magnus, could you add a line like:
lsmod > /tmp/modlist.xdm
just prior to the exec of xdm in earlyxdm?
maybe you could also change the line:
exec /etc/init.d/xdm ${1+"$@"}
to
/etc/init.d/xdm ${1+"$@"}; lsmod > /tmp/modlist.xdm
As a note:
Even this may not be fully conclusive. X is starting in the background while some other drivers are also initialized.
While in theory there shouldn't be any conflicts the situation could become quite racey.
Note2: This could be a duplicate of bug #297441.
Thanks. --- bug-299623_lsmod-earlyxdm.sort 2007-08-15 11:58:41.903718000 +0200 +++ bug-299623_haldaemon_running.txt.sort 2007-08-15 11:58:33.356274000 +0200 @@ -3,0 +4 @@ +acpi_cpufreq 12808 1 @@ -8,0 +10,3 @@ +cpufreq_conservative 11272 0 +cpufreq_powersave 5888 0 +cpufreq_userspace 8704 0 @@ -15 +19 @@ -firmware_class 13568 2 pcmcia,ipw2200 +firmware_class 13568 3 microcode,pcmcia,ipw2200 @@ -30,0 +35 @@ +microcode 14860 0 @@ -36 +41 @@ -processor 40488 1 thermal +processor 40488 2 acpi_cpufreq,thermal @@ -50,0 +56 @@ +speedstep_lib 8708 0 So we need to figure out, which module is the culprit here. For this blacklist the modules in /etc/modprobe.d/blacklist. Add blacklist acpi_cpufreq blacklist cpufreq_conservative blacklist cpufreq_powersave blacklist cpufreq_userspace blacklist microcode blacklist speedstep_lib Then comment out these lines one by one. Afterwards reboot (again and again). I'm sorry. This is a lot of work for you and it's boring I know. Stefan, Not to worry. openSUSE 10.3 boots fast so not to much hard work for me :-) The modules can not be blacklisted as they are modprobe'd from haldaemon and microcode.ctl. I edited the scripts to make sure they did not load. Unfortunately, the result was that the GUI still comes up distorted. I'm attaching 'after' which is the lsmod after all the above modules have been rem'ed out. Created attachment 157626 [details]
lsmod after "blacklist" of specific modules
So none of the modules is the culprit. Sure you can add some
sleep <sec>
in /etc/init.d/earlyxdm right before
exec /etc/init.d/xdm ${1+"$@"}
but this is no generic solution ...
Yeah, it works if I delay it with 5 seconds or more. Anything less and the problem persists. That's bad. Magnus, I wanted to get back to you on IRC but you left while we were at lunch. I think the modules list turned up by the diff was inconclusive. We need to look at some other way to figure out what's causing the problem. Can you disable the parallel mode for rc scripts to have more reproducible results ? RUN_PARALLEL=n in /etc/syconfig/boot should to the trick. Egbert, yeah, I'm in Australia and it was a bit late so had to go to bed :-)
I assume that you wanted me to get the output of lsmod before this line in earlyxdm;
exec /etc/init.d/xdm ${1+"$@"}
and after the machine booted. Find them attached.
Created attachment 157778 [details]
par=off
Created attachment 157779 [details]
par=off after boot
Created attachment 157847 [details]
diff
diff -U0 bug-299623_lsmod-earlyxdm-par-off.txt.sort \
bug-299623_lsmod-earlyxdm-par-off-after-boot.txt.sort | grep -v ^@@
Seems to be a dbus/haldeamon (init script) combination problem. +dbus/+haldeamon --> broken +dbus/-haldeamon --> works -dbus/+haldaemon --> works +dbus/+haldeamon --> broken +: enabled -: disabled Adding haldaemon/dbus maintainers. I don't see what we (from the HAL/D-Bus side) can do here, or what the exact problem is. What you implies is that you have problems if HAL is running, but HAL doesn't touch your graphic card, it load only some ACPI drivers which we loaded in acpid or powersaved in the past. Is there a log, what haldaemon/dbus did exactly during boot (besides of dmesg)? This might help us ... What exactly do you mean? What they do in the startscripts or what the programms itself do (hal don't touch your graphic card, it reads only stuff from proc, sysfs and maybe dmidecode)? Danny, I disabled the following services (chkconfig <service> off); postfix powersaved smartd sshd hotkey-setup cups nscd fbset splash smbfs setserial resmgr portmap irq_balancer boot.apparmor auditd alsasound acpid cron syslog network haldaemon SuSEfirewall2_setup SuSEfirewall2_init dbus consolekit boot.loadmodules boot.md boot.device-mapper boot.fuse boot.udev_retry boot.ipconfig boot.clock If I turn them ALL back on, EXCEPT dbus and haldaemon, the GUI will work fine (ie NOT distorted). If I enable DBus AND haldaemon (with the rest of the services disabled OR enabled) the GUI WILL be distorted. That's the reason why we're asking you and Timo if you have any ideas on where to look what DBus and HAL is doing (not blaiming HAL/DBus, but they might load other modules/scripts that could be the cause of this issue) Please also see comment#29 on what modules we already prevented from loading. If you don't start HAL may some resmgr related stuff wouldn't be done. This would be the only idea atm. Maybe ask lnussel. ... which would mean that disabling regmgr alone fixes the isue as well. Magnus, could you check this, please? Disabling resmgr does not help. Maybe there is one thing you can still try.
1) disable dbus, haldaeomon, earlyxdm and xdm for boot
2) reboot
3) backup /var/log/warn and /var/log/messages
4) run "dbus; haldaemon; earlyxdm"
5) Attach backups for /var/log/{warn,messages} and the files themselves.
Does the problem persist?
No, I don't get the problem doing it this way. I tried to start the services in the following order (included network as it failed to start when dbus wasn't running); dbus; haldaemon; earlyxdm dbus; haldaemon; network; earlyxdm earlyxdm; dbus; haldaemon; network earlyxdm; dbus; haldaemon; network; xdm Naturally restarted the machine between all of the above tests. Find warn and messages attached. Created attachment 158325 [details]
Messages
Created attachment 158326 [details]
Warn
Strange, that it works now. Since you didn't attach the backups of warn/messages, I still don't know what happens exactly when dbus/haldaemon starts. :-( "dbus; haldaemon; network; earlyxdm" should be the right order. Also disable network during boot. It seems you didn't get the idea. What I'm interesting in are the additional lines in /var/log/{warn,messages} when "dbus/haldaemon" are running (since the difference in loaded kernel modules didn't help us).
Guess I didn't. Find the log files attached. Created attachment 158344 [details]
Log files as requested
This is happening on a T43p. Rhe services that communicate with haldaemon do so over dbus, thus it seems natural that if dbus isn't working the problem doesn't exist. Interestingly a 'sleep 5' at the start of earlyxdm fixed the problem, too. I'd suggest the following approach: 1. Identify all system services that are started after haldaemon and before earlyxdm. 2. move the 'sleep 5' from earlyxdm down thru the other system services down to the haldaemon script. (perferrably at the beginning). The service it's in once it stops working should be looked at. Using delay was very slow, so started to add "exit 0" to one script at a time instead. When adding "exit 0" to the start of kbd, the problem disappeared. After remarking the following line in /etc/init.d/kbd the issue disappears; /bin/setfont -C $tty $CONSOLE_FONT $UMAP $SMAP Very interesting to see these changelog entries done after openSUSE 10.2! kbd.changes: ------------------------------------------------------------------- Mon Jun 18 14:07:53 CEST 2007 - jw@suse.de - sleep 3 removed. bugzilla #284348 ------------------------------------------------------------------- Sat Jun 16 17:07:11 CEST 2007 - coolo@suse.de - as discussed with jw and werner kbd does not need to run before xdm framebuffer is disabled on this machine. egbert: the handling of vga and fb consoles is quite different. it's possible egbert: that the problem goes away. the vga text console code path gets tested egbert: much less often. So could you test again with an appropriate vesa mode set (vga=0x0XXX) in /boot/grub/menu.lst? Check with "hwinfo --framebuffer" which one to chose for 1400x1050@24bpp. Does this help? Additionally verify if adding kbd to Required-Start: of earlyxdm helps. Run insserv afterwards. Stefan, both adding the vga=0x0XXX and adding kbd as Required start to earlyxdm works independently of each other. So I guess the real solution is to add kdb to earlyxdm. Thanks! *** This bug has been marked as a duplicate of bug 302010 *** Magnus, thanks for hanging in investigating this! I think I know where to look now and there is an easy workaround to avoid the problem. |