Bug 152730 - Massive XRender corruption ...
Summary: Massive XRender corruption ...
Status: RESOLVED FIXED
: 144659 159551 162166 164447 167235 179675 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: X.Org (show other bugs)
Version: Beta 4
Hardware: Other Other
: P1 - Urgent : Critical (vote)
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 174810
  Show dependency treegraph
 
Reported: 2006-02-22 11:05 UTC by Michael Meeks
Modified: 2006-05-30 10:01 UTC (History)
8 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
corrupt image (1.57 MB, image/png)
2006-02-22 11:07 UTC, Michael Meeks
Details
what it should look like (196.36 KB, image/png)
2006-02-22 11:08 UTC, Michael Meeks
Details
hwinfo ... (227.61 KB, text/plain)
2006-02-22 11:09 UTC, Michael Meeks
Details
x conf. (8.68 KB, text/plain)
2006-02-22 11:10 UTC, Michael Meeks
Details
presentation - press F9 to run it ... (382.25 KB, application/octet-stream)
2006-02-22 11:22 UTC, Michael Meeks
Details
log (40.31 KB, text/plain)
2006-02-24 10:41 UTC, Michael Meeks
Details
Patch for being able to build xorg 7.0 with --enable-debug (635 bytes, patch)
2006-03-30 10:56 UTC, Matthias Hopf
Details | Diff
Patch for fixing missing initialization of pPicture->format (618 bytes, patch)
2006-03-30 10:59 UTC, Matthias Hopf
Details | Diff
modified cairo canvas (3.82 MB, application/octet-stream)
2006-04-20 09:12 UTC, Radek Doulik
Details
simple presentation with broken slide (36.20 KB, application/vnd.oasis.opendocument.presentation)
2006-04-20 09:14 UTC, Radek Doulik
Details
simple test program trying to mimic what OOo does (2.33 KB, text/x-csrc)
2006-04-26 16:13 UTC, Radek Doulik
Details
image for test program (16.96 KB, image/png)
2006-04-26 16:15 UTC, Radek Doulik
Details
hwinfo log from hope.suse.cz (36.69 KB, application/x-bzip2)
2006-04-27 10:00 UTC, Petr Mladek
Details
part of log from valgrind - after pressing F5 - run slideshow - until the slide is correctly rendered (9.70 KB, text/plain)
2006-04-27 11:51 UTC, Radek Doulik
Details
valgrind log (41.75 KB, text/plain)
2006-04-27 16:38 UTC, Radek Doulik
Details
1st version of the fix (6.23 KB, patch)
2006-05-02 14:14 UTC, Radek Doulik
Details | Diff
2nd version of the patch with dx,dy tmp variables to avoid double dereferencing all the time (suggested by Michael) (10.00 KB, patch)
2006-05-02 16:27 UTC, Radek Doulik
Details | Diff
Updated patch (10.72 KB, patch)
2006-05-11 10:58 UTC, Matthias Hopf
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2006-02-22 11:05:23 UTC
I attach screenshots of this slide being rendered by OO.o HEAD with & without cairo (XRender) support. I have:

michael@linux:~> rpm -qa | grep xorg
xorg-x11-server-glx-6.9.0-15
xorg-x11-fonts-75dpi-6.9.0-15
xorg-x11-6.9.0-15
xorg-x11-driver-video-6.9.0-19
xorg-x11-sdk-6.9.0-15
xorg-x11-fonts-100dpi-6.9.0-15
xorg-x11-Xnest-6.9.0-15
xorg-x11-fonts-syriac-6.9.0-15
xorg-x11-Xvnc-6.9.0-15
xorg-x11-man-6.9.0-15
xorg-x11-libs-6.9.0-15
xorg-x11-devel-6.9.0-15
xorg-x11-fonts-scalable-6.9.0-15
xorg-x11-server-6.9.0-15
xorg-x11-fonts-cyrillic-6.9.0-15
michael@linux:~> rpm -qa | grep cairo
cairo-devel-1.0.2-13
cairo-doc-1.0.2-13
cairo-1.0.2-13
libsvg-cairo-0.1.6-6

To turn XRender on / off use Tools->Options->View-> 'Use Hardware acceleration' [ on right hand side ].
Comment 1 Michael Meeks 2006-02-22 11:07:01 UTC
Created attachment 69731 [details]
corrupt image
Comment 2 Michael Meeks 2006-02-22 11:08:09 UTC
Created attachment 69732 [details]
what it should look like
Comment 3 Michael Meeks 2006-02-22 11:09:56 UTC
Created attachment 69734 [details]
hwinfo ...

interestingly hwinfo claims it's a 16bpp mode, but AFAIR I configured it for 24bit. I'll attach my xorg.conf too.
Comment 4 Michael Meeks 2006-02-22 11:10:30 UTC
Created attachment 69735 [details]
x conf.
Comment 5 Michael Meeks 2006-02-22 11:22:48 UTC
Created attachment 69742 [details]
presentation - press F9 to run it ...

the problematic presentation.

Most curiously - turning off 'clone' [ re-configuring with sax2 -r & just not selecting that ], yields an Xserver crash rendering the 1st slide of that presentation - I'll try to get a trace.
Comment 6 Michael Meeks 2006-02-22 12:01:58 UTC
So - the crash is prolly related to the corruption - since I get both concurrently:

Xserver:

Program received signal SIGSEGV, Segmentation fault.
0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
573     {
(gdb) bt
#0  0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
#1  0xb77b53ec in fbFetchTransformed (pict=0x85f50b0, x=<value optimized out>, y=<value optimized out>, width=1399, 
    buffer=0xbffd84a0) at fbcompose.c:3159
#2  0xb77b30f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0, 
    yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
#3  0xb77c43cd in fbComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0, yMask=0, 
    xDst=0, yDst=0, width=1399, height=1050) at fbpict.c:1233
#4  0xb777aa91 in XAAComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=0, ySrc=0, xMask=0, yMask=0, 
    xDst=0, yDst=0, width=1399, height=1050) at xaaPict.c:538
#5  0x081688b6 in damageComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=0, ySrc=0, xMask=0, yMask=0, 
    xDst=0, yDst=0, width=1399, height=1050) at damage.c:539
#6  0x0817cdfe in ProcRenderComposite (client=0x8552e98) at render.c:755
#7  0x080c7810 in Dispatch () at dispatch.c:459
#8  0x080d44d5 in main (argc=9, argv=0xbffdec04, envp=Cannot access memory at address 0x8
) at main.c:450

or:
(gdb) bt full
#0  0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
No locals.
#1  0xb77b53ec in fbFetchTransformed (pict=0x85f50b0, x=<value optimized out>, y=<value optimized out>, width=1399, 
    buffer=0xbffd84a0) at fbcompose.c:3159
        y1 = 3748
        tl = <value optimized out>
        br = <value optimized out>
        x1_out = 0
        y2_out = <value optimized out>
        x1 = 1093
        y2 = 3749
        distx = 155
        idistx = 101
        b = (FbBits *) 0xb7c8da00
        r = <value optimized out>
        x2_out = 0
        x2 = <value optimized out>
        disty = 178
        tr = <value optimized out>
        bl = <value optimized out>
        x_off = 1407
        y1_out = <value optimized out>
        stride = 1408
        xoff = <value optimized out>
        yoff = <value optimized out>
        fetch = (fetchPixelProc) 0xb77af200 <fbFetchPixel_a8r8g8b8>
        v = {vector = {71670655, 245674569, 65536}}
        i = 1243
        box = {x1 = 314, y1 = 3194, x2 = 1175, y2 = 3854}
        indexed = (miIndexedPtr) 0x0
        affine = 1
#2  0xb77b30f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0, 
    yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
        region = {extents = {x1 = 0, y1 = 0, x2 = 1399, y2 = 1050}, data = 0x0}
        n = 0
        pbox = <value optimized out>
        srcRepeat = 0
        maskRepeat = 0
        w = <value optimized out>
        h = dwarf2_read_address: Corrupted DWARF expression.

Trace from client (run with XSync):

(gdb) bt
#0  0xb7002627 in ___newselect_nocancel () from /lib/libc.so.6
#1  0xb728ff93 in _XWaitForReadable (dpy=0x813e660) at XlibInt.c:502
#2  0xb729032f in _XRead (dpy=0x813e660, data=0xbf834228 "`�023\b�a`\003\200\211M�", size=32) at XlibInt.c:1080
#3  0xb7290dd4 in _XReply (dpy=0x813e660, rep=0xbf834228, extra=0, discard=1) at XlibInt.c:1712
#4  0xb728b66a in XSync (dpy=0x813e660, discard=0) at Sync.c:48
#5  0xb728b7e5 in _XSyncFunction (dpy=0x813e660) at Synchro.c:37
#6  0xb444ac02 in XRenderComposite (dpy=0x813e660, op=3, src=56625130, mask=0, dst=56625089, src_x=0, src_y=0, mask_x=0, 
    mask_y=0, dst_x=0, dst_y=0, width=1399, height=1050) at Composite.c:66
#7  0xb44b28aa in cairo_xlib_surface_set_drawable () from /usr/lib/libcairo.so.2
#8  0xb449f845 in cairo_surface_status () from /usr/lib/libcairo.so.2
#9  0xb4497d24 in cairo_font_options_create () from /usr/lib/libcairo.so.2
#10 0xb4498017 in cairo_font_options_create () from /usr/lib/libcairo.so.2
#11 0xb44983fc in cairo_font_options_create () from /usr/lib/libcairo.so.2
#12 0xb4491412 in cairo_paint () from /usr/lib/libcairo.so.2
#13 0xafa3fd7c in cairocanvas::CanvasHelper::implDrawBitmapSurface () from ./cairocanvas.uno.so
#14 0xafa3fe3f in cairocanvas::CanvasHelper::drawBitmap () from ./cairocanvas.uno.so
#15 0xafa36eae in canvas::CanvasBase<canvas::BaseMutexHelper<cppu::WeakComponentImplHelper3<com::sun::star::rendering::XBitmapCanvas, com::sun::star::rendering::XIntegerBitmap, com::sun::star::lang::XServiceInfo> >, cairocanvas::CanvasHelper, osl::Guard<osl::Mutex>, cppu::OWeakObject>::drawBitmap () from ./cairocanvas.uno.so
#16 0xb179f2fb in cppcanvas::internal::(anonymous namespace)::BitmapAction::render () from ./libcppcanvas680li.so
#17 0xb179ee95 in cppcanvas::internal::CachedPrimitiveBase::render () from ./libcppcanvas680li.so
#18 0xb17a76fe in cppcanvas::internal::(anonymous namespace)::ActionRenderer::operator() () from ./libcppcanvas680li.so
#19 0xb17a7725 in _STL::for_each<cppcanvas::internal::ImplRenderer::MtfAction const*, cppcanvas::internal::(anonymous namespace)::ActionRenderer> () from ./libcppcanvas680li.so

of which perhaps:

#6  0xb444ac02 in XRenderComposite (dpy=0x813e660, op=3, src=56625130, mask=0, 
    .. **width=1399**, height=1050) at Composite.c:66

looks most interesting - the display is of course 1400x1050 - perhaps an under-tested corner case ? Radek - why are we passing funny sizes in here ?
Comment 7 Michael Meeks 2006-02-22 12:23:00 UTC
Re-tested with 16bpp & 'clone' turned on & it works perfectly [ modulo color problems: apparently we loose some red; perhaps that follows from the above - but ...
Comment 8 Michael Meeks 2006-02-24 09:27:16 UTC
the color issue is unrelated; and now fixed in OO.o.
Comment 9 Stefan Dirsch 2006-02-24 09:38:04 UTC
Could you also attach /var/log/Xorg.0.log? Thanks.
Comment 10 Michael Meeks 2006-02-24 10:40:13 UTC
Sure - so I also took another stack-trace of X; this time it's seemingly way less of an obvious corner case:

(gdb) bt
#0  0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb840dc00, offset=486, indexed=0x0) at fbcompose.c:579
#1  0xb783b3ec in fbFetchTransformed (pict=0x84d6950, x=<value optimized out>, y=<value optimized out>, 
    width=1399, buffer=0xbfb87050) at fbcompose.c:3159
#2  0xb78390f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=243, 
    ySrc=4121, xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
#3  0xb784a3cd in fbComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=243, ySrc=4121, 
    xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbpict.c:1233
#4  0xb7800a91 in XAAComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=0, ySrc=0, 
    xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at xaaPict.c:538
#5  0x081688b6 in damageComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=0, ySrc=0, 
    xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at damage.c:539
#6  0x0817cdfe in ProcRenderComposite (client=0x85355a0) at render.c:755
#7  0x080c7810 in Dispatch () at dispatch.c:459
#8  0x080d44d5 in main (argc=9, argv=0xbfb8d7b4, envp=Cannot access memory at address 0x8
) at main.c:450
Comment 11 Michael Meeks 2006-02-24 10:41:47 UTC
Created attachment 70149 [details]
log

hth.
Comment 13 Matthias Hopf 2006-03-01 12:06:52 UTC
(In reply to comment #3)
> hwinfo ...
> 
> interestingly hwinfo claims it's a 16bpp mode, but AFAIR I configured it for
> 24bit. I'll attach my xorg.conf too.

hwinfo has no idea about whether the Xserver runs in 16 or 24bpp mode. Forget about that.

Will have to try to reproduce this first. This is a bad issue, especially the server crash.

> (--) RADEON(0): Chipset: "ATI Radeon Mobility M7 LW (AGP)" (ChipID = 0x4c57)

That chip, again. Sigh.
Stefan (Behlert), do we have a free laptop with that chipset?
Stefan (Dirsch), maybe we should disable RenderAccel for M7 completely... :-(
Comment 14 Michael Meeks 2006-03-09 13:55:40 UTC
*** Bug 144659 has been marked as a duplicate of this bug. ***
Comment 16 Michael Meeks 2006-03-16 19:54:37 UTC
Ah - I must meet this Criddel; does he run Xgl ? ;-)
Comment 17 Egbert Eich 2006-03-17 11:39:21 UTC
Matthias, are you investigating this?
I can try to investigate this - if I'm able to reproduce.
The crash is helpful to help pinpointing the problem. plain corruptions are a pain to debug without a simple test case.
Comment 18 Matthias Hopf 2006-03-17 13:35:30 UTC
I know. I haven't really started investigating this, so if you *can* reproduce this easily...
Comment 19 Radek Doulik 2006-03-21 17:19:36 UTC
Similar bug (specific for ATI hardware as well) is https://bugzilla.novell.com/show_bug.cgi?id=159551
Comment 20 Michael Meeks 2006-03-21 17:38:31 UTC
Matthias - if you need access to my machine where this is repeatable every time - I can happily set that up for you; NEEDINFO is not my preferred state for this bug ;-)
Comment 21 Matthias Hopf 2006-03-21 18:10:18 UTC
Egbert, are you already working on that? Otherwise I'll take a look into this issue this week. We should not duplicate efforts.
Comment 22 Egbert Eich 2006-03-21 18:25:12 UTC
Nope. I was waiting for your reply. Someting seems to step over the memory. It's not unlikely that its the same thing that's causing #159551.
Comment 23 Matthias Hopf 2006-03-22 16:12:40 UTC
Trying to reproduce now...
Comment 24 Matthias Hopf 2006-03-24 15:14:15 UTC
I can reproduce this on a RV200 QW. Though the display corruption looks completely different (seems like an image is to be copied, and a broken pointer is used) and I do not get any crashes. Xorg from CVS doesn't even paint the background white.

Option "RenderAccel" "off"   doesn't help anything.
fbdev works fine. Have to test nv, maybe this is a general XAA related bug.
Comment 25 Matthias Hopf 2006-03-24 15:31:24 UTC
Same issue on nv. This seems to be a bug of the base XAA Compose implementation.
Comment 26 Stefan Dirsch 2006-03-24 15:49:06 UTC
We could try "EXA" to verify this assumption. ;-) An XGI card, which uses the sis driver is available.
Comment 27 Matthias Hopf 2006-03-24 18:55:26 UTC
With current CVS I even get curruption with the nv driver with the NoAccel option, and the fbdev driver crashes even during startup.

I will recompile xorg 7.0 for future tests. Testing the XGI card now.
Comment 32 Egbert Eich 2006-03-29 19:17:55 UTC
Can you if a resource with the same ID was added before? What it's type and pointer are? It should be easily doable by adding some logging to resource.c:AddResource().
Comment 33 Matthias Hopf 2006-03-30 10:53:52 UTC
Current state:

- when configured for 16bit all drivers work. Even though most operations are appearantly done with 24bit visuals in offscreen buffers in OOo.
- when configured for 24bit all drivers except nvidia binary only break.
- when configured for 24bit even rendertest fails for blend/over.

I have some patches done, but they don't fix the OOo issue yet.
Comment 34 Matthias Hopf 2006-03-30 10:56:45 UTC
Created attachment 75710 [details]
Patch for being able to build xorg 7.0 with --enable-debug

Simple patch for being able to build the 7.0 branch (dunno about head right now) with --enable-debug.
Comment 35 Matthias Hopf 2006-03-30 10:59:52 UTC
Created attachment 75712 [details]
Patch for fixing missing initialization of pPicture->format

This fixes a missing initialization for pPicture->format, according to which a wrong Composite function was chosen. I'm not sure about the initialization value here, as pFormat is NULL and stays NULL for SolidPicture (and I guess so as well for GradientPicture).

With this patch rendertest runs through all tests without failures any more. OOo still has the broken images, though.
Comment 36 Matthias Hopf 2006-03-30 13:06:26 UTC
This issue is *not* MMX dependent. Same results with MMX disabled.
Comment 37 Matthias Hopf 2006-03-30 13:07:36 UTC
This is difference in the Composite operation calling sequence of 16 vs 24 bit for displaying the first slide:

--- /tmp/Xorg.composite.24      2006-03-30 14:26:24.000000000 +0200
+++ /tmp/Xorg.composite.16      2006-03-30 14:27:36.000000000 +0200
@@ -1,34 +1,40 @@
 
 X Window System Version 7.0.0
 Release Date: 21 December 2005
 X Protocol Version 11, Revision 0, Release 7.0
 Build Operating System:Linux 2.6.13-15.8-smp i686
 Current Operating System: Linux gkar 2.6.16-rc6-git1-4-default #1 Tue Mar 14 18
:04:33 UTC 2006 i686
 Build Date: 29 March 2006
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
 Module Loader present
 Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-(==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 30 14:25:31 2006
+(==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 30 14:26:50 2006
 (==) Using config file: "/etc/X11/xorg.conf"
 Could not init font path element /usr/X11R6/lib/X11/fonts/local, removing from 
list!
 Could not init font path element /usr/X11R6/lib/X11/fonts/CID, removing from li
st!
 Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20028888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20028888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
Comment 38 Matthias Hopf 2006-03-30 13:14:24 UTC
 Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) ->
10/742
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0

After these calls rendering has completed, so the remainder are for prerendering the next slide. It's clearly visible, that OpenOffice does something different for 16bit: It composes the image in 24bit first, and after that it is copied/converted to 16bit into the frame buffer. For 24bit visuals, it appearantly does everything in the framebuffer.
Comment 39 Matthias Hopf 2006-03-30 13:21:53 UTC
Stefan, could you please provide an Xorg package with the patch from attachment #75712 [details]? It should at least fix the segfault.

Petr, is it possible to tell OpenOffice to render into offscreen 24bit visuals first even if the base visual is a 24bit one, so that the same render path is used? That way we can make sure whether OpenOffice is doing something weird, or wether this is due to another bug in the render layer.
Comment 40 Petr Mladek 2006-03-30 13:38:26 UTC
Radek, you should know it better than me. Could you please answer the latest guestion?
Comment 41 Matthias Hopf 2006-03-30 13:39:50 UTC
*** Bug 159551 has been marked as a duplicate of this bug. ***
Comment 42 Radek Doulik 2006-03-30 14:01:33 UTC
How do I find out whether visual is offscreen?

We don't use visuals directly. I create pixmap for the cairo surface and pass it to cairo together with render format. It should be possible to ask for right render format if I know the visual though.

Here the code, which creates the pixmaps in OOo:

        Surface* Surface::getSimilar( Content aContent, int width, int height )
        {
                Pixmap hPixmap;

                if( mpSysData && mpDisplay && mhDrawable ) {
                        XRenderPictFormat *pFormat;
                        int     nFormat;

                        switch (aContent) {
                        case CAIRO_CONTENT_ALPHA:
                                nFormat = PictStandardA8;
                                break;
                        case CAIRO_CONTENT_COLOR:
                                nFormat = PictStandardRGB24;
                                break;
                        case CAIRO_CONTENT_COLOR_ALPHA:
                        default:
                                nFormat = PictStandardARGB32;
                                break;
                        }

                        pFormat = XRenderFindStandardFormat( (Display*) mpDisplay, nFormat );
                        hPixmap = XCreatePixmap( (Display*) mpDisplay, cairoHelperGetWindow( mpSysData ),
                                                                         width > 0 ? width : 1, height > 0 ? height : 1,
                                                                         pFormat->depth );

                        return new Surface( mpSysData, mpDisplay, (long) hPixmap, pFormat,
                                                                cairo_xlib_surface_create_with_xrender_format( (Display*) mpDisplay, hPixmap,

           DefaultScreenOfDisplay( (Display *) mpDisplay ),

           pFormat, width, height ) );
                } else
                        return new Surface( mpSysData, mpDisplay, 0, NULL, cairo_surface_create_similar( mpSurface, aContent, width, height ) );
        }
Comment 43 Stefan Dirsch 2006-03-30 14:03:29 UTC
> Stefan, could you please provide an Xorg package with the patch from 
> attachment #75712 [details] [edit]? It should at least fix the segfault.
done.
Comment 44 Matthias Hopf 2006-03-30 14:07:17 UTC
Thanks Stefan, Petr, Radek.

Those who had the Xserver segfaulting, please test the new packages from Stefan as soon as they show up in stable. They won't fix the rendering issues, though.
Comment 45 Michael Meeks 2006-03-30 17:19:06 UTC
Matthias - thanks for the great progress here.

Radek - can you adjust this in VCL/cairocanvas * send Matthias just those 2 libs for testing ?
Comment 46 Radek Doulik 2006-03-31 12:54:31 UTC
Well, I don't know how to tell if visual is offscreen or not.

Matthias, don't you mean an offscreen buffer? In that case there should not be any difference as cairo canvas always render it to a pixmap before it copies/composites that pixmap to the window.
Comment 48 Radek Doulik 2006-04-07 13:23:06 UTC
*** Bug 164447 has been marked as a duplicate of this bug. ***
Comment 49 Stefan Dirsch 2006-04-11 10:30:06 UTC
Radek, Matthias, any updates on this one?
Comment 50 Radek Doulik 2006-04-11 10:38:30 UTC
I am waiting for Matthias to answer my question to see if I can prepare a modified canvas for him.
Comment 51 Matthias Hopf 2006-04-11 10:52:02 UTC
Thanks Radek for the input, this is definitively helping. I have to get through my emails first (guess what stacks up in one week :-( ), and then scan the according glitz code first, in order to see what is really happening here.

I'll publish what I find out, and ask you again as soon as I know what I need to be tested.
Comment 52 Matthias Hopf 2006-04-11 10:54:11 UTC
Additionally:

(In reply to comment #44)
> Those who had the Xserver segfaulting, please test the new packages from Stefan
> as soon as they show up in stable. They won't fix the rendering issues, though.

Any news on that?
Comment 53 Radek Doulik 2006-04-11 11:44:00 UTC
I have tried xorg-x11-server-6.9.0-34 package and it doesn't crash anymore. Thanks for that fix. I see the rendering issues too.

We don't use glitz cairo backend IIRC, just cairo/Xlib(RENDER). The difference from other cairo enabled apps is that we let cairo create similar surface on pixmap created in OOo as we need to have access to the pixmap for text rendering. Normally apps just ask cairo for similar surface and cairo creates the pixmap internally.
Comment 54 Stefan Dirsch 2006-04-12 18:43:10 UTC
Could we lower severity here since the crash has been fixed?
Comment 55 Michael Meeks 2006-04-13 08:16:31 UTC
Preferably not - no. It's not acceptable to render people's slides like this:
https://bugzilla.novell.com/show_bug.cgi?id=152730

This is not a Glitz issue - but a core XRender issue: it doesn't work. Having done a ton of work for SLED10 to use XRender for nice anti-aliased slide-show rendering, it would be a tragedy to have to disable that for everyone because of a specific driver's buggy X server implementation.
Comment 56 Stefan Dirsch 2006-04-13 08:32:25 UTC
Sure that it works with any driver on 10.1 in 24bpp? IMHO it's broken with any accelerated driver on 10.1 in 24bpp. See comment #33 by Matthias. I'm afraid Xrender is somewhat broken in 24bpp since a long time, but nobody noticed it so far. :-(
Comment 57 Matthias Hopf 2006-04-13 14:33:39 UTC
Actually, it seems to work with the intel driver. Getting closer, the intel driver uses a 24bit RGB visual, while the other drivers all seem to use 32bit XRGB visuals: wdiff of composite operators (-radeon +intel):

Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst [-20020888-] {+18020888+} 9x16 0/0 (10/742) -> 10/742
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst [-20020888-] {+18020888+} 9x16 0/0 (10/742) -> 10/742
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0

The same is true for vesa, so this is the reason why the vesa driver works as well.

This is not a critical bug any more, as no crashes are involved, and it can be worked around (though we don't want that). Changing severity will not influence my priority in fixing it.
Comment 58 Radek Doulik 2006-04-18 16:15:47 UTC
*** Bug 167235 has been marked as a duplicate of this bug. ***
Comment 59 Matthias Hopf 2006-04-18 16:21:26 UTC
The differences of the chosen render paths between any nonworking and working configuration I've tried so far are by far too large to be able to track down this issue inside openoffice, I've pretty much given up with that.

I guess I have to bite into the bullet of creating a simpler test case that shows the same behavior.

I finally noticed *one* major difference: All working versions do not seem to use hardware based offscreen surfaces, i.e. all pDrawable->x/y are 0, while for the broken version draws to surface positions starting at 0/3595, 0/4363, 0/6667, or similar. These odd start positions really sound strange to me.

To debug this, we really need a simpler test case.
Comment 60 Radek Doulik 2006-04-18 16:37:42 UTC
Mattthias: is there anything I can do for you on OOo side?

What are these pDrawable->x/y? Does it define an offset in the offscreen memory on the card?
Comment 61 Matthias Hopf 2006-04-18 16:40:37 UTC
Just have been told this has not been open to public. Should be, changing product for that, as it is relevant for all SL 10.1 based products.
Comment 62 Matthias Hopf 2006-04-18 16:50:19 UTC
pDrawable->x/y define the starting point of a Drawable inside a Screen. A screen can be imagined as the frame buffer memory of a graphics card in this case.

Have to read more code so I know which render calls are invoked by glitz for the OOo case (as rendertest works fine for all cases it tests, and that includes some offscreen hardware surfaces - but as far as I can see no rendering *to* hardware surfaces). Radek, you could point me to the drawing routines in OOo for the Render case?
Comment 63 Radek Doulik 2006-04-18 18:41:09 UTC
We don't do much Render calls in OOo (there are some, but the problematic rendering comes from cairo canvas which uses the cairo) In cairo canvas I use only XRenderFindStandardFormat xrender call before I call cairo_xlib_surface_create_with_xrender_format.

Thinking more about it, I will try to disable calling cairo_xlib_surface_create_with_xrender_format and use cairo_surface_create_similar to see if it makes a difference (using cairo_xlib_surface_create_with_xrender_format is different from most of the other apps using cairo I guess).

Do you know if the problem lies for sure in the x server or might it be some problem in the client application as well (ie. in the cairo lib)?

The cairo canvas code is in ooo-build/build/src-m*/canvas/source/cairo if you want to take a look.
Comment 64 Radek Doulik 2006-04-19 08:02:19 UTC
*** Bug 162166 has been marked as a duplicate of this bug. ***
Comment 65 Radek Doulik 2006-04-19 09:39:48 UTC
xorg-x11-server-6.9.0-34 crashed for me today. Unfortunatelly I am unable to reproduce it :-(
Comment 66 Radek Doulik 2006-04-19 09:58:24 UTC
So using cairo_surface_create_similar doesn't help, it still renders images wrong.

I will try to write simple test using only cairo lib and no OOo code. One thing which came to my mind is that OOo is also probably creating windows with different visual than the default one.
Comment 67 Matthias Hopf 2006-04-19 13:35:27 UTC
(In reply to comment #63)
> We don't do much Render calls in OOo (there are some, but the problematic
> rendering comes from cairo canvas which uses the cairo) In cairo canvas I use

I'm sure that this is the case. It's just that appearantly only the specific combination used in OOo is triggering the bug.

> Do you know if the problem lies for sure in the x server or might it be some
> problem in the client application as well (ie. in the cairo lib)?

It *could* be a bug of cairo, but my personal guess (from the type of output we're getting) I very much suspect this is an Xserver problem.

> The cairo canvas code is in ooo-build/build/src-m*/canvas/source/cairo if you
> want to take a look.

Thanks.

(In reply to comment #66)
> I will try to write simple test using only cairo lib and no OOo code. One thing
> which came to my mind is that OOo is also probably creating windows with
> different visual than the default one.

Wow. That would be *extremely* helpfull.

BTW - current (CVS) Xorg server behaves even worse. Even on 16bit visuals render is partially broken (background color is black), and the vesa driver doesn't work at all, so this issue hasn't been fixed upstream in the last weeks.
Comment 68 Radek Doulik 2006-04-19 16:05:25 UTC
Tried to reproduce it with simple test program, but everything worked as supposed.

So I went back to cairo canvas. It looks like cairo canvas gets even wrong data from underlying vcl layer (part of OOo). So the data drawed are right from cairo view. I will have to dive into vcl code to see if I can spot where the difference is - it is weird as on nvidia I still get the correct data from the vcl.

When playing with it on ati card (radeon driver) it started working at one point, but after few slideshows the server crashed again. After restart it didn't work again - I suppose it might happened when X server run out of offscreen memory, so it used the working path you described. I have got about 3 or 4 crashes this afternoon :(
Comment 69 Radek Doulik 2006-04-20 09:08:19 UTC
I tracked the broken data (image content) in the vcl code to the XGetImage call in vcl salbmp.cxx:206. So we get the broken data from the server.

I tried to use the pixmap (which is used for XGetImage call) directly, but the server now crashes for me reliably. At least we have a reproducible way for the server crash again.

Here is the backtrace:

Program received signal SIGSEGV, Segmentation fault.
0xb77df211 in fbListInstalledColormaps () from /usr/X11R6/lib/modules/libfb.so
(gdb) bt
#0  0xb77df211 in fbListInstalledColormaps () from /usr/X11R6/lib/modules/libfb.so
#1  0xb77e4bf5 in fbCompositeGeneral () from /usr/X11R6/lib/modules/libfb.so
#2  0xb77f44bd in fbComposite () from /usr/X11R6/lib/modules/libfb.so
#3  0xb77aaa91 in XAAComposite () from /usr/X11R6/lib/modules/libxaa.so
#4  0x08168aa6 in DamageDamageRegion ()
#5  0x0817cfee in PanoramiXRenderReset ()
#6  0x080c79c0 in Dispatch ()
#7  0x080d4685 in main ()
(gdb)

I am attaching modified cairocanvas.uno.so and broken.odp document.

Matthias, please let me know if you are able to reproduce the crash. (replace your cairocanvas.uno.so with the attached one and try to run slideshow of broken.odp) It crashes on my machine every time I run it.

I hope it will help. Let me know if you need more info from me.
Comment 70 Radek Doulik 2006-04-20 09:12:00 UTC
Created attachment 79183 [details]
modified cairo canvas
Comment 71 Radek Doulik 2006-04-20 09:14:32 UTC
Created attachment 79184 [details]
simple presentation with broken slide
Comment 72 Radek Doulik 2006-04-24 09:01:38 UTC
Matthias: Any news on this bug? Were you able to reproduce the crash?

I tried newer server meanwhile (xorg-x11-server-6.9.0-39) and it still crashes for me.
Comment 73 Radek Doulik 2006-04-25 12:08:48 UTC
Increasing the severity as I am able to reproduce the crash again.
Comment 74 Matthias Hopf 2006-04-25 13:27:34 UTC
I cannot reproduce the crash, neither with xorg-x11-server-6.9.0-39, nor with a plain Xorg 7.0 with patches described here. I have copied the modified cairocanvas.uno.so to the OpenOffice program directory.

The simpler presentation and the bug description helps, though.

I'll try an OpenOffice update now.
Comment 75 Matthias Hopf 2006-04-26 09:26:12 UTC
Tried the update, no change. Also tried xorg-x11-server-6.9.0-44
I would really love to be able to reproduce the crash. The only difference in the configuration is that I nuked the synaptics mouse (this is no laptop). Are there any differences on your machine to a standard installation?

Radek, is bad as it sounds, we should stop using RENDER as default until this is fixed. This bug is not a blocker, as there is a workaround.
Comment 76 Matthias Hopf 2006-04-26 09:59:56 UTC
The much smaller sample reduced the amount of render actions massively:

=Comp 8 Op 3 src 20020888 mask 00000000 dst 20028888 9x16 0/0+0/0 (0/0) -> 0/0+0/0
fbCompose op 3 pMask 0 src 0:0 mask 0:0 dst 0:0 size 9x16
format 20020888 0 20028888 type 1 0 1 depth 24 0 32 drawable type 1 0 1 depth 24 0 32
=Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0+0/0 (10/742) -> 10/742+0/3409
fbCompose op 3 pMask 1 src 0:0 mask 10:742 dst 10:4151 size 9x16
format 20028888 20028888 20020888 type 1 1 1 depth 32 32 24 drawable type 1 1 1 depth 32 32 24
=Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0+614/1872 (0/0) -> 0/0+0/4177
fbCompose op 3 pMask 0 src 614:1872 mask 0:0 dst 0:4177 size 1023x768
format 20020888 0 20020888 type 1 0 1 depth 24 0 24 drawable type 1 0 1 depth 24 0 24
=Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0+0/0 (10/742) -> 10/742+0/3409

I will now single step each one.
Comment 77 Radek Doulik 2006-04-26 12:14:45 UTC
I have beta6 or beta8 with updated xorg. I will try to upgrade my whole system to latest release to see if the crash remain. My system is laptop with x300 ati card, radeon driver.

From the render ops I think the interesting one is the one with size 1023x768. The other 9x16 is hourglass icon which is renderer always and thus doesn't crash on other slides.

I will also try to trace the data in the vcl to see if it creates the pixmap (which is read as image later) correctly.
Comment 78 Matthias Hopf 2006-04-26 15:42:32 UTC
ok, first thing I do not completely understand:

For the very first compose operator pSrc is connected to a drawable already sitting in the framebuffer. How does it get there?

Do you have the source code of the test program (which worked according to comment #68) available?
Comment 79 Radek Doulik 2006-04-26 16:12:24 UTC
> For the very first compose operator pSrc is connected to a drawable already
> sitting in the framebuffer. How does it get there?

Not sure, I do not work directly with RENDER, but I use cairo which uses RENDER. I think the pSrc is a cairo surface (pixmap), which is produces by cairo from image surface.

> Do you have the source code of the test program (which worked according to
> comment #68) available?

I am attaching it. I compile it with:

gcc -Wall my-test.c -o my-test `pkg-config --cflags cairo` `pkg-config --libs cairo` -O0 -g

It also needs water.png in cwd.
Comment 80 Radek Doulik 2006-04-26 16:13:40 UTC
Created attachment 80327 [details]
simple test program trying to mimic what OOo does
Comment 81 Radek Doulik 2006-04-26 16:15:17 UTC
Created attachment 80328 [details]
image for test program
Comment 82 Stefan Dirsch 2006-04-26 21:45:01 UTC
(comment #75)
> Radek, is bad as it sounds, we should stop using RENDER as default until
> this is fixed. This bug is not a blocker, as there is a workaround.
Since it's unlikely that Matthias can still find a fix in time. Could we at least change the default for SUSE 10.1? Maybe we'll still find a fix for SLED10 ...
Comment 83 Radek Doulik 2006-04-27 07:54:48 UTC
> Since it's unlikely that Matthias can still find a fix in time. Could we at
> least change the default for SUSE 10.1? Maybe we'll still find a fix for SLED10

Let ask Michael about it.
Comment 84 Radek Doulik 2006-04-27 08:02:23 UTC
I have updated my whole system to beta10 and it still crashes for me reliably. Updating to xorg-x11..-44 doesn't help either.

The crash still looks related to the driver, it crashes my x300/ati/radeon xserver, while running remotely on nvidia 7800/nv xserver it doesn't crash and shows only pixmap with broken data.

I have also bt's with symbols (interestingly it crashes in other place now):

(gdb) bt
#0  0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb85ffc00, offset=0,
    indexed=0x0) at fbcompose.c:579
#1  0xb783d00b in fbFetchTransformed (pict=0x84eff58, x=<value optimized
out>,
    y=<value optimized out>, width=1399, buffer=0xbf9e1700) at
fbcompose.c:3163
#2  0xb783abf5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84eff58,
pMask=0x0,
    pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=3711,
    width=1399, height=1050) at fbcompose.c:3494
#3  0xb784a4bd in fbComposite (op=3 '\003', pSrc=0x84eff58, pMask=0x0,
    pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=0,
    width=1399, height=1050) at fbpict.c:1233
#4  0xb7800a91 in XAAComposite (op=3 '\003', pSrc=0x84eff58, pMask=0x0,
    pDst=0x847c380, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0,
    width=1399, height=1050) at xaaPict.c:538
#5  0x08168aa6 in DamageDamageRegion ()
#6  0x0817cfee in PanoramiXRenderReset ()
#7  0x080c79c0 in Dispatch ()
#8  0x080d4685 in main ()
(gdb)
Comment 85 Radek Doulik 2006-04-27 08:09:18 UTC
I have also tried to run my Xorg binary with valgrind, but it gives up pretty soon.

Any hint how to valdrind the x server?
Comment 86 Michael Meeks 2006-04-27 08:59:47 UTC
> Stefan Dirsch 2006-04-26 15:45 MST
> Since it's unlikely that Matthias can still find a fix in time. Could we at
> least change the default for SUSE 10.1? Maybe we'll still find a fix for 
> SLED10

You ask this of the man whose months of work halving OO.o startup time were wasted because we have no glibc maintainer ? And now - you want us to discard yet many more man-months of work because (it now turns out) XRender is broken.

That would be a sad outcome. We disabled this feature in the last SUSE release because of pitiful performance of the XRender impl. it'd be a shame to disable it again now because of bugs in the impl. I'd rather have Radek working to try to rescue this big investment we made.

Of course, for SUSE RC<whatever> I guess if we can turnaround a package with this disabled we can do that - but I *Really* want to see this fixed for SLED.

Radek - since XRender seems to work fairly well (in general) for other apps - I bet OO.o is doing some real corner case broken thing for this - other apps like screen-shot apps also hate OO.o's Xwindows (or have done in the past). Can you do some xmon debugging to see if we can capture the state of the drawables, instrument VCL to the hilt to work out what it's doing / binary chop code out of VCL until it doesn't crash etc. until we get more data.
Comment 87 Matthias Hopf 2006-04-27 09:41:07 UTC
(In reply to comment #85)
> I have also tried to run my Xorg binary with valgrind, but it gives up pretty
> soon.
> 
> Any hint how to valdrind the x server?

Just one word: Don't!

Sorry to say that, but the Xserver has memory access bugs all over the place. Having them fixed with valgrind would be a great thing, but I guess we're much too underpowered to do so.
Comment 88 Petr Mladek 2006-04-27 09:49:15 UTC
Andreas, do we have time to disable this feature in OOo for SL 10.1?
Comment 89 Matthias Hopf 2006-04-27 09:51:19 UTC
(In reply to comment #86)
> > Stefan Dirsch 2006-04-26 15:45 MST
> > Since it's unlikely that Matthias can still find a fix in time. Could we at
> > least change the default for SUSE 10.1? Maybe we'll still find a fix for 
> > SLED10
> 
> You ask this of the man whose months of work halving OO.o startup time were
> wasted because we have no glibc maintainer ? And now - you want us to discard
> yet many more man-months of work because (it now turns out) XRender is broken.

So what? Read it again: our guess is that we simply *won't* be able to fix this for SL10.1, so the default should be to *not* enable XRender in OOo, because this would mean we would have broken slide shows on 60-80% of all notebooks (assuming NVidia has a larger percentage on desktop machines).
If we *do* find a fix tomorrow, we can still re-enable Render as default after that.

Additionally, I'll be on LinuxTag the next week, so I guess I won't be able to do be of any help next week.

> That would be a sad outcome. We disabled this feature in the last SUSE release
> because of pitiful performance of the XRender impl. it'd be a shame to disable
> it again now because of bugs in the impl. I'd rather have Radek working to try
> to rescue this big investment we made.

The investment isn't lost. It's just that it won't surface for this release. It's not the first time something doesn't hit the shelf in time due to bugs in the base system. Which I wasn't aware of until I got this bug assigned.

> Of course, for SUSE RC<whatever> I guess if we can turnaround a package with
> this disabled we can do that - but I *Really* want to see this fixed for SLED.

Nobody said we should close this as WONTFIX. *That* would be a bad idea.

> Radek - since XRender seems to work fairly well (in general) for other apps - I
> bet OO.o is doing some real corner case broken thing for this - other apps like

I don't think it's a corner case, I do think OOo just uses more features than other applications. I do think now we're seing some sort of memory corruption (the values of the drawable look really strange), so maybe this is the reason why this doesn't surface in the test case.
Comment 90 Petr Mladek 2006-04-27 10:00:44 UTC
Created attachment 80454 [details]
hwinfo log from hope.suse.cz

I have reproduced the crash on my workstation.

I have ASUSTeK GeForce 6200 TurboCache(TM) and use the default nv_drv.so from Xorg.

I am running SL10.1rc2-x86_64.
Comment 91 Andreas Jaeger 2006-04-27 10:13:09 UTC
Not sure whether we have time - let's disable it for now and I'll check.
Comment 92 Matthias Hopf 2006-04-27 10:17:48 UTC
(In reply to comment #84)
> The crash still looks related to the driver, it crashes my x300/ati/radeon
> xserver, while running remotely on nvidia 7800/nv xserver it doesn't crash and
> shows only pixmap with broken data.

This seems to be *extremely* dependend on the hardware, maybe on available video memory and its layout. I'm debugging on an ati card, and it still doesn't crash for me.

> (gdb) bt
> #0  0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb85ffc00, offset=0,
>     indexed=0x0) at fbcompose.c:579
> #1  0xb783d00b in fbFetchTransformed (pict=0x84eff58, x=<value optimized
> out>,
>     y=<value optimized out>, width=1399, buffer=0xbf9e1700) at
> fbcompose.c:3163
> #2  0xb783abf5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84eff58,
> pMask=0x0,
>     pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=3711,
>     width=1399, height=1050) at fbcompose.c:3494

Ok, this hardens my assumption that the source drawable is already malformed.
Comment 93 Radek Doulik 2006-04-27 11:50:10 UTC
> (In reply to comment #85)
> > I have also tried to run my Xorg binary with valgrind, but it gives up pretty
> > soon.
> > 
> > Any hint how to valdrind the x server?
> 
> Just one word: Don't!
> 
> Sorry to say that, but the Xserver has memory access bugs all over the place.
> Having them fixed with valgrind would be a great thing, but I guess we're much
> too underpowered to do so.

Well, I tried it after all (have to comment one of asserts in valgrind code so that server starts - thanks to dirk for encouraging me to play with valgrind source code).

OOo runs OK under valgrinded server - no crash, no corruption, and I have got sensible reports from it (at least I hope).

I am attaching them, could you please take a look?
Comment 94 Radek Doulik 2006-04-27 11:51:33 UTC
Created attachment 80492 [details]
part of log from valgrind - after pressing F5 - run slideshow - until the slide is correctly rendered
Comment 95 Michael Meeks 2006-04-27 12:18:07 UTC
Radek - the log is IMHO not that useful - we want to bin all items except writes to freed memory etc. the archetypes:

Syscall param writev(vector[...]) points to uninitialised byte(s)

is almost always harmless & a false positive, and:

Conditional jump or move depends on uninitialised value(s)

is not really that likely to be the problem.
Comment 96 Radek Doulik 2006-04-27 12:25:48 UTC
Well, in that case I think the writev might be?

OTOH, the pixmap is probably created even before running the slideshow, so I will look at the rest as well.
Comment 97 Matthias Hopf 2006-04-27 14:48:50 UTC
(In reply to comment #96)
> Well, in that case I think the writev might be?

> OTOH, the pixmap is probably created even before running the slideshow, so I
> will look at the rest as well.

Yes, that would be great!

(In reply to comment #93)
> Well, I tried it after all (have to comment one of asserts in valgrind code so
> that server starts - thanks to dirk for encouraging me to play with valgrind
> source code).

Could you point me to that assert as well? Having a working valgrind for Xorg would be nice to have.

> OOo runs OK under valgrinded server - no crash, no corruption, and I have got
> sensible reports from it (at least I hope).

Too bad we don't see the corruption any longer. However, it could me debugging if I can get the exactly same code path running with and without corruption.

Is it possible to debug a valgrinded program with gdb? Never tried that...
Comment 98 Radek Doulik 2006-04-27 16:37:43 UTC
> Could you point me to that assert as well? Having a working valgrind for Xorg
> would be nice to have.

syswrap-generic.c:1697

if I comment out that assert I am able to run Xorg under valgrind.

> Too bad we don't see the corruption any longer. However, it could me debugging
> if I can get the exactly same code path running with and without corruption.

Well, I can still see it without valgrind.

Not seeing it in valgrind might mean, that some uninitialized value conditional jump is now forwarded the right way. There are also some illegal memory reads/writes in the initialization, which might be vicious as well. Dunno, I will try to look at least at some.

> Is it possible to debug a valgrinded program with gdb? Never tried that...

I think valgrind doesn't work in gdb, but you can instruct valgrind to attach to the process when it finds something.

I am also attaching complete log this time.
Comment 99 Radek Doulik 2006-04-27 16:38:51 UTC
Created attachment 80595 [details]
valgrind log
Comment 100 Radek Doulik 2006-04-28 11:54:50 UTC
So it even crashes under valgrind :( No additional info, it reads invalid memory at the same place where it crashes normally.

I am now looking at this place:

==5521== Conditional jump or move depends on uninitialised value(s)
==5521==    at 0x4B52C8F: fbBltOne (fbbltone.c:352)
==5521==    by 0x4B669CB: fbOddStipple (fbstipple.c:265)
==5521==    by 0x4B66B14: fbStipple (fbstipple.c:313)
==5521==    by 0x4B5F9C8: fbFill (fbfill.c:119)
==5521==    by 0x4B5FE4D: fbPolyFillRect (fbfillrect.c:80)
==5521==    by 0x4B88F4D: ??? (xaaGC.c:521)
==5521==    by 0x8166710: (within /usr/X11R6/bin/Xorg-copy)
==5521==    by 0x80C43CD: ProcPolyFillRectangle (in /usr/X11R6/bin/Xorg-copy)
==5521==    by 0x80C79BF: Dispatch (in /usr/X11R6/bin/Xorg-copy)
==5521==    by 0x80D4684: main (in /usr/X11R6/bin/Xorg-copy)
==5521== 

I guess it might be related to the wrong pixmap content?
Comment 101 Petr Mladek 2006-04-28 12:00:07 UTC
Just for record. I have submitted the OOo package where the cairo stuff is disabled by default for SL10.1. It would be used if we produce SL10.1-rc4.
Comment 102 Radek Doulik 2006-04-28 12:24:52 UTC
This is the place of crash as seen by valgrind:

==25795== Invalid read of size 4
==25795==    at 0x4C37848: fbFetchPixel_x8r8g8b8 (fbcompose.c:580)
==25795==    by 0x4C41ACE: fbFetchTransformed (fbcompose.c:3159)
==25795==    by 0x4C43231: fbCompositeRect (fbcompose.c:3488)
==25795==    by 0x4C43BA2: fbCompositeGeneral (fbcompose.c:3608)
==25795==    by 0x4C56EA8: fbComposite (fbpict.c:1233)
==25795==    by 0x4CAF08C: XAAComposite (xaaPict.c:529)
==25795==    by 0x81C2B92: damageComposite (damage.c:539)
==25795==    by 0x81DF532: CompositePicture (picture.c:1672)
==25795==    by 0x81E181E: ProcRenderComposite (render.c:755)
==25795==    by 0x81E47B2: ProcRenderDispatch (render.c:1995)
==25795==    by 0x80D870B: Dispatch (dispatch.c:459)
==25795==    by 0x80F19DB: main (main.c:450)
==25795==  Address 0xCA92C40 is not stack'd, malloc'd or (recently) free'd
Comment 103 Radek Doulik 2006-04-28 19:28:38 UTC
Spent whole day with Xserver code, gdb, valgrind, cairo and the test program. Pretty tired ;-)

So far it seems to me that Xserver is unable to access memory with offscreen pixmaps, or at least fb code is. (please forgive me my ignorance, if I miss basic things - I started looking into Xserver code just lately)

I have a trivial fix for now, but I don't know how is it going to influence the server performance.

Index: xaaInit.c
===================================================================
RCS file: /cvs/xorg/xc/programs/Xserver/hw/xfree86/xaa/xaaInit.c,v
retrieving revision 1.8
diff -u -p -r1.8 xaaInit.c
--- xaaInit.c   13 Sep 2005 01:33:19 -0000      1.8
+++ xaaInit.c   28 Apr 2006 19:21:47 -0000
@@ -519,6 +519,8 @@ XAACreatePixmap(ScreenPtr pScreen, int w
         FBAreaPtr area;
         int gran = 0;

+           goto BAILOUT;
+
        switch(pScrn->bitsPerPixel) {
         case 24:
         case 8:  gran = 4;  break;

I will continue on Tuesday (Monday is national holiday) to see why is this memory inaccessible and if it points to the right place.
Comment 104 Matthias Hopf 2006-05-02 08:51:27 UTC
I'm on LinuxTag this week (and spent most of the last week finishing up my demo), so I won't be able to help.

The error you're experiencing is exactly what I meant in a comment above, that appearantly the offsets of the pixmap are broken. The base pointer of the pixmap seems to point to the base of the mapped graphics memory (not exactly sure, but the memory map suggests so), so we have to track the pixmap, how it is created and where it might be modified.

I wanted to do this next, but time wasn't sufficient for that :-/
Comment 105 Radek Doulik 2006-05-02 09:06:52 UTC
It looks like there is problem with the mapped memory, as I cannot access even devPrivate.ptr [0]
Comment 106 Radek Doulik 2006-05-02 11:18:33 UTC
After all it seems to be problem in fbFetchTransformed which is accessing memory outside the pixmap memory (happens only for offscreen pixmaps).

I have a fixed version here, but it still shows the pixmap with few pixels wrong offset. But it doesn't crash and show the right content.
Comment 107 Radek Doulik 2006-05-02 14:13:20 UTC
I am attaching 1st version of the patch. It works OK for me here.

I will need to do more testing though to be sure I didn't break anything on the way.
Comment 108 Radek Doulik 2006-05-02 14:14:47 UTC
Created attachment 81265 [details]
1st version of the fix
Comment 109 Radek Doulik 2006-05-02 16:27:39 UTC
Created attachment 81340 [details]
2nd version of the patch with dx,dy tmp variables to avoid double dereferencing all the time (suggested by Michael)
Comment 110 Radek Doulik 2006-05-03 12:44:23 UTC
Created bug in xorg bugzilla for upstreaming the fix.

https://bugs.freedesktop.org/show_bug.cgi?id=6827
Comment 111 Matthias Hopf 2006-05-09 14:14:59 UTC
Wow.

Radek, you're getting a golden Xorg debugger medal from me.

The patch looks reasonable at first glance, but I want to dig a bit further (there might be other functions that are broken WRT off-screen surfaces).

Sorry I couldn't help debugging while I was at LinuxTag or preparing the demo. I also didn't think that something as basic as fbFetchTransformed could contain a bug as big as this one.
Comment 112 Matthias Hopf 2006-05-10 12:37:10 UTC
Furth technical discussion on Xorg bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=6827

I'll post here the final outcome.
Comment 113 Matthias Hopf 2006-05-11 10:58:56 UTC
Created attachment 82989 [details]
Updated patch

Updated patch (see xorg bugzilla).
Comment 114 Matthias Hopf 2006-05-11 11:01:21 UTC
Stefan, please apply to STABLE (and 7.1 branch, if applicable). Please close bug when done.

Radek for issues we're seing with CVS Xorg I'll open another bug. This one is way too long.
Comment 115 Stefan Dirsch 2006-05-11 14:06:43 UTC
fixed.
Comment 116 Andreas Jaeger 2006-05-19 07:26:06 UTC
I'd like to see xorg-x11 release for SL10.1 as well, please submit a patchinfo: MaintenanceTracker-4359
Comment 122 Stefan Dirsch 2006-05-23 15:59:55 UTC
I openened a different bugreport for the udating issue (Bug #178025).
Comment 123 Radek Doulik 2006-05-30 09:23:51 UTC
*** Bug 179675 has been marked as a duplicate of this bug. ***
Comment 124 Radek Doulik 2006-05-30 09:51:45 UTC
Mathias, do you know which SLED release candidate has fix for this bug?
Comment 125 Radek Doulik 2006-05-30 09:52:40 UTC
Ops, I mean Matthias, sorry for the typo.
Comment 126 Stefan Dirsch 2006-05-30 10:01:19 UTC
At least SLED10 RC2 has.