|
Bugzilla – Full Text Bug Listing |
|
Description
Michael Meeks
2006-02-22 11:05:23 UTC
Created attachment 69731 [details]
corrupt image
Created attachment 69732 [details]
what it should look like
Created attachment 69734 [details]
hwinfo ...
interestingly hwinfo claims it's a 16bpp mode, but AFAIR I configured it for 24bit. I'll attach my xorg.conf too.
Created attachment 69735 [details]
x conf.
Created attachment 69742 [details]
presentation - press F9 to run it ...
the problematic presentation.
Most curiously - turning off 'clone' [ re-configuring with sax2 -r & just not selecting that ], yields an Xserver crash rendering the 1st slide of that presentation - I'll try to get a trace.
So - the crash is prolly related to the corruption - since I get both concurrently:
Xserver:
Program received signal SIGSEGV, Segmentation fault.
0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
573 {
(gdb) bt
#0 0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
#1 0xb77b53ec in fbFetchTransformed (pict=0x85f50b0, x=<value optimized out>, y=<value optimized out>, width=1399,
buffer=0xbffd84a0) at fbcompose.c:3159
#2 0xb77b30f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0,
yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
#3 0xb77c43cd in fbComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0, yMask=0,
xDst=0, yDst=0, width=1399, height=1050) at fbpict.c:1233
#4 0xb777aa91 in XAAComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=0, ySrc=0, xMask=0, yMask=0,
xDst=0, yDst=0, width=1399, height=1050) at xaaPict.c:538
#5 0x081688b6 in damageComposite (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=0, ySrc=0, xMask=0, yMask=0,
xDst=0, yDst=0, width=1399, height=1050) at damage.c:539
#6 0x0817cdfe in ProcRenderComposite (client=0x8552e98) at render.c:755
#7 0x080c7810 in Dispatch () at dispatch.c:459
#8 0x080d44d5 in main (argc=9, argv=0xbffdec04, envp=Cannot access memory at address 0x8
) at main.c:450
or:
(gdb) bt full
#0 0xb77af201 in fbFetchPixel_a8r8g8b8 (bits=0xb7c8da00, offset=1408, indexed=0x0) at fbcompose.c:573
No locals.
#1 0xb77b53ec in fbFetchTransformed (pict=0x85f50b0, x=<value optimized out>, y=<value optimized out>, width=1399,
buffer=0xbffd84a0) at fbcompose.c:3159
y1 = 3748
tl = <value optimized out>
br = <value optimized out>
x1_out = 0
y2_out = <value optimized out>
x1 = 1093
y2 = 3749
distx = 155
idistx = 101
b = (FbBits *) 0xb7c8da00
r = <value optimized out>
x2_out = 0
x2 = <value optimized out>
disty = 178
tr = <value optimized out>
bl = <value optimized out>
x_off = 1407
y1_out = <value optimized out>
stride = 1408
xoff = <value optimized out>
yoff = <value optimized out>
fetch = (fetchPixelProc) 0xb77af200 <fbFetchPixel_a8r8g8b8>
v = {vector = {71670655, 245674569, 65536}}
i = 1243
box = {x1 = 314, y1 = 3194, x2 = 1175, y2 = 3854}
indexed = (miIndexedPtr) 0x0
affine = 1
#2 0xb77b30f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x85f50b0, pMask=0x0, pDst=0x85f5658, xSrc=314, ySrc=3194, xMask=0,
yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
region = {extents = {x1 = 0, y1 = 0, x2 = 1399, y2 = 1050}, data = 0x0}
n = 0
pbox = <value optimized out>
srcRepeat = 0
maskRepeat = 0
w = <value optimized out>
h = dwarf2_read_address: Corrupted DWARF expression.
Trace from client (run with XSync):
(gdb) bt
#0 0xb7002627 in ___newselect_nocancel () from /lib/libc.so.6
#1 0xb728ff93 in _XWaitForReadable (dpy=0x813e660) at XlibInt.c:502
#2 0xb729032f in _XRead (dpy=0x813e660, data=0xbf834228 "`�023\b�a`\003\200\211M�", size=32) at XlibInt.c:1080
#3 0xb7290dd4 in _XReply (dpy=0x813e660, rep=0xbf834228, extra=0, discard=1) at XlibInt.c:1712
#4 0xb728b66a in XSync (dpy=0x813e660, discard=0) at Sync.c:48
#5 0xb728b7e5 in _XSyncFunction (dpy=0x813e660) at Synchro.c:37
#6 0xb444ac02 in XRenderComposite (dpy=0x813e660, op=3, src=56625130, mask=0, dst=56625089, src_x=0, src_y=0, mask_x=0,
mask_y=0, dst_x=0, dst_y=0, width=1399, height=1050) at Composite.c:66
#7 0xb44b28aa in cairo_xlib_surface_set_drawable () from /usr/lib/libcairo.so.2
#8 0xb449f845 in cairo_surface_status () from /usr/lib/libcairo.so.2
#9 0xb4497d24 in cairo_font_options_create () from /usr/lib/libcairo.so.2
#10 0xb4498017 in cairo_font_options_create () from /usr/lib/libcairo.so.2
#11 0xb44983fc in cairo_font_options_create () from /usr/lib/libcairo.so.2
#12 0xb4491412 in cairo_paint () from /usr/lib/libcairo.so.2
#13 0xafa3fd7c in cairocanvas::CanvasHelper::implDrawBitmapSurface () from ./cairocanvas.uno.so
#14 0xafa3fe3f in cairocanvas::CanvasHelper::drawBitmap () from ./cairocanvas.uno.so
#15 0xafa36eae in canvas::CanvasBase<canvas::BaseMutexHelper<cppu::WeakComponentImplHelper3<com::sun::star::rendering::XBitmapCanvas, com::sun::star::rendering::XIntegerBitmap, com::sun::star::lang::XServiceInfo> >, cairocanvas::CanvasHelper, osl::Guard<osl::Mutex>, cppu::OWeakObject>::drawBitmap () from ./cairocanvas.uno.so
#16 0xb179f2fb in cppcanvas::internal::(anonymous namespace)::BitmapAction::render () from ./libcppcanvas680li.so
#17 0xb179ee95 in cppcanvas::internal::CachedPrimitiveBase::render () from ./libcppcanvas680li.so
#18 0xb17a76fe in cppcanvas::internal::(anonymous namespace)::ActionRenderer::operator() () from ./libcppcanvas680li.so
#19 0xb17a7725 in _STL::for_each<cppcanvas::internal::ImplRenderer::MtfAction const*, cppcanvas::internal::(anonymous namespace)::ActionRenderer> () from ./libcppcanvas680li.so
of which perhaps:
#6 0xb444ac02 in XRenderComposite (dpy=0x813e660, op=3, src=56625130, mask=0,
.. **width=1399**, height=1050) at Composite.c:66
looks most interesting - the display is of course 1400x1050 - perhaps an under-tested corner case ? Radek - why are we passing funny sizes in here ?
Re-tested with 16bpp & 'clone' turned on & it works perfectly [ modulo color problems: apparently we loose some red; perhaps that follows from the above - but ... the color issue is unrelated; and now fixed in OO.o. Could you also attach /var/log/Xorg.0.log? Thanks. Sure - so I also took another stack-trace of X; this time it's seemingly way less of an obvious corner case:
(gdb) bt
#0 0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb840dc00, offset=486, indexed=0x0) at fbcompose.c:579
#1 0xb783b3ec in fbFetchTransformed (pict=0x84d6950, x=<value optimized out>, y=<value optimized out>,
width=1399, buffer=0xbfb87050) at fbcompose.c:3159
#2 0xb78390f5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=243,
ySrc=4121, xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbcompose.c:3488
#3 0xb784a3cd in fbComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=243, ySrc=4121,
xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at fbpict.c:1233
#4 0xb7800a91 in XAAComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=0, ySrc=0,
xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at xaaPict.c:538
#5 0x081688b6 in damageComposite (op=3 '\003', pSrc=0x84d6950, pMask=0x0, pDst=0x84bcb78, xSrc=0, ySrc=0,
xMask=0, yMask=0, xDst=0, yDst=0, width=1399, height=1050) at damage.c:539
#6 0x0817cdfe in ProcRenderComposite (client=0x85355a0) at render.c:755
#7 0x080c7810 in Dispatch () at dispatch.c:459
#8 0x080d44d5 in main (argc=9, argv=0xbfb8d7b4, envp=Cannot access memory at address 0x8
) at main.c:450
Created attachment 70149 [details]
log
hth.
(In reply to comment #3) > hwinfo ... > > interestingly hwinfo claims it's a 16bpp mode, but AFAIR I configured it for > 24bit. I'll attach my xorg.conf too. hwinfo has no idea about whether the Xserver runs in 16 or 24bpp mode. Forget about that. Will have to try to reproduce this first. This is a bad issue, especially the server crash. > (--) RADEON(0): Chipset: "ATI Radeon Mobility M7 LW (AGP)" (ChipID = 0x4c57) That chip, again. Sigh. Stefan (Behlert), do we have a free laptop with that chipset? Stefan (Dirsch), maybe we should disable RenderAccel for M7 completely... :-( *** Bug 144659 has been marked as a duplicate of this bug. *** Ah - I must meet this Criddel; does he run Xgl ? ;-) Matthias, are you investigating this? I can try to investigate this - if I'm able to reproduce. The crash is helpful to help pinpointing the problem. plain corruptions are a pain to debug without a simple test case. I know. I haven't really started investigating this, so if you *can* reproduce this easily... Similar bug (specific for ATI hardware as well) is https://bugzilla.novell.com/show_bug.cgi?id=159551 Matthias - if you need access to my machine where this is repeatable every time - I can happily set that up for you; NEEDINFO is not my preferred state for this bug ;-) Egbert, are you already working on that? Otherwise I'll take a look into this issue this week. We should not duplicate efforts. Nope. I was waiting for your reply. Someting seems to step over the memory. It's not unlikely that its the same thing that's causing #159551. Trying to reproduce now... I can reproduce this on a RV200 QW. Though the display corruption looks completely different (seems like an image is to be copied, and a broken pointer is used) and I do not get any crashes. Xorg from CVS doesn't even paint the background white. Option "RenderAccel" "off" doesn't help anything. fbdev works fine. Have to test nv, maybe this is a general XAA related bug. Same issue on nv. This seems to be a bug of the base XAA Compose implementation. We could try "EXA" to verify this assumption. ;-) An XGI card, which uses the sis driver is available. With current CVS I even get curruption with the nv driver with the NoAccel option, and the fbdev driver crashes even during startup. I will recompile xorg 7.0 for future tests. Testing the XGI card now. Can you if a resource with the same ID was added before? What it's type and pointer are? It should be easily doable by adding some logging to resource.c:AddResource(). Current state: - when configured for 16bit all drivers work. Even though most operations are appearantly done with 24bit visuals in offscreen buffers in OOo. - when configured for 24bit all drivers except nvidia binary only break. - when configured for 24bit even rendertest fails for blend/over. I have some patches done, but they don't fix the OOo issue yet. Created attachment 75710 [details]
Patch for being able to build xorg 7.0 with --enable-debug
Simple patch for being able to build the 7.0 branch (dunno about head right now) with --enable-debug.
Created attachment 75712 [details]
Patch for fixing missing initialization of pPicture->format
This fixes a missing initialization for pPicture->format, according to which a wrong Composite function was chosen. I'm not sure about the initialization value here, as pFormat is NULL and stays NULL for SolidPicture (and I guess so as well for GradientPicture).
With this patch rendertest runs through all tests without failures any more. OOo still has the broken images, though.
This issue is *not* MMX dependent. Same results with MMX disabled. This is difference in the Composite operation calling sequence of 16 vs 24 bit for displaying the first slide:
--- /tmp/Xorg.composite.24 2006-03-30 14:26:24.000000000 +0200
+++ /tmp/Xorg.composite.16 2006-03-30 14:27:36.000000000 +0200
@@ -1,34 +1,40 @@
X Window System Version 7.0.0
Release Date: 21 December 2005
X Protocol Version 11, Revision 0, Release 7.0
Build Operating System:Linux 2.6.13-15.8-smp i686
Current Operating System: Linux gkar 2.6.16-rc6-git1-4-default #1 Tue Mar 14 18
:04:33 UTC 2006 i686
Build Date: 29 March 2006
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
-(==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 30 14:25:31 2006
+(==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 30 14:26:50 2006
(==) Using config file: "/etc/X11/xorg.conf"
Could not init font path element /usr/X11R6/lib/X11/fonts/local, removing from
list!
Could not init font path element /usr/X11R6/lib/X11/fonts/CID, removing from li
st!
Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1022x767 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0
+Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742
Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0 Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0 (10/742) -> 10/742 +Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0 +Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 9x16 10/742 (0/0) -> 10/742 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0 Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0 (0/0) -> 0/0 +Comp 8 Op 1 src 20020888 mask 00000000 dst 10020565 1024x768 0/0 (0/0) -> 0/0 After these calls rendering has completed, so the remainder are for prerendering the next slide. It's clearly visible, that OpenOffice does something different for 16bit: It composes the image in 24bit first, and after that it is copied/converted to 16bit into the frame buffer. For 24bit visuals, it appearantly does everything in the framebuffer. Stefan, could you please provide an Xorg package with the patch from attachment #75712 [details]? It should at least fix the segfault.
Petr, is it possible to tell OpenOffice to render into offscreen 24bit visuals first even if the base visual is a 24bit one, so that the same render path is used? That way we can make sure whether OpenOffice is doing something weird, or wether this is due to another bug in the render layer.
Radek, you should know it better than me. Could you please answer the latest guestion? *** Bug 159551 has been marked as a duplicate of this bug. *** How do I find out whether visual is offscreen?
We don't use visuals directly. I create pixmap for the cairo surface and pass it to cairo together with render format. It should be possible to ask for right render format if I know the visual though.
Here the code, which creates the pixmaps in OOo:
Surface* Surface::getSimilar( Content aContent, int width, int height )
{
Pixmap hPixmap;
if( mpSysData && mpDisplay && mhDrawable ) {
XRenderPictFormat *pFormat;
int nFormat;
switch (aContent) {
case CAIRO_CONTENT_ALPHA:
nFormat = PictStandardA8;
break;
case CAIRO_CONTENT_COLOR:
nFormat = PictStandardRGB24;
break;
case CAIRO_CONTENT_COLOR_ALPHA:
default:
nFormat = PictStandardARGB32;
break;
}
pFormat = XRenderFindStandardFormat( (Display*) mpDisplay, nFormat );
hPixmap = XCreatePixmap( (Display*) mpDisplay, cairoHelperGetWindow( mpSysData ),
width > 0 ? width : 1, height > 0 ? height : 1,
pFormat->depth );
return new Surface( mpSysData, mpDisplay, (long) hPixmap, pFormat,
cairo_xlib_surface_create_with_xrender_format( (Display*) mpDisplay, hPixmap,
DefaultScreenOfDisplay( (Display *) mpDisplay ),
pFormat, width, height ) );
} else
return new Surface( mpSysData, mpDisplay, 0, NULL, cairo_surface_create_similar( mpSurface, aContent, width, height ) );
}
> Stefan, could you please provide an Xorg package with the patch from
> attachment #75712 [details] [edit]? It should at least fix the segfault.
done.
Thanks Stefan, Petr, Radek. Those who had the Xserver segfaulting, please test the new packages from Stefan as soon as they show up in stable. They won't fix the rendering issues, though. Matthias - thanks for the great progress here. Radek - can you adjust this in VCL/cairocanvas * send Matthias just those 2 libs for testing ? Well, I don't know how to tell if visual is offscreen or not. Matthias, don't you mean an offscreen buffer? In that case there should not be any difference as cairo canvas always render it to a pixmap before it copies/composites that pixmap to the window. *** Bug 164447 has been marked as a duplicate of this bug. *** Radek, Matthias, any updates on this one? I am waiting for Matthias to answer my question to see if I can prepare a modified canvas for him. Thanks Radek for the input, this is definitively helping. I have to get through my emails first (guess what stacks up in one week :-( ), and then scan the according glitz code first, in order to see what is really happening here. I'll publish what I find out, and ask you again as soon as I know what I need to be tested. Additionally: (In reply to comment #44) > Those who had the Xserver segfaulting, please test the new packages from Stefan > as soon as they show up in stable. They won't fix the rendering issues, though. Any news on that? I have tried xorg-x11-server-6.9.0-34 package and it doesn't crash anymore. Thanks for that fix. I see the rendering issues too. We don't use glitz cairo backend IIRC, just cairo/Xlib(RENDER). The difference from other cairo enabled apps is that we let cairo create similar surface on pixmap created in OOo as we need to have access to the pixmap for text rendering. Normally apps just ask cairo for similar surface and cairo creates the pixmap internally. Could we lower severity here since the crash has been fixed? Preferably not - no. It's not acceptable to render people's slides like this: https://bugzilla.novell.com/show_bug.cgi?id=152730 This is not a Glitz issue - but a core XRender issue: it doesn't work. Having done a ton of work for SLED10 to use XRender for nice anti-aliased slide-show rendering, it would be a tragedy to have to disable that for everyone because of a specific driver's buggy X server implementation. Sure that it works with any driver on 10.1 in 24bpp? IMHO it's broken with any accelerated driver on 10.1 in 24bpp. See comment #33 by Matthias. I'm afraid Xrender is somewhat broken in 24bpp since a long time, but nobody noticed it so far. :-( Actually, it seems to work with the intel driver. Getting closer, the intel driver uses a 24bit RGB visual, while the other drivers all seem to use 32bit XRGB visuals: wdiff of composite operators (-radeon +intel):
Comp 8 Op 3 src 20028888 mask 00000000 dst 20028888 9x16 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst [-20020888-] {+18020888+} 9x16 0/0 (10/742) -> 10/742
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1022x767 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 20028888 dst [-20020888-] {+18020888+} 9x16 0/0 (10/742) -> 10/742
Comp 8 Op 3 src [-20020888-] {+18020888+} mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
Comp 8 Op 3 src 20028888 mask 00000000 dst [-20020888-] {+18020888+} 1023x768 0/0 (0/0) -> 0/0
The same is true for vesa, so this is the reason why the vesa driver works as well.
This is not a critical bug any more, as no crashes are involved, and it can be worked around (though we don't want that). Changing severity will not influence my priority in fixing it.
*** Bug 167235 has been marked as a duplicate of this bug. *** The differences of the chosen render paths between any nonworking and working configuration I've tried so far are by far too large to be able to track down this issue inside openoffice, I've pretty much given up with that. I guess I have to bite into the bullet of creating a simpler test case that shows the same behavior. I finally noticed *one* major difference: All working versions do not seem to use hardware based offscreen surfaces, i.e. all pDrawable->x/y are 0, while for the broken version draws to surface positions starting at 0/3595, 0/4363, 0/6667, or similar. These odd start positions really sound strange to me. To debug this, we really need a simpler test case. Mattthias: is there anything I can do for you on OOo side? What are these pDrawable->x/y? Does it define an offset in the offscreen memory on the card? Just have been told this has not been open to public. Should be, changing product for that, as it is relevant for all SL 10.1 based products. pDrawable->x/y define the starting point of a Drawable inside a Screen. A screen can be imagined as the frame buffer memory of a graphics card in this case. Have to read more code so I know which render calls are invoked by glitz for the OOo case (as rendertest works fine for all cases it tests, and that includes some offscreen hardware surfaces - but as far as I can see no rendering *to* hardware surfaces). Radek, you could point me to the drawing routines in OOo for the Render case? We don't do much Render calls in OOo (there are some, but the problematic rendering comes from cairo canvas which uses the cairo) In cairo canvas I use only XRenderFindStandardFormat xrender call before I call cairo_xlib_surface_create_with_xrender_format. Thinking more about it, I will try to disable calling cairo_xlib_surface_create_with_xrender_format and use cairo_surface_create_similar to see if it makes a difference (using cairo_xlib_surface_create_with_xrender_format is different from most of the other apps using cairo I guess). Do you know if the problem lies for sure in the x server or might it be some problem in the client application as well (ie. in the cairo lib)? The cairo canvas code is in ooo-build/build/src-m*/canvas/source/cairo if you want to take a look. *** Bug 162166 has been marked as a duplicate of this bug. *** xorg-x11-server-6.9.0-34 crashed for me today. Unfortunatelly I am unable to reproduce it :-( So using cairo_surface_create_similar doesn't help, it still renders images wrong. I will try to write simple test using only cairo lib and no OOo code. One thing which came to my mind is that OOo is also probably creating windows with different visual than the default one. (In reply to comment #63) > We don't do much Render calls in OOo (there are some, but the problematic > rendering comes from cairo canvas which uses the cairo) In cairo canvas I use I'm sure that this is the case. It's just that appearantly only the specific combination used in OOo is triggering the bug. > Do you know if the problem lies for sure in the x server or might it be some > problem in the client application as well (ie. in the cairo lib)? It *could* be a bug of cairo, but my personal guess (from the type of output we're getting) I very much suspect this is an Xserver problem. > The cairo canvas code is in ooo-build/build/src-m*/canvas/source/cairo if you > want to take a look. Thanks. (In reply to comment #66) > I will try to write simple test using only cairo lib and no OOo code. One thing > which came to my mind is that OOo is also probably creating windows with > different visual than the default one. Wow. That would be *extremely* helpfull. BTW - current (CVS) Xorg server behaves even worse. Even on 16bit visuals render is partially broken (background color is black), and the vesa driver doesn't work at all, so this issue hasn't been fixed upstream in the last weeks. Tried to reproduce it with simple test program, but everything worked as supposed. So I went back to cairo canvas. It looks like cairo canvas gets even wrong data from underlying vcl layer (part of OOo). So the data drawed are right from cairo view. I will have to dive into vcl code to see if I can spot where the difference is - it is weird as on nvidia I still get the correct data from the vcl. When playing with it on ati card (radeon driver) it started working at one point, but after few slideshows the server crashed again. After restart it didn't work again - I suppose it might happened when X server run out of offscreen memory, so it used the working path you described. I have got about 3 or 4 crashes this afternoon :( I tracked the broken data (image content) in the vcl code to the XGetImage call in vcl salbmp.cxx:206. So we get the broken data from the server. I tried to use the pixmap (which is used for XGetImage call) directly, but the server now crashes for me reliably. At least we have a reproducible way for the server crash again. Here is the backtrace: Program received signal SIGSEGV, Segmentation fault. 0xb77df211 in fbListInstalledColormaps () from /usr/X11R6/lib/modules/libfb.so (gdb) bt #0 0xb77df211 in fbListInstalledColormaps () from /usr/X11R6/lib/modules/libfb.so #1 0xb77e4bf5 in fbCompositeGeneral () from /usr/X11R6/lib/modules/libfb.so #2 0xb77f44bd in fbComposite () from /usr/X11R6/lib/modules/libfb.so #3 0xb77aaa91 in XAAComposite () from /usr/X11R6/lib/modules/libxaa.so #4 0x08168aa6 in DamageDamageRegion () #5 0x0817cfee in PanoramiXRenderReset () #6 0x080c79c0 in Dispatch () #7 0x080d4685 in main () (gdb) I am attaching modified cairocanvas.uno.so and broken.odp document. Matthias, please let me know if you are able to reproduce the crash. (replace your cairocanvas.uno.so with the attached one and try to run slideshow of broken.odp) It crashes on my machine every time I run it. I hope it will help. Let me know if you need more info from me. Created attachment 79183 [details]
modified cairo canvas
Created attachment 79184 [details]
simple presentation with broken slide
Matthias: Any news on this bug? Were you able to reproduce the crash? I tried newer server meanwhile (xorg-x11-server-6.9.0-39) and it still crashes for me. Increasing the severity as I am able to reproduce the crash again. I cannot reproduce the crash, neither with xorg-x11-server-6.9.0-39, nor with a plain Xorg 7.0 with patches described here. I have copied the modified cairocanvas.uno.so to the OpenOffice program directory. The simpler presentation and the bug description helps, though. I'll try an OpenOffice update now. Tried the update, no change. Also tried xorg-x11-server-6.9.0-44 I would really love to be able to reproduce the crash. The only difference in the configuration is that I nuked the synaptics mouse (this is no laptop). Are there any differences on your machine to a standard installation? Radek, is bad as it sounds, we should stop using RENDER as default until this is fixed. This bug is not a blocker, as there is a workaround. The much smaller sample reduced the amount of render actions massively: =Comp 8 Op 3 src 20020888 mask 00000000 dst 20028888 9x16 0/0+0/0 (0/0) -> 0/0+0/0 fbCompose op 3 pMask 0 src 0:0 mask 0:0 dst 0:0 size 9x16 format 20020888 0 20028888 type 1 0 1 depth 24 0 32 drawable type 1 0 1 depth 24 0 32 =Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0+0/0 (10/742) -> 10/742+0/3409 fbCompose op 3 pMask 1 src 0:0 mask 10:742 dst 10:4151 size 9x16 format 20028888 20028888 20020888 type 1 1 1 depth 32 32 24 drawable type 1 1 1 depth 32 32 24 =Comp 8 Op 3 src 20020888 mask 00000000 dst 20020888 1023x768 0/0+614/1872 (0/0) -> 0/0+0/4177 fbCompose op 3 pMask 0 src 614:1872 mask 0:0 dst 0:4177 size 1023x768 format 20020888 0 20020888 type 1 0 1 depth 24 0 24 drawable type 1 0 1 depth 24 0 24 =Comp 8 Op 3 src 20028888 mask 20028888 dst 20020888 9x16 0/0+0/0 (10/742) -> 10/742+0/3409 I will now single step each one. I have beta6 or beta8 with updated xorg. I will try to upgrade my whole system to latest release to see if the crash remain. My system is laptop with x300 ati card, radeon driver. From the render ops I think the interesting one is the one with size 1023x768. The other 9x16 is hourglass icon which is renderer always and thus doesn't crash on other slides. I will also try to trace the data in the vcl to see if it creates the pixmap (which is read as image later) correctly. ok, first thing I do not completely understand: For the very first compose operator pSrc is connected to a drawable already sitting in the framebuffer. How does it get there? Do you have the source code of the test program (which worked according to comment #68) available? > For the very first compose operator pSrc is connected to a drawable already > sitting in the framebuffer. How does it get there? Not sure, I do not work directly with RENDER, but I use cairo which uses RENDER. I think the pSrc is a cairo surface (pixmap), which is produces by cairo from image surface. > Do you have the source code of the test program (which worked according to > comment #68) available? I am attaching it. I compile it with: gcc -Wall my-test.c -o my-test `pkg-config --cflags cairo` `pkg-config --libs cairo` -O0 -g It also needs water.png in cwd. Created attachment 80327 [details]
simple test program trying to mimic what OOo does
Created attachment 80328 [details]
image for test program
(comment #75) > Radek, is bad as it sounds, we should stop using RENDER as default until > this is fixed. This bug is not a blocker, as there is a workaround. Since it's unlikely that Matthias can still find a fix in time. Could we at least change the default for SUSE 10.1? Maybe we'll still find a fix for SLED10 ... > Since it's unlikely that Matthias can still find a fix in time. Could we at
> least change the default for SUSE 10.1? Maybe we'll still find a fix for SLED10
Let ask Michael about it.
I have updated my whole system to beta10 and it still crashes for me reliably. Updating to xorg-x11..-44 doesn't help either.
The crash still looks related to the driver, it crashes my x300/ati/radeon xserver, while running remotely on nvidia 7800/nv xserver it doesn't crash and shows only pixmap with broken data.
I have also bt's with symbols (interestingly it crashes in other place now):
(gdb) bt
#0 0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb85ffc00, offset=0,
indexed=0x0) at fbcompose.c:579
#1 0xb783d00b in fbFetchTransformed (pict=0x84eff58, x=<value optimized
out>,
y=<value optimized out>, width=1399, buffer=0xbf9e1700) at
fbcompose.c:3163
#2 0xb783abf5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84eff58,
pMask=0x0,
pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=3711,
width=1399, height=1050) at fbcompose.c:3494
#3 0xb784a4bd in fbComposite (op=3 '\003', pSrc=0x84eff58, pMask=0x0,
pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=0,
width=1399, height=1050) at fbpict.c:1233
#4 0xb7800a91 in XAAComposite (op=3 '\003', pSrc=0x84eff58, pMask=0x0,
pDst=0x847c380, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0,
width=1399, height=1050) at xaaPict.c:538
#5 0x08168aa6 in DamageDamageRegion ()
#6 0x0817cfee in PanoramiXRenderReset ()
#7 0x080c79c0 in Dispatch ()
#8 0x080d4685 in main ()
(gdb)
I have also tried to run my Xorg binary with valgrind, but it gives up pretty soon. Any hint how to valdrind the x server? > Stefan Dirsch 2006-04-26 15:45 MST
> Since it's unlikely that Matthias can still find a fix in time. Could we at
> least change the default for SUSE 10.1? Maybe we'll still find a fix for
> SLED10
You ask this of the man whose months of work halving OO.o startup time were wasted because we have no glibc maintainer ? And now - you want us to discard yet many more man-months of work because (it now turns out) XRender is broken.
That would be a sad outcome. We disabled this feature in the last SUSE release because of pitiful performance of the XRender impl. it'd be a shame to disable it again now because of bugs in the impl. I'd rather have Radek working to try to rescue this big investment we made.
Of course, for SUSE RC<whatever> I guess if we can turnaround a package with this disabled we can do that - but I *Really* want to see this fixed for SLED.
Radek - since XRender seems to work fairly well (in general) for other apps - I bet OO.o is doing some real corner case broken thing for this - other apps like screen-shot apps also hate OO.o's Xwindows (or have done in the past). Can you do some xmon debugging to see if we can capture the state of the drawables, instrument VCL to the hilt to work out what it's doing / binary chop code out of VCL until it doesn't crash etc. until we get more data.
(In reply to comment #85) > I have also tried to run my Xorg binary with valgrind, but it gives up pretty > soon. > > Any hint how to valdrind the x server? Just one word: Don't! Sorry to say that, but the Xserver has memory access bugs all over the place. Having them fixed with valgrind would be a great thing, but I guess we're much too underpowered to do so. Andreas, do we have time to disable this feature in OOo for SL 10.1? (In reply to comment #86) > > Stefan Dirsch 2006-04-26 15:45 MST > > Since it's unlikely that Matthias can still find a fix in time. Could we at > > least change the default for SUSE 10.1? Maybe we'll still find a fix for > > SLED10 > > You ask this of the man whose months of work halving OO.o startup time were > wasted because we have no glibc maintainer ? And now - you want us to discard > yet many more man-months of work because (it now turns out) XRender is broken. So what? Read it again: our guess is that we simply *won't* be able to fix this for SL10.1, so the default should be to *not* enable XRender in OOo, because this would mean we would have broken slide shows on 60-80% of all notebooks (assuming NVidia has a larger percentage on desktop machines). If we *do* find a fix tomorrow, we can still re-enable Render as default after that. Additionally, I'll be on LinuxTag the next week, so I guess I won't be able to do be of any help next week. > That would be a sad outcome. We disabled this feature in the last SUSE release > because of pitiful performance of the XRender impl. it'd be a shame to disable > it again now because of bugs in the impl. I'd rather have Radek working to try > to rescue this big investment we made. The investment isn't lost. It's just that it won't surface for this release. It's not the first time something doesn't hit the shelf in time due to bugs in the base system. Which I wasn't aware of until I got this bug assigned. > Of course, for SUSE RC<whatever> I guess if we can turnaround a package with > this disabled we can do that - but I *Really* want to see this fixed for SLED. Nobody said we should close this as WONTFIX. *That* would be a bad idea. > Radek - since XRender seems to work fairly well (in general) for other apps - I > bet OO.o is doing some real corner case broken thing for this - other apps like I don't think it's a corner case, I do think OOo just uses more features than other applications. I do think now we're seing some sort of memory corruption (the values of the drawable look really strange), so maybe this is the reason why this doesn't surface in the test case. Created attachment 80454 [details]
hwinfo log from hope.suse.cz
I have reproduced the crash on my workstation.
I have ASUSTeK GeForce 6200 TurboCache(TM) and use the default nv_drv.so from Xorg.
I am running SL10.1rc2-x86_64.
Not sure whether we have time - let's disable it for now and I'll check. (In reply to comment #84) > The crash still looks related to the driver, it crashes my x300/ati/radeon > xserver, while running remotely on nvidia 7800/nv xserver it doesn't crash and > shows only pixmap with broken data. This seems to be *extremely* dependend on the hardware, maybe on available video memory and its layout. I'm debugging on an ati card, and it still doesn't crash for me. > (gdb) bt > #0 0xb7835211 in fbFetchPixel_x8r8g8b8 (bits=0xb85ffc00, offset=0, > indexed=0x0) at fbcompose.c:579 > #1 0xb783d00b in fbFetchTransformed (pict=0x84eff58, x=<value optimized > out>, > y=<value optimized out>, width=1399, buffer=0xbf9e1700) at > fbcompose.c:3163 > #2 0xb783abf5 in fbCompositeGeneral (op=3 '\003', pSrc=0x84eff58, > pMask=0x0, > pDst=0x847c380, xSrc=0, ySrc=7281, xMask=0, yMask=0, xDst=0, yDst=3711, > width=1399, height=1050) at fbcompose.c:3494 Ok, this hardens my assumption that the source drawable is already malformed. > (In reply to comment #85)
> > I have also tried to run my Xorg binary with valgrind, but it gives up pretty
> > soon.
> >
> > Any hint how to valdrind the x server?
>
> Just one word: Don't!
>
> Sorry to say that, but the Xserver has memory access bugs all over the place.
> Having them fixed with valgrind would be a great thing, but I guess we're much
> too underpowered to do so.
Well, I tried it after all (have to comment one of asserts in valgrind code so that server starts - thanks to dirk for encouraging me to play with valgrind source code).
OOo runs OK under valgrinded server - no crash, no corruption, and I have got sensible reports from it (at least I hope).
I am attaching them, could you please take a look?
Created attachment 80492 [details]
part of log from valgrind - after pressing F5 - run slideshow - until the slide is correctly rendered
Radek - the log is IMHO not that useful - we want to bin all items except writes to freed memory etc. the archetypes: Syscall param writev(vector[...]) points to uninitialised byte(s) is almost always harmless & a false positive, and: Conditional jump or move depends on uninitialised value(s) is not really that likely to be the problem. Well, in that case I think the writev might be? OTOH, the pixmap is probably created even before running the slideshow, so I will look at the rest as well. (In reply to comment #96) > Well, in that case I think the writev might be? > OTOH, the pixmap is probably created even before running the slideshow, so I > will look at the rest as well. Yes, that would be great! (In reply to comment #93) > Well, I tried it after all (have to comment one of asserts in valgrind code so > that server starts - thanks to dirk for encouraging me to play with valgrind > source code). Could you point me to that assert as well? Having a working valgrind for Xorg would be nice to have. > OOo runs OK under valgrinded server - no crash, no corruption, and I have got > sensible reports from it (at least I hope). Too bad we don't see the corruption any longer. However, it could me debugging if I can get the exactly same code path running with and without corruption. Is it possible to debug a valgrinded program with gdb? Never tried that... > Could you point me to that assert as well? Having a working valgrind for Xorg > would be nice to have. syswrap-generic.c:1697 if I comment out that assert I am able to run Xorg under valgrind. > Too bad we don't see the corruption any longer. However, it could me debugging > if I can get the exactly same code path running with and without corruption. Well, I can still see it without valgrind. Not seeing it in valgrind might mean, that some uninitialized value conditional jump is now forwarded the right way. There are also some illegal memory reads/writes in the initialization, which might be vicious as well. Dunno, I will try to look at least at some. > Is it possible to debug a valgrinded program with gdb? Never tried that... I think valgrind doesn't work in gdb, but you can instruct valgrind to attach to the process when it finds something. I am also attaching complete log this time. Created attachment 80595 [details]
valgrind log
So it even crashes under valgrind :( No additional info, it reads invalid memory at the same place where it crashes normally. I am now looking at this place: ==5521== Conditional jump or move depends on uninitialised value(s) ==5521== at 0x4B52C8F: fbBltOne (fbbltone.c:352) ==5521== by 0x4B669CB: fbOddStipple (fbstipple.c:265) ==5521== by 0x4B66B14: fbStipple (fbstipple.c:313) ==5521== by 0x4B5F9C8: fbFill (fbfill.c:119) ==5521== by 0x4B5FE4D: fbPolyFillRect (fbfillrect.c:80) ==5521== by 0x4B88F4D: ??? (xaaGC.c:521) ==5521== by 0x8166710: (within /usr/X11R6/bin/Xorg-copy) ==5521== by 0x80C43CD: ProcPolyFillRectangle (in /usr/X11R6/bin/Xorg-copy) ==5521== by 0x80C79BF: Dispatch (in /usr/X11R6/bin/Xorg-copy) ==5521== by 0x80D4684: main (in /usr/X11R6/bin/Xorg-copy) ==5521== I guess it might be related to the wrong pixmap content? Just for record. I have submitted the OOo package where the cairo stuff is disabled by default for SL10.1. It would be used if we produce SL10.1-rc4. This is the place of crash as seen by valgrind: ==25795== Invalid read of size 4 ==25795== at 0x4C37848: fbFetchPixel_x8r8g8b8 (fbcompose.c:580) ==25795== by 0x4C41ACE: fbFetchTransformed (fbcompose.c:3159) ==25795== by 0x4C43231: fbCompositeRect (fbcompose.c:3488) ==25795== by 0x4C43BA2: fbCompositeGeneral (fbcompose.c:3608) ==25795== by 0x4C56EA8: fbComposite (fbpict.c:1233) ==25795== by 0x4CAF08C: XAAComposite (xaaPict.c:529) ==25795== by 0x81C2B92: damageComposite (damage.c:539) ==25795== by 0x81DF532: CompositePicture (picture.c:1672) ==25795== by 0x81E181E: ProcRenderComposite (render.c:755) ==25795== by 0x81E47B2: ProcRenderDispatch (render.c:1995) ==25795== by 0x80D870B: Dispatch (dispatch.c:459) ==25795== by 0x80F19DB: main (main.c:450) ==25795== Address 0xCA92C40 is not stack'd, malloc'd or (recently) free'd Spent whole day with Xserver code, gdb, valgrind, cairo and the test program. Pretty tired ;-)
So far it seems to me that Xserver is unable to access memory with offscreen pixmaps, or at least fb code is. (please forgive me my ignorance, if I miss basic things - I started looking into Xserver code just lately)
I have a trivial fix for now, but I don't know how is it going to influence the server performance.
Index: xaaInit.c
===================================================================
RCS file: /cvs/xorg/xc/programs/Xserver/hw/xfree86/xaa/xaaInit.c,v
retrieving revision 1.8
diff -u -p -r1.8 xaaInit.c
--- xaaInit.c 13 Sep 2005 01:33:19 -0000 1.8
+++ xaaInit.c 28 Apr 2006 19:21:47 -0000
@@ -519,6 +519,8 @@ XAACreatePixmap(ScreenPtr pScreen, int w
FBAreaPtr area;
int gran = 0;
+ goto BAILOUT;
+
switch(pScrn->bitsPerPixel) {
case 24:
case 8: gran = 4; break;
I will continue on Tuesday (Monday is national holiday) to see why is this memory inaccessible and if it points to the right place.
I'm on LinuxTag this week (and spent most of the last week finishing up my demo), so I won't be able to help. The error you're experiencing is exactly what I meant in a comment above, that appearantly the offsets of the pixmap are broken. The base pointer of the pixmap seems to point to the base of the mapped graphics memory (not exactly sure, but the memory map suggests so), so we have to track the pixmap, how it is created and where it might be modified. I wanted to do this next, but time wasn't sufficient for that :-/ It looks like there is problem with the mapped memory, as I cannot access even devPrivate.ptr [0] After all it seems to be problem in fbFetchTransformed which is accessing memory outside the pixmap memory (happens only for offscreen pixmaps). I have a fixed version here, but it still shows the pixmap with few pixels wrong offset. But it doesn't crash and show the right content. I am attaching 1st version of the patch. It works OK for me here. I will need to do more testing though to be sure I didn't break anything on the way. Created attachment 81265 [details]
1st version of the fix
Created attachment 81340 [details]
2nd version of the patch with dx,dy tmp variables to avoid double dereferencing all the time (suggested by Michael)
Created bug in xorg bugzilla for upstreaming the fix. https://bugs.freedesktop.org/show_bug.cgi?id=6827 Wow. Radek, you're getting a golden Xorg debugger medal from me. The patch looks reasonable at first glance, but I want to dig a bit further (there might be other functions that are broken WRT off-screen surfaces). Sorry I couldn't help debugging while I was at LinuxTag or preparing the demo. I also didn't think that something as basic as fbFetchTransformed could contain a bug as big as this one. Furth technical discussion on Xorg bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=6827 I'll post here the final outcome. Created attachment 82989 [details]
Updated patch
Updated patch (see xorg bugzilla).
Stefan, please apply to STABLE (and 7.1 branch, if applicable). Please close bug when done. Radek for issues we're seing with CVS Xorg I'll open another bug. This one is way too long. fixed. I'd like to see xorg-x11 release for SL10.1 as well, please submit a patchinfo: MaintenanceTracker-4359 I openened a different bugreport for the udating issue (Bug #178025). *** Bug 179675 has been marked as a duplicate of this bug. *** Mathias, do you know which SLED release candidate has fix for this bug? Ops, I mean Matthias, sorry for the typo. At least SLED10 RC2 has. |