Bugzilla – Bug 117163
Minimizing a window takes 1-2 seconds
Last modified: 2006-02-03 21:27:04 UTC
When minimizing the only window covering the desktop, the window contents disappears, leaving a black frame where the windows outer edges were. This black frame stays there for between 1 second or two. If the window was maximized to start with, you get a blank desktop (no icons visible on the desktop) for that amount a time. If there is another maximized window underneath the window you are minimizing, you can actually see that it is an animated minimization. It's very annoying.
Upstream bugs: http://bugzilla.gnome.org/show_bug.cgi?id=314616 https://bugs.freedesktop.org/show_bug.cgi?id=4320 Mandrake patch: http://cvs.mandriva.com/cgi-bin/cvsweb.cgi/SPECS/cairo/cairo-1.0.0-brokenxrender.patch?rev=1.1&content-type=text/x-cvsweb-markup Magnus, what video card do you have? Which X server are you using? Do you know if acceleration is turned on? [I can see the bug as well; I need to figure out the same things about my video card :) ]
I have an ATI Radeon 9200 SE. The bug is known to happen with it. I'll investigate.
My video card is ATI Mobility Fire GL T2 (In an IBM Thinkpad T41P). I have not installed ATIs drivers. X server is xorg-x11-server-6.8.2-96 and I don't think I have 3d acceleration enabled (attaching output from glxinfo. hopefully that'll help)
Created attachment 50106 [details] Output from glxinfo
I checked the upstream bugs... Just wanted you to know that even if I disable the background, I get the same issue. Difference is that it takes less than a second before the icons show up. Still annoying :)
*** Bug 117489 has been marked as a duplicate of this bug. ***
*** Bug 115565 has been marked as a duplicate of this bug. ***
I'm on the latest code and it seems to have been fixed.
Magnus - you're using a large pixmap desktop background right ? but perhaps you're testing with the code-10 stuff (?).
I always had the issue without a backgruond picture. But yes, I'm using the code10 stuff and it seems to be fixed in there.
I'm still getting this issue in code 10.
*** Bug 141249 has been marked as a duplicate of this bug. ***
I'd love some help from an X person to fix this, or at least to determine what the culprit is inside the X server. Without a modular server, a profiler can't see the X server's functions.
I imagine it's simply slowness rendering the background - with a block-color background I guess it's fine. Some things to check: we are rendering the pixmap at integer offsets ;-) Also - if we have a large pixmap - can we not high-quality re-render it to the screen size & then just blit it across ? - there's little point in keeping more of the image around than we need surely ?
Could someone on the X team give us a hand here? The performance is fine on Xgl so we suspect its an xorg is. Possible culprit is XRenderComposite. From David R: Nautilus doesn't seam to be doing anything stupid. It's using the render extension and compositing the minimum area that have changed using the OVER operator. This is all done to temporary pixmaps and Xgl is not accelerating that by default, which means that the software code is used when running Xgl. However, it's hitting MMX optimized software paths all the time so performance is fine. Xorg 6.9 and 7.0 should have these same MMX optimizations. So with latest version of Xorg and correctly built packages (MMX optimizations turned on) it should be fine. The stack trace at: http://bugzilla.gnome.org/show_bug.cgi?id=314616#c8 May be the culprit.
How can I know if MMX optimizations are turned on? I have NLD10, xorg-x11-6.9.0-3.
MMX optimizations are enabled in our X.Org build. I discussed this before with David R.
Created attachment 63671 [details] big-fill.c This is a test case to show the slowness; it's the actual xlib/cairo calls that Nautilus makes when painting the background (indirectly through GDK). Compile with this: gcc -o big-fill big-fill.c `pkg-config --cflags --libs cairo` You can see that cairo_fill() takes more than 1 second.
I have a Thinkpad T41p with an ATI Radeon. If I add this to the "Device" section of the /etc/X11/xorg.conf that yast spit out, my test case becomes fast (about 0.01 seconds): Option "AccelMethod" "EXA" Two questions: - Why is that disabled by default? - Does it work fine with a normal session? - Why is the non-EXA version so slow? Crappy compositing routines in the server?
(12:42:05) cworth: federico: It's probably just not noticing a case where it could be doing the equivalent of what XCopyArea would result it. (Though I am just guessing---I haven't looked.) http://bugzilla.gnome.org/show_bug.cgi?id=314616#c9 shows the stack trace in the X server. I sent the test case upstream: https://bugs.freedesktop.org/show_bug.cgi?id=4320
Do we have someone who knows the RENDER implementation code? Both the server and the client will need a change to detect when they can simply use XCopyArea() for non-alpha source and destination.
*** Bug 141944 has been marked as a duplicate of this bug. ***
X team? This includes a sample test case now.
We'll investigate. BTW, gcc -Wall -O2 -I/usr/include/cairo/ -o big-fill big-fill.c -L/usr/X11R6/lib \ -lX11 -lcairo Using nvidia driver: # ./big-fill cairo_fill() time: 0.026906 sec So probably as already assumed a (radeon) driver issue.
The call trace indicates that fbCopyAreammx() is called. The area that's used is the full screen so it doesn't seem as if the server is pieceing things together. As a first test I would add some timestamp code to see how long it takes ProcRenderComposite to complete in ProcRenderDispatch.
I'm no intel assembler export, but the assembler output of fbCopyAreammx() doesn't look like mmx assembler at all. Given the source code I cannot blame the compiler not optimizing this. It just looks like CopyArea has never been implemented in MMX (as e.g. fbCompositeSrcAdd* has). The problem is twofold: a) why is the fb function called at all (I remember the radeon driver accelerates most of Render, so it should at least CopyArea)? b) why has this function never been optimized? I will have to investigate this some more.
We are copying from a pixmap to another pixmap, obviously without alpha. From the discussion I had with Keith Packard, the problem is just that the relevant code path in the server-side RENDER code does not detect this. In that case, it can simply use XCopyArea() instead of "hand-written" compositing code. #0 0xb7ae6daa in fbCopyAreammx (pSrc=0x8471578, pDst=0x84af108, src_x=0, src_y=774, dst_x=0, dst_y=1542, width=1024, height=768) at fbmmx.c:2241 #1 0xb7ae6eb0 in fbCompositeCopyAreammx (op=3 '\003', pSrc=0xffe5c793, pMask=0x0, pDst=0xffe5c793, xSrc=0, ySrc=774, xMask=0, yMask=0, xDst=0, yDst=-8640, width=51091, height=51091) at fbmmx.c:2283 #2 0xb7ada38d in fbComposite (op=3 '\003', pSrc=0x8477610, pMask=0x0, pDst=0x84af068, xSrc=0, ySrc=774, xMask=0, yMask=0, xDst=0, yDst=1542, width=1024, height=768) at fbpict.c:1297 #3 0xb7a91ede in XAAComposite (op=3 '\003', pSrc=0x8477610, pMask=0x0, pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1024, height=768) at xaaPict.c:529 #4 0x0814b541 in damageComposite (op=3 '\003', pSrc=0xffe5c793, pMask=0xffe5c793, pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1024, height=768) at damage.c:539 #5 0x0813e90b in CompositePicture (op=3 '\003', pSrc=0x8477610, pMask=0x0, pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1024, height=768) at picture.c:1667 #6 0x081400d8 in ProcRenderComposite (client=0x824f9b0) at render.c:755 #7 0x0814292f in ProcRenderDispatch (client=0x824f9b0) at render.c:1995 #8 0x080861de in Dispatch () at dispatch.c:459 #9 0x0806d9b5 in main (argc=10, argv=0xbfcc9ac4, envp=0xffe5c793) at main.c:450 Somewhere in that stack trace (frame 5, perhaps?) you need to see that both the source and destination are pixmaps without alpha, and that you can simply use the normal implementation of XCopyArea().
Right, still the fbCopyArea should be optimized. Hm, seems that I cannot reproduce this issue here at all. I get 0.03-0.042 sec with the radeon driver. On a very low end machine, with a Radeon 9700 and a Radeon 7500. Will have to test on JP's laptop first.
Matthias, is that the free driver (EXA or non-EXA?) or the proprietary one? With the free driver and without EXA, you get the slow copy. If you turn on EXA, the copy is fast - but the desktop repaint is still slow in some cases. However, some things get mis-painted all over the place, including the mouse cursor at login. Maybe this is why we disable EXA by default for Radeon.
Mena, I think it's time to attach your X.Org config and logfile. I think we need more deteailed information about your system to be able to reproduce this problem here. :-)
Created attachment 65931 [details] xorf.conf I'm on a Thinkpad T41p.
Logfile?
What logfile do you need and how can I obtain it?
/var/log/Xorg.0.log. :-)
Created attachment 65960 [details] Xorg.0.log.gz
Those log files grow incredibly big, by the way: -rw-r--r-- 1 root root 144884207 2006-01-31 17:24 Xorg.0.log.old ... that's the log file previous to the one I attached, and represents a day's worth of info. When I logged in, my current logfile was at about 70 KB. It is now 5 minutes later, and it is at 320 KB. The end of the file gets a constant stream of these: (**) RADEON(0): WaitForIdle (entering): 62 entries, stat=0x8802613e (**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140 (**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140 (**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x88020140 (**) RADEON(0): WaitForIdle (entering): 59 entries, stat=0x8802613b (**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140 (**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140 any idea what that is?
This is debug info from the patch by Benjamin Herrenschmidt. :-) This is mainly what I wanted to know. (--) RADEON(0): Chipset: "ATI FireGL Mobility T2 (M10) NT (AGP)" (ChipID = 0x4e54) This is also interesting: (II) RADEON(0): Render acceleration unsupported on Radeon 9500/9700 and newer. (II) RADEON(0): Render acceleration disabled
The annoying debug messages that benh has added to the radeon driver and which should never be in a production driver don't help performance. Therefore my first reaction would be to remove them. Then one can go back and look for other performance sinks. It is possible that we are seeing problems in the WaitForIdle call. If we perform the CopyArea in software we must wait for the engine to become idle so that we don't write things in the wrong order. Since I've got an M10 here I can test this later today.
Egbert, the debug messages have been removed Beta2. Probably Frederico is still using Beta1. (BTW, why is this still marked as a 10.0 RC1 bug?. Could this be updated accordingly, please?) Unfortunately by updating the patch by Benjamin I've enabled DEBUG Info for Beta3 again. :-( But we've seen several bugreports about freezing radeon driver, which has been caused by disabling debug info between Beta1 and Beta2, anyway. So we also have a timing problem here. :-(
(In reply to comment #29) > Matthias, is that the free driver (EXA or non-EXA?) or the proprietary one? Radeon (free), no EXA. > With the free driver and without EXA, you get the slow copy. Not on my setup. > If you turn on EXA, the copy is fast - but the desktop repaint is still slow I won't try EXA. This is experimental stuff, you could get about any performance figure you want with it, depending on date, time, and moon phase. If it runs correctly at that time anyway. (In reply to comment #37) > This is also interesting: > (II) RADEON(0): Render acceleration unsupported on Radeon 9500/9700 and newer. > (II) RADEON(0): Render acceleration disabled I couldn't reproduce the bug with both, Radeon 7500 and Radeon 9700. With the first Render acceleration is enabled, with the second disabled. Both take approx. 0.04sec for the test program. Guess I have to single step into the driver to see which function is actually called (too fast to be interrupted).
I have xorg-x11-6.9.0-3 from preview4. What server do you guys have?
NLD10 preview4, that is.
We're talking about Beta2 ...
What server is in beta2? I haven't gotten the DVD yet.
6.9.0 with new patches for the radeon driver.
Is that 6.9.0-8? I just need the version+revision number.
Ok, I single stepped into my machine (which doesn't have the slowdown). This is the backtrace: #0 fbCopyAreammx (pSrc=0xaf1d8008, pDst=0xaec64008, src_x=0, src_y=0, dst_x=0, dst_y=0, width=1400, height=1021) at fbmmx.c:2163 #1 0xb77f8916 in fbCompositeCopyAreammx (op=3 '\003', pSrc=0x831c598, pMask=0x0, pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1400, height=1021) at fbmmx.c:2265 #2 0xb77f577b in fbComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1400, height=1021) at fbpict.c:1297 #3 0xb77aba0b in XAAComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1400, height=1021) at xaaPict.c:529 #4 0x0816f4b6 in cwComposite (op=3 '\003', pSrcPicture=0x831c598, pMskPicture=0x0, pDstPicture=0x831c738, xSrc=0, ySrc=0, xMsk=0, yMsk=0, xDst=0, yDst=0, width=1400, height=1021) at cw_render.c:273 #5 0x08168796 in damageComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1400, height=1021) at damage.c:539 #6 0x0817ccde in ProcRenderComposite (client=0x831ca60) at render.c:755 #7 0x080c76f0 in Dispatch () at dispatch.c:459 #8 0x080d43b5 in main (argc=1, argv=0xbfff3954, envp=Cannot access memory at address 0x8 ) at main.c:450 Major differences are cwComposite() call and sizes. Other than that it seems to step into the same functions. Also your fbComposite call gets different y coordinates than the previous calls, probably related to your dualhead setup. Maybe the slowdown is related to the debug messages, or not related to the copy function at all. Can you please retest with Beta 3 (three!), which has just been released? And with just running an X server with no other applications (calling the test program remotely).
Tested on my machine: xorg-x11-server-6.9.0-9 xorg-x11-driver-video-6.9.0-10
I now have the issues reproduced. It does not occure with my setup, but with your configuration file. I'll check this tomorrow. Presumably this has something to do with a dual monitor setup or with panning.
This is UNBELIEVABLE. Changing the resolution from 1400x1050 to 1280x1024 does the trick! 1400x1050 is slow, 1280x1024 fast.
I don't really have a dual-head setup. I'm on a Thinkpad T41p. I *think* I may have enabled "clone the display to the VGA port". Would that affect things?
I should have written, that dual head setup does not influence this. It is only the effective resolution.
Are any other resolutions affected?
Important differences in logfiles (1280x1024 vs. 1400x1050): -(--) RADEON(0): Virtual size is 1400x1050 (pitch 1408) +(--) RADEON(0): Virtual size is 1280x1024 (pitch 1280) -1400x1050 155.80 1400 1464 1784 1912 1050 1052 1064 1090 (24,32) +H +V -1400x1050 155.80 1400 1464 1784 1912 1050 1052 1064 1090 (24,32) +H +V -(**) RADEON(0): Pitch = 11534512 bytes (virtualX = 1400, displayWidth = 1408) -(**) RADEON(0): dc=15580, of=31160, fd=138, pd=2 +1280x1024 190.96 1280 1376 1520 1760 1024 1025 1028 1085 (24,32) +1280x1024 190.96 1280 1376 1520 1760 1024 1025 1028 1085 (24,32) +(**) RADEON(0): Pitch = 10485920 bytes (virtualX = 1280, displayWidth = 1280) +(**) RADEON(0): dc=19096, of=38192, fd=170, pd=2 -(**) RADEON(0): Wrote: 0x0000000c 0x0001008a 0x00000000 (0x0000a400) -(**) RADEON(0): Wrote: rd=12, fd=138, pd=1 -(**) RADEON(0): GRPH_BUFFER_CNTL from 20204c4c to 20227c7c +(**) RADEON(0): Wrote: 0x0000000c 0x000100aa 0x00000000 (0x0000a400) +(**) RADEON(0): Wrote: rd=12, fd=170, pd=1 +(**) RADEON(0): GRPH_BUFFER_CNTL from 20204c4c to 202a7c7c -(II) RADEON(0): Memory manager initialized to (0,0) (1408,8191) -(II) RADEON(0): Reserved area from (0,1050) to (1408,1052) -(II) RADEON(0): Largest offscreen area available: 1408 x 7139 +(II) RADEON(0): Memory manager initialized to (0,0) (1280,8191) +(II) RADEON(0): Reserved area from (0,1024) to (1280,1026) +(II) RADEON(0): Largest offscreen area available: 1280 x 7165 -(**) RADEON(0): Pitch for acceleration = 176 +(**) RADEON(0): Pitch for acceleration = 160 -(II) RADEON(0): Using hardware cursor (scanline 1052) -(II) RADEON(0): Largest offscreen area available: 1408 x 7136 +(II) RADEON(0): Using hardware cursor (scanline 1026) +(II) RADEON(0): Largest offscreen area available: 1280 x 7161 (**) RADEON(0): RADEONDoAdjustFrame(0,0,0) (**) RADEON(0): -> reg : 0x0224 = 0x00000000 -(**) RADEON(0): regcntl : 0x0350 = 0xbfbc24a4 +(**) RADEON(0): regcntl : 0x0350 = 0xbfae1bc4 No obvious memory setup differences, so we have to dig deeper. I also just confirmed that the whole time is spent in fbCopyAreammx - in both cases.
(In reply to comment #53) > Are any other resolutions affected? 1600x1200 is as well, so I assume that anything higher than 1280x1024 is.
Ok, I have an explaination, but no solution so far: For 1280x1024 framebuffers the off-screen pixmap seems to be created in regular memory (src+dest bytes): fbCopyAreammx: src 0xaf4d3008 0/0 dest 0xaef5f008 0/0 size 1400/1021 fbCopyAreammx: src bytes 0xaf4d3060 stride 15e0 dest bytes 0xaef5f060 stride 15e0 byte_width 15e0 fbCopyAreammx: 0.037907s aef5f000-afa6f000 rw-p aef5f000 00:00 0 afa6f000-b7a6f000 rw-s d8000000 03:02 26724 /dev/mem while for 1400x1050 the pixmap is created in card memory: fbCopyAreammx: src 0x820fcc0 0/1055 dest 0x820fd20 0/2105 size 1400/1021 fbCopyAreammx: src bytes 0xb0048a00 stride 1600 dest bytes 0xb05ec600 stride 1600 byte_width 15e0 fbCopyAreammx: 0.981807s 081c8000-0838a000 rw-p 081c8000 00:00 0 [heap] afa9e000-b7a9e000 rw-s d8000000 03:02 26724 /dev/mem This happens because the requested pixmaps are of size 1400x1021 and 1400x1050. The current memory manager is stride based, that is it can only allocate pixmaps in graphics memory up to the framebuffer width. Card memory is incredibly slow to read, that's what's hitting us here. So what actually should happen is that the XAA CopyArea function should have been called, which does everything in graphics memory with the GPU.
FWIW - and I guess since you're hot on the trail here - someone asked if this still occurs in B3 - and it does :-)
XAAComposite should catch this situation before one gets down there. I wonder why the MMX layer is not implemented as another wrapper layer.
XAACopy should probably catch the same situations as for which fbCopyAreammx is called. I would be curious what happens if fbCopyAreammx isn't used at all but the general fallback code in the fb layer. Would be funny if this was faster but it would explain why this situation hasn't been caught before.
It seems that it actually has only been pretty fast in this setup due to bugs in the fb layer code. E.g. the repeat flag was on but not honored correctly. This didn't do any harm in this case, but it could in others.
Created attachment 66232 [details] Proposed patch for improving XAAComposite Fastpath This patch improves the Fastpath from XAAComposite so that this corner case (yes, it is a corner case) is accelerated as well. I'm have to discuss this patch upstream, as I'm not 100% sure about some variables. As the Slowpath is definitively broken WRT some others, this shouldn't create any regressions, though. Results here (Radeon 7500): Slow path (Framebuffer width < Pixmap width): 37ms Fast path (Framebuffer width >= Pixmap width): without patch: 900-1200ms with patch: 1-1.2ms Acceleration factor: approx. 1000 =-)
Stefan, please apply to stable. From the Xorg bugzilla #4320: > This makes a lot of sense. As Keith and Carl mentioned once, there is code in > the server-side implementation of RENDER that needs to detect that it is copying > a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of > copying the pixels by hand. Actually, that is (partially) already done in XComposite, Cairo only hit a corner case that wasn't accelerated. I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a source without alpha and without a mask (should do PictOpSrc in this case), 2nd) because it enables repeat even for 1:1 copies. Anyone willing to report this to Cairo guys? I know David is working on glitz, but this should be detected and worked around earlier in the library.
Thanks for following up in https://bugs.freedesktop.org/show_bug.cgi?id=4320, Matthias - we are making excellent progress there. In the meantime, I'll attach a patch which works around this for our gtk2 package. With this, my Nautilus desktop is fast again. Summary: CAIRO_EXTEND_REPEAT with a source pixmap pattern and a destination pixmap surface is slow because of bugs in Cairo and XRENDER. The patch makes GTK+ turn on REPEAT only when absolutely necessary.
Created attachment 66252 [details] gtk2-117163-cairo-repeat-pattern-workaround.diff
Submitted to autobuild; this will be in gtk2-2.8.10-7.
Damn. My patch works when you log in, and then it varies between working and not working depending on how full of pixmaps the video card becomes as you use your session.
Matthias' patch is submitted now for Beta4. --> FIXED
Created attachment 66427 [details] Updated gtk2-117163-cairo-repeat-pattern-workaround.diff I'll put this updated patch in our gtk2 package. It fixes an offsetting problem with the previous patch, and falls back to known-to-be-fast code.