Bug 117163 - Minimizing a window takes 1-2 seconds
Summary: Minimizing a window takes 1-2 seconds
Status: RESOLVED FIXED
: 115565 117489 141249 141944 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: GNOME (show other bugs)
Version: RC 1
Hardware: Other All
: P1 - Urgent : Critical
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-15 05:07 UTC by Magnus Boman
Modified: 2006-02-03 21:27 UTC (History)
4 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Output from glxinfo (15.65 KB, text/plain)
2005-09-15 22:55 UTC, Magnus Boman
Details
big-fill.c (1.59 KB, text/plain)
2006-01-17 18:12 UTC, Federico Mena Quintero
Details
xorf.conf (5.33 KB, text/plain)
2006-01-31 19:10 UTC, Federico Mena Quintero
Details
Xorg.0.log.gz (9.47 KB, application/x-gzip)
2006-01-31 23:28 UTC, Federico Mena Quintero
Details
Proposed patch for improving XAAComposite Fastpath (1.61 KB, patch)
2006-02-02 17:56 UTC, Matthias Hopf
Details | Diff
gtk2-117163-cairo-repeat-pattern-workaround.diff (1.99 KB, patch)
2006-02-02 19:53 UTC, Federico Mena Quintero
Details | Diff
Updated gtk2-117163-cairo-repeat-pattern-workaround.diff (6.19 KB, patch)
2006-02-03 21:27 UTC, Federico Mena Quintero
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Magnus Boman 2005-09-15 05:07:10 UTC
When minimizing the only window covering the desktop, the window contents
disappears, leaving a black frame where the windows outer edges were. This black
frame stays there for between 1 second or two. If the window was maximized to
start with, you get a blank desktop (no icons visible on the desktop) for that
amount a time.
If there is another maximized window underneath the window you are minimizing,
you can actually see that it is an animated minimization.
It's very annoying.
Comment 1 Federico Mena Quintero 2005-09-15 14:55:35 UTC
Upstream bugs:
http://bugzilla.gnome.org/show_bug.cgi?id=314616
https://bugs.freedesktop.org/show_bug.cgi?id=4320

Mandrake patch:
http://cvs.mandriva.com/cgi-bin/cvsweb.cgi/SPECS/cairo/cairo-1.0.0-brokenxrender.patch?rev=1.1&content-type=text/x-cvsweb-markup

Magnus, what video card do you have?  Which X server are you using?  Do you know
if acceleration is turned on?

[I can see the bug as well; I need to figure out the same things about my video
card :) ]
Comment 2 Federico Mena Quintero 2005-09-15 15:11:50 UTC
I have an ATI Radeon 9200 SE.  The bug is known to happen with it.  I'll
investigate.
Comment 3 Magnus Boman 2005-09-15 22:54:42 UTC
My video card is ATI Mobility Fire GL T2 (In an IBM Thinkpad T41P). I have not
installed ATIs drivers.
X server is xorg-x11-server-6.8.2-96 and I don't think I have 3d acceleration
enabled (attaching output from glxinfo. hopefully that'll help)

Comment 4 Magnus Boman 2005-09-15 22:55:44 UTC
Created attachment 50106 [details]
Output from glxinfo
Comment 5 Magnus Boman 2005-09-16 05:22:38 UTC
I checked the upstream bugs... Just wanted you to know that even if I disable
the background, I get the same issue. Difference is that it takes less than a
second before the icons show up. Still annoying :)
Comment 6 Federico Mena Quintero 2005-09-19 16:32:48 UTC
*** Bug 117489 has been marked as a duplicate of this bug. ***
Comment 7 Federico Mena Quintero 2005-09-19 17:12:24 UTC
*** Bug 115565 has been marked as a duplicate of this bug. ***
Comment 8 Magnus Boman 2005-12-22 23:50:51 UTC
I'm on the latest code and it seems to have been fixed.
Comment 9 Michael Meeks 2005-12-23 10:52:57 UTC
Magnus - you're using a large pixmap desktop background right ? but perhaps you're testing with the code-10 stuff (?).
Comment 10 Magnus Boman 2005-12-23 21:06:15 UTC
I always had the issue without a backgruond picture. But yes, I'm using the code10 stuff and it seems to be fixed in there.
Comment 11 JP Rosevear 2006-01-11 04:37:07 UTC
I'm still getting this issue in code 10.
Comment 12 JP Rosevear 2006-01-11 04:38:00 UTC
*** Bug 141249 has been marked as a duplicate of this bug. ***
Comment 13 Federico Mena Quintero 2006-01-11 17:47:55 UTC
I'd love some help from an X person to fix this, or at least to determine what the culprit is inside the X server.  Without a modular server, a profiler can't see the X server's functions.
Comment 14 Michael Meeks 2006-01-11 18:20:32 UTC
I imagine it's simply slowness rendering the background - with a block-color background I guess it's fine.

Some things to check: we are rendering the pixmap at integer offsets ;-)
Also - if we have a large pixmap - can we not high-quality re-render it to the screen size & then just blit it across ? - there's little point in keeping more of the image around than we need surely ?
Comment 15 JP Rosevear 2006-01-17 15:36:57 UTC
Could someone on the X team give us a hand here?

The performance is fine on Xgl so we  suspect its an xorg is.  Possible culprit is XRenderComposite.

From David R:

Nautilus doesn't seam to be doing anything stupid. It's using the render
extension and compositing the minimum area that have changed using the
OVER operator.

This is all done to temporary pixmaps and Xgl is not accelerating that
by default, which means that the software code is used when running Xgl.
However, it's hitting MMX optimized software paths all the time so
performance is fine.

Xorg 6.9 and 7.0 should have these same MMX optimizations. So with
latest version of Xorg and correctly built packages (MMX optimizations
turned on) it should be fine.

The stack trace at:
http://bugzilla.gnome.org/show_bug.cgi?id=314616#c8

May be the culprit.
Comment 16 Federico Mena Quintero 2006-01-17 15:43:11 UTC
How can I know if MMX optimizations are turned on?  I have NLD10, xorg-x11-6.9.0-3.
Comment 17 Stefan Dirsch 2006-01-17 15:49:05 UTC
MMX optimizations are enabled in our X.Org build. I discussed this before with David R.
Comment 18 Federico Mena Quintero 2006-01-17 18:12:30 UTC
Created attachment 63671 [details]
big-fill.c

This is a test case to show the slowness; it's the actual xlib/cairo calls that Nautilus makes when painting the background (indirectly through GDK).

Compile with this:

  gcc -o big-fill big-fill.c `pkg-config --cflags --libs cairo`

You can see that cairo_fill() takes more than 1 second.
Comment 19 Federico Mena Quintero 2006-01-17 18:39:28 UTC
I have a Thinkpad T41p with an ATI Radeon.

If I add this to the "Device" section of the /etc/X11/xorg.conf that yast spit out, my test case becomes fast (about 0.01 seconds):

  Option "AccelMethod" "EXA"

Two questions:

- Why is that disabled by default?
- Does it work fine with a normal session?
- Why is the non-EXA version so slow?  Crappy compositing routines in the server?
Comment 20 Federico Mena Quintero 2006-01-17 19:01:01 UTC
(12:42:05) cworth: federico: It's probably just not noticing a case where it could be doing the equivalent of what XCopyArea would result it. (Though I am just guessing---I haven't looked.)

http://bugzilla.gnome.org/show_bug.cgi?id=314616#c9 shows the stack trace in the X server.

I sent the test case upstream: https://bugs.freedesktop.org/show_bug.cgi?id=4320
Comment 21 Federico Mena Quintero 2006-01-17 19:09:08 UTC
Do we have someone who knows the RENDER implementation code?  Both the server and the client will need a change to detect when they can simply use XCopyArea() for non-alpha source and destination.
Comment 22 JP Rosevear 2006-01-25 16:39:24 UTC
*** Bug 141944 has been marked as a duplicate of this bug. ***
Comment 23 JP Rosevear 2006-01-25 16:40:49 UTC
X team?  This includes a sample test case now.
Comment 24 Stefan Dirsch 2006-01-31 11:26:19 UTC
We'll investigate. BTW,

gcc -Wall -O2 -I/usr/include/cairo/ -o big-fill big-fill.c -L/usr/X11R6/lib \
    -lX11 -lcairo

Using nvidia driver:

# ./big-fill 
cairo_fill() time: 0.026906 sec

So probably as already assumed a (radeon) driver issue.
Comment 25 Egbert Eich 2006-01-31 11:29:06 UTC
The call trace indicates that fbCopyAreammx() is called. The area that's used is the full screen so it doesn't seem as if the server is pieceing things together.
As a first test I would add some timestamp code to see how long it takes ProcRenderComposite to complete in ProcRenderDispatch.
Comment 26 Matthias Hopf 2006-01-31 16:44:42 UTC
I'm no intel assembler export, but the assembler output of fbCopyAreammx() doesn't look like mmx assembler at all. Given the source code I cannot blame the compiler not optimizing this. It just looks like CopyArea has never been implemented in MMX (as e.g. fbCompositeSrcAdd* has).

The problem is twofold:
a) why is the fb function called at all (I remember the radeon driver accelerates most of Render, so it should at least CopyArea)?
b) why has this function never been optimized?

I will have to investigate this some more.
Comment 27 Federico Mena Quintero 2006-01-31 16:49:12 UTC
We are copying from a pixmap to another pixmap, obviously without alpha.  From the discussion I had with Keith Packard, the problem is just that the relevant code path in the server-side RENDER code does not detect this.  In that case, it can simply use XCopyArea() instead of "hand-written" compositing code.

#0  0xb7ae6daa in fbCopyAreammx (pSrc=0x8471578, pDst=0x84af108, src_x=0,
src_y=774, dst_x=0, dst_y=1542, width=1024, height=768) at fbmmx.c:2241
#1  0xb7ae6eb0 in fbCompositeCopyAreammx (op=3 '\003', pSrc=0xffe5c793,
pMask=0x0, pDst=0xffe5c793, xSrc=0, ySrc=774, xMask=0, yMask=0, xDst=0,
yDst=-8640, width=51091, height=51091) at fbmmx.c:2283
#2  0xb7ada38d in fbComposite (op=3 '\003', pSrc=0x8477610, pMask=0x0,
pDst=0x84af068, xSrc=0, ySrc=774, xMask=0, yMask=0, xDst=0, yDst=1542,
width=1024, height=768) at fbpict.c:1297
#3  0xb7a91ede in XAAComposite (op=3 '\003', pSrc=0x8477610, pMask=0x0,
pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1024,
height=768) at xaaPict.c:529
#4  0x0814b541 in damageComposite (op=3 '\003', pSrc=0xffe5c793,
pMask=0xffe5c793, pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0,
yDst=0, width=1024, height=768) at damage.c:539
#5  0x0813e90b in CompositePicture (op=3 '\003', pSrc=0x8477610, pMask=0x0,
pDst=0x84af068, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, width=1024,
height=768) at picture.c:1667
#6  0x081400d8 in ProcRenderComposite (client=0x824f9b0) at render.c:755
#7  0x0814292f in ProcRenderDispatch (client=0x824f9b0) at render.c:1995
#8  0x080861de in Dispatch () at dispatch.c:459
#9  0x0806d9b5 in main (argc=10, argv=0xbfcc9ac4, envp=0xffe5c793) at main.c:450

Somewhere in that stack trace (frame 5, perhaps?) you need to see that both the source and destination are pixmaps without alpha, and that you can simply use the normal implementation of XCopyArea().
Comment 28 Matthias Hopf 2006-01-31 17:48:55 UTC
Right, still the fbCopyArea should be optimized.

Hm, seems that I cannot reproduce this issue here at all.
I get 0.03-0.042 sec with the radeon driver. On a very low end machine, with a Radeon 9700 and a Radeon 7500. Will have to test on JP's laptop first.

Comment 29 Federico Mena Quintero 2006-01-31 18:20:11 UTC
Matthias, is that the free driver (EXA or non-EXA?) or the proprietary one?

With the free driver and without EXA, you get the slow copy.

If you turn on EXA, the copy is fast - but the desktop repaint is still slow in some cases.  However, some things get mis-painted all over the place, including the mouse cursor at login.  Maybe this is why we disable EXA by default for Radeon.
Comment 30 Stefan Dirsch 2006-01-31 18:23:16 UTC
Mena, I think it's time to attach your X.Org config and logfile. I think we need more deteailed information about your system to be able to reproduce this problem here. :-)
Comment 31 Federico Mena Quintero 2006-01-31 19:10:28 UTC
Created attachment 65931 [details]
xorf.conf

I'm on a Thinkpad T41p.
Comment 32 Stefan Dirsch 2006-01-31 19:48:21 UTC
Logfile?
Comment 33 Federico Mena Quintero 2006-01-31 20:04:17 UTC
What logfile do you need and how can I obtain it?
Comment 34 Stefan Dirsch 2006-01-31 20:06:16 UTC
/var/log/Xorg.0.log. :-)
Comment 35 Federico Mena Quintero 2006-01-31 23:28:31 UTC
Created attachment 65960 [details]
Xorg.0.log.gz
Comment 36 Federico Mena Quintero 2006-01-31 23:31:06 UTC
Those log files grow incredibly big, by the way:

-rw-r--r-- 1 root root    144884207 2006-01-31 17:24 Xorg.0.log.old

... that's the log file previous to the one I attached, and represents a day's worth of info.

When I logged in, my current logfile was at about 70 KB.  It is now 5 minutes later, and it is at 320 KB.  The end of the file gets a constant stream of these:

(**) RADEON(0): WaitForIdle (entering): 62 entries, stat=0x8802613e
(**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140
(**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140
(**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x88020140
(**) RADEON(0): WaitForIdle (entering): 59 entries, stat=0x8802613b
(**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140
(**) RADEON(0): WaitForIdle (entering): 64 entries, stat=0x00000140

any idea what that is?
Comment 37 Stefan Dirsch 2006-01-31 23:38:58 UTC
This is debug info from the patch by Benjamin Herrenschmidt. :-)

This is mainly what I wanted to know.
(--) RADEON(0): Chipset: "ATI FireGL Mobility T2 (M10) NT (AGP)" (ChipID = 0x4e54)

This is also interesting:
(II) RADEON(0): Render acceleration unsupported on Radeon 9500/9700 and newer.
(II) RADEON(0): Render acceleration disabled

Comment 38 Egbert Eich 2006-02-01 09:18:22 UTC
The annoying debug messages that benh has added to the radeon driver and which should never be in a production driver don't help performance. Therefore my first reaction would be to remove them. Then one can go back and look for other performance sinks.
It is possible that we are seeing problems in the WaitForIdle call. If we perform the CopyArea in software we must wait for the engine to become idle so that we don't write things in the wrong order.
Since I've got an M10 here I can test this later today.
Comment 39 Stefan Dirsch 2006-02-01 09:51:13 UTC
Egbert, the debug messages have been removed Beta2. Probably Frederico is still using Beta1. (BTW, why is this still marked as a 10.0 RC1 bug?. Could this be updated accordingly, please?) Unfortunately by updating the patch by
Benjamin I've enabled DEBUG Info for Beta3 again. :-( But we've seen several
bugreports about freezing radeon driver, which has been caused by disabling debug info between Beta1 and Beta2, anyway. So we also have a timing problem here. :-(
Comment 40 Matthias Hopf 2006-02-01 12:32:42 UTC
(In reply to comment #29)
> Matthias, is that the free driver (EXA or non-EXA?) or the proprietary one?

Radeon (free), no EXA.

> With the free driver and without EXA, you get the slow copy.

Not on my setup.

> If you turn on EXA, the copy is fast - but the desktop repaint is still slow

I won't try EXA. This is experimental stuff, you could get about any performance figure you want with it, depending on date, time, and moon phase. If it runs correctly at that time anyway.

(In reply to comment #37)
> This is also interesting:
> (II) RADEON(0): Render acceleration unsupported on Radeon 9500/9700 and newer.
> (II) RADEON(0): Render acceleration disabled

I couldn't reproduce the bug with both, Radeon 7500 and Radeon 9700. With the first Render acceleration is enabled, with the second disabled. Both take approx. 0.04sec for the test program.

Guess I have to single step into the driver to see which function is actually called (too fast to be interrupted).
Comment 41 Federico Mena Quintero 2006-02-01 16:51:25 UTC
I have xorg-x11-6.9.0-3 from preview4.  What server do you guys have?
Comment 42 Federico Mena Quintero 2006-02-01 16:51:54 UTC
NLD10 preview4, that is.
Comment 43 Stefan Dirsch 2006-02-01 16:53:45 UTC
We're talking about Beta2 ...
Comment 44 Federico Mena Quintero 2006-02-01 17:27:14 UTC
What server is in beta2?  I haven't gotten the DVD yet.
Comment 45 Stefan Dirsch 2006-02-01 17:31:06 UTC
6.9.0 with new patches for the radeon driver.
Comment 46 Federico Mena Quintero 2006-02-01 17:48:06 UTC
Is that 6.9.0-8?  I just need the version+revision number.
Comment 47 Matthias Hopf 2006-02-01 17:50:15 UTC
Ok, I single stepped into my machine (which doesn't have the slowdown). This is the backtrace:

#0  fbCopyAreammx (pSrc=0xaf1d8008, pDst=0xaec64008, src_x=0, src_y=0, 
    dst_x=0, dst_y=0, width=1400, height=1021) at fbmmx.c:2163
#1  0xb77f8916 in fbCompositeCopyAreammx (op=3 '\003', pSrc=0x831c598, 
    pMask=0x0, pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, 
    yDst=0, width=1400, height=1021) at fbmmx.c:2265
#2  0xb77f577b in fbComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, 
    pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, 
    width=1400, height=1021) at fbpict.c:1297
#3  0xb77aba0b in XAAComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, 
    pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, 
    width=1400, height=1021) at xaaPict.c:529
#4  0x0816f4b6 in cwComposite (op=3 '\003', pSrcPicture=0x831c598, 
    pMskPicture=0x0, pDstPicture=0x831c738, xSrc=0, ySrc=0, xMsk=0, yMsk=0, 
    xDst=0, yDst=0, width=1400, height=1021) at cw_render.c:273
#5  0x08168796 in damageComposite (op=3 '\003', pSrc=0x831c598, pMask=0x0, 
    pDst=0x831c738, xSrc=0, ySrc=0, xMask=0, yMask=0, xDst=0, yDst=0, 
    width=1400, height=1021) at damage.c:539
#6  0x0817ccde in ProcRenderComposite (client=0x831ca60) at render.c:755
#7  0x080c76f0 in Dispatch () at dispatch.c:459
#8  0x080d43b5 in main (argc=1, argv=0xbfff3954, envp=Cannot access memory at address 0x8
) at main.c:450

Major differences are cwComposite() call and sizes. Other than that it seems to step into the same functions. Also your fbComposite call gets different y coordinates than the previous calls, probably related to your dualhead setup.
Maybe the slowdown is related to the debug messages, or not related to the copy function at all.

Can you please retest with Beta 3 (three!), which has just been released? And with just running an X server with no other applications (calling the test program remotely).
Comment 48 Matthias Hopf 2006-02-01 17:51:11 UTC
Tested on my machine:

xorg-x11-server-6.9.0-9
xorg-x11-driver-video-6.9.0-10
Comment 49 Matthias Hopf 2006-02-01 17:57:06 UTC
I now have the issues reproduced. It does not occure with my setup, but with your configuration file. I'll check this tomorrow. Presumably this has something to do with a dual monitor setup or with panning.
Comment 50 Matthias Hopf 2006-02-01 18:10:29 UTC
This is UNBELIEVABLE.

Changing the resolution from 1400x1050 to 1280x1024 does the trick!
1400x1050 is slow, 1280x1024 fast.
Comment 51 Federico Mena Quintero 2006-02-01 20:09:13 UTC
I don't really have a dual-head setup.  I'm on a Thinkpad T41p.  I *think* I may have enabled "clone the display to the VGA port".  Would that affect things?
Comment 52 Matthias Hopf 2006-02-02 11:14:21 UTC
I should have written, that dual head setup does not influence this. It is only the effective resolution.
Comment 53 JP Rosevear 2006-02-02 12:31:49 UTC
Are any other resolutions affected?
Comment 54 Matthias Hopf 2006-02-02 13:25:43 UTC
Important differences in logfiles (1280x1024 vs. 1400x1050):

-(--) RADEON(0): Virtual size is 1400x1050 (pitch 1408)
+(--) RADEON(0): Virtual size is 1280x1024 (pitch 1280)
-1400x1050     155.80  1400 1464 1784 1912  1050 1052 1064 1090 (24,32) +H +V
-1400x1050     155.80  1400 1464 1784 1912  1050 1052 1064 1090 (24,32) +H +V
-(**) RADEON(0): Pitch = 11534512 bytes (virtualX = 1400, displayWidth = 1408)
-(**) RADEON(0): dc=15580, of=31160, fd=138, pd=2
+1280x1024     190.96  1280 1376 1520 1760  1024 1025 1028 1085 (24,32)
+1280x1024     190.96  1280 1376 1520 1760  1024 1025 1028 1085 (24,32)
+(**) RADEON(0): Pitch = 10485920 bytes (virtualX = 1280, displayWidth = 1280)
+(**) RADEON(0): dc=19096, of=38192, fd=170, pd=2
-(**) RADEON(0): Wrote: 0x0000000c 0x0001008a 0x00000000 (0x0000a400)
-(**) RADEON(0): Wrote: rd=12, fd=138, pd=1
-(**) RADEON(0): GRPH_BUFFER_CNTL from 20204c4c to 20227c7c
+(**) RADEON(0): Wrote: 0x0000000c 0x000100aa 0x00000000 (0x0000a400)
+(**) RADEON(0): Wrote: rd=12, fd=170, pd=1
+(**) RADEON(0): GRPH_BUFFER_CNTL from 20204c4c to 202a7c7c
-(II) RADEON(0): Memory manager initialized to (0,0) (1408,8191)
-(II) RADEON(0): Reserved area from (0,1050) to (1408,1052)
-(II) RADEON(0): Largest offscreen area available: 1408 x 7139
+(II) RADEON(0): Memory manager initialized to (0,0) (1280,8191)
+(II) RADEON(0): Reserved area from (0,1024) to (1280,1026)
+(II) RADEON(0): Largest offscreen area available: 1280 x 7165
-(**) RADEON(0): Pitch for acceleration = 176
+(**) RADEON(0): Pitch for acceleration = 160
-(II) RADEON(0): Using hardware cursor (scanline 1052)
-(II) RADEON(0): Largest offscreen area available: 1408 x 7136
+(II) RADEON(0): Using hardware cursor (scanline 1026)
+(II) RADEON(0): Largest offscreen area available: 1280 x 7161
 (**) RADEON(0): RADEONDoAdjustFrame(0,0,0)
 (**) RADEON(0):  -> reg     : 0x0224 = 0x00000000
-(**) RADEON(0):     regcntl : 0x0350 = 0xbfbc24a4
+(**) RADEON(0):     regcntl : 0x0350 = 0xbfae1bc4

No obvious memory setup differences, so we have to dig deeper.

I also just confirmed that the whole time is spent in fbCopyAreammx - in both cases.
Comment 55 Matthias Hopf 2006-02-02 13:26:25 UTC
(In reply to comment #53)
> Are any other resolutions affected?

1600x1200 is as well, so I assume that anything higher than 1280x1024 is.
Comment 56 Matthias Hopf 2006-02-02 14:12:26 UTC
Ok, I have an explaination, but no solution so far:

For 1280x1024 framebuffers the off-screen pixmap seems to be created in regular memory (src+dest bytes):

fbCopyAreammx: src 0xaf4d3008 0/0 dest 0xaef5f008 0/0 size 1400/1021
fbCopyAreammx: src bytes 0xaf4d3060 stride 15e0 dest bytes 0xaef5f060 stride 15e0 byte_width 15e0
fbCopyAreammx: 0.037907s
aef5f000-afa6f000 rw-p aef5f000 00:00 0 
afa6f000-b7a6f000 rw-s d8000000 03:02 26724      /dev/mem

while for 1400x1050 the pixmap is created in card memory:
fbCopyAreammx: src 0x820fcc0 0/1055 dest 0x820fd20 0/2105 size 1400/1021
fbCopyAreammx: src bytes 0xb0048a00 stride 1600 dest bytes 0xb05ec600 stride 1600 byte_width 15e0
fbCopyAreammx: 0.981807s
081c8000-0838a000 rw-p 081c8000 00:00 0          [heap]
afa9e000-b7a9e000 rw-s d8000000 03:02 26724      /dev/mem

This happens because the requested pixmaps are of size 1400x1021 and 1400x1050. The current memory manager is stride based, that is it can only allocate pixmaps in graphics memory up to the framebuffer width.

Card memory is incredibly slow to read, that's what's hitting us here. So what actually should happen is that the XAA CopyArea function should have been called, which does everything in graphics memory with the GPU.
Comment 57 Michael Meeks 2006-02-02 14:51:52 UTC
FWIW - and I guess since you're hot on the trail here - someone asked if this still occurs in B3 - and it does :-)
Comment 58 Egbert Eich 2006-02-02 16:04:23 UTC
XAAComposite should catch this situation before one gets down there.
I wonder why the MMX layer is not implemented as another wrapper layer.
Comment 59 Egbert Eich 2006-02-02 16:53:26 UTC
XAACopy should probably catch the same situations as for which fbCopyAreammx is called. 
I would be curious what happens if fbCopyAreammx isn't used at all but the general fallback code in the fb layer. Would be funny if this was faster but it would explain why this situation hasn't been caught before. 
Comment 60 Matthias Hopf 2006-02-02 17:30:07 UTC
It seems that it actually has only been pretty fast in this setup due to bugs in the fb layer code. E.g. the repeat flag was on but not honored correctly. This didn't do any harm in this case, but it could in others.
Comment 61 Matthias Hopf 2006-02-02 17:56:17 UTC
Created attachment 66232 [details]
Proposed patch for improving XAAComposite Fastpath

This patch improves the Fastpath from XAAComposite so that this corner case (yes, it is a corner case) is accelerated as well.
I'm have to discuss this patch upstream, as I'm not 100% sure about some variables. As the Slowpath is definitively broken WRT some others, this shouldn't create any regressions, though.

Results here (Radeon 7500):
Slow path (Framebuffer width < Pixmap width): 37ms
Fast path (Framebuffer width >= Pixmap width):
   without patch:   900-1200ms
   with patch:      1-1.2ms

Acceleration factor: approx. 1000   =-)
Comment 62 Matthias Hopf 2006-02-02 17:59:07 UTC
Stefan, please apply to stable.


From the Xorg bugzilla #4320:
> This makes a lot of sense.  As Keith and Carl mentioned once, there is code in
> the server-side implementation of RENDER that needs to detect that it is copying
> a non-alpha pixmap to another pixmap, and it can just use CopyArea instead of
> copying the pixels by hand.

Actually, that is (partially) already done in XComposite, Cairo only hit a
corner case that wasn't accelerated.

I would call this a bug in Cairo as well, 1st) because it uses PictOpOver with a source without alpha and without a mask (should do PictOpSrc in this case), 2nd) because it enables repeat even for 1:1 copies.

Anyone willing to report this to Cairo guys? I know David is working on glitz,
but this should be detected and worked around earlier in the library.
Comment 63 Federico Mena Quintero 2006-02-02 19:48:46 UTC
Thanks for following up in https://bugs.freedesktop.org/show_bug.cgi?id=4320, Matthias - we are making excellent progress there.

In the meantime, I'll attach a patch which works around this for our gtk2 package.  With this, my Nautilus desktop is fast again.

Summary: CAIRO_EXTEND_REPEAT with a source pixmap pattern and a destination pixmap surface is slow because of bugs in Cairo and XRENDER.  The patch makes GTK+ turn on REPEAT only when absolutely necessary.
Comment 64 Federico Mena Quintero 2006-02-02 19:53:24 UTC
Created attachment 66252 [details]
gtk2-117163-cairo-repeat-pattern-workaround.diff
Comment 65 Federico Mena Quintero 2006-02-02 20:38:28 UTC
Submitted to autobuild; this will be in gtk2-2.8.10-7.
Comment 66 Federico Mena Quintero 2006-02-03 01:33:34 UTC
Damn.  My patch works when you log in, and then it varies between working and not working depending on how full of pixmaps the video card becomes as you use your session.
Comment 67 Stefan Dirsch 2006-02-03 09:25:30 UTC
Matthias' patch is submitted now for Beta4. --> FIXED
Comment 68 Federico Mena Quintero 2006-02-03 21:27:04 UTC
Created attachment 66427 [details]
Updated gtk2-117163-cairo-repeat-pattern-workaround.diff

I'll put this updated patch in our gtk2 package.  It fixes an offsetting problem with the previous patch, and falls back to known-to-be-fast code.