Bugzilla – Bug 144902
MozillaFirefox sometime very slow with several opend windows and tabs.
Last modified: 2007-11-02 18:54:31 UTC
I'm using MozillaFirefox 1.5 and now 1.5.0.1 and in the most time I've opend 4 windows and some of them have several tabs opend. On opening an URL or link I see very often a increase of the CPU load for a few minutes and all windows are locked which is no refresh mostly for the same time. I'm using FasterFox, NoScripts, Adblock, Extend Cookie Manager and Tab Mix Plus as extensions. But even the FasterFox does not help. The question is: What does this lock cause (DNS lookup, data stream from the proxy or server) and why does such a event lock down the refresh and usage of all other windows and tabs but not only the new one?
Could you please test if this still happens if you use a clean profile without those extensions? You can create another profile with firefox -ProfileManager (while firefox is not running)
Hmmm ... I've done that, seems a bit better. Even after downloading all extensions but not Fasterfox and using the Phoenity Modern theme its seems that better. Using my old setup and disabling FasterFox I saw some speedups but the speed isn't that fast as with a virgin firefox never seen Fasterfox (the restoring isn't 100%). IMHO this is a problem of the pipeling of the data from the servers or proxy. As long the pipling is active and the load is high a strace attachted to the pid of firefox shows that firefox polls network sockets. In comparision with at home where I've an ISDN connection here at work the data flood overruns firefox. Maybe a sched_yield() and/or pthread_yield() and setting locks only on the window/tab for the new contents would help a bit.
Btw: The former version 1.0 hasn't that problem, but the version 1.5.
Just tried disabling IPv6 DNS lookup and using .suse.de instead of suse.de for NO Proxy. Seems now faster.
Args, it seems that there are sites do show this problem even at all. No cache, dns expired and the redraws are gone for slow sites.
IMHO the thread engine of Firefox is not NPTL safe. It does not use pthread_yield() nor sigtimedwait() and assumes that threads have their own PID which is wrong on SL 10.1.
Created attachment 65052 [details] first shot for better NPTL support but this shot does not build due a exeptio of shlibsign. Don't know what the difference is for the program shlibsign between NPTL and LinuxThreads version ... it simply does not use this.
Werner, thanks for the patch and the analysis. I will check if we can take this. As Robert might be interested as well I'm reassigning this back to bnc-team-mozilla (which is actually Robert and me).
One point more, at home I've a SMP system connected via ISDN. Only to mention it, there the firefox works flawless with the same extension and theme setup. And this nasty locks only occurs if a server with or without proxy takes some seconds more than normal (e.g. innerweb.novell.com) for response. IMHO there is a timeout value which lead if expired, to a busy loop ... or a timeout value was never set for a sepcial case triggered by e.g. TabMixPlus. The last message is `Waiting for ...' for several minutes.
I run on an SMP machine all the time and I don't think I've seen this problem in 10.0, which also has all threads sharing the same pid. so what has changed in 10.1?
By using LinuxThreads instead of NPTL and a controlling netstat call using the watch program every second to repeat the netstat call: watch -n 1 netstat -tuep I've found that during the time with high load there is either no connection to the proxy or the existing(s) will be closed. If a new connection happen the windows are accessible and get an refresh. On some pages this happens several times due to several included references to other servers.
I've added .novell.com to the no proxy list. Now I can use *.novell.com as fast as the local suse.de.
This sounds like a bug in the proxy code. Wolfgang, can you reproduce it with the suse.de proxy? Can I use that proxy?
I will try it tomorrow. You can try to use the proxy through VPN. It has an official Novell ip-address and is available as proxy.suse.de. Don't know if client ip range is defined.
Hmmm ... I've learned that this does not depend on proxy or not proxy but only on CPU and Memory of the system used. On slow system in combination with slow network connections/data rates I found out that the sockets disapear or will be removed after timeout but no new sockets will be created for a long time. And this is AFAIS a major difference between Firefox 1.5 and 1.0.7 -> netwerk/base/src/nsSocketTransportService2.cpp This is visible with the `watch -n 1 netstat -tuep' command. This command lists every second all tcp and udp sockets including the process name and its pid.
Can you get an strace with timestamps?
I've done that but it is extremly large even if bzip2ed.
Created attachment 66706 [details] strace log with LD_ASSUME_KERNEL=2.4.21 and timestamps
Help ... as more windows and tabs are open as slower the load for one page over slow connections are. If after 2 seconds the page is not loaded all locks upto several minutes. I've used ltrace and found that many JS_ functions combined with PR_AtomicIncrement()/PR_AtomicDecrement() pairs are called with a load of 100% of the first thread of 3 or more threads. Why does this occur? I suppose that before doing this it would be better to load the data first ... but the creation of sockets are normally rare as netstat told me. Beside this, the behaviour is also visible with Firefox 1.0.7. I can not remember this with the old mozilla. There was only an old bug with DNS lookups which had also caused locks of the graphical output/refresh of all windows and tabs.
firefox/default> strace -r -ttt -p 22297 Process 22297 attached - interrupt to quit 0.000000 gettimeofday({1139505107, 245845}, NULL) = 0 6.592096 gettimeofday({1139505113, 837935}, NULL) = 0 10.839461 gettimeofday({1139505124, 677391}, NULL) = 0 3.340653 gettimeofday({1139505128, 18178}, NULL) = 0 No polling fo about 20 seconds ...
OK, just tried out Deer Park Alpah 2 version 1.6a1 and beside a missing the extension `Form History Manager' and `Extended Cookie Manager' it is fast _and_ during download of one or more Window or Tabs the other Windows and Tabs are accessible and I can access also the taskbars and menus. Even on restoring a session with `Tab Mix Plus' of a huge number of Window and Tabs all Window and Tabs are accessible within a few seconds.
> OK, just tried out Deer Park Alpah 2 version 1.6a1 1.6a1 is not Deer Park Alpha 2. Which one was it?
I guess trunk still identifies as Deer Park alpha2 :-( Werner, if it was a nightly it would be important which one exactly. What was the download location?
The version is exactly what I've found in `Help' -> `About Deer Park Alpha 2'. The installation was done at Februray the 10th. Is there a unique identifier within that tar ball `firefox-1.6a1.en-US.linux-i686.tar.bz2'?
In the Help --- About window, there's a string that looks like Gecko/20050920. That's what we need. Actually, what would help us most of all is if you went to http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly and did a binary search through the builds there to find the first build that works. Then we can probably figure out what patch fixed the bug, and port/backport it to our builds.
Ok here we're: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20060210 Firefox/1.6a1 btw: even the innerweb is very fast, much faster than with the old Mozilla.
Would you be able to do what I mentioned in comment #26?
this is what ve provided in comment #27 ... see Gecko/20060210
How about this? > Actually, what would help us most of all is if you went to > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly and did a binary search > through the builds there to find the first build that works.
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2006-02-10-05-trunk/firefox-1.6a1.en-US.linux-i686.tar.bz2 and now what about this?
Are you saying that that is the first build that works? Builds from 2006-02-09 are slow?
No, the only fact is that this was the almost first try and it has worked at the first time. I've no idea if the problem was fixed before with any other version. It was just a try to get back a working browser even with many opend windoes and tabs.
If you do a binary search backwards through time to locate the first build that works again for you, then we might be able to pinpoint the change that fixed the bug and backport it to our 1.5 release for Code10 or whatever. You would only have to download and test 10 builds or so.
Same problem for me. I'using Suse 10.1 RC1 64 bit. Firefox is ALWAYS very, very very slow in opening windows and contacts. Konqueror and thunderbird work fast instead. I've tried to re-install and also on two diferent PCs with socket 939 and AMD 64 dual core CPUs. The problem is unchanged (it was the same on suse 10.1 beta 9) Bye.
Just to be noted: plain firefox 2.0.0.1 works flawless.
Well, we've shipped 2.0.0.1 (since updated to 2.0.0.2) for both 10.1 and Code10 (for the sake of security fixes, IIRC). Based on the last comment, then, I assume we can call this FIXED.