Bugzilla – Bug 555803
zypper package download hangs forever (waiting for aria2c)
Last modified: 2011-01-25 09:03:34 UTC
Hi, I started 'zypper -v up' on a Factory system (last updated shortly before 11.2 release). [I don't find the component/product Factory in Bugzilla, hence the report for 11.2. But the software might not match exactly 11.2.) # rpm -q aria2 zypper libzypp aria2-1.5.2-1.3.i586 zypper-1.2.5-1.2.i586 libzypp-6.18.0-1.2.i586 The zypper update eventually hangs when downloading one package. Here's debugging info. root@linux-7q6o:~ # cat /dev/vcs1 | fmt -w 96 | tail -2 Retrieving package opensuse-manuals_en-11.2-5.2.noarch (5/667), 20.6 MiB (30.7 MiB unpacked) Retrieving: opensuse-manuals_en-11.2-5.2.noarch.rpm [80% (192.0 KiB/s)] root 2680 0.0 0.3 5176 2000 ? Ss 00:13 0:00 login -- root root 2793 0.0 0.4 5484 2344 tty1 Ss 00:14 0:00 \_ -zsh root 2814 0.0 0.4 4484 2296 tty1 S 00:14 0:00 \_ zsh root 2819 0.3 7.7 49128 39396 tty1 S+ 00:14 0:08 \_ zypper -v up root 9716 0.2 0.8 8404 4184 tty1 S+ 00:17 0:06 \_ aria2c --user-agent=ZYpp 6.18.0 (aria2 version 1.5.2) openSUSE-11.2-i586 --summary-interval=1 --follow-metalink=mem --check-integrity=true --file-allocation=none --server-stat-of=/var/cache/zypp/aria2.stats --server-stat-if=/var/cache/zypp/aria2.stats --uri-selector=adaptive --use-head=false --max-tries=5 --max-concurrent-downloads=2 --header=X-ZYpp-AnonymousId: da93627e-e5ca-42ba-8ebc-122fe15c9a8d --header=X-ZYpp-DistributionFlavor: ftp --header=Pragma: --connect-timeout=180 --dir=/var/adm/mount/AP_0x00000003/suse/noarch http://download.opensuse.org/factory/repo/oss/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm root@linux-7q6o:~ # tail /var/log/aria2.log 2009-11-15 00:17:20.772930 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.780089 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.795122 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.833372 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.844119 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.850661 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:20.850825 INFO - CUID#10 - The download for one segment completed successfully. 2009-11-15 00:17:20.850872 DEBUG - Hash is available! index=29 2009-11-15 00:17:20.850998 INFO - Good chunk checksum. hash=50e4e715bbeefbbe73f484b9744ef82ba943f3ea 2009-11-15 00:17:20.851053 INFO - The download was complete. root@linux-7q6o:~ # lsof -p 9716 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME aria2c 9716 root cwd DIR 8,2 4096 113121 /root aria2c 9716 root rtd DIR 8,2 4096 2 / aria2c 9716 root txt REG 8,2 2620264 17518 /usr/bin/aria2c aria2c 9716 root DEL REG 8,2 444624 /var/run/nscd/dblOHSt1 aria2c 9716 root mem REG 8,2 256316 41319 /usr/lib/locale/en_US.utf8/LC_CTYPE aria2c 9716 root mem REG 8,2 14016 258605 /lib/libdl-2.10.1.so aria2c 9716 root mem REG 8,2 1430104 258573 /lib/libc-2.10.1.so aria2c 9716 root mem REG 8,2 124540 258848 /lib/libgcc_s.so.1 aria2c 9716 root mem REG 8,2 165924 258626 /lib/libm-2.10.1.so aria2c 9716 root mem REG 8,2 966984 457527 /usr/lib/libstdc++.so.6.0.12 aria2c 9716 root mem REG 8,2 75416 258726 /lib/libz.so.1.2.3 aria2c 9716 root mem REG 8,2 157180 258974 /lib/libexpat.so.1.5.2 aria2c 9716 root mem REG 8,2 1495732 457241 /usr/lib/libcrypto.so.0.9.8 aria2c 9716 root mem REG 8,2 303232 457593 /usr/lib/libssl.so.0.9.8 aria2c 9716 root mem REG 8,2 26048 24738 /usr/lib/gconv/gconv-modules.cache aria2c 9716 root mem REG 8,2 125684 258567 /lib/ld-2.10.1.so aria2c 9716 root 0r FIFO 0,8 0t0 24895 pipe aria2c 9716 root 1w FIFO 0,8 0t0 24896 pipe aria2c 9716 root 2w FIFO 0,8 0t0 24896 pipe aria2c 9716 root 3w CHR 1,3 0t0 2276 /dev/null aria2c 9716 root 4w REG 8,2 242220104 445232 /var/log/aria2.log aria2c 9716 root 5u 0000 0,9 0 357 anon_inode aria2c 9716 root 6u IPv4 24901 0t0 TCP linux-7q6o.localdomain:58075->pontifex.opensuse.org:http (CLOSE_WAIT) aria2c 9716 root 7u 0000 0,9 0 357 anon_inode aria2c 9716 root 8u REG 8,2 21563559 145660 /var/adm/mount/AP_0x00000003/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm aria2c 9716 root 15u IPv4 24986 0t0 TCP linux-7q6o.localdomain:55526->mirror.leaseweb.net:http (CLOSE_WAIT) aria2c 9716 root 16u 0000 0,9 0 357 anon_inode root@linux-7q6o:~ # strace -f -p 9716 Process 9716 attached - interrupt to quit epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258241244 time(NULL) = 1258241244 time(NULL) = 1258241244 time(NULL) = 1258241244 gettimeofday({1258241244, 386798}, NULL) = 0 epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258241245 time(NULL) = 1258241245 time(NULL) = 1258241245 time(NULL) = 1258241245 gettimeofday({1258241245, 412522}, NULL) = 0 epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258241246 time(NULL) = 1258241246 time(NULL) = 1258241246 time(NULL) = 1258241246 gettimeofday({1258241246, 440871}, NULL) = 0 epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258241247 time(NULL) = 1258241247 time(NULL) = 1258241247 time(NULL) = 1258241247 gettimeofday({1258241247, 484179}, NULL) = 0 epoll_wait(5, ^C <unfinished ...> Process 9716 detached Two peculiarities here. aria2c keeps epoll'ing on fd 5, which is listed as anon_inode by the system(?). Second, while aria2c logged "The download was complete", the last output of zypper was "80%", and aria2c still runs...
'rpm' output shows that the system had last been updated on the 5th of November: root@linux-7q6o:~ # rpm -qa --last | head opensuse-manuals_en-pdf-11.2-5.2 Sun Nov 15 01:17:08 2009 poppler-data-0.2.1-5.1 Sun Nov 15 01:17:05 2009 providers-2006.11.21-108.1 Sun Nov 15 01:16:58 2009 release-notes-openSUSE-11.2.4-1.2 Sun Nov 15 01:16:54 2009 gnome-control-center-2.28.0-1.4 Thu Nov 5 21:28:51 2009 libcanberra-gtk-0.18-2.3 Thu Nov 5 21:28:49 2009 libgstinterfaces-0_10-0-0.10.24-1.10 Thu Nov 5 21:28:47 2009 metacity-2.28.0-1.2 Thu Nov 5 21:28:36 2009 gstreamer-0_10-plugins-base-0.10.24-1.10 Thu Nov 5 21:28:34 2009 libcanberra-gtk0-0.18-2.3 Thu Nov 5 21:28:32 2009
Upon pressing Ctrl-C at zypper, aria2c only logs a line that it plans to shut down, but keeps going: 2009-11-15 01:05:57.694419 NOTICE - Shutdown sequence commencing... Press Ctrl-C again for emergency shutdown. ^Z [1] + 17830 suspended tail -f /var/log/aria2.log root@linux-7q6o:~ # bg [1] + 17830 continued tail -f /var/log/aria2.log root@linux-7q6o:~ # strace -f -p 9716 Process 9716 attached - interrupt to quit epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258243596 time(NULL) = 1258243596 gettimeofday({1258243596, 430183}, NULL) = 0 epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258243597 time(NULL) = 1258243597 gettimeofday({1258243597, 470358}, NULL) = 0 epoll_wait(5, {}, 1024, 1000) = 0 time(NULL) = 1258243598 time(NULL) = 1258243598 gettimeofday({1258243598, 514590}, NULL) = 0
Created attachment 327693 [details] aria2.log (that I luckily have configured...)
I wonder if this is a common aria2 problem or specific to running from zypper.
Tatsuhiro, maybe you can make something out of this? It is aria2c version 1.5.2.
Can you reproduce the problem by directly invoking aria2c? aria2c http://download.opensuse.org/factory/repo/oss/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm I tried the above command and the only (slight) problem was that one server was slow so last stage of download took more time. But the log said very long pause after pressing ctrl-c.
That command works well for me. It downloads quickly and terminates immediately once finished. The file descriptor 5 being "anon_inode" is normal, by the way, and it's always present; it simply is the epoll file descriptor acquired by epoll_create(), and that's how the system shows it. The system where I observed the problem was suspended during the last days; I kept it in that state to leave it available for further debugging. Believe it or not, it continued downloading (and installing) today. The exact same zypper process is now continuing the update. So it looks that the aria2c process finally did terminate after all.
I think #5 fd is no problem if it is created by epoll_create(). It continues to sit there until aria2 process is shutdown. From the log, aria2 checked all 83 piece hashes. It means it correctly downloaded all pieces(a complete file). But zypper showed 80% complete. That's weird. Invoking aria2c works well, then, is there some problems between zypper and aria2c, ah something like communication problem?
I'm not sure. It looked as if zypper is waiting for aria2c, and aria2c is waiting for something as well (keeping looking at the poll set). When aria2c was done (which the logs seem to indicate) it should have terminated I think.
I checked code, but still could not find the cause of problem. It may be fixed in latest version 1.6.3. >Believe it >or not, it continued downloading (and installing) today. It means aria2 process finally exited? If so, is there a log after "2009-11-15 01:05:57.694419 NOTICE - Shutdown sequence commencing... Press Ctrl-C again for emergency shutdown." ?
Yes, I think that aria2c finally resumed from the hang and terminated, thus zypper continued its work. However, I don't think I'll be able to find another log line, because all aria2c calls log to the same log file (also my attempted manual downloads), thus the log is cluttered with other aria2c calls meanwhile. That's because I configured the log via .aria2/aria2.conf. It would be helpful if the PID was also logged I guess. It would also be helpful if libzypp would maintain its own aria2c log file. It doesn't log at all, in fact.
I can confirm this bug. I did a fresh install of 11.2 x86_64 and the updating process hung quite a few times, e.g. when downloading the openssl, libtalloc and ntfs-3g updates. Very annoying. I can't remember this happening on 11.1. Killing aria2c and retrying the download helped.
I think I found the cause of this mysterious hang. I just noticed this in the log Peter attached: 2009-11-15 01:17:16.809663 DEBUG - CUID#9 - socket: read:1, write:0, hup:0, err:0 2009-11-15 00:17:18.846709 DEBUG - CUID#12 - socket: read:1, write:0, hup:0, err:0 The time is rewinded about 1hr. In aria2c, I use gettimeofday() to get instant time as a base and do some deferred tasks after certain timeout. But because of this rewind of time, it takes long time to timeout and from outside world, aria2c appears to be hanged. I think after long delay(I think 1hr) aria2c finally exited and this matches exactly what Peter said. I'll fix this problem in the next release.
Eagle eye ;)
Ah, that makes sense. Thank you very much, Tatsuhiro!
I had the same issue with 11.2 today while calling "zypper patch". After I waited maybe 5 or 10 minutes while nothing happened on stdout zypper asked for confirmation. But now it hangs again while downloading /var/adm/mount/AP_0x00000001/rpm/i586/gimp-help-browser-2.6.7_2.6.8-4.4_1.1.1.i586.delta.rpm.aria2 As soon as it finished I'll attach the log. While following the log output I saw failing downloads from a host using an IPv6 address. Next I checked my v6 tunnel and it was broken. Therefore I disabled it for this test.
Created attachment 354391 [details] aria2 log while zypper patch on openSUSE 11.2
I checked the log, aria2 tried to connect IPv6 address but all the attempt(5times) were timed out. Each timeout is 60secs, so 5min hang. I think disabling IPv6 will fix the problem. BTW, we fixed the original cause of this issue reported by Peter in the latest release 1.9.1.
Thanks. Keeping the bug open as reminder to try to improve the logging. If we'd print a message on timeouts and retries, it would be easier to guess whats going on.
For me, the same hanging happens with the following versions: openSUSE 11.2 # rpm -q aria2 zypper libzypp aria2-1.9.3-0.1.1.i586 zypper-1.3.4-0.1.3.i586 libzypp-6.31.4-0.1.1.i586
Workaround: export ZYPP_ARIA2C=0 (or remove/rename the /usr/bin/aria2c binary. Removing the aria2 package fails because libzypp still requires it). libzypp will fallback to libcurl then. On 11.4 a libcurl based implementation supporting zsync and metalink will replace aria2 as default downloader.
Yesterday I installed opensuse11.3 to see how zypper with aria2 behaves. I installed gcc package and I experienced "hang". After successful several package downloads, zypper seemed to hang for several minutes. Luckily, I enabled aria2 log, so inspected it. The cause of hang was that aria2 was trying to connect a server but for some reason it could not to connect it and aria2 waited 3minutes timeout and retried 5 times. This means 15 minutes hang. Actually, after 15mins, aria2 gave up this server and tried another one and all download completed eventually. I think zypper's default 3mins timeout is rather long. Lowering it reduces hang. Also --uri-selector=adaptive tends to return low prioritized servers which is more likely unreachable, so I recommend to not to use it. I tried to change these aria2 option values, but due to my short opensuse experience(just 2days), I have no idea how to do that. Is there a way to change these option value?
*** Bug 666429 has been marked as a duplicate of this bug. ***
In case of trouble disable the aria backend (see comment#21)