Bug 555803

Summary: zypper package download hangs forever (waiting for aria2c)
Product: [openSUSE] openSUSE 11.2 Reporter: Forgotten User 4yKfwBnymp <forgotten_4yKfwBnymp>
Component: libzyppAssignee: E-mail List <zypp-maintainers>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Enhancement    
Priority: P3 - Medium CC: albryan, lmuelle, ma, madigens, Markus.Elfring, tatsuhiro.t, verdelyi
Version: Final   
Target Milestone: ---   
Hardware: 32bit   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: aria2.log (that I luckily have configured...)
aria2 log while zypper patch on openSUSE 11.2

Description Forgotten User 4yKfwBnymp 2009-11-16 12:18:32 UTC
Hi,

I started 'zypper -v up' on a Factory system (last updated shortly before 11.2 release).

[I don't find the component/product Factory in Bugzilla, hence the report for 11.2. But the software might not match exactly 11.2.)

 # rpm -q aria2 zypper libzypp
aria2-1.5.2-1.3.i586
zypper-1.2.5-1.2.i586
libzypp-6.18.0-1.2.i586


The zypper update eventually hangs when downloading one package. Here's debugging info.


root@linux-7q6o:~ # cat /dev/vcs1 | fmt -w 96 | tail -2
Retrieving package opensuse-manuals_en-11.2-5.2.noarch (5/667), 20.6 MiB (30.7 MiB unpacked)
Retrieving: opensuse-manuals_en-11.2-5.2.noarch.rpm [80% (192.0 KiB/s)]




root      2680  0.0  0.3   5176  2000 ?        Ss   00:13   0:00 login -- root     
root      2793  0.0  0.4   5484  2344 tty1     Ss   00:14   0:00  \_ -zsh
root      2814  0.0  0.4   4484  2296 tty1     S    00:14   0:00      \_ zsh
root      2819  0.3  7.7  49128 39396 tty1     S+   00:14   0:08          \_ zypper -v up
root      9716  0.2  0.8   8404  4184 tty1     S+   00:17   0:06              \_ aria2c --user-agent=ZYpp 6.18.0 (aria2 version 1.5.2) openSUSE-11.2-i586 --summary-interval=1 --follow-metalink=mem --check-integrity=true --file-allocation=none --server-stat-of=/var/cache/zypp/aria2.stats --server-stat-if=/var/cache/zypp/aria2.stats --uri-selector=adaptive --use-head=false --max-tries=5 --max-concurrent-downloads=2 --header=X-ZYpp-AnonymousId: da93627e-e5ca-42ba-8ebc-122fe15c9a8d --header=X-ZYpp-DistributionFlavor: ftp --header=Pragma: --connect-timeout=180 --dir=/var/adm/mount/AP_0x00000003/suse/noarch http://download.opensuse.org/factory/repo/oss/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm

root@linux-7q6o:~ # tail /var/log/aria2.log
2009-11-15 00:17:20.772930 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.780089 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.795122 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.833372 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.844119 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.850661 DEBUG - CUID#10 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:20.850825 INFO - CUID#10 - The download for one segment completed successfully.
2009-11-15 00:17:20.850872 DEBUG - Hash is available! index=29
2009-11-15 00:17:20.850998 INFO - Good chunk checksum. hash=50e4e715bbeefbbe73f484b9744ef82ba943f3ea
2009-11-15 00:17:20.851053 INFO - The download was complete.

root@linux-7q6o:~ # lsof -p 9716
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF   NODE NAME
aria2c  9716 root  cwd    DIR    8,2      4096 113121 /root
aria2c  9716 root  rtd    DIR    8,2      4096      2 /
aria2c  9716 root  txt    REG    8,2   2620264  17518 /usr/bin/aria2c
aria2c  9716 root  DEL    REG    8,2           444624 /var/run/nscd/dblOHSt1
aria2c  9716 root  mem    REG    8,2    256316  41319 /usr/lib/locale/en_US.utf8/LC_CTYPE
aria2c  9716 root  mem    REG    8,2     14016 258605 /lib/libdl-2.10.1.so
aria2c  9716 root  mem    REG    8,2   1430104 258573 /lib/libc-2.10.1.so
aria2c  9716 root  mem    REG    8,2    124540 258848 /lib/libgcc_s.so.1
aria2c  9716 root  mem    REG    8,2    165924 258626 /lib/libm-2.10.1.so
aria2c  9716 root  mem    REG    8,2    966984 457527 /usr/lib/libstdc++.so.6.0.12
aria2c  9716 root  mem    REG    8,2     75416 258726 /lib/libz.so.1.2.3
aria2c  9716 root  mem    REG    8,2    157180 258974 /lib/libexpat.so.1.5.2
aria2c  9716 root  mem    REG    8,2   1495732 457241 /usr/lib/libcrypto.so.0.9.8
aria2c  9716 root  mem    REG    8,2    303232 457593 /usr/lib/libssl.so.0.9.8
aria2c  9716 root  mem    REG    8,2     26048  24738 /usr/lib/gconv/gconv-modules.cache
aria2c  9716 root  mem    REG    8,2    125684 258567 /lib/ld-2.10.1.so
aria2c  9716 root    0r  FIFO    0,8       0t0  24895 pipe
aria2c  9716 root    1w  FIFO    0,8       0t0  24896 pipe
aria2c  9716 root    2w  FIFO    0,8       0t0  24896 pipe
aria2c  9716 root    3w   CHR    1,3       0t0   2276 /dev/null
aria2c  9716 root    4w   REG    8,2 242220104 445232 /var/log/aria2.log
aria2c  9716 root    5u  0000    0,9         0    357 anon_inode
aria2c  9716 root    6u  IPv4  24901       0t0    TCP linux-7q6o.localdomain:58075->pontifex.opensuse.org:http (CLOSE_WAIT)
aria2c  9716 root    7u  0000    0,9         0    357 anon_inode
aria2c  9716 root    8u   REG    8,2  21563559 145660 /var/adm/mount/AP_0x00000003/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm
aria2c  9716 root   15u  IPv4  24986       0t0    TCP linux-7q6o.localdomain:55526->mirror.leaseweb.net:http (CLOSE_WAIT)
aria2c  9716 root   16u  0000    0,9         0    357 anon_inode

root@linux-7q6o:~ # strace -f -p 9716
Process 9716 attached - interrupt to quit
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258241244
time(NULL)                              = 1258241244
time(NULL)                              = 1258241244
time(NULL)                              = 1258241244
gettimeofday({1258241244, 386798}, NULL) = 0
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258241245
time(NULL)                              = 1258241245
time(NULL)                              = 1258241245
time(NULL)                              = 1258241245
gettimeofday({1258241245, 412522}, NULL) = 0
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258241246
time(NULL)                              = 1258241246
time(NULL)                              = 1258241246
time(NULL)                              = 1258241246
gettimeofday({1258241246, 440871}, NULL) = 0
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258241247
time(NULL)                              = 1258241247
time(NULL)                              = 1258241247
time(NULL)                              = 1258241247
gettimeofday({1258241247, 484179}, NULL) = 0
epoll_wait(5, ^C <unfinished ...>
Process 9716 detached



Two peculiarities here. aria2c keeps epoll'ing on fd 5, which is listed as anon_inode by the system(?). Second, while aria2c logged "The download was complete", the last output of zypper was "80%", and aria2c still runs...
Comment 1 Forgotten User 4yKfwBnymp 2009-11-16 12:23:45 UTC
'rpm' output shows that the system had last been updated on the 5th of November:

root@linux-7q6o:~ # rpm -qa --last | head
opensuse-manuals_en-pdf-11.2-5.2              Sun Nov 15 01:17:08 2009
poppler-data-0.2.1-5.1                        Sun Nov 15 01:17:05 2009
providers-2006.11.21-108.1                    Sun Nov 15 01:16:58 2009
release-notes-openSUSE-11.2.4-1.2             Sun Nov 15 01:16:54 2009
gnome-control-center-2.28.0-1.4               Thu Nov  5 21:28:51 2009
libcanberra-gtk-0.18-2.3                      Thu Nov  5 21:28:49 2009
libgstinterfaces-0_10-0-0.10.24-1.10          Thu Nov  5 21:28:47 2009
metacity-2.28.0-1.2                           Thu Nov  5 21:28:36 2009
gstreamer-0_10-plugins-base-0.10.24-1.10      Thu Nov  5 21:28:34 2009
libcanberra-gtk0-0.18-2.3                     Thu Nov  5 21:28:32 2009
Comment 2 Forgotten User 4yKfwBnymp 2009-11-16 12:31:48 UTC
Upon pressing Ctrl-C at zypper, aria2c only logs a line that it plans to shut down, but keeps going:

2009-11-15 01:05:57.694419 NOTICE - Shutdown sequence commencing... Press Ctrl-C again for emergency shutdown.
^Z
[1]  + 17830 suspended  tail -f /var/log/aria2.log
root@linux-7q6o:~ # bg
[1]  + 17830 continued  tail -f /var/log/aria2.log
root@linux-7q6o:~ # strace -f -p 9716
Process 9716 attached - interrupt to quit
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258243596
time(NULL)                              = 1258243596
gettimeofday({1258243596, 430183}, NULL) = 0
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258243597
time(NULL)                              = 1258243597
gettimeofday({1258243597, 470358}, NULL) = 0
epoll_wait(5, {}, 1024, 1000)           = 0
time(NULL)                              = 1258243598
time(NULL)                              = 1258243598
gettimeofday({1258243598, 514590}, NULL) = 0
Comment 3 Forgotten User 4yKfwBnymp 2009-11-16 12:33:00 UTC
Created attachment 327693 [details]
aria2.log (that I luckily have configured...)
Comment 4 Michael Andres 2009-11-18 16:34:44 UTC
I wonder if this is a common aria2 problem or specific to running from zypper.
Comment 5 Forgotten User 4yKfwBnymp 2009-11-18 21:56:31 UTC
Tatsuhiro, maybe you can make something out of this? It is aria2c version 1.5.2.
Comment 6 Tatsuhiro Tsujikawa 2009-11-19 16:46:28 UTC
Can you reproduce the problem by directly invoking aria2c?

aria2c http://download.opensuse.org/factory/repo/oss/suse/noarch/opensuse-manuals_en-11.2-5.2.noarch.rpm

I tried the above command and the only (slight) problem was that one server was slow so last stage of download took more time. But the log said very long pause after pressing ctrl-c.
Comment 7 Forgotten User 4yKfwBnymp 2009-11-19 18:15:31 UTC
That command works well for me. It downloads quickly and terminates immediately once finished.

The file descriptor 5 being "anon_inode" is normal, by the way, and it's always present; it simply is the epoll file descriptor acquired by epoll_create(), and that's how the system shows it.

The system where I observed the problem was suspended during the last days; I kept it in that state to leave it available for further debugging. Believe it or not, it continued downloading (and installing) today. The exact same zypper process is now continuing the update. So it looks that the aria2c process finally did terminate after all.
Comment 8 Tatsuhiro Tsujikawa 2009-11-20 08:03:41 UTC
I think #5 fd is no problem if it is created by epoll_create(). It continues to sit there until aria2 process is shutdown.
From the log, aria2 checked all 83 piece hashes. It means it correctly downloaded all pieces(a complete file). But zypper showed 80% complete. That's weird.
Invoking aria2c works well, then, is there some problems between zypper and aria2c, ah something like communication problem?
Comment 9 Forgotten User 4yKfwBnymp 2009-11-20 15:41:42 UTC
I'm not sure. It looked as if zypper is waiting for aria2c, and aria2c is waiting for something as well (keeping looking at the poll set). When aria2c was done (which the logs seem to indicate) it should have terminated I think.
Comment 10 Tatsuhiro Tsujikawa 2009-11-21 09:14:20 UTC
I checked code, but still could not find the cause of problem.
It may be fixed in latest version 1.6.3.

>Believe it
>or not, it continued downloading (and installing) today.

It means aria2 process finally exited? If so, is there a log after 
"2009-11-15 01:05:57.694419 NOTICE - Shutdown sequence commencing... Press Ctrl-C again for emergency shutdown." ?
Comment 11 Forgotten User 4yKfwBnymp 2009-11-24 11:26:17 UTC
Yes, I think that aria2c finally resumed from the hang and terminated, thus zypper continued its work.

However, I don't think I'll be able to find another log line, because all aria2c calls log to the same log file (also my attempted manual downloads), thus the log is cluttered with other aria2c calls meanwhile. That's because I configured the log via .aria2/aria2.conf. 

It would be helpful if the PID was also logged I guess.

It would also be helpful if libzypp would maintain its own aria2c log file. It doesn't log at all, in fact.
Comment 12 Nikolaus Waxweiler 2010-04-07 19:55:02 UTC
I can confirm this bug. I did a fresh install of 11.2 x86_64 and the updating process hung quite a few times, e.g. when downloading the openssl, libtalloc and ntfs-3g updates. Very annoying. I can't remember this happening on 11.1. Killing aria2c and retrying the download helped.
Comment 13 Tatsuhiro Tsujikawa 2010-04-09 12:48:06 UTC
I think I found the cause of this mysterious hang.
I just noticed this in the log Peter attached:

2009-11-15 01:17:16.809663 DEBUG - CUID#9 - socket: read:1, write:0, hup:0, err:0
2009-11-15 00:17:18.846709 DEBUG - CUID#12 - socket: read:1, write:0, hup:0, err:0

The time is rewinded about 1hr.
In aria2c, I use gettimeofday() to get instant time as a base and do some deferred tasks after certain timeout.
But because of this rewind of time, it takes long time to timeout and from outside world, aria2c appears to be hanged.
I think after long delay(I think 1hr) aria2c finally exited and this matches exactly what Peter said.

I'll fix this problem in the next release.
Comment 14 Michael Andres 2010-04-09 14:41:57 UTC
Eagle eye ;)
Comment 15 Forgotten User 4yKfwBnymp 2010-04-12 12:31:55 UTC
Ah, that makes sense. Thank you very much, Tatsuhiro!
Comment 16 Lars Müller 2010-04-14 15:19:43 UTC
I had the same issue with 11.2 today while calling "zypper patch".

After I waited maybe 5 or 10 minutes while nothing happened on stdout zypper asked for confirmation.

But now it hangs again while downloading

/var/adm/mount/AP_0x00000001/rpm/i586/gimp-help-browser-2.6.7_2.6.8-4.4_1.1.1.i586.delta.rpm.aria2

As soon as it finished I'll attach the log.

While following the log output I saw failing downloads from a host using an IPv6 address.  Next I checked my v6 tunnel and it was broken.  Therefore I disabled it for this test.
Comment 17 Lars Müller 2010-04-14 15:27:13 UTC
Created attachment 354391 [details]
aria2 log while zypper patch on openSUSE 11.2
Comment 18 Tatsuhiro Tsujikawa 2010-04-14 15:45:24 UTC
I checked the log, aria2 tried to connect IPv6 address but all the attempt(5times) were timed out. Each timeout is 60secs, so 5min hang.
I think disabling IPv6 will fix the problem.

BTW, we fixed the original cause of this issue reported by Peter in the latest release 1.9.1.
Comment 19 Michael Andres 2010-04-16 10:20:30 UTC
Thanks. Keeping the bug open as reminder to try to improve the logging. If we'd print a message on timeouts and retries, it would be easier to guess whats going on.
Comment 20 Viktor Erdélyi 2010-07-08 15:17:01 UTC
For me, the same hanging happens with the following versions:

openSUSE 11.2

# rpm -q aria2 zypper libzypp
aria2-1.9.3-0.1.1.i586
zypper-1.3.4-0.1.3.i586
libzypp-6.31.4-0.1.1.i586
Comment 21 Michael Andres 2010-09-08 17:33:53 UTC
Workaround:

  export ZYPP_ARIA2C=0

(or remove/rename the /usr/bin/aria2c binary. Removing the aria2 package fails because libzypp still requires it). libzypp will fallback to libcurl then. 

On 11.4 a libcurl based implementation supporting zsync and metalink will replace aria2 as default downloader.
Comment 22 Tatsuhiro Tsujikawa 2010-09-09 16:40:32 UTC
Yesterday I installed opensuse11.3 to see how zypper with aria2 behaves.
I installed gcc package and I experienced "hang". After successful several package downloads, zypper seemed to hang for several minutes.
Luckily, I enabled aria2 log, so inspected it. The cause of hang was that aria2 was trying to connect a server but for some reason it could not to connect it and aria2 waited 3minutes timeout and retried 5 times. This means 15 minutes hang. Actually, after 15mins, aria2 gave up this server and tried another one and all download completed eventually.
I think zypper's default 3mins timeout is rather long. Lowering it reduces hang.
Also --uri-selector=adaptive tends to return low prioritized servers which is  more likely unreachable, so I recommend to not to use it.
I tried to change these aria2 option values, but due to my short opensuse experience(just 2days), I have no idea how to do that.
Is there a way to change these option value?
Comment 23 Stefan Schubert 2011-01-25 09:01:33 UTC
*** Bug 666429 has been marked as a duplicate of this bug. ***
Comment 24 Michael Andres 2011-01-25 09:03:34 UTC
In case of trouble disable the aria backend (see comment#21)