Bug 152709

Summary: yast2/zypp does not release media before a rcnetwork restart
Product: [openSUSE] SUSE Linux 10.1 Reporter: Andreas Jaeger <aj>
Component: InstallationAssignee: Jiri Srain <jsrain>
Status: RESOLVED FIXED QA Contact: Klaus Kämpf <kkaempf>
Severity: Blocker    
Priority: P5 - None CC: jsrain, mt, suse-beta
Version: Beta 5   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: yast2 log files
Output of ps xa

Description Andreas Jaeger 2006-02-22 08:53:07 UTC
AFter the final yast screen, if I press "Finish", the system hangs completely.

load is there, nothing is going on...
Comment 1 Andreas Jaeger 2006-02-22 08:54:31 UTC
Created attachment 69708 [details]
yast2 log files
Comment 2 Andreas Jaeger 2006-02-22 08:55:08 UTC
Created attachment 69709 [details]
Output of ps xa
Comment 3 Andreas Jaeger 2006-02-22 09:12:54 UTC
strace on process 4043 shows that it hangs in poll.

Attaching via gdb shows the following backtrace:
poll
Curl_select in libcurl.so.3
Curl_nbftpsendf in libcurl
Curl_ftp_disconnect in libcurl
Curl_done .. in libcurl
Curl_close ... in libcurl
curl_easy_cleanup in libcurl
zypp::media::MediaCurl::disconnectFrom
zypp::media::MediaHandler::disconnect
zypp::media::MediaCurl::releaseFrom
zypp::media::MediaAccess::close
zypp::media::MediaAccess::~MediaAccess
boost::detail::sp_counted_impl_p<zypp::media::MediaAccess>::dispose
...
Comment 4 Marius Tomaschewski 2006-02-22 10:17:02 UTC
As far as I see, the MediaCurl does not set any timeout options URLOPT_TIMEOUT + CURLOPT_NOSIGNAL to curl.

I'm not sure what the result is, but it may be, that curl does not report any timeouts. By defaults timeouts (e.g. URLOPT_TIMEOUT) causes use of alarm and
needs a SIGALRM handler, except CURLOPT_NOSIGNAL is set.

I'm going to test it...
Comment 5 Marius Tomaschewski 2006-02-22 12:55:14 UTC
OK, I think I've got it now:

26528      0.000305 write(2, "2006-02-22 12:49:31 <0> xanthos(26528) [media] MediaHandler.cc(release):511 Releasing media ftp<
26528      0.000183 write(2, "\n", 1)   = 1
26528      0.000099 gettimeofday({1140608971, 232703}, NULL) = 0
26528      0.000077 time([1140608971])  = 1140608971
26528      0.000129 send(4, "QUIT\r\n", 6, 0) = 6
26528      0.000120 gettimeofday({1140608971, 233030}, NULL) = 0
26528      0.000072 gettimeofday({1140608971, 233101}, NULL) = 0
26528      0.000072 poll(

Curl seems to wait in this poll call trying to send "QUIT" to the ftp
server. But because I've shut down the network interface, it does not
work and it seems to block for a very long time (AJ sait, it are ca.
17 minutes), because the MediaCurl does not set any timeout options
to its libcurl handle.
I'll fix it adding timeout handling to MediaCurl, but it needs some
time to test it propelly...

If I've not overlooked something, timeouts may happen in yast2/zypp
while the installation as well (not only with ftp, but also e.g. nfs),
because the media is not released before "rcnetwork restart".
What is done, is:

  - install from ftp (keep media instance open)
  - configure network
    (e.g. user may change the IP)
    - restart network
       (e.g. dhcp may change the IP)
  - configure other hardware
  - continue using media

$ egrep "rcnetwork restart" y2log*
y2log-2:2006-02-22 09:32:15 <1> linux(4043) [YCP] NetworkService.ycp:70 rcnetwork restart

$ egrep -H "\[media\]" y2log-5 | tail -3
y2log-5:2006-02-22 09:25:32 <0> linux(4043) [media] MediaCurl.cc(doGetFileCopy):432 destNew: /var/adm/mount/AP_0x00000003/suse/i586/kdebindings3-python-3.5.1-6.i586.rpm.new.zypp.37456
y2log-5:2006-02-22 09:25:32 <0> linux(4043) [media] MediaCurl.cc(doGetFileCopy):440 URL: ftp://10.10.0.5/CDs/SUSE-Linux-10.1-beta5-i386/CD3/suse/i586/kdebindings3-python-3.5.1-6.i586.rpm
y2log-5:2006-02-22 09:25:33 <0> linux(4043) [media] MediaHandler.cc(provideFile):571 provideFile(/suse/i586/kdebindings3-python-3.5.1-6.i586.rpm)
$ egrep -H "\[media\]" y2log-4 | tail -3
$ egrep -H "\[media\]" y2log-3 | tail -3
$ egrep -H "\[media\]" y2log-2 | tail -3
$ egrep -H "\[media\]" y2log-1 | tail -3
$ egrep -H "\[media\]" y2log | tail -3
y2log:2006-02-22 09:42:52 <1> d95(4043) [media] MediaHandler.cc(provideDirTree):610 provideDirTree(./suse/setup/descr)
y2log:2006-02-22 09:42:53 <0> d95(4043) [media] MediaHandler.cc(release):504 Request to release attached media ftp<ftp://10.10.0.5/CDs/SUSE-Linux-10.1-beta5-i386/CD3>, use count=1
y2log:2006-02-22 09:42:53 <0> d95(4043) [media] MediaHandler.cc(release):511 Releasing media ftp<ftp://10.10.0.5/CDs/SUSE-Linux-10.1-beta5-i386/CD3>

There are several exceptions, that sounds to me to be related to this:

2006-02-22 09:34:51 <5> d95(4043) [DEFINE_LOGGROUP] Exception.cc(log):83 SourceManager.cc(restore):194 THROW:    SourceManager.cc(restore):194: At least one source already registered, cannot restore sources from persistent store.

IMO this is a workflow bug not to release the media after you are finished
with installation and are going to configure.
If a package a needed because of the installation, you should reconnect the
media.

To avoid refetch of the files in case of http/ftp, you can provide a mount-
point to MediaCurl - it will not remove the downloaded files while release.
You can close it and if you need it again, just provide same mountpoint and
the existing files will be reused.
Comment 6 Marius Tomaschewski 2006-02-22 14:42:21 UTC
(In reply to comment #5)
> I'll fix it adding timeout handling to MediaCurl, but it needs some
> time to test it propelly...

Done in svn: svn diff -r1960:1961 zypp/media/MediaCurl.cc

I've set the timeout to 60sec - seems to work fine with it. Curl trys
to recover / reconnect itself if possible, before it reports an error.
Comment 7 Marius Tomaschewski 2006-02-22 14:47:11 UTC
The behaviour is now:
  - if the network is down while a try to fetch a file => exception
  - if the network is up and the ftp server has a timeout of e.g. 30
    sec, curl trys reconnects and if it was successfull, we don't get
    any error at all.
Comment 8 Klaus Kämpf 2006-02-22 16:25:15 UTC
comment #5 "IMO this is a workflow bug not to release the media after you are finished with installation and are going to configure."

YaST must release the source after it committed all changes.

Comment 9 Jiri Srain 2006-02-22 16:40:15 UTC
comment #8: The medias should be released while finishing the application. Releasing it at some points during installation might not make sense - the medias might be required later for installing more packages eg. for configured hardware...
Comment 10 Klaus Kämpf 2006-02-22 16:42:46 UTC
But you don't know what happens during end of commit (end of installation) and start of next commit (hw packages).

Maybe the network was restarted.
Maybe your host has got another IP.
Maybe your USB CD-drive was detached.
Maybe ...

Comment 11 Stanislav Visnovsky 2006-02-23 08:46:16 UTC
This is a semantic change of libzypp vs. old package manager.

YaST never needed to indicate that the connection does not be to kept alive
anymore.

What we need connection for:

1) source creation/parsing
2) package installation

In both cases, yast2-pkg-binding _could_ indicate that the connection does not
need to be kept alive, but IMO it's rather libzypp responsibility to close unneeded connections when commit has been finished.

A design question in principle. Klaus?
Comment 16 Stanislav Visnovsky 2006-03-01 16:40:28 UTC
No, I can't :(

I think it makes sense to release the sources after every commit done by YaST - in pkg-bindings.
Comment 17 Klaus Kämpf 2006-03-01 17:15:03 UTC
Yeah, thats probably a good solution
Comment 18 Jiri Srain 2006-03-01 18:06:17 UTC
Done in SVN, will submit as soon as it rebuilds.