Bug 119813 - Improve Network Install/Update a lot - automatically retry failed downloads
Summary: Improve Network Install/Update a lot - automatically retry failed downloads
Status: RESOLVED FIXED
: 396159 437991 458050 485887 (view as bug list)
Alias: None
Product: openSUSE 10.3
Classification: openSUSE
Component: YaST2 (show other bugs)
Version: unspecified
Hardware: All All
: P3 - Medium : Enhancement (vote)
Target Milestone: ---
Assignee: Ladislav Slezák
QA Contact: Klaus Kämpf
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-01 20:18 UTC by Dirk Stoecker
Modified: 2009-07-09 13:00 UTC (History)
9 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Stoecker 2005-10-01 20:18:13 UTC
Network installation got a lot better in recent SuSE releases, but there are 
nevertheless improvements to be done: 
 
a) Include a build-in (or server-based) list of installation sources to allow 
the user to select a installation source. The boot-iso should include at least 
one installation source in the boot-manager selection. 
 
b) When downloading files do this in parallel for at least 2 files to prevent 
long waiting times. During the install time the next file can already be 
downloaded. 
 
c) Reduce timeouts for files (I think sometimes it needs about 10 or more 
MINUTES to detect a broken transfer). 
 
d) Handle transfer problems automatically: 
  1) Retry it at least once. 
  2) Skip the file and popup error messages at end of installation process. 
 
e) Allow to switch the server in case of problems. 
 
f) The abort-button should abort IMMEDIATELY and not after successful 
installation of current package. It should have an option to abort current 
download and restart it (in case of network problems). 
 
Motivation: 
I did two complete network installations of 10.0 RC1 and had at least 3 times 
to restart the whole process due to total hang-up of the process. Overnight 
installation stopped due to popups telling me that a file could not be loaded 
(due to network problems - e.g. the DSL disabling after 24 hours). 
The Installation process should do as much as possible automatically and only 
ask in cases, where automatic handling fails. Ask after any other possible 
task has already been done. Maybe add an additional window showing accumulated 
problems which will popup at end of installation. 
 
The result should be a 
- more error resistant installation/update process 
- faster responses to user commands (abort). 
- easier server handling (selection from list, switching during install). 
 
A last note: 
Thanks for the improvements already implemented during the last years.
Comment 1 Michael Gross 2005-10-04 09:52:25 UTC
Sounds all to be part of the installation and packager-modules of YaST. These
enhancements sound proper and should be taken into consideration.

Dirk: a) Server-lists tend to be outdated after a short period of time, which
would result in a large list of `broken links'. But at least some of the really
reliable sources should be added into a list.

b) The download of 2 files at once could be critical, as many servers might
allow only one connection at a time for a non-privilleged user and hence could
result in more errors than doing good in most cases.

d) It would be useful if the user has an _option_ to `collect errors' to the end
of the process.

Jiří: What do you think about this?
Comment 2 Jiri Srain 2005-10-04 11:18:52 UTC
Steffen, Michael, please, comment on points where I mentioned your login). 
Dirk, please, comment on the rest. 
   
a) (snwint) Fully agree. Dynamically loaded server list would be great, but   
not possible from isolinux (limited resources, lack of network card   
drivers,...) Setting source later doesn't solve the problem, as Linuxrc must   
load the source from sonewhere. Steffen, do you think it would be possible  
that Linuxrc downloads list of servers from our server and offers user to  
select? At that point, you already have the network card up.  
  
b) (ma) I agree with Michael G., but downloading a package while another  
package is being installed might be a good idea.  
  
c) (ma) Can we do something about it, or is it issue of the libraries below  
(curl etc.)?  
  
d) One more automatical retry might make sense, provided we solve c). Skipping  
messages is IMHO not good idea, we might consider putting timeout to the  
popups.  
  
e) (ma) How much effort it would be to implement this into the package  
manager? I'm afraid it could cause problems in case of different list of  
packages, different versions,...  
  
f) I don't think it is a good idea to abort during package installation, it is 
better to have consistent system. Abort while package is being downloaded is 
solved as different bug. 
   
Steffen, Michael, what do you think about the requests? 
Comment 3 Steffen Winterfeldt 2005-10-04 12:23:44 UTC
ad a) Both a fixed list (in boot loader) and a variable list (in linuxrc) 
make sense to me. If someone gives me a list of install servers I can put 
his in. 
 
This will introduce a difference between openSuSE and SuSE Linux when 
preparing boot CDs, though. 
 
Andreas, Adrian, do we want it? If yes, I'd need a server list. 
Comment 4 Dirk Stoecker 2005-10-07 07:27:09 UTC
Hello Jiri, 
 
some comments from my point of view: 
 
d) Timouts of popups are ok, but the users needs a possibility to see the 
error. So at the end of the installation process a sort of "all collected 
timeout messages" must be appear and without timeout. 
 
e) I know the problems of this. I would not do it in a complicated way, but do 
only switch the base part of the URL. So if there are differences in packages 
(version, number of files, etc) the users get's the normal "canot load" 
request with the option to change the server again (for singular file or all 
following files). This means you need not change the installation process, but 
only the loading mechanism and the failed to load popup. 
 
f) I agree, abort at installation is no good idea. I meant the download. 
 
Comment 5 Stanislav Visnovsky 2005-10-20 06:09:38 UTC
Dirk, just a note for the future: please file a bug report for each
problem. With a report like this, we have really hard time to track if we've addressed all the problems properly.

Thanks.
Comment 6 Jiri Srain 2005-10-21 11:08:12 UTC
b) was reported separatelly as bug 128050, let's solve it there...
Comment 7 Ladislav Slezák 2007-02-09 08:42:55 UTC
To comment #4 d) - something similar is already implemented during autoinstallation.

The long timeout after pressing abort is caused by the fact that libzypp calls the progress callback (which returns abort/continue value) after a long time.

(A built-in list of installation sources is reported in bug #240834, we should track the problem there.)

I have questions to the timed popups: How should be they activated? What should be the default time out value? What should be the default action after time out?

Andreas, Jiri, any idea?
Comment 8 Jiri Srain 2007-08-09 10:57:06 UTC
My suggestion is to default the timeout to Retry operation, but only once or twice (not forever). If the operation fails the 2nd (or 3rd) time, display a popup without timeout.

Aborting the installation when user is not at the computer is IMO not an option. Ignoring it may result in errors with packages which are installed later (because their post-scripts cannot run due to missing package which is PreRequired), user will not be aware of the relation of the failure with packages which failed before and user didn't realize it.
Comment 9 Lukas Ocilka 2007-09-07 13:42:49 UTC
Closing as LATER - to be reevaluated after 10.3
Comment 10 Ladislav Slezák 2008-09-01 13:49:19 UTC
Reopening LATER bugs (LATER will be changed to WONTFIX on September 2).
Comment 11 Ladislav Slezák 2008-10-14 15:05:56 UTC
*** Bug 396159 has been marked as a duplicate of this bug. ***
Comment 12 Ladislav Slezák 2008-10-14 15:07:04 UTC
See https://bugzilla.novell.com/show_bug.cgi?id=396159#c3 for additional information.
Comment 13 Michal Seben 2008-10-27 10:34:55 UTC
*** Bug 437991 has been marked as a duplicate of this bug. ***
Comment 14 Brandon Philips 2008-10-27 17:10:07 UTC
Bug 437991 was mine and is apparently a duplicate of "part d" of this bug's summary. 

Is there any hope of getting this fixed in Beta 4? It pretty much kills the netinstall experience since I have to babysit and click retry every 10 minutes because of broken mirrors.

Thanks, Brandon
Comment 15 Ladislav Slezák 2008-10-30 12:11:02 UTC
(In reply to comment #14 from Brandon Philips)
> Is there any hope of getting this fixed in Beta 4?

Unfortunately not. Beta4 is frozen now and changing that in RC1 is quite risky.
I'll change that in the next release.
Comment 16 Brandon Philips 2008-11-13 17:26:36 UTC
(In reply to comment #15 from Ladislav Slezak)
> (In reply to comment #14 from Brandon Philips)
> > Is there any hope of getting this fixed in Beta 4?
> 
> Unfortunately not. Beta4 is frozen now and changing that in RC1 is quite risky.
> I'll change that in the next release.

Should I create a FATE for this?
Comment 17 Ladislav Slezák 2008-11-14 09:18:17 UTC
I think there is no need to create a FATE. This should be quite easy to implement. The change is not big, but I don't want to break something in 11.1 RC phase. That's why I'm postponing it...
Comment 18 Ladislav Slezák 2008-12-15 11:05:47 UTC
*** Bug 458050 has been marked as a duplicate of this bug. ***
Comment 19 macias - 2008-12-15 14:14:19 UTC
ad.d.1) I repeat my report because I would like to not being ignored for the second time:
"
It is continuation of:
https://bugzilla.novell.com/show_bug.cgi?id=328822

The report is closed as fixed but for real life it is not fixed at all.

Use-case: I started installation (from network-CD) so basically whole system is
downloaded from the net. In my case it is about 2GB of download while
installing, with my connection it takes for sure more than two hours (for sure
because 500 MB were downloaded and it took 2 hours). So my ETA is about 8
hours.

Good job to put it on the night shift, right? With current auto-retry it means
installer tried all 3 tries to download on error which took all 1.5 minute (see
the mentioned report for "fix"), and the rest 478.5 minutes spent displaying
error dialog.

It is not productive, useful, helpful or smart. 

The facts:
---------
* displaying error dialog is not productive at all.
* while displaying error at the same time productive task can take place.

Productive task = real auto-retry.

error? -> auto-retry in 10 seconds -> error? -> auto-retry in 30 seconds ->
error? --> auto-retry in 1 minute -> error? -> auto-retry in 10 minutes (and then do not increase it any more, but just keep this period)

and so on

So the period between tries is increasing and there is no limit in number of
tries (why would be?).

And this design is exactly I described more than year ago;
https://bugzilla.novell.com/show_bug.cgi?id=332175

Now with opensuse 11.1, I noticed two errors waiting for my response (manual
retry "solved" the problem with downloading) so I hope this time, I stressed
_smart_ aspect enough -- so I would be happy to see computer could finally do
computer job.


"
Comment 20 Peter Poeml 2008-12-15 15:55:45 UTC
11.1 ships with an implementation of 
http://en.opensuse.org/Libzypp/Failover
which isn't available during installation, unfortunately, but it would
be highly appreciated if people start testing it in installed systems
real soon now. And hopefully we can enable it for installation later,
too. Every testing and feedback about it would help to get forward here.
Comment 21 Ladislav Slezák 2009-03-23 13:29:53 UTC
*** Bug 485887 has been marked as a duplicate of this bug. ***
Comment 22 Hendrik Müller 2009-04-06 08:35:08 UTC
Remark to dublicate Bug 485887: This request is for opensuse 11.1, not for 10.3. I'm not sure about your intern hanlding, but please take that into account.
Comment 23 Ladislav Slezák 2009-07-09 12:28:22 UTC
I have improved the retry algorithm in yast2-2.18.20 (should be in the next 11.2 milestone release):

- there is a logarithmic back-off for timeout values, the values go from 30 seconds, 1 minute, 2 minutes... up to 15 minutes, then it keeps 15 minutes timeouts
- there are 100 attempts, which means yast gives up after ~24hours
- retry is used always for FTP/HTTP/SMB/NFS repositories

I think this improves network installation quite well.
Comment 24 macias - 2009-07-09 13:00:54 UTC
Ladislav, great news and thank you very much for implementing this.