Bugzilla – Bug 119813
Improve Network Install/Update a lot - automatically retry failed downloads
Last modified: 2009-07-09 13:00:54 UTC
Network installation got a lot better in recent SuSE releases, but there are nevertheless improvements to be done: a) Include a build-in (or server-based) list of installation sources to allow the user to select a installation source. The boot-iso should include at least one installation source in the boot-manager selection. b) When downloading files do this in parallel for at least 2 files to prevent long waiting times. During the install time the next file can already be downloaded. c) Reduce timeouts for files (I think sometimes it needs about 10 or more MINUTES to detect a broken transfer). d) Handle transfer problems automatically: 1) Retry it at least once. 2) Skip the file and popup error messages at end of installation process. e) Allow to switch the server in case of problems. f) The abort-button should abort IMMEDIATELY and not after successful installation of current package. It should have an option to abort current download and restart it (in case of network problems). Motivation: I did two complete network installations of 10.0 RC1 and had at least 3 times to restart the whole process due to total hang-up of the process. Overnight installation stopped due to popups telling me that a file could not be loaded (due to network problems - e.g. the DSL disabling after 24 hours). The Installation process should do as much as possible automatically and only ask in cases, where automatic handling fails. Ask after any other possible task has already been done. Maybe add an additional window showing accumulated problems which will popup at end of installation. The result should be a - more error resistant installation/update process - faster responses to user commands (abort). - easier server handling (selection from list, switching during install). A last note: Thanks for the improvements already implemented during the last years.
Sounds all to be part of the installation and packager-modules of YaST. These enhancements sound proper and should be taken into consideration. Dirk: a) Server-lists tend to be outdated after a short period of time, which would result in a large list of `broken links'. But at least some of the really reliable sources should be added into a list. b) The download of 2 files at once could be critical, as many servers might allow only one connection at a time for a non-privilleged user and hence could result in more errors than doing good in most cases. d) It would be useful if the user has an _option_ to `collect errors' to the end of the process. Jiří: What do you think about this?
Steffen, Michael, please, comment on points where I mentioned your login). Dirk, please, comment on the rest. a) (snwint) Fully agree. Dynamically loaded server list would be great, but not possible from isolinux (limited resources, lack of network card drivers,...) Setting source later doesn't solve the problem, as Linuxrc must load the source from sonewhere. Steffen, do you think it would be possible that Linuxrc downloads list of servers from our server and offers user to select? At that point, you already have the network card up. b) (ma) I agree with Michael G., but downloading a package while another package is being installed might be a good idea. c) (ma) Can we do something about it, or is it issue of the libraries below (curl etc.)? d) One more automatical retry might make sense, provided we solve c). Skipping messages is IMHO not good idea, we might consider putting timeout to the popups. e) (ma) How much effort it would be to implement this into the package manager? I'm afraid it could cause problems in case of different list of packages, different versions,... f) I don't think it is a good idea to abort during package installation, it is better to have consistent system. Abort while package is being downloaded is solved as different bug. Steffen, Michael, what do you think about the requests?
ad a) Both a fixed list (in boot loader) and a variable list (in linuxrc) make sense to me. If someone gives me a list of install servers I can put his in. This will introduce a difference between openSuSE and SuSE Linux when preparing boot CDs, though. Andreas, Adrian, do we want it? If yes, I'd need a server list.
Hello Jiri, some comments from my point of view: d) Timouts of popups are ok, but the users needs a possibility to see the error. So at the end of the installation process a sort of "all collected timeout messages" must be appear and without timeout. e) I know the problems of this. I would not do it in a complicated way, but do only switch the base part of the URL. So if there are differences in packages (version, number of files, etc) the users get's the normal "canot load" request with the option to change the server again (for singular file or all following files). This means you need not change the installation process, but only the loading mechanism and the failed to load popup. f) I agree, abort at installation is no good idea. I meant the download.
Dirk, just a note for the future: please file a bug report for each problem. With a report like this, we have really hard time to track if we've addressed all the problems properly. Thanks.
b) was reported separatelly as bug 128050, let's solve it there...
To comment #4 d) - something similar is already implemented during autoinstallation. The long timeout after pressing abort is caused by the fact that libzypp calls the progress callback (which returns abort/continue value) after a long time. (A built-in list of installation sources is reported in bug #240834, we should track the problem there.) I have questions to the timed popups: How should be they activated? What should be the default time out value? What should be the default action after time out? Andreas, Jiri, any idea?
My suggestion is to default the timeout to Retry operation, but only once or twice (not forever). If the operation fails the 2nd (or 3rd) time, display a popup without timeout. Aborting the installation when user is not at the computer is IMO not an option. Ignoring it may result in errors with packages which are installed later (because their post-scripts cannot run due to missing package which is PreRequired), user will not be aware of the relation of the failure with packages which failed before and user didn't realize it.
Closing as LATER - to be reevaluated after 10.3
Reopening LATER bugs (LATER will be changed to WONTFIX on September 2).
*** Bug 396159 has been marked as a duplicate of this bug. ***
See https://bugzilla.novell.com/show_bug.cgi?id=396159#c3 for additional information.
*** Bug 437991 has been marked as a duplicate of this bug. ***
Bug 437991 was mine and is apparently a duplicate of "part d" of this bug's summary. Is there any hope of getting this fixed in Beta 4? It pretty much kills the netinstall experience since I have to babysit and click retry every 10 minutes because of broken mirrors. Thanks, Brandon
(In reply to comment #14 from Brandon Philips) > Is there any hope of getting this fixed in Beta 4? Unfortunately not. Beta4 is frozen now and changing that in RC1 is quite risky. I'll change that in the next release.
(In reply to comment #15 from Ladislav Slezak) > (In reply to comment #14 from Brandon Philips) > > Is there any hope of getting this fixed in Beta 4? > > Unfortunately not. Beta4 is frozen now and changing that in RC1 is quite risky. > I'll change that in the next release. Should I create a FATE for this?
I think there is no need to create a FATE. This should be quite easy to implement. The change is not big, but I don't want to break something in 11.1 RC phase. That's why I'm postponing it...
*** Bug 458050 has been marked as a duplicate of this bug. ***
ad.d.1) I repeat my report because I would like to not being ignored for the second time: " It is continuation of: https://bugzilla.novell.com/show_bug.cgi?id=328822 The report is closed as fixed but for real life it is not fixed at all. Use-case: I started installation (from network-CD) so basically whole system is downloaded from the net. In my case it is about 2GB of download while installing, with my connection it takes for sure more than two hours (for sure because 500 MB were downloaded and it took 2 hours). So my ETA is about 8 hours. Good job to put it on the night shift, right? With current auto-retry it means installer tried all 3 tries to download on error which took all 1.5 minute (see the mentioned report for "fix"), and the rest 478.5 minutes spent displaying error dialog. It is not productive, useful, helpful or smart. The facts: --------- * displaying error dialog is not productive at all. * while displaying error at the same time productive task can take place. Productive task = real auto-retry. error? -> auto-retry in 10 seconds -> error? -> auto-retry in 30 seconds -> error? --> auto-retry in 1 minute -> error? -> auto-retry in 10 minutes (and then do not increase it any more, but just keep this period) and so on So the period between tries is increasing and there is no limit in number of tries (why would be?). And this design is exactly I described more than year ago; https://bugzilla.novell.com/show_bug.cgi?id=332175 Now with opensuse 11.1, I noticed two errors waiting for my response (manual retry "solved" the problem with downloading) so I hope this time, I stressed _smart_ aspect enough -- so I would be happy to see computer could finally do computer job. "
11.1 ships with an implementation of http://en.opensuse.org/Libzypp/Failover which isn't available during installation, unfortunately, but it would be highly appreciated if people start testing it in installed systems real soon now. And hopefully we can enable it for installation later, too. Every testing and feedback about it would help to get forward here.
*** Bug 485887 has been marked as a duplicate of this bug. ***
Remark to dublicate Bug 485887: This request is for opensuse 11.1, not for 10.3. I'm not sure about your intern hanlding, but please take that into account.
I have improved the retry algorithm in yast2-2.18.20 (should be in the next 11.2 milestone release): - there is a logarithmic back-off for timeout values, the values go from 30 seconds, 1 minute, 2 minutes... up to 15 minutes, then it keeps 15 minutes timeouts - there are 100 attempts, which means yast gives up after ~24hours - retry is used always for FTP/HTTP/SMB/NFS repositories I think this improves network installation quite well.
Ladislav, great news and thank you very much for implementing this.