Bug 156995

Summary: forgetting services ...
Product: [openSUSE] SUSE Linux 10.1 Reporter: Michael Meeks <mmeeks>
Component: ZenworksAssignee: Tambet Ingo <tambet>
Status: VERIFIED DUPLICATE QA Contact: Nat Budin <nbudin>
Severity: Blocker    
Priority: P5 - None CC: aj, alberto.passalacqua, burnus, rbremer, suse-beta
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Yesterday Zmd log files
Today Zmd log file
ZMD log updated

Description Michael Meeks 2006-03-10 11:32:18 UTC
So - I frequently find myself having to re-configure the service I use for zen updating; [ RCE / hazard.provo ... ]

I've no idea where zlm is supposed to store this information - possibly I guess some RPM upgrade wiped it out; [ but should it !? ]

Apart from that the zen-installer GUI looks nice :-)
Comment 1 Naresh Wignarajah 2006-03-10 19:50:59 UTC
Nat can you or someone else test and also check with the dekstop test team.  I haven't seen this on Avenger.
Comment 2 Nat Budin 2006-03-13 05:16:21 UTC
Sorry, Naresh, fixing the assignee/QA contact situation on this one as well.  Please don't assign bugs directly to testers.
Comment 3 Naresh Wignarajah 2006-03-13 15:02:40 UTC
No problem.  Assigning to snopr and needinfo'ing to testing.
Comment 4 Nat Budin 2006-03-15 19:43:08 UTC
Yeah, I haven't seen this one ever, and I've been testing the zen-updater/installer/remover GUIs quite a bit over the past week.  If someone could provide a set of steps to reproduce, I could verify it.
Comment 5 Michael Meeks 2006-03-16 09:23:31 UTC
So - one thing I suffer [ as all VPN users do ] is from an unreliable connection; ie. one where the DNS lookup will succeed [ eg. to hazard.provo.novell.com ] but result in a non-routeable IP address. I guess it's likely that this might result in some situation you [ presumably inside the VPN, with a reliable connection ] will not see.

I guess simulating that inside the VPN is perhaps quite fun: you could edit /etc/hosts to clobber the address of <your-rce-server> and make it point at some remote machine you know is not turned on (eg.).
Comment 6 James Willcox 2006-03-21 18:05:07 UTC
I still can't reproduce this.  What I tried:

1) add hazard.provo.novell.com as a service
2) stop zmd
3) insert appropriate bogus line in /etc/hosts
4) start zmd

zmd showed the service in 'Pending' status until it eventually timed out.  After that it showed 'Inactive' status.  After restarting zmd, the same thing repeated.  After removing the bogus /etc/hosts line and doing a 'rug ref', the service successfully came up.  Michael if you could attach your log file (/var/log/zmd-messages.log) when this happens, it might help us.
Comment 7 Naresh Wignarajah 2006-03-22 04:17:57 UTC
Nat or Mauro any ideas on how testing can help as well?
Comment 8 Michael Meeks 2006-03-22 13:53:13 UTC
James - why do you start / stop zmd ? - most certainly that resets a ton of stuff; that's indeed likely to work.

Instead you need to simulate the case where an IP is resolvable, but connections time-out I guess; it's not that clear to me how to do that; short of playing with the VPN it's not clear to me how best to do that. Of course, it could be done by using another host to setup port forwarding [ that could be disabled ] to connect to hazard or not - but of course, to get a decent timeout you'd want that machine not to be on the lan.

Given the problems that we've had (past & present) with this setup; it may be well worth creating such a setup; no doubt this can be easily done with 'ssh' (-f/-D/-P/-L etc.?). It would prolly be good to have an 'unreliable' host [ you could easily kill the ssh & re-start it every N minutes ] to use during testing.

Comment 9 James Willcox 2006-03-22 16:31:32 UTC
I got a decent timeout situation going by just setting hazard.provo.novell.com in /etc/hosts to 9.9.9.9.  It took the full 180 seconds for each http request to time out.

I repeated the test by just disabling the vpn, and got identical results.

The reason I restarted zmd was because that's the only case where we are supposed to keep a service if adding it fails.  And obviously if a service is already added (from 'rug sa' or startup or whatever), a failed refresh shouldn't remove it.  So to make sure that wasn't what was causing it, I tried:

1) rug sa hazard.provo.novell.com
2) nuke vpn
3) rug ref

The refresh timed out, as expected, and the service was not removed.  I don't know what else to try :/
Comment 10 Michael Meeks 2006-03-22 16:55:29 UTC
interesting; well - I have no clue then. Is there any chance that on a package update the configuration is erroneously obliterated ? I guess that could be a cause. What about doing service operations - list, add etc. while there is a long timeout ?
Comment 11 Michael Meeks 2006-03-24 15:36:31 UTC
Well - would you believe it; there was I starting to doubt my sanity; and yet just now:

michael@linux:~> sudo rug refresh
Successfully refreshed.

[ thinks: wow that was fast & perfect ! ]

michael@linux:~> sudo rug lu
No updates are available.

WARNING: Updates are only visible when you are subscribed to a catalog.

[ how odd ! ]

michael@linux:~> sudo rug ca
--- No catalogs available ---
michael@linux:~> sudo rug sl
--- No services found ---

I can assure you that I did not manually un-subscribe from anything, nor did I upgrade my zmd package etc. nor have I re-started it [that I recall]. It just magically lost the services / catalogs: nice.

Of course, I suspended & resumed the laptop to RAM overnight, had some broken networking for a while etc. but nothing pathalogical.
Comment 13 Naresh Wignarajah 2006-04-04 13:48:57 UTC
*** Bug 159974 has been marked as a duplicate of this bug. ***
Comment 14 Tambet Ingo 2006-04-10 13:41:22 UTC
*** Bug 163771 has been marked as a duplicate of this bug. ***
Comment 15 Michael Meeks 2006-04-11 21:36:51 UTC
OK - I just saw this again in the upgrade to the latest SLED10 packages from hazard.
I decided "hey, lets try this updater thing again, it's never worked before - but it might now ?!" and with trepidation right clicked / watched the "refreshing services" thing happen.

Then I noticed my VPN was not up; put that up & as I looked back: an empty list of services.

Quite possibly related to a transient failure on the line - surely we don't remove services with transient problems ? - as in remove them completely. Either way - my list was empty then.

Strange thing is - running it again; I got another 'Refreshing Services...' legend, but then an empty dialog again: most  odd.
Comment 16 Michael Meeks 2006-04-12 08:39:05 UTC
And again this morning - an empty list after a refresh.
Imagine my suprise then having hit 'add service' - to see the service details populated in the 'add service' box: how odd. It knows it but it won't tell you.

Any chance of actually displaying that information in the list along with some icon signifying a non-responsive source ?
Comment 17 Ronny Bremer 2006-04-16 18:47:59 UTC
It does not seem to only affect vpn users, as an outside of Novell's network I just installed 10.1 RC1 and added 2 services to rug, then rebooted twice, and now the list is empty (both rug sl and rug ca).
Comment 18 Tambet Ingo 2006-04-21 08:09:59 UTC
*** Bug 167954 has been marked as a duplicate of this bug. ***
Comment 19 Tambet Ingo 2006-04-21 08:10:44 UTC
I'll take it.
Comment 20 Tambet Ingo 2006-04-21 08:27:13 UTC
There was a race condition - if zmd got "network restored" event before loading initial services, it rewrote the services file with currently loaded services.

Fixed. 
Comment 21 Nat Budin 2006-05-11 19:36:29 UTC
I haven't seen any reports of this one since Tambet's fix, nor have I been able to reproduce it myself.  Michael, everyone, if it's ok with you all I am closing this and assuming that Tambet's fix solved the problem.
Comment 22 Alberto Passalacqua 2006-06-12 20:33:31 UTC
This bug is still present in SuSE 10.1 after the official patches for Zen/rug/Yast/libzypp.

Everytime I reboot my notebook Zen tells me all catalogs are inactive, and rug in the command line tells me that no catalog is available.



Comment 23 Tambet Ingo 2006-06-12 21:17:41 UTC
Inactive services != forgetting services. Can you attach your /var/log/zmd-messages.log file here? zmd marks services as inactive if it can't download them at startup so it can retry at refresh or, if you're using NetworkManager, when the network becomes available.
Comment 24 Alberto Passalacqua 2006-06-12 23:17:57 UTC
What happened today is the following on my notebook, which uses Networkmanager:

- I set catalogs on my notebook at the office, which is connected to the wired network.
- I switch the notebook off.
- I start the notebook on again on the train, without the network this time. Zen marks catalogs as Inactive, but they're still there.
- I switch the notebook off again.
- When at home I use the notebook again. It connects to the wireless network, but catalogs are still inactive. I refresh Zen through the zen-updater -> refresh function. Catalogs are still there, inactive. I check in a terminal using "rug sl" and the answer is that no catalog is available.

I'll attach the log tomorrow, now I'm on my desktop.
Comment 25 Alberto Passalacqua 2006-06-13 07:38:57 UTC
Created attachment 88917 [details]
Yesterday Zmd log files
Comment 26 Alberto Passalacqua 2006-06-13 07:40:27 UTC
Today I switched my notebook on. Network is on but the update catalogs is marked ad inactive.

I attach the new log too.
Comment 27 Alberto Passalacqua 2006-06-13 07:42:06 UTC
Created attachment 88918 [details]
Today Zmd log file
Comment 28 Alberto Passalacqua 2006-06-13 09:51:59 UTC
I've just checked and now the Update catalogs has disappeared from Zen. I attach the updated log.
Comment 29 Alberto Passalacqua 2006-06-13 09:53:24 UTC
Created attachment 88957 [details]
ZMD log updated
Comment 30 Tambet Ingo 2006-06-13 20:29:13 UTC
*** Bug 181714 has been marked as a duplicate of this bug. ***
Comment 31 Joe Shaw 2006-06-15 16:34:23 UTC
Bug 185206 is probably a dup, and has additional logs.
Comment 33 Tambet Ingo 2006-06-15 17:17:29 UTC
The original issue this bug was opened was a totally different one. The new logs attached here indicate it's a duplicate of 185206. 

(13 Jun 2006 09:28:02 ERROR ServiceManager       Service Refresh Failed: Failed to parse XML metadata: Child exited due to SIGIOT)



*** This bug has been marked as a duplicate of 185206 ***