Bug 409881 - way to stop/drop workers gracefully
Summary: way to stop/drop workers gracefully
Status: CONFIRMED
Alias: None
Product: openSUSE.org
Classification: openSUSE
Component: BuildService (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Adrian Schröter
QA Contact: Adrian Schröter
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-17 08:55 UTC by Forgotten User 0RO3Kla3Ru
Modified: 2011-04-18 22:10 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
adrian.schroeter: needinfo? (mls)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User 0RO3Kla3Ru 2008-07-17 08:55:59 UTC
Atm. the only way to stop a worker is "rcobsworker stop" on the worker host.
This should happen ideally when this worker(s) has(have) no jobs running - or the jobs need to be manually deleted from the backends /srv/obs/jobs/*/.
This works, but will likely loose unfinished jobs, need manual intervention and triggered rebuilds or restarts of the scheduler.

My probosal for an enhancement is therefore to implement some switches (possibly/longterm on the backend) to e.g. shutdown the workers on request _after_ a job has finished.

Quick solution:
e.g. "touch /tmp/root_1/SHUTDOWN" and evaluate existance of the file in worker
and stop after build and don't fetch new data (backend will try to assign new job, but fail as now but no! "stale" jobs)

Long term idea:
Make this possible in the admin-backend via webinterface .
Checkboxes for activating/deactivating discovered workers?
Comment 1 Adrian Schröter 2008-07-17 09:17:39 UTC
Michael, what is a proper api call to discard a job, which I can call in the init script ? Or do you want to handle this within the worker ?

We could have two different methods:
* Shutdown immediatly, discard the job.
* Shutdown after this job.

For the init script, we need the first one, I think.

Comment 2 Martin Mohring 2008-07-17 09:23:35 UTC
Somehow the dispatcher needs to be informed for the first option to reassign the jobs to another worker, which then fails. Then the worker will be deleted from the worker list also on the server.
Comment 3 Martin Mohring 2008-07-17 09:28:07 UTC
adrian has written a tool bs_admin. Should bs_admin have some new command like "check workers"?
Comment 4 Forgotten User 0RO3Kla3Ru 2008-07-17 09:35:39 UTC
to #3: the command would be good to check for stale/no longer existant workers, but if we exit gracefully this would be the better way.
Comment 5 Forgotten User 0RO3Kla3Ru 2008-09-14 23:59:51 UTC
Is this still considered ?  Possibly for resource-management code ?
Comment 6 Michael Schröder 2008-09-15 09:52:20 UTC
Yes, but we're currently a bit overloaded with the sles11/11.1 beta1 preparation...
Comment 7 Forgotten User 0RO3Kla3Ru 2008-09-15 09:55:07 UTC
Tnx, Michael! Consider it just a bump/reminder ;).
Comment 8 Forgotten User 0RO3Kla3Ru 2009-05-14 10:20:40 UTC
ping ;).
Comment 9 Forgotten User 0RO3Kla3Ru 2009-09-23 06:54:32 UTC
pong ;)