Closed
Bug 671647
Opened 14 years ago
Closed 14 years ago
add nagios checks for w64-ix-slaves once they are all cloned
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
x86_64
Windows Server 2008
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Assigned: mlarrain)
References
Details
The checks needed are:
* PING
* disk - C
* disk - E
* buildbot-start
Thanks!
| Reporter | ||
Comment 1•14 years ago
|
||
Adding the other slaves as well 10,12,17 & 19.
Summary: add nagios checks for w64-ix-slave[20-24] → add nagios checks for w64-ix-slave{10,12,17,19-24}
Comment 2•14 years ago
|
||
This needs to wait till the machines are on the new network.
Assignee: server-ops-releng → arich
| Reporter | ||
Comment 3•14 years ago
|
||
Bug 668521 got resolved. What is the next step? Shall I disable few w64 slaves, ask to be moved to vlan76 and then try the nagios check?
Depends on: 668521
Comment 4•14 years ago
|
||
IMHO we should wait until the w64 systems are all up and all moved to the winbuild VLAN before monitoring them - anything more will be unnecessary nagios config churn.
| Reporter | ||
Comment 5•14 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> IMHO we should wait until the w64 systems are all up and all moved to the
> winbuild VLAN before monitoring them - anything more will be unnecessary
> nagios config churn.
Fair enough. Adjusting dependencies. We will wait for all of them to be cloned.
| Reporter | ||
Comment 6•14 years ago
|
||
95% of the slaves have been reimaged.
Most of the slaves are
Is this something that can be started now? or will deal a little later?
Comment 7•14 years ago
|
||
I've added in the checks but downtimed the hosts for now (till next Wed) since I want to make sure we iron out any issues before they start alerting. Over to Matt to verify status of some hosts:
w64-ix-slave02
w64-ix-slave17
w64-ix-slave19
w64-ix-slave41
There's a mix of host down to host responding to NRPE checks but not ping checks. I'm wondering if the ones not responding to ping are those that need to be reimaged. Matt, please take a look at the ngios GUI.
Assignee: arich → mlarrain
Comment 8•14 years ago
|
||
all of the downtimes for ping and disk have been remoted (buildbot_start remains deactivated until idelizer works). armen fixed the firewall rules on w64-ix-slave17 and w64-ix-slave19. w64-ix-slave02 and w64-ix-slave41 are acked and destined for a trip to iX.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•