Closed Bug 671647 Opened 14 years ago Closed 14 years ago

add nagios checks for w64-ix-slaves once they are all cloned

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86_64
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: mlarrain)

References

Details

The checks needed are: * PING * disk - C * disk - E * buildbot-start Thanks!
Adding the other slaves as well 10,12,17 & 19.
Summary: add nagios checks for w64-ix-slave[20-24] → add nagios checks for w64-ix-slave{10,12,17,19-24}
This needs to wait till the machines are on the new network.
Assignee: server-ops-releng → arich
Bug 668521 got resolved. What is the next step? Shall I disable few w64 slaves, ask to be moved to vlan76 and then try the nagios check?
Depends on: 668521
IMHO we should wait until the w64 systems are all up and all moved to the winbuild VLAN before monitoring them - anything more will be unnecessary nagios config churn.
(In reply to Dustin J. Mitchell [:dustin] from comment #4) > IMHO we should wait until the w64 systems are all up and all moved to the > winbuild VLAN before monitoring them - anything more will be unnecessary > nagios config churn. Fair enough. Adjusting dependencies. We will wait for all of them to be cloned.
No longer blocks: 645024
Depends on: 670761, 645024
Summary: add nagios checks for w64-ix-slave{10,12,17,19-24} → add nagios checks for w64-ix-slaves once they are all cloned
95% of the slaves have been reimaged. Most of the slaves are Is this something that can be started now? or will deal a little later?
OS: Mac OS X → Windows Server 2008
Hardware: x86 → x86_64
I've added in the checks but downtimed the hosts for now (till next Wed) since I want to make sure we iron out any issues before they start alerting. Over to Matt to verify status of some hosts: w64-ix-slave02 w64-ix-slave17 w64-ix-slave19 w64-ix-slave41 There's a mix of host down to host responding to NRPE checks but not ping checks. I'm wondering if the ones not responding to ping are those that need to be reimaged. Matt, please take a look at the ngios GUI.
Assignee: arich → mlarrain
all of the downtimes for ping and disk have been remoted (buildbot_start remains deactivated until idelizer works). armen fixed the firewall rules on w64-ix-slave17 and w64-ix-slave19. w64-ix-slave02 and w64-ix-slave41 are acked and destined for a trip to iX.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.