Closed
Bug 694251
Opened 14 years ago
Closed 14 years ago
rev4 slaves are running amok after certain errors
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bear, Unassigned)
References
Details
talos-r4-snow-007 is the last one to do this - it's been disabled
http://buildbot-master04.build.scl1.mozilla.com:8201/builders/Rev4%20MacOSX%20Snow%20Leopard%2010.6%20ionmonkey%20opt%20test%20mochitests-5%2F5/builds/5/steps/clean_old_builds/logs/stdio
| Reporter | ||
Comment 1•14 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=6821606&tree=Ionmonkey&full=1
relevant part of the above log:
========= Started unpack (results: 0, elapsed: 14 secs) ==========
bash ../tools/buildfarm/utils/installdmg.sh firefox-10.0a1.en-US.mac64.dmg
in dir /Users/cltbld/talos-slave/test/build (timeout 1200 secs)
watching logfiles {}
argv: ['bash', '../tools/buildfarm/utils/installdmg.sh', u'firefox-10.0a1.en-US.mac64.dmg']
environment:
Apple_PubSub_Socket_Render=/tmp/launch-Y50gOn/Render
CVS_RSH=ssh
DISPLAY=/tmp/launch-E48YuD/org.x:0
HOME=/Users/cltbld
LOGNAME=cltbld
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
PWD=/Users/cltbld/talos-slave/test/build
PYTHONPATH=/Library/Python/2.5/site-packages
SHELL=/bin/bash
SSH_AUTH_SOCK=/tmp/launch-5BwMo3/Listeners
TMPDIR=/var/folders/Hs/HsDn6a9SG8idoIya6p9mtE+++TI/-Tmp-/
USER=cltbld
VERSIONER_PYTHON_PREFER_32_BIT=no
VERSIONER_PYTHON_VERSION=2.6
__CF_USER_TEXT_ENCODING=0x1F5:0:0
using PTY: False
Initializing…
DIBackingStoreInstantiatorProbe: interface 0, score 100, CBSDBackingStore
DIBackingStoreInstantiatorProbe: interface 1, score -1000, CBundleBackingStore
DIBackingStoreInstantiatorProbe: interface 2, score -1000, CRAMBackingStore
DIBackingStoreInstantiatorProbe: interface 3, score 100, CCarbonBackingStore
DIBackingStoreInstantiatorProbe: interface 4, score -1000, CDevBackingStore
DIBackingStoreInstantiatorProbe: interface 5, score -1000, CCURLBackingStore
DIBackingStoreInstantiatorProbe: interface 6, score -1000, CVectoredBackingStore
DIBackingStoreInstantiatorProbe: selecting CBSDBackingStore
DIBackingStoreInstantiatorProbe: interface 0, score 100, CBSDBackingStore
DIBackingStoreInstantiatorProbe: interface 1, score -1000, CBundleBackingStore
DIBackingStoreInstantiatorProbe: interface 2, score -1000, CRAMBackingStore
DIBackingStoreInstantiatorProbe: interface 3, score 100, CCarbonBackingStore
DIBackingStoreInstantiatorProbe: interface 4, score -1000, CDevBackingStore
DIBackingStoreInstantiatorProbe: interface 5, score -1000, CCURLBackingStore
DIBackingStoreInstantiatorProbe: interface 6, score -1000, CVectoredBackingStore
DIBackingStoreInstantiatorProbe: selecting CBSDBackingStore
DIFileEncodingInstantiatorProbe: interface 0, score -1000, CMacBinaryEncoding
DIFileEncodingInstantiatorProbe: interface 1, score -1000, CAppleSingleEncoding
DIFileEncodingInstantiatorProbe: interface 2, score -1000, CEncryptedEncoding
DIFileEncodingInstantiatorProbe: nothing to select.
DIFileEncodingInstantiatorProbe: interface 0, score 900, CUDIFEncoding
DIFileEncodingInstantiatorProbe: selecting CUDIFEncoding
DIFileEncodingNewWithBackingStore: CUDIFEncoding
DIFileEncodingNewWithBackingStore: instantiator returned 0
DIFileEncodingInstantiatorProbe: interface 0, score -1000, CSegmentedNDIFEncoding
DIFileEncodingInstantiatorProbe: interface 1, score -1000, CSegmentedUDIFEncoding
DIFileEncodingInstantiatorProbe: interface 2, score -1000, CSegmentedUDIFRawEncoding
DIFileEncodingInstantiatorProbe: nothing to select.
DIDiskImageInstantiatorProbe: interface 0, score 0, CDARTDiskImage
DIDiskImageInstantiatorProbe: interface 1, score 0, CDiskCopy42DiskImage
DIDiskImageInstantiatorProbe: interface 2, score -1000, CNDIFDiskImage
DIDiskImageInstantiatorProbe: interface 3, score 1000, CUDIFDiskImage
CRawDiskImage: data fork length 0x00000000017CA53C (24945980) not a multiple of 512.
DIDiskImageInstantiatorProbe: interface 5, score -100, CRawDiskImage
DIDiskImageInstantiatorProbe: interface 6, score -100, CShadowedDiskImage
DIDiskImageInstantiatorProbe: interface 7, score 0, CSparseDiskImage
DIDiskImageInstantiatorProbe: interface 8, score 0, CSparseBundleDiskImage
DIDiskImageInstantiatorProbe: interface 9, score -1000, CCFPlugInDiskImage
DIDiskImageInstantiatorProbe: interface 10, score -100, CWrappedDiskImage
DIDiskImageInstantiatorProbe: selecting CUDIFDiskImage
DIDiskImageNewWithBackingStore: CUDIFDiskImage
DIDiskImageNewWithBackingStore: instantiator returned 0
Verifying…
Checksumming Driver Descriptor Map (DDM : 0)…
Driver Descriptor Map (DDM : 0): verified CRC32 $8E9803F7
Checksumming Apple (Apple_partition_map : 1)…
Apple (Apple_partition_map : 1): verified CRC32 $DA4FFAC7
Checksumming DiscRecording 5.0.3d2 (Apple_HFS : 2)…
DiscRecording 5.0.3d2 (Apple_HFS : 2: verified CRC32 $20181BD6
Verification completed…
Error 0 (Unknown error: 0).
verified CRC32 $3674FFAD
Attaching…
DI_kextWaitQuiet: about to call IOServiceWaitQuiet...
DI_kextWaitQuiet: IOServiceWaitQuiet took 0.000004 seconds
2011-10-12 17:36:08.274 diskimages-helper[312:4223] DIHelperHDID serveImage: attaching drive
{
autodiskmount = 1;
"hdiagent-drive-identifier" = "7C218785-187E-4A62-BFD6-0196AEFE3BE3";
"unmount-timeout" = 0;
}
2011-10-12 17:36:08.281 diskimages-helper[312:4223] DIHelperHDID serveImage: connecting to myDrive 0x4B0B
2011-10-12 17:36:08.282 diskimages-helper[312:4223] DIHelperHDID serveImage: register _readBuffer 0x1031a4000
2011-10-12 17:36:08.282 diskimages-helper[312:4223] DIHelperHDID serveImage: activating drive port 0x4c07
2011-10-12 17:36:08.283 diskimages-helper[312:4223] DIHelperHDID serveImage: set cache enabled=TRUE returned SUCCESS.
2011-10-12 17:36:08.283 diskimages-helper[312:4223] DIHelperHDID serveImage: set on IO thread=TRUE returned SUCCESS.
2011-10-12 17:36:08.284 diskimages-helper[312:4223] DIHelperHDID serveImage: terminating UI Agent
2011-10-12 17:36:08.284 diskimages-helper[312:4223] DIHelperHDID serveImage: starting server loop - myPort is 0x4c07
Mounting…
2011-10-12 17:36:08.471 diskimages-helper[312:2103] _mountDevEntries: disk1s2 aborting mountpoint postflight because disk image has no band size specified.
Finishing…
Finishing…
/dev/disk1 Apple_partition_scheme
/dev/disk1s1 Apple_partition_map
/dev/disk1s2 Apple_HFS /Users/cltbld/talos-slave/test/build/mnt
building file list ... done
Summary: rev4 slaves are running amok after certain errors → rev4 slaves are running amok after dmg unpack error
| Reporter | ||
Comment 2•14 years ago
|
||
(In reply to Mike Taylor [:bear] from comment #1)
> https://tbpl.mozilla.org/php/getParsedLog.
> php?id=6821606&tree=Ionmonkey&full=1
>
> relevant part of the above log:
bah - ignore that comment. i'm trying to help debug while tired
Comment 3•14 years ago
|
||
I looked through the steps and it failed initially to checkout the tools repo because of DNS issue.
> could not lookup DNS configuration info service: (ipc/send) invalid destination port
Updated•14 years ago
|
Summary: rev4 slaves are running amok after dmg unpack error → rev4 slave failed to checkout tools repo because of DNS issue
Comment 4•14 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #3)
> I looked through the steps and it failed initially to checkout the tools
> repo because of DNS issue.
In that sentence, what is "it"?
You must not mean the log in comment 1, which successfully checked out the tools repo, so I presume you mean some later run on the same slave, which is not really the point of this bug, that's *how* they run amok rather than *why*.
The steps go like this:
1. Slave is fine
2. Something happens
3. Slave takes a job, DNS fails, it can't do the job
4. Repeat 3 until someone manually stops it
The interesting part is step 2, which in the case of just above https://tbpl.mozilla.org/php/getParsedLog.php?id=6821606&tree=Ionmonkey&full=1#error0 was apparently "2011-10-12 17:36:53.432 firefox-bin[351:903] HIToolbox: received notification of WindowServer event port death." That's the same thing I found in the log immediately before one of the two on the night of the 11th, though in the case of one that nthomas looked at in bug 693470 comment 5, it was apparently dmg unpacking.
Summary: rev4 slave failed to checkout tools repo because of DNS issue → rev4 slaves are running amok after certain errors
Comment 5•14 years ago
|
||
"received notification of WindowServer event port death" - that's the same kind of port (mach port) as is causing the DNS errors you've been seeing. That the port is *dying* is quite interesting. Off the top of my head:
- screensaver activates, locking the screen, which would disconnect the console session
- automatic logout
- ??
Comment 6•14 years ago
|
||
This morning on Slaves Gone Wild, it's talos-r4-snow-049.
Started with https://tbpl.mozilla.org/php/getParsedLog.php?id=6854726&tree=Mozilla-Inbound, another "hdiutil: detach: could not eject /dev/disk1s2: Inappropriate ioctl for device"
Comment 8•14 years ago
|
||
talos-r4-snow-049 was already disabled in slavealloc.
talos-r4-snow-041, talos-r4-snow-062 have been rebooted (not aware of any repeat offenders, just a random chance that each slave will go bang).
Comment 9•14 years ago
|
||
Part of the puppet startup involves launching and killing the screensaver engine to verify that we are properly turning off the screensaver. My testing on 10.6 showed that we didn't need to have the screen saver engine ever start for it to pick up the setting but was advised to keep it in. On 10.7, the slave completely breaks if we ever start the screen saver engine.
Even though this script is bound to run before buildbot starts, it might sever the required connections somewhere higher in the process hierarchy.
I am going to remove the launch and kill of screensaver engine, like I did to get lion to work.
Comment 10•14 years ago
|
||
It looks like this was fixed by no longer launching and killing the screen saver engine. Please reopen if we see more of these issues.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 11•14 years ago
|
||
-007 and -049 have been re-enabled now.
Comment 12•14 years ago
|
||
(In reply to John Ford [:jhford] from comment #10)
> It looks like this was fixed by no longer launching and killing the screen
> saver engine. Please reopen if we see more of these issues.
How was this change made ? I was expecting to see a landing in puppet-manifests but can't find one.
Comment 13•14 years ago
|
||
it's a script in /N/, there is nowhere (currently) to land it. I'd like to create a place to land these sorts of things with a lookaside cache for big files, but that's the future.
A copy of it is located at scl-production-puppet.build.mozilla.org:/N/production/darwin10-i386/test/usr/local/bin/disable-screensaver.sh if I am remembering correctly.
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•