Closed Bug 694251 Opened 14 years ago Closed 14 years ago

rev4 slaves are running amok after certain errors

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bear, Unassigned)

References

Details

https://tbpl.mozilla.org/php/getParsedLog.php?id=6821606&tree=Ionmonkey&full=1 relevant part of the above log: ========= Started unpack (results: 0, elapsed: 14 secs) ========== bash ../tools/buildfarm/utils/installdmg.sh firefox-10.0a1.en-US.mac64.dmg in dir /Users/cltbld/talos-slave/test/build (timeout 1200 secs) watching logfiles {} argv: ['bash', '../tools/buildfarm/utils/installdmg.sh', u'firefox-10.0a1.en-US.mac64.dmg'] environment: Apple_PubSub_Socket_Render=/tmp/launch-Y50gOn/Render CVS_RSH=ssh DISPLAY=/tmp/launch-E48YuD/org.x:0 HOME=/Users/cltbld LOGNAME=cltbld PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin PWD=/Users/cltbld/talos-slave/test/build PYTHONPATH=/Library/Python/2.5/site-packages SHELL=/bin/bash SSH_AUTH_SOCK=/tmp/launch-5BwMo3/Listeners TMPDIR=/var/folders/Hs/HsDn6a9SG8idoIya6p9mtE+++TI/-Tmp-/ USER=cltbld VERSIONER_PYTHON_PREFER_32_BIT=no VERSIONER_PYTHON_VERSION=2.6 __CF_USER_TEXT_ENCODING=0x1F5:0:0 using PTY: False Initializing… DIBackingStoreInstantiatorProbe: interface 0, score 100, CBSDBackingStore DIBackingStoreInstantiatorProbe: interface 1, score -1000, CBundleBackingStore DIBackingStoreInstantiatorProbe: interface 2, score -1000, CRAMBackingStore DIBackingStoreInstantiatorProbe: interface 3, score 100, CCarbonBackingStore DIBackingStoreInstantiatorProbe: interface 4, score -1000, CDevBackingStore DIBackingStoreInstantiatorProbe: interface 5, score -1000, CCURLBackingStore DIBackingStoreInstantiatorProbe: interface 6, score -1000, CVectoredBackingStore DIBackingStoreInstantiatorProbe: selecting CBSDBackingStore DIBackingStoreInstantiatorProbe: interface 0, score 100, CBSDBackingStore DIBackingStoreInstantiatorProbe: interface 1, score -1000, CBundleBackingStore DIBackingStoreInstantiatorProbe: interface 2, score -1000, CRAMBackingStore DIBackingStoreInstantiatorProbe: interface 3, score 100, CCarbonBackingStore DIBackingStoreInstantiatorProbe: interface 4, score -1000, CDevBackingStore DIBackingStoreInstantiatorProbe: interface 5, score -1000, CCURLBackingStore DIBackingStoreInstantiatorProbe: interface 6, score -1000, CVectoredBackingStore DIBackingStoreInstantiatorProbe: selecting CBSDBackingStore DIFileEncodingInstantiatorProbe: interface 0, score -1000, CMacBinaryEncoding DIFileEncodingInstantiatorProbe: interface 1, score -1000, CAppleSingleEncoding DIFileEncodingInstantiatorProbe: interface 2, score -1000, CEncryptedEncoding DIFileEncodingInstantiatorProbe: nothing to select. DIFileEncodingInstantiatorProbe: interface 0, score 900, CUDIFEncoding DIFileEncodingInstantiatorProbe: selecting CUDIFEncoding DIFileEncodingNewWithBackingStore: CUDIFEncoding DIFileEncodingNewWithBackingStore: instantiator returned 0 DIFileEncodingInstantiatorProbe: interface 0, score -1000, CSegmentedNDIFEncoding DIFileEncodingInstantiatorProbe: interface 1, score -1000, CSegmentedUDIFEncoding DIFileEncodingInstantiatorProbe: interface 2, score -1000, CSegmentedUDIFRawEncoding DIFileEncodingInstantiatorProbe: nothing to select. DIDiskImageInstantiatorProbe: interface 0, score 0, CDARTDiskImage DIDiskImageInstantiatorProbe: interface 1, score 0, CDiskCopy42DiskImage DIDiskImageInstantiatorProbe: interface 2, score -1000, CNDIFDiskImage DIDiskImageInstantiatorProbe: interface 3, score 1000, CUDIFDiskImage CRawDiskImage: data fork length 0x00000000017CA53C (24945980) not a multiple of 512. DIDiskImageInstantiatorProbe: interface 5, score -100, CRawDiskImage DIDiskImageInstantiatorProbe: interface 6, score -100, CShadowedDiskImage DIDiskImageInstantiatorProbe: interface 7, score 0, CSparseDiskImage DIDiskImageInstantiatorProbe: interface 8, score 0, CSparseBundleDiskImage DIDiskImageInstantiatorProbe: interface 9, score -1000, CCFPlugInDiskImage DIDiskImageInstantiatorProbe: interface 10, score -100, CWrappedDiskImage DIDiskImageInstantiatorProbe: selecting CUDIFDiskImage DIDiskImageNewWithBackingStore: CUDIFDiskImage DIDiskImageNewWithBackingStore: instantiator returned 0 Verifying… Checksumming Driver Descriptor Map (DDM : 0)… Driver Descriptor Map (DDM : 0): verified CRC32 $8E9803F7 Checksumming Apple (Apple_partition_map : 1)… Apple (Apple_partition_map : 1): verified CRC32 $DA4FFAC7 Checksumming DiscRecording 5.0.3d2 (Apple_HFS : 2)… DiscRecording 5.0.3d2 (Apple_HFS : 2: verified CRC32 $20181BD6 Verification completed… Error 0 (Unknown error: 0). verified CRC32 $3674FFAD Attaching… DI_kextWaitQuiet: about to call IOServiceWaitQuiet... DI_kextWaitQuiet: IOServiceWaitQuiet took 0.000004 seconds 2011-10-12 17:36:08.274 diskimages-helper[312:4223] DIHelperHDID serveImage: attaching drive { autodiskmount = 1; "hdiagent-drive-identifier" = "7C218785-187E-4A62-BFD6-0196AEFE3BE3"; "unmount-timeout" = 0; } 2011-10-12 17:36:08.281 diskimages-helper[312:4223] DIHelperHDID serveImage: connecting to myDrive 0x4B0B 2011-10-12 17:36:08.282 diskimages-helper[312:4223] DIHelperHDID serveImage: register _readBuffer 0x1031a4000 2011-10-12 17:36:08.282 diskimages-helper[312:4223] DIHelperHDID serveImage: activating drive port 0x4c07 2011-10-12 17:36:08.283 diskimages-helper[312:4223] DIHelperHDID serveImage: set cache enabled=TRUE returned SUCCESS. 2011-10-12 17:36:08.283 diskimages-helper[312:4223] DIHelperHDID serveImage: set on IO thread=TRUE returned SUCCESS. 2011-10-12 17:36:08.284 diskimages-helper[312:4223] DIHelperHDID serveImage: terminating UI Agent 2011-10-12 17:36:08.284 diskimages-helper[312:4223] DIHelperHDID serveImage: starting server loop - myPort is 0x4c07 Mounting… 2011-10-12 17:36:08.471 diskimages-helper[312:2103] _mountDevEntries: disk1s2 aborting mountpoint postflight because disk image has no band size specified. Finishing… Finishing… /dev/disk1 Apple_partition_scheme /dev/disk1s1 Apple_partition_map /dev/disk1s2 Apple_HFS /Users/cltbld/talos-slave/test/build/mnt building file list ... done
Summary: rev4 slaves are running amok after certain errors → rev4 slaves are running amok after dmg unpack error
(In reply to Mike Taylor [:bear] from comment #1) > https://tbpl.mozilla.org/php/getParsedLog. > php?id=6821606&tree=Ionmonkey&full=1 > > relevant part of the above log: bah - ignore that comment. i'm trying to help debug while tired
I looked through the steps and it failed initially to checkout the tools repo because of DNS issue. > could not lookup DNS configuration info service: (ipc/send) invalid destination port
Summary: rev4 slaves are running amok after dmg unpack error → rev4 slave failed to checkout tools repo because of DNS issue
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #3) > I looked through the steps and it failed initially to checkout the tools > repo because of DNS issue. In that sentence, what is "it"? You must not mean the log in comment 1, which successfully checked out the tools repo, so I presume you mean some later run on the same slave, which is not really the point of this bug, that's *how* they run amok rather than *why*. The steps go like this: 1. Slave is fine 2. Something happens 3. Slave takes a job, DNS fails, it can't do the job 4. Repeat 3 until someone manually stops it The interesting part is step 2, which in the case of just above https://tbpl.mozilla.org/php/getParsedLog.php?id=6821606&tree=Ionmonkey&full=1#error0 was apparently "2011-10-12 17:36:53.432 firefox-bin[351:903] HIToolbox: received notification of WindowServer event port death." That's the same thing I found in the log immediately before one of the two on the night of the 11th, though in the case of one that nthomas looked at in bug 693470 comment 5, it was apparently dmg unpacking.
Summary: rev4 slave failed to checkout tools repo because of DNS issue → rev4 slaves are running amok after certain errors
"received notification of WindowServer event port death" - that's the same kind of port (mach port) as is causing the DNS errors you've been seeing. That the port is *dying* is quite interesting. Off the top of my head: - screensaver activates, locking the screen, which would disconnect the console session - automatic logout - ??
This morning on Slaves Gone Wild, it's talos-r4-snow-049. Started with https://tbpl.mozilla.org/php/getParsedLog.php?id=6854726&tree=Mozilla-Inbound, another "hdiutil: detach: could not eject /dev/disk1s2: Inappropriate ioctl for device"
Currently rogue: talos-r4-snow-041, talos-r4-snow-062
Blocks: 683734
talos-r4-snow-049 was already disabled in slavealloc. talos-r4-snow-041, talos-r4-snow-062 have been rebooted (not aware of any repeat offenders, just a random chance that each slave will go bang).
Part of the puppet startup involves launching and killing the screensaver engine to verify that we are properly turning off the screensaver. My testing on 10.6 showed that we didn't need to have the screen saver engine ever start for it to pick up the setting but was advised to keep it in. On 10.7, the slave completely breaks if we ever start the screen saver engine. Even though this script is bound to run before buildbot starts, it might sever the required connections somewhere higher in the process hierarchy. I am going to remove the launch and kill of screensaver engine, like I did to get lion to work.
It looks like this was fixed by no longer launching and killing the screen saver engine. Please reopen if we see more of these issues.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
-007 and -049 have been re-enabled now.
(In reply to John Ford [:jhford] from comment #10) > It looks like this was fixed by no longer launching and killing the screen > saver engine. Please reopen if we see more of these issues. How was this change made ? I was expecting to see a landing in puppet-manifests but can't find one.
it's a script in /N/, there is nowhere (currently) to land it. I'd like to create a place to land these sorts of things with a lookaside cache for big files, but that's the future. A copy of it is located at scl-production-puppet.build.mozilla.org:/N/production/darwin10-i386/test/usr/local/bin/disable-screensaver.sh if I am remembering correctly.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.