Closed Bug 1166107 Opened 10 years ago Closed 10 years ago

shutdown hang SharedDecoderManager::DrainComplete()/Flush() deadlock

Categories

(Core :: Audio/Video, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox40 + fixed
firefox41 + fixed

People

(Reporter: karlt, Assigned: karlt)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

https://crash-stats.mozilla.com/report/index/930399a3-9302-499f-a7b6-deb492150517#allthreads SharedDecoderManager::SetIdle() is holding its monitor while calling Flush(). http://hg.mozilla.org/releases/mozilla-aurora/annotate/b03c5f1a77a1/dom/media/fmp4/SharedDecoderManager.cpp#l173 WMFMediaDataDecoder::Flush() waits to run a task on the same queue as DrainComplete() runs. SharedDecoderManager::DrainComplete() is waiting for the monitor held by SharedDecoderManager::SetIdle(). http://hg.mozilla.org/releases/mozilla-aurora/annotate/b03c5f1a77a1/dom/media/fmp4/SharedDecoderManager.cpp#l196 Holding the lock during Flush() is a regression from http://hg.mozilla.org/releases/mozilla-aurora/rev/380b0a052ef0 I suspect there was an incomplete external Drain before SetIdle() set mWaitForInternalDrain. The wait for the internal drain has picked up the DrainComplete() from the external Drain and the DrainComplete() from the internal Drain runs after the Wait() acquires the monitor again.
[Tracking Requested - why for this release]: Recently introduced cause of video playback failure and shutdown hang.
Enabled tracking for 40 and 41 — because what Karl said. :)
The DrainComplete() caught with mWaitForInternalDrain still won't necessarily be from the internal Drain(), but all we need is that one DrainComplete() is caught for the internal Drain() because one more will be generated if there is a Drain() in progress. What protecting mWaitForInternalDrain access with the monitor provides here is that the compiler won't use its address for storage of other data meaningless in the context of mWaitForInternalDrain and so, for example, two DrainComplete() calls won't unintentionally think that they are both for one internal drain. And TSan warnings.
Attachment #8609159 - Flags: review?(gsquelart)
Assignee: nobody → karlt
Status: NEW → ASSIGNED
Comment on attachment 8609159 [details] [diff] [review] release internal drain monitor before calling Flush() Review of attachment 8609159 [details] [diff] [review]: ----------------------------------------------------------------- r+. Optional: Fix typo if easy enough (not your typo, but we might as well correct it while we're nearby). ::: dom/media/platforms/SharedDecoderManager.cpp @@ +169,1 @@ > // We don't want to hold the lock while calling Drain() has some "has" -> "as"
Attachment #8609159 - Flags: review?(gsquelart) → review+
Attachment #8609160 - Flags: review?(gsquelart) → review+
Comment on attachment 8609159 [details] [diff] [review] release internal drain monitor before calling Flush() Review of attachment 8609159 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/platforms/SharedDecoderManager.cpp @@ +173,2 @@ > if (NS_SUCCEEDED(rv)) { > + MonitorAutoLock mon(mMonitor); Is it possible that this monitor might make DrainComplete() unable to acquire the lock and cause a dead lock in next while below?
Comment on attachment 8609159 [details] [diff] [review] release internal drain monitor before calling Flush() Review of attachment 8609159 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/platforms/SharedDecoderManager.cpp @@ +173,2 @@ > if (NS_SUCCEEDED(rv)) { > + MonitorAutoLock mon(mMonitor); wait() releases the lock
(In reply to Jean-Yves Avenard [:jya] from comment #7) > Comment on attachment 8609159 [details] [diff] [review] > release internal drain monitor before calling Flush() > > Review of attachment 8609159 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/media/platforms/SharedDecoderManager.cpp > @@ +173,2 @@ > > if (NS_SUCCEEDED(rv)) { > > + MonitorAutoLock mon(mMonitor); > > wait() releases the lock Oh,my bad! I read it carelessly...XD
Attached patch 40 branch patch (obsolete) — Splinter Review
Approval Request Comment [Feature/regressing bug #]: bug 1161443 [User impact if declined]: video playback failure and shutdown hang [Describe test coverage new/current, TreeHerder]: no new tests; this is a potential cause of intermittent failure in existing tests. [Risks and why]: low risk because the deviation from existing code is very small. [String/UUID change made/needed]: none
Attachment #8611598 - Flags: approval-mozilla-aurora?
Comment on attachment 8611598 [details] [diff] [review] 40 branch patch Shutdown hangs are a pain. Taking it.
Attachment #8611598 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
applying 1166107-8611598 failed to synchronize metadata for "dom/media/fmp4/SharedDecoderManager.cpp"
Flags: needinfo?(karlt)
In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #14) > applying 1166107-8611598 > failed to synchronize metadata for "dom/media/fmp4/SharedDecoderManager.cpp" Thanks. I guess something was being fussy about the arguments listed against diff.
Flags: needinfo?(karlt)
Attachment #8611598 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: