Closed
Bug 1129455
Opened 11 years ago
Closed 9 years ago
shutdownhang in mozilla::MediaShutdownManager::Shutdown() with a thread in WMFMediaDataDecoder::Flush()
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
People
(Reporter: kairo, Assigned: karlt)
References
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is
report bp-80eb894c-0602-4e2e-be37-13c412150128.
=============================================================
It seems that some of our shutdownhangs are in media shutdown code, we should see if we can get those fixed. Unsurprisingly, many different YouTube URLs are on those crash reports.
For stats and reports, please see https://crash-stats.mozilla.com/report/list?signature=shutdownhang%20%7C%20WaitForSingleObjectEx%20%7C%20WaitForSingleObject%20%7C%20PR_Wait%20%7C%20nsThread%3A%3AProcessNextEvent%28bool%2C%20bool%2A%29%20%7C%20NS_ProcessNextEvent%28nsIThread%2A%2C%20bool%29%20%7C%20mozilla%3A%3AMediaShutdownManager%3A%3AShutdown%28%29
![]() |
Reporter | |
Updated•11 years ago
|
Summary: crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown() → shutdownhang in mozilla::MediaShutdownManager::Shutdown()
Updated•11 years ago
|
Priority: -- → P1
Comment 1•11 years ago
|
||
I still see some crashes in crash-stats on Aurora - https://crash-stats.mozilla.com/report/index/aa6779b5-e05d-40fe-8fde-a75122150205 is one example. The most recent crashes in this stack are buildid=2015020700.
![]() |
Reporter | |
Updated•11 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc]
Comment 2•11 years ago
|
||
Tracking all MSE P1 bugs for Firefox 37.
Comment 3•11 years ago
|
||
Anthony - Can you please help find an owner for this bug?
Flags: needinfo?(ajones)
Updated•11 years ago
|
Assignee: nobody → karlt
Flags: needinfo?(ajones)
Comment 4•11 years ago
|
||
I looked at these crashes - most of the recent ones seem audio-related, and almost certainly fallout from padenot's stuff.
Comment 5•11 years ago
|
||
If that's the case, it's my understanding that all the stuff is now backed out so this should be gone. Kairo, Marcia can you check if we are still seeing any of these? I searched and couldn't find anything with a build id later than 2/16 but I might have missed one of the signatures.
![]() |
Reporter | |
Comment 6•11 years ago
|
||
For beta, https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=beta&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id says there are some in build ID 20150217132925 which is 36.0b10, but it's not in the top 50 crashes.
In terms of Dev Edition, https://crash-stats.mozilla.com/search/?product=Firefox&release_channel=aurora&signature=~mozilla%3A%3AMediaShutdownManager%3A%3AShutdown&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-build_id also still shows some reports for the latest builds.
Assignee | ||
Comment 7•11 years ago
|
||
Of the hangs in comment 6 not covered by bug 1135562, the majority seem to have threads in MediaTaskQueue::FlushAndDispatch().
e.g. https://crash-stats.mozilla.com/report/index/5836e867-b5e4-42a1-9e6c-f5dce2150221
That suggests that perhaps bug 1125970 may have an effect here.
There are also threads in WMFMediaDataDecoder::Flush(), and typically MFTDecoder::Output() and BaseOutputPin::GetDeliveryBuffer().
Those stacks are from non-MSE code paths. The MSE code path in comment 0 also involved WMFMediaDataDecoder::Flush(), but FlushAndDispatch() was not on any stack.
Depends on: 1125970
Assignee | ||
Updated•11 years ago
|
Summary: shutdownhang in mozilla::MediaShutdownManager::Shutdown() → shutdownhang in mozilla::MediaShutdownManager::Shutdown() with a thread in WMFMediaDataDecoder::Flush()
Comment 8•11 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #7)
> Those stacks are from non-MSE code paths. The MSE code path in comment 0
> also involved WMFMediaDataDecoder::Flush(), but FlushAndDispatch() was not
> on any stack.
The first thing that WMFMediaDataDecoder::Flush does is invoke mTaskQueue->Flush(), so there's a good chance it's related. Note that one odd side-effect of flushing is that the runnable gets destroyed on a different thread than it would otherwise, so if the runnable is holding onto things that do interesting things in their destructor, there's a potential for trouble. Additionally, the decoder might be expecting its tasks to run, and get confused in ::Flush if those tasks disappeared.
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #8)
> The first thing that WMFMediaDataDecoder::Flush does is invoke
> mTaskQueue->Flush(), so there's a good chance it's related.
Yes, thanks for your comments.
Related, but this task queue won't be directly affected by changes bug 1125970.
> Note that one
> odd side-effect of flushing is that the runnable gets destroyed on a
> different thread than it would otherwise, so if the runnable is holding onto
> things that do interesting things in their destructor, there's a potential
> for trouble.
Runnables on this task queue hold references to WMFMediaDataDecoder and
sometimes auto pointers to mp4_demuxer::MP4Sample. Since the WMFMediaDataDecoder owns the task queue, and MediaDataDecoder has thread-safe ref counting, I think that should be fine. I assume it is expected that MP4Sample can deleted on different threads.
> Additionally, the decoder might be expecting its tasks to run,
> and get confused in ::Flush if those tasks disappeared.
There are only a few different runnables queued on this task queue. I'm looking at those and when they are called to see if I can identify a sequence that would be problematic. A try run says we could run those tasks instead of emptying them, but I don't have evidence yet to indicate that would help.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc72e2711a5b
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4da9a9fb8648
Comment 10•11 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #9)
> There are only a few different runnables queued on this task queue. I'm
> looking at those and when they are called to see if I can identify a
> sequence that would be problematic. A try run says we could run those tasks
> instead of emptying them, but I don't have evidence yet to indicate that
> would help.
In general, cpearce wants to remove flushing from all the PDMs. It's possible that this would adversely affect latency in the WMF case, in which case we might need a different mechanism.
Comment 11•11 years ago
|
||
During triage today, I was asked to report back on crash volume - this data is for the last 7 days:
The latest beta in FF 37 has 151 crashes. There were 231 crashes in B2 and 150 in B1.
Comment 12•11 years ago
|
||
Sheila - This is still marked as an MSE P1. It's not a high volume crash but is a shutdown crash. Is this still on the list of MSE bugs to fix in 37? If so, are we going to have a fix by Thursday for Beta 7?
Flags: needinfo?(smooney)
Assignee | ||
Comment 13•11 years ago
|
||
I don't know the cause of these hangs, and so we have only speculative fixes and no real expectation of a fix by Thursday.
The patch uplifted from bug 1139380 may help, but that is speculation.
The number of similar but different sets of thread stacks here, suggests that either there are a number of different issues or the threads are busy processing media frames to drain their queues and taking a very long time to do so.
Something like the ResetDecode changes in https://hg.mozilla.org/mozilla-central/rev/5f2bef7e0b6c from bug 1135170 may help by flushing / dropping samples instead of processing the remaining queue. However, that patch, as is, is too involved to be a candidate for uplift, and any reduced version would be speculative and not well tested. Emptying queues should not take as long as our 60s shutdown timeout, so I suspect there is another undiagnosed confounding issue.
Depends on: 1139380
Comment 14•11 years ago
|
||
I reviewed this bug with the media team today. While this is a priority bug, it won't block the 37 release. Given that we're building the final 37 Beta on Thursday, I'm marking this as wontfix in 37.
Comment 15•10 years ago
|
||
Marking this for followup later in aurora and after Karl gets back.
Whiteboard: [followup-2015-04-20]
Comment 16•10 years ago
|
||
Karl, we are now past the middle of the beta cycle and shutdownhang crashes are an important part of our crashes. Could you have a look asap? Thanks
Flags: needinfo?(karlt)
Whiteboard: [followup-2015-04-20]
Comment 17•10 years ago
|
||
We need to check when bug 1155432 makes it to beta to see if it has moved the needle.
Assignee | ||
Comment 18•10 years ago
|
||
This bug still exists on 38.0b7/20150424064750 which includes the changes from
bug 1155432.
These hang reports have a threads in different stages of
WMFVideoMFTManager::Output(), while WMFMediaDataDecoder::Flush() is Wait()ing
on the monitor condition variable.
https://crash-stats.mozilla.com/report/index/b901bff8-3d02-4fda-bcb8-dec762150427#allthreads
https://crash-stats.mozilla.com/report/index/09ccddd4-36fd-4633-b19c-e722e2150426#allthreads
Searching the threads for media/ identifies the threads involved.
A similar report for 39.0a2/20150417004243 indicates that changes from bug 1135170 have not fixed this. e.g.
https://crash-stats.mozilla.com/report/index/ae2376d6-48a5-4b76-9652-4da972150417#allthreads
Comment 19•10 years ago
|
||
Like 37, wontfix for 38.
Karl, any news on this for 39? Thanks
Flags: needinfo?(karlt)
Assignee | ||
Comment 20•10 years ago
|
||
I filed bug 1161402 in case that gives us some clues as to what is happening.
Bug 1159456 may help with some of these, but not most.
I'm going to have a look into what info might be worth recording on the stacks, to extract for clues.
Depends on: 1161402
Flags: needinfo?(karlt)
Updated•10 years ago
|
Flags: needinfo?(smooney)
Comment 21•10 years ago
|
||
Karl, I know it may be hard to tell where this is at right now because we don't have a lot of crash data for 39 (yet), but do you think that bug 1159456 fixed this issue?
Flags: needinfo?(karlt)
Assignee | ||
Comment 22•10 years ago
|
||
I don't know whether or not fixing bug 1159456 had much impact. It was only going to fix some of the hangs here anyway.
I think fixing bug 1161892 made some difference, because that eliminated one class of hangs, but there were still plenty of hangs matching the description here.
I'm looking out for changes from fixing bug 1162364, but need more crash data from aurora builds newer than 20150529004007 to make any analysis yet.
Flags: needinfo?(karlt)
Comment 23•10 years ago
|
||
From discussion above and with k17e it sounds like this may become more clear as some of the other shutdown hangs get resolved. So we are waiting-and-seeing for a bit longer.
Comment 24•10 years ago
|
||
There are crashes from 39.0b2, but none from 39.0b3, 4, or 5. Seems like a good sign!
I'm going to mark this as fixed; Karl, if you think too hasty, please comment and change the flag back to "affected" for 39.
status-firefox40:
--- → affected
status-firefox41:
--- → ?
tracking-firefox40:
--- → +
Flags: needinfo?(karlt)
Updated•10 years ago
|
tracking-firefox40:
+ → ---
Assignee | ||
Comment 25•10 years ago
|
||
I see some hangs in 39.0b3 which may be addressed by bug 1162364, which is fixed in beta 5.
https://crash-stats.mozilla.com/report/index/593db919-cb94-4d84-9f6d-617732150608#allthreads
https://crash-stats.mozilla.com/report/index/2ecd41d1-d552-4db5-b5eb-a19442150608#allthreads
There are also a number of MediaShutdownManager::Shutdown() hangs where there is no media code in the stack traces of the other threads, but that would not be what is tracked in this bug.
e.g. https://crash-stats.mozilla.com/report/index/f4e1f4a5-bf56-4dcb-b427-967192150608#allthreads
I can't explain the lack of MediaShutdownManager::Shutdown() reports on b4 and b5, because the complete lack of these signatures would imply that several different bugs have been fixed concurrently. I suspect that is unlikely.
However, we do seem to have a much lower relative incidence of this bug, and it may now be less important than other MediaShutdownManager::Shutdown() hangs.
I suspect there may still be something to fix in this bug, but I'm short on ideas and I don't think we have time to address what remains of this bug before 39 release anyway.
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(karlt)
Comment 26•10 years ago
|
||
Tracking this for 40 just in case and to follow up once 40 moves to beta.
tracking-firefox40:
--- → +
Comment 27•10 years ago
|
||
Anthony, as I mentioned in my email, I do see this shutdown hang on FF40.0b4 on crash-stats page. Could you please help find an owner so we can address this in time for FF40 release?
Flags: needinfo?(ajones)
Assignee | ||
Comment 28•10 years ago
|
||
Looking at reports from 40.0b4, the variation in stack traces seems to have
reduced considerably.
The common situation is now that one thread is in WaitForSingleObject from
CCoreH264Decoder::GetDisplayFrame() of msmpeg2vdec.dll. This is usually from
MFTDecoder::Output() msmpeg2vdec.dll, but I saw one instance from
MFTDecoder::Flush().
A typical report is in [1].
The only other threads with msmpeg2vdec.dll on the stack are in
WaitForMultipleObjects from ThreadedDecode.
I don't know what these are waiting for, but deadlock seems a possibility.
In 40.0b4 this hang is about 3 time more common than bug 1135562.
On 41.0a2, however, bug 1135562 seems to be about 4 times more common than
this hang. I don't have an explanation for that.
342 hangs with the signatures of these bugs on 331 installations implies that
very few installations are seeing these hangs more than once. The data is
similar on 39.0.
[1] https://crash-stats.mozilla.com/report/index/851f59f2-bc42-45f4-b2da-28ce42150719#allthreads
Assignee | ||
Comment 30•10 years ago
|
||
Ritu, I don't see this signature in the top 50 on release, beta, or aurora.
Do you have data indicating that this bug needs to be prioritized?
Flags: needinfo?(rkothari)
Comment 31•10 years ago
|
||
Karl, the only reason this bug came up in my search is because we (RelMan) are following up on FF40 tracked bugs and pinging devs on bugs that have been unchanged for a while.
I don't see this crash signature in top 25 on beta channel. Based on that, do we want to mark status-firefox40: "fixed" instead of "affected"?
Flags: needinfo?(rkothari)
![]() |
Reporter | |
Comment 32•10 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #31)
> I don't see this crash signature in top 25 on beta channel. Based on that,
> do we want to mark status-firefox40: "fixed" instead of "affected"?
It's not fixed if no fix has landed and it's still happening at low volume, but it may make sense to flip it to "wontfix" for that train if we don't find it important enough to fix.
Comment 33•10 years ago
|
||
We've fixed a lot of things properly in 42 so it is a better release target for further efforts to reduce crashes/hangs. If this is low enough volume I'd rather not try to tackle it right now.
Updated•10 years ago
|
Flags: needinfo?(ajones)
Comment 34•10 years ago
|
||
Can we untrack this?
Comment 35•10 years ago
|
||
I have marked 40 as wontfix and not tracked 41+ as there is a low incidence of occurrence for the associated crash report.
Updated•10 years ago
|
Component: Audio/Video → Audio/Video: Playback
Updated•10 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_Wait | nsThread::ProcessNextEvent(bool, bool*) | NS_ProcessNextEvent(nsIThread*, bool) | mozilla::MediaShutdownManager::Shutdown()]
[@ shutdownhang | ntdll.dll@0x3c6bc]
[@ shutdownhang |…
Comment 37•10 years ago
|
||
Not seeing this signature in 42 or later.
Comment 38•10 years ago
|
||
With no crashes in 28 days and 42 now in release I think we can close this bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 39•9 years ago
|
||
It happens again, see thread 42.
https://crash-stats.mozilla.com/report/index/39fd3300-7278-4661-a74b-11c162160517#allthreads
In addition, bug 1274498 is also stuck in MFTDecoder::Output, but it calls msmpeg2vdec.dll instead of MFH264Dec.dll.
Comment 40•9 years ago
|
||
Comment 41•9 years ago
|
||
mfh264dec.dll is the DLL in which the WMF H.264 decoder resides on Windows Vista. On Windows 7 and later, it's msmpeg2vdec.dll.
Comment 42•9 years ago
|
||
(In reply to Alastor Wu [:alwu] from comment #40)
> See thread 68.
> https://crash-stats.mozilla.com/report/index/c98046a7-2a43-4d63-8b53-
> 3c8ce2160513#allthreads
alwu: This is a Firefox 46 crash report. Are there any crashes in 47 or later? I don't see any in the Crash Signature URLs above. Maybe this is fixed.
Flags: needinfo?(alwu)
Comment 43•9 years ago
|
||
According the comment42, we didn't see the crash signature in FF47, it might be fixed.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 9 years ago
Flags: needinfo?(alwu)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•