Open Bug 1372102 Opened 8 years ago Updated 3 years ago

Investigate how zero duration causes playback stuttering

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: jwwang, Unassigned)

References

Details

(Keywords: stale-bug)

Attachments

(1 file)

Bug 1371290 fixes the stuttering by giving each sample a reasonable duration. We need to figure out how duration affects playback in order to remove duration and depend only on the sample start time in our playback pipeline.
Priority: -- → P1
See Also: → 1314458, 1371290
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #1) > Created attachment 8876635 [details] > Bug 1372102 - set zero duration to repro playback stuttering. > > Review commit: https://reviewboard.mozilla.org/r/147978/diff/#index_header > See other reviews: https://reviewboard.mozilla.org/r/147978/ The patch deliberately set zero duration and allows you to repro the issue on all platforms.
OK. I see what's going with zero duration. Suppose we have some frames in the video queue (with zero duration): [0], [100], [200], [300], [400] Render loop 1 (t=0): We sent all frames to the image container and schedule the next loop at t=100. Render loop 2 (t=100): [0] and [100] are removed from the queue according to their end times. See (1) for the code. So we send [200], [300], [400] to the image container. Depending on the vsync timing, [100] might haven't displayed to the user yet when we replace the current images. Therefore [100] is dropped and never displayed to the user. This is how jitters come from. Also, |frame->IsSentToCompositor()| doesn't correctly report dropped frames. A frame sent to the image container and then replaced before reaching the user is not reported as a dropped frame. I will add some tweaks to VideoSink to take zero duration into account. (1) http://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/media/mediasink/VideoSink.cpp#419
(In reply to walmartguy from bug 1314458 comment #146) > I was using this video in 4k 60fps: > https://www.youtube.com/watch?v=aqz-KE-bpKQ > In the beginning, the camera is moving down gradually which makes stutter > easy to spot. Attach the video link that reproduces the issue.
Well, after thinking more about it, there is no easy way to determine whether a frame is late and should be dropped without looking at the end time (calculated by mTime + mDuration). Using the start time of next frame as the end time of the previous frame doesn't work when it comes to frame-skipping. This is preventing us from removing duration from our media playback pipeline. Hi Jya, Can you share your thoughts here how we can move on?
Flags: needinfo?(jyavenard)
isn't a frame late, only late if we have a new frame to play? If there's no new frame, it's not late (yet) and should be displayed. :kentuckyfriedtakahe did raise the issue that if we always displayed the frame that we had, when we had them it makes thing looks very weird and like very broken A/V sync as the frame being displayed may no longer have any links to what you're listening to. If we assume that the duration of a frame is always defined as start_next_frame_time - start_previous_frame_time. Then we don't need an explicit duration
Flags: needinfo?(jyavenard)
Will think more about it...
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #6) > If we assume that the duration of a frame is always defined as > start_next_frame_time - start_previous_frame_time. Then we don't need an > explicit duration For 2 frames in the video queue, VideoSink doesn't know if they are adjacent frames or there are some frames missing due to skip-to-next-key-frame. |start_next_frame_time - start_previous_frame_time| could give a wrong duration to a frame due to skip-to-next-key-frame and the frame is wrongly reported as presented instead of dropped. My idea is that since demuxer knows the start time of each video frame, it should be able to calculate the duration by using |start_next_frame_time - start_previous_frame_time| before sending a demuxed frame to the decoder. It works even in the case of skip-to-next-key-frame. So we still need 'duration'. But it will be calculated by demuxer and we don't have to depend on the mercy of decoder to provide a valid duration. This is another strong reason to remove the legacy reader where we have no control over the demuxer.
I'm not sure I understand how the example you give (skip to next frame got triggered) justify requiring a duration. obviously a frame is only to be displayed if the frame's start time is <= currentTime. until the new frame is ready to be displayed, it's the previous one that is to be seen no? So even in the case of skip_to_next_frame the rule of duration=next_frame-current_frame is true. And you always have the issue with the last video frame, you can't calculate how long that frame will last. It is to be displayed until either duration is reached or there's no more audio to play.
Flags: needinfo?(jyavenard)
Suppose we have 3 frames in the encoded file: [100], [200], [300]. Now we have [100] and [300] in the video queue and [200] is dropped due to skip-to-next-key-frame. Now with currentTime=150, and both [100] and [300] are not displayed yet, should we still display [100] or just drop it? I think we should drop it for we are supposed to render [200] when currentTime is 150. However, since [200] is missing, there is no way for VideoSink to know how long a frame should last without the duration. Using |duration=next_frame-current_frame| will bring bug 1258870 back. For the last frame, it is fine to assume the duration is infinity since there is no next frame to the last one.
if skip-to-next-key-frame has kicked in, we must have audio. As such, we have a value for currentTime. if currentTime is 150, then 100 is to be displayed. 300 can only be displayed once currentTime is >= 300. Why would you render 200 when currentTime is 150? it's too early to display this one.
Sorry. I mean currentTime=250 when it is too late to render [100], but too early for [300].
the only way if we had [100],[300] in the queue is that playback is linear. 100 should be displayed (as we have to display something) as it's the first frame anyway... if 100 isn't the first frame, we have a broken A/V sync anyway, so I'm not sure displaying 100 would be a big deal.
Yeah, the A/V sync is broken anyway no matter you display [100] or not. However, the point is we want to report [100] as dropped instead of presented so the player can detect frame dropping and switch to a lower resolution when the decoder can't keep up. Using |duration=next_frame-current_frame| will give a wrong duration to [100] and not report dropped frames correctly.
At this stage I would argue that no frames dropped by the compositor are reported anyway (bug 1245400)
I am still not convinced we should report dropped frames based on whether they are rendered by the compositor or not. This is what we used to have in bug 1258870: render loop 1: With currentTime=250 and [100] in the queue, [100] is sent to the compositor. render loop 2: With currentTime=350 and [100],[200] in the queue, [100] is removed from the queue and [200] is sent to the compositor. render loop 2: With currentTime=450 and [200],[300] in the queue, [200] is removed from the queue and [300] is sent to the compositor. Despite we send every frame to the compositor without any dropped frame, A/V sync is broken. The fix of bug 1258870 ensures those late frames are reported as dropped so the player has a chance to switch to a lower resolution for better playback experience. Therefore we should report dropped frames according to their end time instead of whether they are sent to or rendered by the compositor. Hi Chris, Can you share your opinions here since you fixed bug 1258870?
Flags: needinfo?(cpearce)
If the frames were dropped because they were late, they definitely should be reported as it's indicating that this machine is unable to keep up... > Therefore we should report dropped frames according to their end time instead of whether they are sent to or rendered by the compositor. but I'd like to get rid of that end time, it's causing pain with webrtc (which has no concept of end time) or even webm.
Here is my proposal: 1. Preload some video frames before starting playback. This is already done by MDSM. 2. When VideoSink starts playback, peek all frames in the queue to calculate an average duration. 3. Use the average duration to determine the end of the last frame in the queue. The end time of each frame is calculated either by: 1. current_end_time = next_start_time or 2. current_end_time = current_start_time + average_duration. Then we can remove mDuration and GetEndTime() from VideoData.
That sounds like a good plan.. This is what the WebMDemuxer does already. It always demux one frame ahead to determine the duration of the previous frame). when it gets to the last frame, it checks the average frame duration and use that instead. It's not 100% accurate, but its good enough.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #16) > Therefore we should report dropped frames according to their end time > instead of whether they are sent to or rendered by the compositor. > > Hi Chris, > Can you share your opinions here since you fixed bug 1258870? I'm not keen on regressing bug 1258870. Could we have the MediaFormatReader tag a frame which occurs after a skip-to-next-keyframe as a discontinuity, and we special case that in the dropping logic so we drop up to that whenever it's in the queue? Maybe if we just gave discontinuities a timestamp of 0 that would be sufficient?
Flags: needinfo?(cpearce)
(In reply to Chris Pearce (:cpearce) from comment #20) > Maybe if we > just gave discontinuities a timestamp of 0 that would be sufficient? This is a bit vague, what I mean is drop all frames with start time less than the clock time, provided the start time of the *following* frame is also less than or equal to the clock time.
I don't think the issue is related to skip-to-next-keyframe in any ways. In the scenario described above, skip-to-next-keyframe was never triggered.
(In reply to Jean-Yves Avenard [:jya] from comment #22) > I don't think the issue is related to skip-to-next-keyframe in any ways. > In the scenario described above, skip-to-next-keyframe was never triggered. IIUC, you are talking about the example in comment 16 where the decoder doesn't fall behind playback enough to trigger skip-to-next-keyframe. However, A/V async issue is still noticeable to the user.
(In reply to Jean-Yves Avenard [:jya] from comment #22) > I don't think the issue is related to skip-to-next-keyframe in any ways. > In the scenario described above, skip-to-next-keyframe was never triggered. Comment #10? That's the scenario that bug 1258870 fixed.
Depending on how incapable the computer is, it could the example in comment 10 or the one in comment 16 (decoder isn't slow enough to trigger skip-to-next-keyframe).
A/V sync would typically still be good, it just cause visible jitter.
(In reply to Chris Pearce (:cpearce) from comment #24) > (In reply to Jean-Yves Avenard [:jya] from comment #22) > > I don't think the issue is related to skip-to-next-keyframe in any ways. > > In the scenario described above, skip-to-next-keyframe was never triggered. > > Comment #10? That's the scenario that bug 1258870 fixed. I think there's too many misunderstanding at hand here. I personally never referred about "stnk" when describing the problem (ability to remove the concept of frame duration. We have the decoder that returned 100, 300, we only have a start time. the video queue contains 100,200 100 arrived a tad too late, 300 arrived early, currentTime is 150. Which frame do we display now?
(In reply to Jean-Yves Avenard [:jya] from comment #27) > (In reply to Chris Pearce (:cpearce) from comment #24) > > (In reply to Jean-Yves Avenard [:jya] from comment #22) > > > I don't think the issue is related to skip-to-next-keyframe in any ways. > > > In the scenario described above, skip-to-next-keyframe was never triggered. > > > > Comment #10? That's the scenario that bug 1258870 fixed. > > I think there's too many misunderstanding at hand here. I personally never > referred about "stnk" when describing the problem (ability to remove the > concept of frame duration. Sure, but STNKF is an important case to consider, as it exacerbates any problems here. > We have the decoder that returned 100, 300, we only have a start time. the > video queue contains 100,200 > > 100 arrived a tad too late, 300 arrived early, currentTime is 150. > > Which frame do we display now? I'm confused by your example here now. It's not clear whether the queue contains 100,200, or 100,300? In either case, one would think we should display 150. But, if we're struggling to keep up, chances are that a lot of the time we have only 1 frame in the queue, as I discovered: https://bugzilla.mozilla.org/show_bug.cgi?id=1258870#c6 And so if we don't drop this frame, we'll end up always showing frames which are late, and break A/V sync. Also, re-reading https://bugzilla.mozilla.org/show_bug.cgi?id=1258870#c6 made me realise, if we often have only 1 frame in the queue when we're struggling, we're unlikely to be able to infer a duration. At least at this level.
This is a P1 bug without an assignee. P1 are bugs which are being worked on for the current release cycle/iteration/sprint. If the bug is not assigned by Monday, 28 August, the bug's priority will be reset to '--'.
Keywords: stale-bug
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: