Closed Bug 1550531 Opened 6 years ago Closed 6 years ago

-moz-element leaks memory and hangs whole browser with massive CPU usage at closing browser window with large image Theme after landing patch from bug #1330487

Categories

(Core :: SVG, defect, P1)

64 Branch
x86_64
Windows 7
defect

Tracking

()

VERIFIED FIXED
mozilla69
Performance Impact medium
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- wontfix
firefox67 --- wontfix
firefox67.0.1 --- wontfix
firefox68 --- verified
firefox69 --- verified

People

(Reporter: Virtual, Assigned: jwatt)

References

(Regression)

Details

(8 keywords)

Attachments

(3 files)

STR:

  1. Open few browser windows
  2. Start long browsing session
  3. Close one window
    and enjoy browser hang for few seconds and in some cases even many minutes

Regression range:
TODO - started few days ago

I will try to catch profile with Gecko Profiler extension.

I suspect that it could be related to relanded LSNG patches or maybe some Fission work.

:Virtual, could you try to find a regression range in using for example mozregression?

I'm having trouble to easily reproduce this issue.
In my case sometimes it happens, sometimes it doesn't,
so it will take some time to find regression range.
If someone can reproduce this issue easily, help to find regression range will be nice.

For what it's worth, I had noted extended shut-down times with one of the Firefox processes taking about 32% of the CPU (the equivalent of about 1.28 cores in my 4-core CPU) for up to a few minutes if I had viewed a number of YouTube videos earlier in the session. No-YouTube = no elongated shut-down time. Note: when the problem exists, if I shut down the YouTube tabs but other tabs are open, no CPU usage is noticed at that time, but rather later when I shut down Firefox. Also, it seems the more I had been running YouTube videos that session, the longer it takes Firefox to shut down. Since the CPU shoots up after the Firefox windows had closed, I don't have opportunity to run the Gecko Profiler on it.

My system is Windows 10 (64-bit), Firefox Nightly 68.0a1 (2019-05-16) (64-bit), running on Intel i5-7400 (4-core), with an Intel(R) HD Graphics 630. Maybe relevant: I have the uBlock Origin extension installed with default block lists.

(In reply to Mark from comment #5)

CPU usage happens in the Firefox main process. (I just downloaded "Process Explorer" from www.sysinternals.com and had it running when I shut down Firefox Nightly after a few YouTube videos and an audio podcast.)

I will try to catch profile with Gecko Profiler extension, as looking for regression range is unreliable in my case.

Has Regression Range: --- → no

I've had similar problems for the last probably 1 -2 weeks as well. In fact, today I installed the new nightly and it took just over 25 (yes, twenty five) mins for firefox to shutdown. And that was maxing 1 core of the CPU the whole time.
I don't get any indication from firefox that anything hung/crashes (no new reports in about:crashes).
If there is a way to get a profile of the shutdown process in firefox, I'm more than willing to do it and post the results for analysis.

Whiteboard: [qf]

FYI - Looks like it's not caused by LSNG (Local Storage Next Generation), because when I disable it in about:config by setting "dom.storage.next_gen" to "false", it still happens.

As I said before I have problems with shutdowns taking a long time, but the problem seems to manifest the same way when I close a private browsing window as well.
So I tried to use Gecko Profiler to get some more info about this, but I couldn't get it to work when you have a private window open.
But I took a screenshot (that I will attach) of the Threads tab in Process Hacker for both the main Firefox process and a child one that was using a lot of CPU. The "firefox.exe!TargetNtUnmapViewOfSection" in the child process screenshot seemed to continuously spawn and disappear.
Don't know if they are of any help.

Next profile - https://perfht.ml/2HKxBtD

STR:

  1. Open few tabs in one browser window
  2. Open second browser window and start long browsing session in this window with many tabs
  3. Close all tabs from #2, except one
  4. Start recording profile with Gecko Profiler
  5. Wait about 10s doing nothing and close browser window from #2 & #3
  6. and enjoy waiting nearly 3 mins for Firefox to close browser window...
  7. Wait 10s doing nothing after profile capture

Looks like 80% of time Firefox is spending at "mozilla::SVGRenderingObserver::StopObserving".
Looking on https://searchfox.org/mozilla-central/search?q=mozilla%3A%3ASVGRenderingObserver%3A%3AStopObserving
latest changes to this function name were done in bug #1525372 which landed 2019-03-19 10:52:47 CET,
but oddly I don't think this issue started before 2019-04-15.

@ Matt Woodrow (:mattwoodrow) & Jeff Muizelaar [:jrmuizel] & Jonathan Watt [:jwatt] - Good day, I'm asking about some help in analyzing my profiles and some hints. Could it be that bug #1525372 caused this issue or it's missed call by me? Thank you very much in advance!

Info for later regression hunting - bugs which touched SVG code from 2019-04-15 to 2019-05-09 are:
bug #1548985
bug #1547209
bug #1544270
bug #1544216
bug #1544209
bug #1543584
bug #1542646
bug #1541350
bug #1539846
bug #1539318
bug #1533491
bug #1525753
bug #1372537
bug #1323962
bug #1247218
bug #972041
bug #937268
bug #874811
bug #608812

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jwatt)
Flags: needinfo?(jmuizelaar)

The stacks in your profile seem a bit unreliable, but most of that is likely due to identical functions being compiled to a single symbol.

I think the issue is this code, leading to massive hash table leaking - https://searchfox.org/mozilla-central/rev/952521e6164ddffa3f34bc8cfa5a81afc5b859c4/layout/svg/SVGObserverUtils.cpp#1378

This changeset adds a new usage of -moz-element, seems like about the right date range - https://hg.mozilla.org/mozilla-central/rev/e330d90e540084087b6a9c155eabe069ec690c49

Flags: needinfo?(matt.woodrow)

@ Matt Woodrow (:mattwoodrow) - Thank you very much for your detailed information!

@ Kris Maglione [:kmag] - Could you please look on this issue, as it may be that your patch to fix bug my issue in bug #1540984 caused this issue.

Flags: needinfo?(kmaglione+bmo)

(In reply to Matt Woodrow (:mattwoodrow) from comment #17)

I think the issue is this code, leading to massive hash table leaking - https://searchfox.org/mozilla-central/rev/952521e6164ddffa3f34bc8cfa5a81afc5b859c4/layout/svg/SVGObserverUtils.cpp#1378

Let's start with this.

Assignee: nobody → jwatt

Why is this code using -moz-element? Doesn't background-image work? I'd expect -moz-element to be much more expensive.

Regressed by: 1330487

First, nice sleuthing, Matt, identifying in comment 17 the relevant leak and the bug that triggered it. For completeness, it looks like the memory leak was introduced by the Part 6 patch from bug 1330487. See bug 1330487 comment 101 for details.

I could indeed only reproduce the leak after installing a theme with a large banner image, to trigger the -moz-element introduced in bug 1540984 comment 31.

When bug 1330487 regressed this, it goes all the way back to 64. Maybe we could include it in a possible security fix for an upcoming 67.0.x release?

@ Jonathan Watt [:jwatt] - Thank you very much for rapid diagnosis and fix. I'm testing it right now and I will report back later with results.

STR:

  1. Install Theme with large image (like Virtualfox [ https://addons.mozilla.org/addon/virtualfox/ ] or Firefox Quantum Nightly [ https://addons.mozilla.org/addon/firefox-quantum-nightly/ ] )
  2. In first browser window open few tabs and leave it untouched
  3. Open second browser window and start long browsing session in this window with many tabs
  4. After long browsing session close second browser window from #3
    and enjoy waiting from few seconds to many minutes for Firefox to close browser window,
    my record from yesterday was nearly 1 hour of hang with 85-93% of RAM usage, but I waited patiently, as I know session won't be restored properly when I will close Firefox with Windows hang information notification that Firefox stopped responding.
Severity: major → critical
Status: NEW → ASSIGNED
Has Regression Range: no → yes
Regressed by: 1540984
Summary: Mozilla Firefox Nightly 68.0a1 uses too much CPU and sometimes even hangs for some time at closing browser window → -moz-element leaks memory and hangs whole browser with massive CPU usage at closing browser window with large image Theme after landing patch from bug #1330487
Version: 68 Branch → 64 Branch
Pushed by jwatt@jwatt.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/f75308d601f2 Fix -moz-element memory leak and related shutdown hangs. r=emilio

(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)

Why is this code using -moz-element? Doesn't background-image work? I'd expect -moz-element to be much more expensive.

It doesn't work, no. At least not with large background images, which flicker every time a window opens or changes focus. See bug 1540984.

I was initially going to use -moz-element with a canvas with a cropped image when the background image was too large, but just using it with an uncropped image element solved the problem on its own, so it didn't seem worth the extra complexity.

(In reply to Kris Maglione [:kmag] from comment #28)

It doesn't work, no. At least not with large background images, which flicker every time a window opens or changes focus. See bug 1540984.

Which protocol are these images loaded with, moz-extension://? I suspect that's an underlying imagelib bug, we no longer properly cache a bunch of images from those protocols (see https://bugzilla.mozilla.org/show_bug.cgi?id=1406134#c5).

Flags: needinfo?(kmaglione+bmo)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #30)

(In reply to Kris Maglione [:kmag] from comment #28)

It doesn't work, no. At least not with large background images, which flicker every time a window opens or changes focus. See bug 1540984.

Which protocol are these images loaded with, moz-extension://? I suspect that's an underlying imagelib bug, we no longer properly cache a bunch of images from those protocols (see https://bugzilla.mozilla.org/show_bug.cgi?id=1406134#c5).

At the moment, yes, but we had this problem in the past even when we used to persist theme images from https: URLs to disk and load them from file: URLs.

Flags: needinfo?(kmaglione+bmo)

file:// has the same issue actually, I think...

Well, in any case, I'd love for the -moz-element hack not to be necessary, but I spent more time than I could spare digging through the image loading and caching code trying to figure out why it didn't work correctly for large images before I had to give up and make do with a workaround. If the caching problem gets fixed, I would be more than happy to remove it.

Kris, can you file a bug on the caching issue? It seems like it shouldn't be too hard to fix.

Flags: needinfo?(kmaglione+bmo)
See Also: → 1553923

(In reply to Jeff Muizelaar [:jrmuizel] from comment #34)

Kris, can you file a bug on the caching issue? It seems like it shouldn't be too hard to fix.

Filed bug 1553923.

Flags: needinfo?(kmaglione+bmo)
Whiteboard: [qf] → [qf:p2:resource]

Note also on this profile it's also spending a lot of time deleting RetainedDisplayLists from a hashtable, it appears.

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #36)

Note also on this profile it's also spending a lot of time deleting RetainedDisplayLists from a hashtable, it appears.

Do you mean the DeleteValue<RetainedDisplayListData> line?

I think that's just a shared symbol, since it doesn't make sense for RetainedDisplayListData deletion to trigger the rest of the code below it.

I think that's just a shared symbol, since it doesn't make sense for RetainedDisplayListData deletion to trigger the rest of the code below it.

Aha, that makes sense. Darn linkers....

This regression dates back to 64 (bug 1330487) but became a real problem when bug 1540984 landed in 68. Given that, I think this is not a good candidate for uplifting to 67, so wontfix 67 and we don't take ridealongs in 67.05.

@ Jonathan Watt [:jwatt] - I'm reporting back. So far so good. I didn't encounter issue for over 1 day of testing with Mozilla Firefox Nightly 69.0a1 (2019-05-23) (64-bit) [Built from https://hg.mozilla.org/try/rev/663b5d1bd8fdb5c098d3f9bb37eef7f6b986de05] from Comment #24 with STR from Comment #26.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69

(In reply to Virtual_ManPL [:Virtual] - (please needinfo? me - so I will see your comment/reply/question/etc.) from comment #40)

@ Jonathan Watt [:jwatt] - I'm reporting back. So far so good. I didn't encounter issue for over 1 day of testing

Awesome, thanks for testing and reporting back!

Comment on attachment 9067012 [details]
Bug 1550531. Fix -moz-element memory leak and hangs. r=emilio

Beta/Release Uplift Approval Request

  • User impact if declined: Indefinitely long shutdowns for users using a theme with a large banner image (due to memory leak). Possibly other users are affected by other scenarios that hit the same leak.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: See comment 26.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch fixes a memory leak, but otherwise should not change behavior. I've followed the code execution through in a debugger both with and without the patch to confirm that it works as expected.
  • String changes made/needed: none
Attachment #9067012 - Flags: approval-mozilla-beta?

I'm confirming that bug is fixed, starting in Mozilla Firefox Nightly 68.0a1 (2019-05-25), so I'm marking this bug as VERIFIED.
Thank you very much! \o/

Status: RESOLVED → VERIFIED
Priority: -- → P1

Comment on attachment 9067012 [details]
Bug 1550531. Fix -moz-element memory leak and hangs. r=emilio

fix for a mem leak, approved for 68.0b5

Attachment #9067012 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Managed to verify the fix as well using 68.0b5 on Windows 7x32.

Flags: qe-verify+
Performance Impact: --- → P2
Whiteboard: [qf:p2:resource]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: