Closed Bug 1952699 Opened 7 months ago Closed 7 months ago

Firefox takes 1.4GB to run a testcase. On reloading and running again, it barely takes a 200MB

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
138 Branch
Tracking Status
firefox138 --- fixed

People

(Reporter: mayankleoboy1, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Attached file Nested Map.HTML
  1. Open attached testcase
  2. Enter 5000000 and click "Generate Maps"
  • Notice it will take ~1.4GB RAM
  1. Once the maps are generated, Force-reload the page and wait till GC/CC clears the memory
  2. Enter 5000000 and click on "Generate Maps" again
  • Notice the page barely uses 200-300MB RAM.

Profile: https://share.firefox.dev/3FrLjD6

  • The first blip is the first time the testcase is run.
  • The second blip at 15s mark is when the page does GC/CC and releases memory
  • The third blip at 20s is when i rerun the testcase

Jan, needinfoing you as you are working on related areas and this may be of interest.

Flags: needinfo?(jdemooij)

Infact, whatever magic is happening in the second run, it works so well that you can enter a number that is 100x the original and still get barely more than 300MB total RAM use:

On second run, N= 500000000
Nightly: https://share.firefox.dev/3DmxaXr (300MB total RAM used by tab)
Chrome for comparison: https://share.firefox.dev/3FeQyGt

Bisection: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=38967ad7e8f29d5aeb963ccc61480124a6bae15a&tochange=0bcf2642f5a6e7175812623451eda2ab6cb35a0d
Disabling javascript.options.jithints makes the testcase use 1.4GB RAM on every run. So a positive behaviour change from Bug 1824772

To be clear, Bug 1824772 improved things. Prior to this, each time the testcase is run, it would take 1.4GB. After this bisection, the testcase takes 200MB on subsequent runs.

Depends on: 1824772

I think this is due to a combination of how the test case is written + how OSR works with the C++ Interpreter.

The test creates a long 'chain' of nested Map objects, but it only keeps the last 1-2 alive (with current). This means a nursery GC can typically clean up old objects in the chain because they're now garbage.

When we OSR from C++ interpreter to Baseline code, we leave the interpreter frame on the stack and mark it as "entered JIT code". We still trace this frame though and in this case we do end up marking an ancestor in the chain. This then keeps a lot more objects alive.

It looks like the nursery GC is still tenuring all of these objects and the major GCs do the actual cleanup. Let me see what's going on there before I close this bug.

To avoid this you could change the test to do something like this:

  let set = new Set();
  let root = set;
  let current = set;
  
  for (let i = 0; i < n; i++) {
      let nextSet = new Set();
      current.add(nextSet);
      current = nextSet;
  }
  return root;

(In reply to Jan de Mooij [:jandem] from comment #3)

It looks like the nursery GC is still tenuring all of these objects and the major GCs do the actual cleanup. Let me see what's going on there before I close this bug.

Oh nevermind this makes sense because after a nursery collection there's always a tenured object that points to the chain of nursery objects, so a nursery GC has to tenure the whole chain. You could try this with semi space enabled.

An easy thing we could do is clear the InterpreterFrame slots when we OSR into Baseline to avoid it keeping things alive. Let me try that...

Second Run:
SemispaceGC enabled with 100x(!) the loop count: https://share.firefox.dev/4ij6HJm
SemispaceGC disabled: https://share.firefox.dev/3DiN3yd (from comment 0)

So with semisapceGC enabled, the second run becomes 8x faster compared to non-semisapceGC?1?!

These interpreter frames are effectively dead, so clear its variable/expression stack slots
after copying the values to the Baseline frame.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

The attached patch won't affect this measurably because the test also relies on Ion realizing that the set local is dead, and that depends on which parts of the function run in Baseline, but it might help other things.

Flags: needinfo?(jdemooij)
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cbb538a57813 Clear InterpreterFrame stack slots when we OSR into Baseline. r=iain
Blocks: sm-js-perf
Severity: -- → N/A
Priority: -- → P1

Backed out for causing valgrid failures.

Flags: needinfo?(jdemooij)
Pushed by agoloman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5903b3ab6eb6 Clear InterpreterFrame stack slots when we OSR into Baseline. r=iain
Flags: needinfo?(jdemooij)
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 138 Branch
Duplicate of this bug: 1982547
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: