Open Bug 1914491 Opened 1 year ago Updated 5 months ago

Investigate moving Ion linking off-thread

Categories

(Core :: JavaScript Engine: JIT, task, P1)

task

Tracking

()

People

(Reporter: iain, Unassigned)

References

(Blocks 1 open bug)

Details

After an off-thread Ion compile is complete, there's a linking step that runs on the main thread. See LinkIonScript, which bottoms out in CodeGenerator::link. This shows up in Speedometer profiles.

It would be nice if we didn't have to spend main-thread time on linking. Markus is very keen on eliminating this. It could potentially let us Ion-compile more aggressively. However, linking touches a bunch of data that is currently only available on the main thread.

As a first step, we should audit the Ion linking code to figure out whether there's any low-hanging fruit that we could move off-thread incrementally.

js-perf-next: Investigate ion linking to identify which steps are most expensive. Determine what changes would be necessary to move expensive steps off-thread. Open bugs for any low-hanging fruit.

Whiteboard: [sm-perf-next] → [js-perf-next]

Please don't do it yet, or make it such that it can be toggled back …

While I agree from the performance point of view, and I want us to go this way in the future. I would prefer if we could have a degraded mode which can be toggle in about:config such that users might still fallback on main-thread linking when there is no-hardware / no-OS support to help mitigate against JIT Spraying attacks.

Not keeping a main-thread linking will prevent us from having an mprotect-based protection based on code-data separation.

It seems very reasonable to make sure that we preserve the ability to do main-thread writing-to-executable-pages. Among other things, we still use mprotect in the parent process, so we need to remain compatible with that.

There are other steps of CodeGenerator::link (random example: generateCompactNativeToBytecodeMap) where I can imagine that it would be less problematic to move it offthread. (Or maybe not, if it depends on knowing the final address of the code. This is very much intended as an exploratory bug.)

See Also: → 1490849

I did a quick investigation of this, looking at this JS3 profile (which I assume should have a similar distribution of time within linking as any other code). We spend:

  • 50% of the total time is spent in Linker::newCode. Of this, 35% is spent copying the code into the executable area (with lots of time spent pagefaulting?), and almost all the remaining time is spent allocating executable memory.
  • 11% is spent adding entries to the JitcodeGlobalEntry table.
  • 7% of our time recording glean metrics ("ION_COMPILE_TIME"). We claim that the embedder is responsible for locking in the telemetry code, but I can't find an existing example of off-thread telemetry inside SM. (And there's no explicit synchronization around JSRuntime::setTelemetryCallback, so I expect ASAN would complain if we tried accessing it offthread.)
  • 6% is spent updating JitHints.
  • 5% is spent in MacroAssembler::finish, updating relocations etc.
  • 5% is allocating the IonScript itself.
  • 3% is spent initializating the IonScript.
  • 3% is registering fuses on the main thread.
  • 2% is in AddInlinedCompilations.

I'd say this broadly breaks down into two categories: work that we do on the main thread because we're storing metadata about the compilation in a global data structure (JitcodeGlobalEntry, JitHints, fuses, AddInlinedCompilation, arguably telemetry), and work that we do on the main thread because it needs to happen after we have executable memory, and the executable memory pool is single-threaded (Linker::newCode, MacroAssembler::finish, probably IonScript initialization). I count roughly 29% for the first category and 63% for the second. (The remaining 8% is tiny fragments and rounding error).

If we were going to do something in this area, off-thread executable memory allocation is probably the right choice: it's delicate work, but the payoff is larger than incrementally adding locking to each of 4 or 5 different kinds of metadata.

One big obstacle to off-thread memory allocation is that it doesn't play nicely with W^X (which IIRC we still use in the parent process): because permissions are flipped on a per-page basis, off-thread compilation can't write to any page that might contain currently executing code. So we would likely end up with more fragmentation.

Looking at a couple older profiles of SP3, it looks like LinkIonScript takes up 550/114K samples here, and 734/193K samples here. (The percentages in those profiles are not exactly the same as the one measured above, but the overall picture is the same.) So overall I would estimate that Ion linking is just under 0.5% of SP3. If we estimate that off-thread memory allocation roughly doubles the speed, then we would expect about 0.25% improvement across the benchmark, which isn't nothing but is a pretty small win relative to the difficulty. (We might also get a small boost from making the same changes to offthread baseline compilation.)

I don't think this is one of our most valuable opportunities. Removing the js-perf-next tag and closing this bug. I'll open a separate bug for off-thread allocation of executable memory and mark it as P3.

No longer blocks: sm-perf-next
Whiteboard: [js-perf-next]
See Also: → 1965082
You need to log in before you can comment on or make changes to this bug.