Closed Bug 1167409 Opened 10 years ago Closed 10 years ago

Enable off-main-thread parsing of blocking scripts

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: djvj, Assigned: djvj)

References

(Blocks 2 open bugs)

Details

Attachments

(8 files, 10 obsolete files)

move-offthread-token-into-script-load-request.patch 10 years ago Kannan Vijayan [:djvj] 9.56 KB, patch		Details \| Diff \| Splinter Review
factor-off-thread-parsing-into-separate-function.patch 10 years ago Kannan Vijayan [:djvj] 3.01 KB, patch		Details \| Diff \| Splinter Review
change-mloading-to-mprogress.patch 10 years ago Kannan Vijayan [:djvj] 13.32 KB, patch		Details \| Diff \| Splinter Review
off-thread-compile-blocking-scripts.patch 10 years ago Kannan Vijayan [:djvj] 12.48 KB, patch		Details \| Diff \| Splinter Review
off-thread-compile-blocking-scripts.patch 10 years ago Kannan Vijayan [:djvj] 12.72 KB, patch		Details \| Diff \| Splinter Review
test-failure-1.patch 10 years ago Kannan Vijayan [:djvj] 4.27 KB, patch		Details \| Diff \| Splinter Review
test-failure-2.patch 10 years ago Kannan Vijayan [:djvj] 1.15 KB, patch		Details \| Diff \| Splinter Review
test-failure-1.patch 10 years ago Kannan Vijayan [:djvj] 6.38 KB, patch		Details \| Diff \| Splinter Review
test-failure-2.patch 10 years ago Kannan Vijayan [:djvj] 1.32 KB, patch		Details \| Diff \| Splinter Review
1-move-offthread-token-into-script-load-request.patch 10 years ago Kannan Vijayan [:djvj] 9.56 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
2-factor-process-request-and-off-thread-parsing-into-separate-functions.patch 10 years ago Kannan Vijayan [:djvj] 3.01 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
3-change-mloading-to-mprogress.patch 10 years ago Kannan Vijayan [:djvj] 13.32 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
4-off-thread-compile-blocking-scripts.patch 10 years ago Kannan Vijayan [:djvj] 11.42 KB, patch		Details \| Diff \| Splinter Review
0-move-offthread-token-into-script-load-request.patch 10 years ago Kannan Vijayan [:djvj] 9.15 KB, patch		Details \| Diff \| Splinter Review
1-factor-process-request-and-off-thread-parsing-into-separate-functions.patch 10 years ago Kannan Vijayan [:djvj] 3.02 KB, patch		Details \| Diff \| Splinter Review
2-change-mloading-to-mprogress.patch 10 years ago Kannan Vijayan [:djvj] 13.40 KB, patch		Details \| Diff \| Splinter Review
3-off-thread-compile-blocking-scripts.patch 10 years ago Kannan Vijayan [:djvj] 11.37 KB, patch		Details \| Diff \| Splinter Review
4-initialize-sourceobject-when-compilation-fails.patch 10 years ago Kannan Vijayan [:djvj] 11.27 KB, patch	jandem : review+	Details \| Diff \| Splinter Review

Kannan Vijayan [:djvj]

Assignee

Description

•

10 years ago

Sub-bug to enable off-main-thread parsing of scripts that block the main HTML parser. These are "typical" scripts that are present in <script> tags which are not marked 'async'.

Kannan Vijayan [:djvj]

Assignee

Comment 1

•

10 years ago

Attached patch move-offthread-token-into-script-load-request.patch (obsolete) — Details — Splinter Review

Current off-main-thread parsing of async scripts uses a token value that's passed back from the JS compiler to the browser. This token is then used to look up the compiled JSScript when the pre-compiled script needs to be run. This patch moves the off-thread token from being a parameter that's passed around to the various processing functions, to a field that lives on nsScriptLoadRequest.

Kannan Vijayan [:djvj]

Assignee

Comment 2

•

10 years ago

Try run for above patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=38adfa30fe7e

Kannan Vijayan [:djvj]

Assignee

Comment 3

•

10 years ago

Attached patch factor-off-thread-parsing-into-separate-function.patch (obsolete) — Details — Splinter Review

Currently, the attempt to off-thread parse a script is made within the body of 'ProcessRequest'. The attempt method checks the request to only try off-thread parsing on async scripts. This patch factors the call to AttemptAsyncScriptParse into a separate method: 'ParseOffThreadOrProcessRequest', which is called only from the async-script processing code. This is to allow cleaner usage of attempting off-thread compilation in places where we don't necessarily want to process the script immediately. Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba652c20f6ea

Kannan Vijayan [:djvj]

Assignee

Comment 4

•

10 years ago

Attached patch change-mloading-to-mprogress.patch (obsolete) — Details — Splinter Review

Currently, nsScriptLoadRequest has an 'mLoading' field that is true when the script data is being loaded, and set to false when it's done loading. In the new code, scripts may be a few different states depending on how they are to be processed: loading, done loading, compiling, and done compiling. Some scripts will go from loading -> done loading -> execute, some scripts will go from loading -> done loading -> compiling -> done compiling -> execute. This patch changes 'bool mLoading' into 'Progress mProgress', where Progress is an enumeration of the states above. Logic which currently checks 'mLoading' is changed to use better-named methods that read the underlying mProgress var.

Kannan Vijayan [:djvj]

Assignee

Comment 5

•

10 years ago

Attached patch off-thread-compile-blocking-scripts.patch (obsolete) — Details — Splinter Review

This is the final patch, and I think it has some bugs still. It changes logic in 'ProcessScriptElement' to attempt off-thread compilation of parser-blocking scripts. It also changes the 'OnStreamComplete' handler to try off-thread parsing of scripts which are blocking the parser. It has a number of oranges on try which need to be tracked down, but should structurally be pretty solid. Basically, we change some places where we are currengly 'ProcessRequest'ing a request, and instead try an off-thread compile first. If the off-thread compile is successfully initiated, then the parser-blocking script is kept around, and the callback run at completion-of-compile processes the request - either directly, or by issuing an async command on the event loop. Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5246f5c7e4e5

Kannan Vijayan [:djvj]

Assignee

Comment 6

•

10 years ago

Attached patch off-thread-compile-blocking-scripts.patch (obsolete) — Details — Splinter Review

Updated patch to appropriately call unblockOnload in the case where an off-thread compiled script is handled via a call to ProcessPendingRequests. Latest try runs still show oranges, but far fewer (the bc1 and cgc and dt errors are separate): https://treeherder.mozilla.org/#/jobs?repo=try&revision=32ce366d80ef I'm tracking down the bc2 errors now, and it seems that "bocking script execution" is not done quite correctly in the new logic. Somehow, there is script execution that continues despite the main page being blocked on an off-thread compiling synchronous script.

Attachment #8609037 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 7

•

10 years ago

Fixing the remaining batches of orange on try, I've been focusing on the |bc2| errors above: browser/components/uitour/test/browser_UITour_loop.js The basic issue is this: one of the tests in the UITour deals with the "loop" or "Firefox Hello" feature. Referring to firefox hello as "loop" for this discussion as that's how it's referred to in code. The test clicks the 'loop button' (presumably the hello button at top), and then waits for the 'loopButton.open' field to become true, then does somethign with the ContentAPI, then checks to see if the 'getting started button' exists: Services.prefs.setBoolPref("loop.gettingStarted.seen", false); Services.prefs.setCharPref("loop.gettingStarted.url", "http://example.com"); is(loopButton.open, false, "Menu should initially be closed"); loopButton.click(); yield waitForConditionPromise(() => { return loopButton.open; }, "Menu should be visible after showMenu()"); gContentAPI.registerPageID("hello-tour_OpenPanel_testPage"); yield new Promise(resolve => { gContentAPI.ping(() => resolve()); }); let loopDoc = document.getElementById("loop-notification-panel").children[0].contentDocument; let gettingStartedButton = loopDoc.getElementById("fte-button"); ok(gettingStartedButton, "Getting Started button should be found"); The test fails on the last line, noting that |gettingStartedButton| is not found. Internally comparing logs between my patches firefox and reference firefox, I can see that the reference firefox loads and executes MANY scripts before the |fte-button| check occurs. In my patched firefox, the logs indicate that the |react.js| script is scheduled for asynchronous parsing, but the failing assertion is reached before that compile even completes. Furthermore, |fte-button| is a button added using React APIs. So it seems like pretty strong evidence that the test code is continuing execution before the necessary scripts are parsed and executed. I'm not sure how or why this happens, as my patched firefox _should_ be blocking page parsing when it's off-thread compiling scripts (using the same mechanism that pages are blocked when loading js code from netwerk). Yet somehow the test code is running ahead of the time it usually should.

Kannan Vijayan [:djvj]

Assignee

Comment 8

•

10 years ago

Attached patch test-failure-1.patch (obsolete) — Details — Splinter Review

Patch for replicating test failure 1 (browser/devtools/markupview/test/browser_markupview_search_01.js)

Kannan Vijayan [:djvj]

Assignee

Comment 9

•

10 years ago

Attached patch test-failure-2.patch — Details — Splinter Review

Patch for recreating test failure 2 (browser/components/uitour/test/browser_UITour_loop.js)

Kannan Vijayan [:djvj]

Assignee

Comment 10

•

10 years ago

I spent the last week meticulously tracing between test code, browser frontend code, and browser DOM code to track these failures. Eventually I ended up resorting to bisecting my own patches, and used the results of that to come up with these two reduced patches, which cause the test failures when applied to tip. The second patch (test-failure-2.patch) is particularly interesting to me. It simply comments out an "optimization" - where nsScriptLoader processes a script syncrhonously instead of putting it on the mParserBlockingScript field and running through the event loop once and processing it asynchronously. The first patch (test-failure-1.patch) modifies the behaviour of '::OnStreamComplete'. Typically, OnStreamComplete will process incoming mParserBlockingScript scripts immediately and synchronously. This patch modifies OnStreamComplete to push script processing off by scheduling script processing for the next event loop. It seems like neither of these should affect execution semantics, and should only affect timing or incidental script execution ordering.

Kannan Vijayan [:djvj]

Assignee

Comment 11

•

10 years ago

Attached patch test-failure-1.patch — Details — Splinter Review

Updated test-failure-1 patch changed to BlockOnload when deferring script processing, and UnblockOnload after processing. Still shows test failure.

Attachment #8614199 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 12

•

10 years ago

Attached patch test-failure-2.patch — Details — Splinter Review

Updated test-failure-2 patch changed to BlockOnload when deferring script processing, and UnblockOnload after processing. Still shows test failure.

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Depends on: 1173490

Kannan Vijayan [:djvj]

Assignee

Comment 13

•

10 years ago

After a bunch of investigation, I'm reasonably sure these patches are good. Going to go ahead and get review on them (after a final try run) and wait for test issues to clear up.

Kannan Vijayan [:djvj]

Assignee

Comment 14

•

10 years ago

Attached patch 1-move-offthread-token-into-script-load-request.patch (obsolete) — Details — Splinter Review

Attachment #8609028 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 15

•

10 years ago

Attached patch 2-factor-process-request-and-off-thread-parsing-into-separate-functions.patch (obsolete) — Details — Splinter Review

Attachment #8609031 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 16

•

10 years ago

Attached patch 3-change-mloading-to-mprogress.patch (obsolete) — Details — Splinter Review

Attachment #8609034 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 17

•

10 years ago

Attached patch 4-off-thread-compile-blocking-scripts.patch (obsolete) — Details — Splinter Review

Attachment #8610258 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Attachment #8620614 - Flags: review?(bugs)

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Attachment #8620616 - Flags: review?(bugs)

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Attachment #8620617 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

10 years ago

Comment on attachment 8620614 [details] [diff] [review] 1-move-offthread-token-into-script-load-request.patch >+ JSRuntime *rt; * goes with the type > >+ void** OffThreadTokenPtr() { Nit, { goes to the next line >+ return mOffThreadToken ? &mOffThreadToken : nullptr; >+ } Hmm, a tiny bit odd to have getter for &mOffThreadToken, but not for mOffThreadToken. Perhaps add also void* OffThreadToken() { return mOffThreadToken; } and use use and not aRequest->mOffThreadToken > using super::getNext; > using super::isInList; > > nsCOMPtr<nsIScriptElement> mElement; > bool mLoading; // Are we still waiting for a load to complete? > bool mIsInline; // Is the script inline or loaded? > bool mHasSourceMapURL; // Does the HTTP header have a source map url? > bool mIsDefer; // True if we live in mDeferRequests. > bool mIsAsync; // True if we live in mLoadingAsyncRequests or mLoadedAsyncRequests. > bool mIsNonAsyncScriptInserted; // True if we live in mNonAsyncExternalScriptInsertedRequests > bool mIsXSLT; // True if we live in mXSLTRequests. > bool mIsCanceled; // True if we have been explicitly canceled. >+ void* mOffThreadToken; // Off-thread parsing token. Perhaps worth to document when it is set

Attachment #8620614 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 19

•

10 years ago

Comment on attachment 8620616 [details] [diff] [review] 2-factor-process-request-and-off-thread-parsing-into-separate-functions.patch > nsresult >+nsScriptLoader::ParseOffThreadOrProcessRequest(nsScriptLoadRequest* aRequest) >+{ >+ NS_ASSERTION(nsContentUtils::IsSafeToRunScript(), >+ "Processing requests when running scripts is unsafe."); >+ NS_ASSERTION(!aRequest->mOffThreadToken, >+ "Candidate for off-thread parsing is already parsed off-thread"); >+ >+ nsresult rv = AttemptAsyncScriptParse(aRequest); >+ if (rv != NS_ERROR_FAILURE) >+ return rv; always {} with if (I know you just moved this code, but still) > while (mEnabled && !mLoadedAsyncRequests.isEmpty()) { > request = mLoadedAsyncRequests.StealFirst(); >- ProcessRequest(request); >+ ParseOffThreadOrProcessRequest(request); It is not clear at all after this patch when one should call ProcessRequest and when ParseOffThreadOrProcessRequest but I assume the next patch explains that. (it is about the async-ness)

Attachment #8620616 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 20

•

10 years ago

Comment on attachment 8620617 [details] [diff] [review] 3-change-mloading-to-mprogress.patch >+ enum Progress { >+ Progress_Loading, >+ Progress_DoneLoading, >+ Progress_Compiling, >+ Progress_DoneCompiling >+ }; >+ bool IsReadyToRun() { >+ return mProgress == Progress_DoneLoading || >+ mProgress == Progress_DoneCompiling; >+ } >+ bool IsLoading() { >+ return mProgress == Progress_Loading; >+ } >+ bool IsDoneLoading() { >+ return mProgress == Progress_DoneLoading; >+ } >+ bool InLoadingStage() { >+ return IsLoading() || IsDoneLoading(); >+ } { goes to its own line with methods. IsLoadingStage() and IsReadyToRun() may both return true at the same time. For a random (non-native-English) reader the difference between IsLoading() and InLoadingStage() is rather vague I'd say. Given that you use InLoadingStage() in assertions (which probably should be MOZ_ASSERTs), perhaps make this InCompilingStage() and then !InCompilingStage() in assertions and change the assertion text a bit.

Attachment #8620617 - Flags: review?(bugs) → review+

Kannan Vijayan [:djvj]

Assignee

Comment 21

•

10 years ago

(In reply to Olli Pettay [:smaug] (for generic DOM reviews, you may want to look for other reviewers too ;)) from comment #18) > Hmm, a tiny bit odd to have getter for &mOffThreadToken, but not for > mOffThreadToken. > Perhaps add also > void* OffThreadToken() > { > return mOffThreadToken; > } > and use use and not aRequest->mOffThreadToken I think your comment got mangled when you were writing. Could you restate? I don't understand what you're suggesting here.

Kannan Vijayan [:djvj]

Assignee

Comment 22

•

10 years ago

So the latest try push looks pretty good (the other oranges are all intermittent or from the base revision the patches were applied to), except for a consistent |g2| crash in the debugger when talos-testing devtools on opt builds. Seems to happen pretty consistently across platforms: https://treeherder.mozilla.org/#/jobs?repo=try&revision=70baeefe36ff I'm trying to replicate this locally but not having any on an opt 64-bit build on linux. Trying an opt 32-bit build in a linux VM now.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 23

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #21) > (In reply to Olli Pettay [:smaug] (for generic DOM reviews, you may want to > look for other reviewers too ;)) from comment #18) > > Hmm, a tiny bit odd to have getter for &mOffThreadToken, but not for > > mOffThreadToken. > > Perhaps add also > > void* OffThreadToken() > > { > > return mOffThreadToken; > > } > > and use use and not aRequest->mOffThreadToken > > I think your comment got mangled when you were writing. Could you restate? > I don't understand what you're suggesting here. s/use use/use that/ So, add a getter for mOffThreadToken

Kannan Vijayan [:djvj]

Assignee

Comment 24

•

10 years ago

(In reply to Olli Pettay [:smaug] from comment #23) > > I think your comment got mangled when you were writing. Could you restate? > > I don't understand what you're suggesting here. > s/use use/use that/ > > So, add a getter for mOffThreadToken I don't need a getter for mOffThreadToken because it's a public var. The reason I need an get-address-of is that the address of the field is passed into nsJsUtils::EvaluateString which doesn't know about the request. It receives a pointer-to-void-pointer which represents the token, and if non-null, uses it. If the token is consumed by EvaluateString, EvaluateString sets the token to nullptr. So the field is passed in as an outvar to EvaluateString. I can work around this by having the caller to EvaluateString cache the void-star in a local var, and pass in a pointer to the local var, and after return update mOffThreadToken, but it's effectively equivalent to this. I can go either way on this depending on how strongly you feel about it, but I think the get-address-of method we have now is sufficient and mirrors pre-existing usage pattern of the off-thread-token (it's passed around currentely as a pointer-to-void-star).

Flags: needinfo?(bugs)

Kannan Vijayan [:djvj]

Assignee

Comment 25

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #22) > So the latest try push looks pretty good (the other oranges are all > intermittent or from the base revision the patches were applied to), except > for a consistent |g2| crash in the debugger when talos-testing devtools on > opt builds. Seems to happen pretty consistently across platforms: > > https://treeherder.mozilla.org/#/jobs?repo=try&revision=70baeefe36ff > > I'm trying to replicate this locally but not having any on an opt 64-bit > build on linux. Trying an opt 32-bit build in a linux VM now. I can't replicate this g2 crash I'm seeing in that try build. Tried a local opt 32-bit and 64-bit linux build, no luck. Got a loaner talos machine and ran the same talos build with a download of the try build, and the crash doesn't replicate. Going to try to reduce my patches again and try to get a simple set of changes that reproduces this error. That might be a better avenue.

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Depends on: 1175252

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 26

•

10 years ago

The getter for mOffThreadToken would be just for consistency - nothing else. So either have getter for the address of mOffThreadToken and mOffThreadToken itself or don't have getter for neither.

Flags: needinfo?(bugs)

Kannan Vijayan [:djvj]

Assignee

Comment 27

•

10 years ago

Ok. I'll remove the pointer-getter then. We never need to access the bool directly, it gets passed in as a pointer and set in js-specific code that doesn't have access to the request.

Kannan Vijayan [:djvj]

Assignee

Comment 28

•

10 years ago

Finally managed to replicate the g2 crash seen here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5090fa33cf9b The crash itself is occurring within the getter-function for '.element' on debugger-proxy for a script source. It occurs inside |DebuggerSource_getElement|. It turns out that the |element| slot on the ScriptSourceObject for the script is initialized to a poison Magic value. Looking into the initialization of ScriptSourceObject, we run into this (https://dxr.mozilla.org/mozilla-central/source/js/src/frontend/BytecodeCompiler.cpp?from=CreateScriptSourceObject&case=true#195): // Off-thread compilations do all their GC heap allocation, including the // SSO, in a temporary compartment. Hence, for the SSO to refer to the // gc-heap-allocated values in |options|, it would need cross-compartment // wrappers from the temporary compartment to the real compartment --- which // would then be inappropriate once we merged the temporary and real // compartments. // // Instead, we put off populating those SSO slots in off-thread compilations // until after we've merged compartments. if (cx->isJSContext()) { if (!ScriptSourceObject::initFromOptions(cx->asJSContext(), sso, options)) return nullptr; } the |initFromOptions| call is what initializes the |element| field in a ScriptSourceObject. For off-thread compiles, this is delayed until compartment merge time. Somehow, these partially-initialized ScriptSourceObjects are being made visible to chrome (specifically the debugger), which is using them before they are initialized. The javascript stack at the time of the crash is: #0 (nil) resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/utils/TabSources.js:288 (0x7f21644d0830 @ 61) #1 (nil) resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/utils/TabSources.js:360 (0x7f21644d0c18 @ 13) #2 (nil) resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:922 (0x7f21644d3e70 @ 100) #3 (nil) resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:801 (0x7f21644d3a88 @ 72) #4 (nil) resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:740 (0x7f21644d38f8 @ 8) So it's some event triggering a promise resolution that calls into some code that lists script source objects. This code gets its hands on not-fully-initialized script source objects. Next step is to figure out how and where ScriptSourceObjects are escaping. There may be an assumption somewhere else in the code that emits events for particular scripts getting compiled, before the compartment merge happens and the ScriptSourceObject gets fully initialized.

Kannan Vijayan [:djvj]

Assignee

Comment 29

•

10 years ago

Shu pointed to this as https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Debugger.cpp#3688 The findScripts code figures out which scripts to expose to debugger. Might need to fix some of the conditionals there.

Kannan Vijayan [:djvj]

Assignee

Comment 30

•

10 years ago

Hey guys, I need some insight into this issue. To give some context to this patchset (so you don't have to read all the comments), I'm trying to move most script parsing off the main thread and onto a background thread. The patches for this are written and mostly green, but opt build crash on Talos G2 builds. I discovered that this is due to the issues described on comment 28 - a ScriptSourceObject that has not been fully initialized is leaking out to the debugger frontend code. This happens because off-main-thread-parsed scripts are parsed in a separate compartment from the main compartment. This means that memory allocations during parsing (such as the allocation of ScriptSourceObject) happen in a separate compartment. To handle this, the full initialization of a ScriptSourceObject is put off until an off-main-thread parse completes and the parse compartment is merged with the main compartment. The issue is that before this happens, the SSO is getting leaked to the debugger frontend, which tries to get the '.element' property off of it, which has not been initialized, and thus crashes. I need to figure out what control flow paths the SSO object takes from creation to escaping-to-debugger-frontend. For all of those paths, I need to identify SSO objects created on off-main-thread parsed scripts, and ensure that they are "held back" until the parse completes. Insight and advice would be greatly appreciated.

Flags: needinfo?(jimb)

Flags: needinfo?(ejpbruel)

Kannan Vijayan [:djvj]

Assignee

Comment 31

•

10 years ago

After more digging, and printf instrumentation, I've learned some more about what's going on: 1. The SSO we are crashing on is created as part of an off-main-thread compile. 2. The reason the SSO is never ititialized is because the compile (fronted::CompileScript) fails, returning null to HelperThread::handleParseWorkload. This causes the ParseTask's script to get set to null. 3. I printf-instrumented the Debugger::ScriptQuery::consider function to print all JSScripts. The script in question doesn't show up in it. 4. Which means that the SSO is somehow escaping into chrome js, and it's not going through Debugger::findScripts.

Kannan Vijayan [:djvj]

Assignee

Comment 32

•

10 years ago

Figured out the problem. This is what happens: 1. Off-main-thread compile is started, and creates an incomplete SSO object as part of the compile. The expectation is that the SSO object will be fully initialized once the OMTC (off-main-thread compile) compartment is merged. 2. During the compile, the top-level JSScript is allocated, pointing to the SSO. The top-level JSScript's code_ field is null (it's not set until the end of compilation), so this is ok, because the script can't be used. 3. However, inner JSScripts that are allocated can complete their compilation fully. These scripts reference the same SSO, and their code_ field is not null. 4. The top-level compile fails (for whatever reason). This sets the OMTC ParseTask->script to null. 5. On the main thread, after compartment merge, the code tries to fully initialize the SSO, but it can't because it gets the SSO from the ParseTask->script, which is now null. The assumed logic here is "the script failed to compile and is not valid anyway, so it doesn't matter if the SSO for it is initialized or not". 6. Post-merge, the JSScript, and all of the inner JSScripts are in the new compartment. The top-level JSScript is invalid (null code_ pointer), but the inner JSScripts are perfectly valid (valid code_ pointer). The inner JSScripts point to a partially initialized SSO. 7. findScripts discovers the "successfully compiled" inner JSScripts, and accesses the partially initialized SSO from it, leading to the crash.

Flags: needinfo?(jimb)

Flags: needinfo?(ejpbruel)

Kannan Vijayan [:djvj]

Assignee

Comment 33

•

10 years ago

Ok, fixed the crash. Try runs look clean (couple of them): https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb349bc1f1a3 https://treeherder.mozilla.org/#/jobs?repo=try&revision=b71b149efd99 Also, talos shows a significant perf IMPROVEMENT (yay) of 30% or more on V8 on windows: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=8a9734a9d97d&newProject=try&newRevision=bb349bc1f1a3 The improvement is not a fluke. Holds between different try runs. I am not sure _exactly_ why, but I'll take a look at the horse's mouth later.

Kannan Vijayan [:djvj]

Assignee

Comment 34

•

10 years ago

Attached patch 0-move-offthread-token-into-script-load-request.patch — Details — Splinter Review

Attachment #8620614 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 35

•

10 years ago

Attached patch 1-factor-process-request-and-off-thread-parsing-into-separate-functions.patch — Details — Splinter Review

Attachment #8620616 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 36

•

10 years ago

Attached patch 2-change-mloading-to-mprogress.patch — Details — Splinter Review

Attachment #8620617 - Attachment is obsolete: true

Kannan Vijayan [:djvj]

Assignee

Comment 37

•

10 years ago

Attached patch 3-off-thread-compile-blocking-scripts.patch — Details — Splinter Review

Attachment #8620618 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 38

•

10 years ago

I wonder what the v8 test is actually testing. Does it test js parsing/compiling time + some benchmark, when one would expect it to test only the benchmark part.

Kannan Vijayan [:djvj]

Assignee

Comment 39

•

10 years ago

Attached patch 4-initialize-sourceobject-when-compilation-fails.patch — Details — Splinter Review

Kannan Vijayan [:djvj]

Assignee

Comment 40

•

10 years ago

(In reply to Olli Pettay [:smaug] from comment #38) > I wonder what the v8 test is actually testing. Does it test js > parsing/compiling time + some benchmark, when one would expect it to test > only the benchmark part. I'm not sure what's going on here, specifically, but the improvements hold across multiple comparisons with builds off of different tips, so they seem reliable. I tried to get a breakdown on which tests were doing better, but the perfherder breakdown page didn't load. I was just looking for "no regressions", so I'm not too concerned about getting to the bottom of this just yet.

Kannan Vijayan [:djvj]

Assignee

Comment 41

•

10 years ago

Comment on attachment 8639439 [details] [diff] [review] 4-initialize-sourceobject-when-compilation-fails.patch So, I'm giving this review to you since you did the rest of the reviews. It touches JS-land a bit more heavily, but nothing serious. It just makes sure that ScriptSourceObject gets initialized properly even if script compilation failed. Mostly it's ensuring that frontend::CompileScript emits a ScriptSourceObject independently of the returned JSScript*.

Attachment #8639439 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 42

•

10 years ago

Comment on attachment 8639439 [details] [diff] [review] 4-initialize-sourceobject-when-compilation-fails.patch This is way too js-eng-y for me... >+++ b/js/src/vm/HelperThreads.h >@@ -475,16 +475,19 @@ struct ParseTask > JS::OffThreadCompileCallback callback; > void* callbackData; > > // Holds the final script between the invocation of the callback and the > // point where FinishOffThreadScript is called, which will destroy the > // ParseTask. > JSScript* script; > >+ // Holds the ScriptSourceObject generated for the script compilation. >+ ScriptSourceObject* sourceObject; >+ ...especially since I have no idea what the lifetime management is here. Any Foo* member variable looks scary to me, and I treat them as a security critical bug unless proved otherwise.

Attachment #8639439 - Flags: review?(bugs) → review?(jdemooij)

Kannan Vijayan [:djvj]

Assignee

Comment 43

•

10 years ago

(In reply to Olli Pettay [:smaug] from comment #42) > ...especially since I have no idea what the lifetime management is here. > Any Foo* member variable looks scary to me, and I treat them as a security > critical bug unless proved otherwise. I realize it's in Jan's hands now, but just to explain that part: It's OK for the same reason the bare-pointer |JSScript* script| hanging off of that struct is ok. The only thing that happens after compilation finishes is the ParseTask getting queued back up for finishing-up in the main thread. The finishing-up is just merging the compartment. No more allocations or any other activity should be happening on that compartment. If it were, the |JSScript*| on the ParseTask would also be unsafe.

Kannan Vijayan [:djvj]

Assignee

Comment 44

•

10 years ago

Comment on attachment 8639439 [details] [diff] [review] 4-initialize-sourceobject-when-compilation-fails.patch Review of attachment 8639439 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsscript.cpp @@ +2782,5 @@ > return global(); > } > > void > +JSScript::makeInvalid() I just realized this method is unused. Please ignore it. It was for an earlier attempt at the fix that didn't work out.

Jan de Mooij [:jandem]

Comment 45

•

10 years ago

Comment on attachment 8639439 [details] [diff] [review] 4-initialize-sourceobject-when-compilation-fails.patch Review of attachment 8639439 [details] [diff] [review]: ----------------------------------------------------------------- Ugh, nice detective work tracking this down. ::: js/src/frontend/BytecodeCompiler.cpp @@ +783,5 @@ > compiler.setEnclosingStaticScope(enclosingStaticScope); > + JSScript* script = compiler.compileScript(scopeChain, evalCaller, staticLevel); > + > + if (sourceObjectOut) > + *sourceObjectOut = compiler.sourceObjectPtr(); To avoid accidentally reintroducing the bug, this could use a comment explaining we need to return the sourceObject even if the compilation failed, because inner scripts may be referencing it. ::: js/src/jsscript.h @@ +1838,5 @@ > #ifdef DEBUG > uint32_t stepModeCount() { return hasDebugScript_ ? debugScript()->stepMode : 0; } > #endif > > + void makeInvalid(); (as you said, don't forget to remove this one) ::: js/src/vm/HelperThreads.cpp @@ +232,5 @@ > > bool > ParseTask::finish(JSContext* cx) > { > + if (sourceObject) { Nit: it seems we could MOZ_ASSERT(sourceObject) ?

Attachment #8639439 - Flags: review?(jdemooij) → review+

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

10 years ago

Assignee: nobody → kvijayan

Status: NEW → ASSIGNED

Kannan Vijayan [:djvj]

Assignee

Comment 46

•

10 years ago

(In reply to Jan de Mooij [:jandem] from comment #45) > ::: js/src/vm/HelperThreads.cpp > @@ +232,5 @@ > > > > bool > > ParseTask::finish(JSContext* cx) > > { > > + if (sourceObject) { > > Nit: it seems we could MOZ_ASSERT(sourceObject) ? It's possible for the compilation to fail early enough that the SSO itself is not created. We need to allow for a null sourceObject after the compilation attempt.

Pulsebot

Comment 47

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b88cb51c1f77 https://hg.mozilla.org/integration/mozilla-inbound/rev/95a7f1251dd4 https://hg.mozilla.org/integration/mozilla-inbound/rev/4264d30f4b03 https://hg.mozilla.org/integration/mozilla-inbound/rev/211b839cfcda https://hg.mozilla.org/integration/mozilla-inbound/rev/08653d62e6c0

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 48

•

10 years ago

Backed out for mochitest-chrome failures in https://hg.mozilla.org/integration/mozilla-inbound/rev/3d882ef61333 https://treeherder.mozilla.org/logviewer.html#?job_id=12222096&repo=mozilla-inbound

Flags: needinfo?(kvijayan)

Kannan Vijayan [:djvj]

Assignee

Comment 49

•

10 years ago

Trying to build b2g emulator to replicate this. Running into build problems (the B2G build instructions don't work). Installing ubuntu on a VM to try again and see if that works. The B2G Emulator Mochitest chrome crash is a shutdown crash, in: std::auto_ptr<std::basic_streambuf<char, std::char_traits<char> > >::~auto_ptr() This destructor is called on one of the static global streambuf instances that stl uses to keep the underlying buffers for its cin/cout/cerr streams. Something in the destructor ends up jumping to null (pc == 0x00000000 in crash). I have no clue why this might be happening. Investigating.

Flags: needinfo?(kvijayan)

Emanuel Hoogeveen [:ehoogeveen]

Comment 50

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #33) > Also, talos shows a significant perf IMPROVEMENT (yay) of 30% or more on V8 > on windows: > https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla- > inbound&originalRevision=8a9734a9d97d&newProject=try&newRevision=bb349bc1f1a3 (In reply to Kannan Vijayan [:djvj] from comment #40) > I tried to get a breakdown on which tests were doing better, but the > perfherder breakdown page didn't load. I was just looking for "no > regressions", so I'm not too concerned about getting to the bottom of this > just yet. The subtest view loads now, but the scores make no sense! For one thing, why are Earley and Boyer, and Decrypt and Encrypt separate? Even ignoring those, why does RegExp have the highest score when it should have the lowest? It looks like Talos is displaying the *times*, but treating them as scores! I don't know how this could be, though, since the score at [1] does seem to be going up over time (as you'd expect from our gradually improving score). [1] http://graphs.mozilla.org/graph.html#tests=[[230,131,35]]&sel=none&displayrange=365&datatype=geo

Emanuel Hoogeveen [:ehoogeveen]

Comment 51

•

10 years ago

Joel, could you shed some light on this? Which view is correct? It goes without saying that if the scores are actually times, then the large improvement is actually a large regression.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 52

•

10 years ago

ah, sorry for the confusion, and thanks for looking at the perf on your own! V8- while this is a few years old, it still has a lot of useful data. V8 has the 8 subtests and they are then summarized into the final v8 score. This is done inside of Talos and uploaded to graph server, so the graph server score for v8 is the number that v8 intended to report. Keep in mind that we have each of these subtests which are run (e.g. Earley, Boyer, RegExp) and we harvest the numbers from them. In fact all of the raw replicates of them are posted to perfherder which we take all of the data and just report a geometric mean, not a benchmark score. This is mentioned in bug 1184966. We are fixing this and have already done 1 of 2 parts where talos now sends the computed benchmark scores for everything in the data blog to perfherder. What remains is that perfherder needs to ingest the new data in the json blob so we report it properly. Does that shed light and make sense?

Flags: needinfo?(jmaher)

Emanuel Hoogeveen [:ehoogeveen]

Comment 53

•

10 years ago

I think so, but this means that the perfherder results are currently wrong, correct? They are interpreting an increase in raw runtime as an increase in scores, and thus reporting an improvement? Kannan, the affected tests are DeltaBlue, RayTrace and Richards, with RayTrace regressing particularly badly (if I'm interpreting this right). See e.g. [1] and [2] for the linux32 and linux64 results.

Emanuel Hoogeveen [:ehoogeveen]

Comment 54

•

10 years ago

Er, where [1] and [2] are as below. [1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=8a9734a9d97d&newProject=try&newRevision=bb349bc1f1a3&originalSignature=f3cdc5746374e146c7cc7c679e89f6dc5931f68e&newSignature=f3cdc5746374e146c7cc7c679e89f6dc5931f68e [2] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=8a9734a9d97d&newProject=try&newRevision=bb349bc1f1a3&originalSignature=e81533b82bb54362bdb9a24b08da2d09b79c8956&newSignature=e81533b82bb54362bdb9a24b08da2d09b79c8956

Joel Maher ( :jmaher ) (UTC -8)

Comment 55

•

10 years ago

the suite summary in perfherder will be incorrect, but the subtests like RayTrace and Richards should be useful. Right now graph server doesn't have an easy way to get those subtest results.

Kannan Vijayan [:djvj]

Assignee

Comment 56

•

10 years ago

I want to go over what I've discovered so far about this bug: 1. No luck in reproducing locally yet. I took a while to get a build of B2G ICS Emulator going. Then took a while more to get mochitests running on it. I'm still not sure if it's runnign fine because the screen is black due to a known bug in B2G that is not fixed yet. 2. I bisected my patchset. The problem gets introduced at patch 3-off-thread-compile-blocking-scripts.patch . This patch is what "turns on" the OMTC (off-main-thread compilation) machinery. Within the patch, the specific logic that directs scripts to off-thread parsing is what induces the crash. 3. Interestingly, there are 2 places where the patch directs previously synchronous script compilation to an off-main-thread compilation. The first is when we find a preloaded (speculatively loaded) script in the preload cache. The second is when we get a script through a regular network request. If either one of these is commented out, the crash goes away. It's only when both mechanisms are used that the crash shows up. I'm requesting a try machine loan again and am going to try to replicate it there. Here is a failing try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=213da61817be (In reply to Kannan Vijayan [:djvj] from comment #49) > Trying to build b2g emulator to replicate this. Running into build problems > (the B2G build instructions don't work). Installing ubuntu on a VM to try > again and see if that works. > > The B2G Emulator Mochitest chrome crash is a shutdown crash, in: > > std::auto_ptr<std::basic_streambuf<char, std::char_traits<char> > > >::~auto_ptr() > > This destructor is called on one of the static global streambuf instances > that stl uses to keep the underlying buffers for its cin/cout/cerr streams. > Something in the destructor ends up jumping to null (pc == 0x00000000 in > crash). > > I have no clue why this might be happening. Investigating.

Kannan Vijayan [:djvj]

Assignee

Updated

•

10 years ago

Keywords: leave-open

Kannan Vijayan [:djvj]

Assignee

Comment 57

•

10 years ago

In the interests of making this easier to track, I've landed all the lead-up infrastructure to off-main-thread parsing, but left off the actual usage of that parsing.

Pulsebot

Comment 58

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e7251438cac7 https://hg.mozilla.org/integration/mozilla-inbound/rev/5afb8edd05f7 https://hg.mozilla.org/integration/mozilla-inbound/rev/c467f1b2bc71 https://hg.mozilla.org/integration/mozilla-inbound/rev/49dbd2c38b9e

Ryan VanderMeulen [:RyanVM]

Comment 59

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/e7251438cac7 https://hg.mozilla.org/mozilla-central/rev/5afb8edd05f7 https://hg.mozilla.org/mozilla-central/rev/c467f1b2bc71 https://hg.mozilla.org/mozilla-central/rev/49dbd2c38b9e

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

10 years ago

Blocks: 1164539

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 60

•

10 years ago

Would it make sense to close this one and move the rest of the work to happen in a different bug. That way patches from one bug would land all in one release.

Kannan Vijayan [:djvj]

Assignee

Comment 61

•

10 years ago

Sure, I'm ok for that. The infrastructure is in now. It's just the actually-using-it part that's blocked on crash.

Kannan Vijayan [:djvj]

Assignee

Comment 62

•

10 years ago

Resolving this, moving remaining work to bug 1084009.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Firefox Bug Husbandry Bot

Comment 63

•

8 years ago

Removing leave-open keyword from resolved bugs, per :sylvestre.

Keywords: leave-open

You need to log in before you can comment on or make changes to this bug.