HTML parser should supply column numbers for <script> elements
Categories
(Core :: DOM: HTML Parser, defect, P2)
Tracking
()
People
(Reporter: loganfsmyth, Assigned: hsivonen)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [debugger-mvp])
Attachments
(3 files, 1 obsolete file)
This was spawned from https://bugzilla.mozilla.org/show_bug.cgi?id=1538056 but I'm going to file it separately since there is at least some work that can be done there on its own without this.
In 67, the JS debugger will begin treating breakpoints as line/column positions, where traditionally they have been treated as line-based. One of the outstanding issues is that while that is fine for
<script>
console.log("foo");
</script>
where a breakpoint might be specifically at column 2 (the c
in console.log
)
we run into issues with
<script>console.log("foo");</script>
because the position of c
, as far as the engine knows is 0
, even though visually it is column 8
, which causes our breakpoint markers to render in the wrong place.
While hand-written markup usually has that newline and doesn't tend to encounter this, we see it pop up in contexts where the inline code has been minified, which can be quite frustrating for users since debugging minified code is already tough and this will make it even tougher. Things can be especially frustrating if there are multiple <script> elements on the same line, because then it isn't just that the breakpoints are rendered off by 8 characters, but also that there could in fact be multiple sets of breakpoints overlapping.
The nsIScriptElement
type already has a SetScriptColumnNumber
, which is set by the XML parser, but not the HTML parser, so that would be the objective of this bug.
The XML case also doesn't work, so I'm going to co-opt https://bugzilla.mozilla.org/show_bug.cgi?id=1538056 to address that (we just never used the column from nsIScriptElement
at all right now). Then once this lands we should be all set.
Assignee | ||
Comment 1•6 years ago
|
||
This patch is untested beyond checking that it compiles. This is the part that is affected by Java to C++ translation. The rest should be buildable on top of this in non-generated C++.
Please check the impact on the innerHTML
setter early (with text input that has no tags).
Reporter | ||
Comment 2•6 years ago
|
||
One issue here is that this is a code-unit column count, but SpiderMonkey's column values expect a code-point count, so it's not just an char16_t indexed counter that we need, but a codepoint count.
The alternate approach I've thought about is storing the start offset of the current line start, and in the rare case where getColumnNumber
is actually called, it could run through the buffer between that location and the current location, calculating the codepoint offset. That has the downside of the tokenizer needing to be aware of the current buffer and position, which are currently scoped to only the stateLoop, but it would potentially have less of a performance effect since checkChar
would have to do less per character.
Thoughts?
Updated•6 years ago
|
Assignee | ||
Comment 3•6 years ago
|
||
(In reply to Logan Smyth [:loganfsmyth] from comment #2)
One issue here is that this is a code-unit column count, but SpiderMonkey's column values expect a code-point count, so it's not just an char16_t indexed counter that we need, but a codepoint count.
:-( Code point counts instead of code unit counts are generally bad when identifying a position in a text buffer, but I guess these are supposed to "make sense" independently of buffer encoding in messages.
Since line numbers in document.written scripts are hopeless anyway and innerHTML
-inserted scripts don't run, you can rely on data from the network being guaranteed to be valid UTF-16, so you can simply make checkChar
not increment the column upon encountering a low surrogate.
The alternate approach I've thought about is storing the start offset of the current line start, and in the rare case where
getColumnNumber
is actually called, it could run through the buffer between that location and the current location, calculating the codepoint offset. That has the downside of the tokenizer needing to be aware of the current buffer and position, which are currently scoped to only the stateLoop, but it would potentially have less of a performance effect sincecheckChar
would have to do less per character.
Since the tokenizer tokenizes a sequence of buffers such that a boundary can be anywhere, dealing with syncing things on buffer boundary is potentially complicated and/or error-prone, so I suggest seeing what the perf impact of the simple solution is first.
Which reminds me that my untested patch doesn't handle restoring the column number when rewinding the input stream after a failed speculation. I'll post a new patch.
Assignee | ||
Comment 4•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•3 years ago
|
Comment 5•3 years ago
|
||
Comment 6•3 years ago
|
||
So I applied patch from Comment 4 and rebased + fixed compilation errors I was seeing, and this seems to give us the information we need.
I added a patch on top of this one to actually consume the tokenizer information and set the column on inline script load context (see https://phabricator.services.mozilla.com/D170580 , for Bug 1815937)
I pushed to TRY with mach try auto: https://treeherder.mozilla.org/jobs?repo=try&revision=cda6d06715d8418ee608012a86427889f9c990f1
Aside from the devtools failures, which we'll take care of, it looks like only one other test fails: parser/htmlparser/tests/mochitest/test_bug672453.html , but I'm not sure which message is missing and why
Henri, do you think you could take ownership of this patch, check that I didn't do anything stupid when rebasing and ask review to a peer?
Assignee | ||
Comment 7•3 years ago
|
||
I read the rebased patch, and it looks OK.
smaug, the main concern here is if this affects innerHTML
performance. Can you suggest a benchmark that should be used to check whether this is harmless enough for innerHTML
perf?
Comment 9•3 years ago
|
||
TRY push of speedometer with attached patch: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&newProject=try&newRevision=60d0dd9d309ecbae659042c7e4c9453575709c57&framework=13&originalRevision=574e19c9f0db5a543f7ca0b1cb0263c69df7045c&page=1
It's still ongoing, but there seems to be a (small) impact on jQuery-TodoMVC/Adding100Items
, which smaug said we should pay attention to:
Assignee | ||
Comment 10•3 years ago
|
||
nchevobbe, does changing
inline char16_t checkChar(char16_t* buf, int32_t pos) {
char16_t c = buf[pos];
if (nextCharOnNewLine) {
line++;
col = 1;
nextCharOnNewLine = false;
} else if (!(c >= 0xDC00 && c <= 0xDFFF)) {
// SpiderMonkey wants to count scalar values
// instead of UTF-16 code units.
col++;
}
return c;
}
to
inline char16_t checkChar(char16_t* buf, int32_t pos) {
char16_t c = buf[pos];
if (MOZ_UNLIKELY(nextCharOnNewLine)) {
line++;
col = 1;
nextCharOnNewLine = false;
} else if (MOZ_LIKELY(!NS_IS_LOW_SURROGATE(c))) {
// SpiderMonkey wants to count scalar values
// instead of UTF-16 code units.
col++;
}
return c;
}
improve things?
Comment 11•3 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #10)
nchevobbe, does changing
[…]
improve things?
I feel like I don't see the same regression as before, but also, it looks like the results don't have a super high confidence.
Anyhow, it doesn't seem like there would be a very obvious regression, so it sounds okay?
Comment 12•3 years ago
|
||
Talking with sparky on Element, it seems like overall, there is no regression on the speedometer score overall
Ok so it isn't so bad, but you got unlucky with a single outlier data point: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3448328,1,13&timerange=86400&zoom=1677840218136,1677843859926,122.8025877384645,161.669612285737
The same thing seems to have happened on linux: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3445603,1,13&timerange=86400&zoom=1677839357024,1677845474908,85.9995250547984,125.63516599693689
On mac there might be an improvement to the score if you ignore the outlier: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3448328,1,13&timerange=86400&zoom=1677840245133,1677843800247,127.58475711769015,136.84425414218742
Assignee | ||
Comment 13•3 years ago
|
||
smaug, are you OK with landing this with the refinement from comment 10?
Assignee | ||
Updated•3 years ago
|
Comment 14•3 years ago
•
|
||
I think I'm ok, but if we see some regression after all, it may need to get backed out.
With perf tests one need to run them quite a few times to get reliable numbers.
Assignee | ||
Comment 15•3 years ago
|
||
nchevobbe, I suggest updating the patch with the change from comment 10 and requesting review from smaug. (It seems inappropriate for me to review, since most of the code in the patch comes from me.)
Once the patch is on m-c, I can then reconstruct the appropriate changeset for the htmlparser repo on GitHub.
Comment 16•3 years ago
|
||
If you profile microbenchmark like http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html, does the new code show up in the profile? Might be useful to change that test to use a bit longer string for innerHTML.
Comment 17•3 years ago
|
||
Here's the test page modified
The only thing I changed is that
// in loop
document.getElementById("speed_dom").innerHTML =
"<b><span> Testing...</span> (" + i + "/1000)</b>"
into
const suffix = "<i></i>".repeat(100);
// ...
// in loop
document.getElementById("speed_dom").innerHTML =
"<b><span> Testing...</span> (" + i + "/1000)</b>" + suffix
Here's a base profile from latest central: https://share.firefox.dev/3JfgPE4
And a profile with henri's latest patch: https://share.firefox.dev/3EWsN2D
I don't see too much differences in set Element.innerHTML
timings, but I'm not sure I'm able to analyze those graphs accurately
Updated•3 years ago
|
Comment 18•3 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #15)
nchevobbe, I suggest updating the patch with the change from comment 10 and requesting review from smaug. (It seems inappropriate for me to review, since most of the code in the patch comes from me.)
Once the patch is on m-c, I can then reconstruct the appropriate changeset for the htmlparser repo on GitHub.
Patch queue rebased and pushed on phab https://phabricator.services.mozilla.com/D170579
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Comment 20•3 years ago
|
||
Backed out changeset 1b5481ac73dc (Bug 1552008) for mochitest failures on test_bug672453.html.
Backout link
Push with failures <--> 1
Failure Log
Comment 21•3 years ago
|
||
looks like the failure I was talking about in https://bugzilla.mozilla.org/show_bug.cgi?id=1552008#c6
Comment 22•3 years ago
|
||
This does fix the issue
diff --git a/parser/htmlparser/tests/mochitest/test_bug672453.html b/parser/htmlparser/tests/mochitest/test_bug672453.html
--- a/parser/htmlparser/tests/mochitest/test_bug672453.html
+++ b/parser/htmlparser/tests/mochitest/test_bug672453.html
@@ -104,7 +104,7 @@ var expectedErrors = [
isWarning: true },
{ errorMessage: "The start of the document was reparsed, because there were non-ASCII characters in the part of the document that was unsuccessfully searched for a meta tag before falling back to the XML declaration syntax. A meta tag at the start of the head part should be used instead of the XML declaration syntax.",
sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_xml_speculation_fail.html",
- lineNumber: 11,
+ lineNumber: 10,
isWarning: true },
];
line 10 points to the closing html tag https://searchfox.org/mozilla-central/rev/cd2121e7d83af1b421c95e8c923db70e692dab5f/parser/htmlparser/tests/mochitest/file_bug672453_xml_speculation_fail.html#10
Assignee | ||
Comment 23•3 years ago
|
||
(In reply to Nicolas Chevobbe [:nchevobbe] from comment #22)
This does fix the issue
Thanks. Relanding queued with this tweak. Sorry about the delay.
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Assignee | ||
Comment 26•3 years ago
|
||
Description
•