Closed Bug 1181345 Opened 10 years ago Closed 10 years ago

When accent is found, subsequent code isn't shown in debugger (Firefox Nightly)

Tracking

(firefox40 unaffected, firefox41+ fixed, firefox42+ fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 42

Tracking Flags:

Tracking

Status

firefox40

---

unaffected

firefox41

fixed

firefox42

fixed

People

(Reporter: contato, Assigned: sjakthol)

References

Details

(Keywords: regression)

Attachments

(3 files)

debugger-bug.png 10 years ago Nicholas Camp Barcellos 12.82 KB, image/png		Details
1181345.html (testcase) 10 years ago Loic 70 bytes, text/html		Details
bug-1181345-non-utf8-files.patch 10 years ago Sami Jaktholm 6.96 KB, patch	past : review+ ritu : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Nicholas Camp Barcellos

Reporter

Description

•

10 years ago

Attached image debugger-bug.png — Details

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:42.0) Gecko/20100101 Firefox/42.0 Build ID: 20150707030206 Steps to reproduce: Write code with accent and opened it in the debugger from the web developer tools. Actual results: The character with accent and subsequent code isn't shown. Expected results: Code should be displayed as in Firefox standard release.

Loic

Comment 1

•

10 years ago

Attached file 1181345.html (testcase) — Details

Loic

Comment 2

•

10 years ago

I'm able to reprouce the issue when I load the testcase locally (as file://). Reg range: good=2015-06-19 bad=2015-06-20 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2694ff2ace6a&tochange=c319f262ce3e My guess goes to bug 1169343.

Blocks: 1169343

Status: UNCONFIRMED → NEW

status-firefox40: --- → unaffected

tracking-firefox41: --- → ?

tracking-firefox42: --- → ?

Component: Untriaged → Developer Tools: Debugger

Ever confirmed: true

Flags: needinfo?(ejpbruel)

Keywords: regression

Version: 42 Branch → 41 Branch

Alice0775 White

Comment 3

•

10 years ago

Pushlog: https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=543e22ba9046&tochange=07b5f3ee0315 Regressed by: a6d0ac7916ff Sami Jaktholm — Bug 1170864 - Refactor DevToolsUtils.fetch to use NetUtil more extensively and make it fall back to OS.File if a local file cannot be read. r=past

Blocks: 1170864
No longer blocks: 1169343

Loic

Updated

•

10 years ago

Flags: needinfo?(ejpbruel) → needinfo?(sjakthol)

Sami Jaktholm

Assignee

Comment 4

•

10 years ago

This is my fault. This is an encoding issue. If no charset information is available the code assumes it to be UTF-8. However, if the file is not UTF-8 (like the testcase here which is iso-8859), the bytes -> js string conversion is cut short. I have a patch but since try is closed I can't test it at the moment.

Flags: needinfo?(sjakthol)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 5

•

10 years ago

Adding a tracking flag for FF41 and FF42 as this is a regression.

status-firefox41: --- → affected

status-firefox42: --- → affected

tracking-firefox41: ? → +

tracking-firefox42: ? → +

Sami Jaktholm

Assignee

Updated

•

10 years ago

Comment 6

•

10 years ago

So this is an encoding issue and it happens to be a fun one. When we try to read a file from the disk, we can only guess its encoding[1]. If no charset hints are provided the code defaults to UTF-8. However, the attached test case file is in Windows-1252 (which is a superset of ISO-8859-1 [2]). And this is where the problem lies. Why this worked before? The earlier version first read the raw bytes from a file and the bytes were converted to a JavaScript string according to a locale specific default encoding[3] which is Windows-1252 for most locales. Only after the initial conversion the code tried to decode the data as UTF-8 but if that failed it just returned the initial, Windows-1252 decoded, string. So in essence the result of the conversion was the one that worked. If the file was Windows-1252 it returned Windows-1252 decoding of the data. If it was UTF-8 it returned UTF-8 decoding of the data. If it was Windows-1252 but did not contain sequences that are invalid in UTF-8, it likely returned something that looked broken (we cannot really avoid this: if we don't know the encoding, we can't decode it). Why it doesn't work anymore? When I fixed bug 1170864 I removed the second UTF-8 conversion step and instead defaulted directly to UTF-8 in the first conversion if no encoding hints were given. The code uses NetUtil.readInputStreamToString() to do the conversion. And that seems to silently truncate the string if invalid UTF-8 byte sequences are found (filed bug 1182755). The fun part: When you encode the string "déf" with Windows-1252, you get the byte sequence \x64\xe9\x66. When you try to decode that sequence as UTF-8 you're in trouble. According to UTF-8 rules, if the byte 0xe? is found (the middle one, 0xe9, in the above sequence) the next two bytes must have binary form of 10xxxxxx 10xxxxxx.[4] Since the next byte is 0x66 or 01100110 in binary, the decoding as it's not a valid UTF-8 continuation byte. To fix this problem I will add the previous conversion behavior back even though it is discouraged and deprecated (bug 1003087). [1] https://developer.mozilla.org/en-US/docs/Reading_textual_data#Determining_the_character_encoding_of_data [2] https://en.wikipedia.org/wiki/Windows-1252#Details [3] https://developer.mozilla.org/en-US/docs/Web/Guide/Localizations_and_character_encodings#Details_and_browser_internals [4] http://stackoverflow.com/a/5552623

Sami Jaktholm

Assignee

Comment 7

•

10 years ago

Attached patch bug-1181345-non-utf8-files.patch — Details — Splinter Review

Here's a patch that restores the pre-bug 1170864 conversion behavior. More information in the commit message and the previous comment. Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=96f0fe793042

Assignee: nobody → sjakthol

Status: NEW → ASSIGNED

Attachment #8632448 - Flags: review?(past)

Panos Astithas (he/him) [:past] (please ni?)

Comment 8

•

10 years ago

Comment on attachment 8632448 [details] [diff] [review] bug-1181345-non-utf8-files.patch Review of attachment 8632448 [details] [diff] [review]: ----------------------------------------------------------------- Splendid job, as always, Sami!

Attachment #8632448 - Flags: review?(past) → review+

Sami Jaktholm

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

Pulsebot

Comment 9

•

10 years ago

https://hg.mozilla.org/integration/fx-team/rev/b2709be0bee3

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 10

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/b2709be0bee3

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox42: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 42

Sami Jaktholm

Assignee

Comment 11

•

10 years ago

Comment on attachment 8632448 [details] [diff] [review] bug-1181345-non-utf8-files.patch Approval Request Comment [Feature/regressing bug #]: Bug 1170864. [User impact if declined]: Debugger shows truncated sources if the files are not UTF-8 encoded and contain invalid UTF-8 byte sequences such as accents in Latin-1 (ISO 8859-1) encoded files. [Describe test coverage new/current, TreeHerder]: Unit tests for basic cases are still passing and this patch introduced tests for this bug. Also the tests for tools that rely on the modified utility method (mostly Debugger and Style Editor) are still passing. [Risks and why]: Minimal as the patch only restores pre-bug 1170864 behavior for charset conversion. In the worst case scenario the debugger won't load sources or shows garbled text due to charset conversion failure. [String/UUID change made/needed]: N/A.

Attachment #8632448 - Flags: approval-mozilla-aurora?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 12

•

10 years ago

Comment on attachment 8632448 [details] [diff] [review] bug-1181345-non-utf8-files.patch The patch has added unit tests to test this scenario which is great. Let's uplift to Aurora.

Attachment #8632448 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 13

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/e2e14c4cfd09

status-firefox41: affected → fixed

Flags: in-testsuite+

Henri Sivonen (:hsivonen)

Comment 14

•

8 years ago

(In reply to Sami Jaktholm from comment #6) > Why this worked before? > The earlier version first read the raw bytes from a file and the bytes were > converted to a JavaScript string according to a locale specific default > encoding[3] which is Windows-1252 for most locales. Only after the initial > conversion the code tried to decode the data as UTF-8 but if that failed it > just returned the initial, Windows-1252 decoded, string. It seems to me that there's nothing in the code tying the behavior to the locale. Instead, it seems to me that the code is relying on bytes getting zero-extended to UTF-16 code units, which happens to match ISO-8859-1 to UTF-16 conversion. Am I missing something?

Flags: needinfo?(sjakthol)

Henri Sivonen (:hsivonen)

Comment 15

•

8 years ago

More importantly: Why do our dev tools try to support creating new non-UTF-8 content? (Presumably, local files are stuff you are developing yourself and not someone else's code you are debugging to figure out a Web compat problem.)

Sami Jaktholm

Assignee

Comment 16

•

8 years ago

Sorry, I can't remember any extra details that are not already written down to this bug. But it could very well be that my analysis at the time was flawed. I'm was not and I'm still not an expert in the area of string encodings.

Flags: needinfo?(sjakthol)

Henri Sivonen (:hsivonen)

Comment 17

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #14) > It seems to me that there's nothing in the code tying the behavior to the > locale. Instead, it seems to me that the code is relying on bytes getting > zero-extended to UTF-16 code units, which happens to match ISO-8859-1 to > UTF-16 conversion. I managed to track down what I changed to regress the test case for this bug, and, indeed, this has nothing to do with the locale. If the input isn't valid UTF-8, zero-extending the bytes to UTF-16 code units is used (which happens to resemble locale-specific legacy conversion for some locales, including en-US that we run our tests as).

BMO Automation

Updated

•

7 years ago

Product: Firefox → DevTools

You need to log in before you can comment on or make changes to this bug.