Closed Bug 1848025 Opened 2 years ago Closed 2 years ago

1.78% (2 MB) apk-size-fenixDebug (Android) regression on Mon August 7 2023

Tracking

(firefox116 unaffected, firefox117 unaffected, firefox118+ wontfix)

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox116

---

unaffected

firefox117

---

unaffected

firefox118

wontfix

People

(Reporter: cpeterson, Assigned: m_kato)

References

(Regression)

Details

(Keywords: perf-alert, regression)

Chris Peterson [:cpeterson]

Reporter

Description

•

2 years ago

Perfherder has detected a build_metrics performance regression from push aa7f3681b1656ab5ba5b76adcc919fe16f412379. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)
2%	apk-size-fenixDebug	fenix-android-all		119,559,411.17 -> 121,683,661.75

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Chris Peterson [:cpeterson]

Reporter

Comment 1

•

2 years ago

This APK size regression was caused by new ICU4x data added in bug 1719535.

Severity: -- → S2

Component: Performance → General

Product: Testing → GeckoView

Regressed by: 1719535

Donal Meehan [:dmeehan]

Updated

•

2 years ago

tracking-firefox118: --- → +

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

2 years ago

:m_kato, since you are the author of the regressor, bug 1719535, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(m_kato)

arthur200126

Comment 3

•

2 years ago

The C-FFI layer may be introducing unneeded data. If I am reading the ICU4x segmenter crate, specifically line.rs, correctly:

ICU4XLineSegmenter_create_auto calls down to ffi::ICU4XLineSegmenter_::create_auto
that calls down to LineSegmenter::new_auto(), try_new_auto_with_buffer_provider, and try_new_auto_with_buffer_provider
new_auto() should only call down to the LSTM stuff, ~70 KB segment_line_v1.data.rs
The two macro provider things don't seem to ultimately go down to any data.

I guess turning off the compiled_data feature on create segmenter can help -- if it's not off yet. That still could bring in unneeded word-level segmentation data though.

Makoto Kato [:m_kato]

Assignee

Comment 4

•

2 years ago

•

Edited

By bug 1847958, APK size will be reduced on Nigtly before landing ICU4X. Most incremented data is CJ (Chinese and Japanese). ICU4X team is considering/investigating to move dictionary based implementation to other approaches such as ML.

Assignee: nobody → m_kato

Flags: needinfo?(m_kato)

Makoto Kato [:m_kato]

Assignee

Updated

•

2 years ago

Duplicate of this bug: 1848091

Shane Carr

Comment 6

•

2 years ago

Hi, I'm from the ICU4X team. I know from previous benchmarks that the whole LSTM-based line segmentation component is about 378 KB, code and data together. Here is an example: https://github.com/unicode-org/icu4x/pull/3866

If you need word segmentation on top of line segmentation, the Chinese/Japanese word segmentation dictionary in ICU4X is identical to the one in ICU4C. It is about 2 MB uncompressed and 1.5 MB gzipped. That is big, but it should not be bigger than the ICU4C equivalent. We're happy to mentor anyone who wants to implement an ML-based solution for CJ word segmentation, like we did for Southeast Asian in the LSTM. There are already some early design proposals written.

Chris Peterson [:cpeterson]

Reporter

Updated

•

2 years ago

Comment 7

•

2 years ago

Thanks for the reference points Shane. Since no references to word segmentation has been added as far as I can tell, the larger-than-expected (378 K -> 2 MB) growth must be because of something else having been added. Could be dependencies, could be inadvertent introduction of unused code & data (is LTO having a good enough peek through the FFI?).

Makoto Kato [:m_kato]

Assignee

Comment 8

•

2 years ago

ICU4X datagen won't generate more custom filter. If locale is th and ja, and data is LSTM and dictionary, th data will be generated for LSTM and dictionary. So I think that I will patch to datagen in update_icu4x.sh script when generating data.

Makoto Kato [:m_kato]

Assignee

Comment 9

•

2 years ago

Ah, I have already remove dictionaries for EA.

Makoto Kato [:m_kato]

Assignee

Comment 10

•

2 years ago

It is necessary change to support Chinese and Japanese word segmenter since this is implemented by dictionary. Before landing this, we cannot handle these languages.
ICU4X have a future plan to move CJ dictionary to ML-based data to reduce data size etc.
We already move EA (th, lo, my and km) languages from the dictionary based implementation to LSTM's to reduce data size.
I have fixed other APK size issue by bug 1847958 for Nightly.
compiled_data feature (comment #3) is new feature from ICU4X 1.3. But current stable is 1.2. Even if upgrading to 1.3, it seems to case other problem that is cargo vendor bug. (I will investigate and file this issue)

So this is wontfix.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → WONTFIX

Donal Meehan [:dmeehan]

Updated

•

2 years ago

status-firefox118: affected → wontfix

You need to log in before you can comment on or make changes to this bug.

Bugzilla

1.78% (2 MB) apk-size-fenixDebug (Android) regression on Mon August 7 2023

Categories

(GeckoView :: General, defect)

Tracking

(firefox116 unaffected, firefox117 unaffected, firefox118+ wontfix)

People

(Reporter: cpeterson, Assigned: m_kato)

References

(Regression)

Details

(Keywords: perf-alert, regression)

Crash Data

Security

(public)

User Story

Description

Regressions:

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Updated