1.78% (2 MB) apk-size-fenixDebug (Android) regression on Mon August 7 2023
Categories
(GeckoView :: General, defect)
Tracking
(firefox116 unaffected, firefox117 unaffected, firefox118+ wontfix)
| Tracking | Status | |
|---|---|---|
| firefox116 | --- | unaffected |
| firefox117 | --- | unaffected |
| firefox118 | + | wontfix |
People
(Reporter: cpeterson, Assigned: m_kato)
References
(Regression)
Details
(Keywords: perf-alert, regression)
Perfherder has detected a build_metrics performance regression from push aa7f3681b1656ab5ba5b76adcc919fe16f412379. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
| Ratio | Test | Platform | Options | Absolute values (old vs new) |
|---|---|---|---|---|
| 2% | apk-size-fenixDebug | fenix-android-all | 119,559,411.17 -> 121,683,661.75 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
| Reporter | ||
Comment 1•2 years ago
|
||
This APK size regression was caused by new ICU4x data added in bug 1719535.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
:m_kato, since you are the author of the regressor, bug 1719535, could you take a look?
For more information, please visit BugBot documentation.
Comment 3•2 years ago
|
||
The C-FFI layer may be introducing unneeded data. If I am reading the ICU4x segmenter crate, specifically line.rs, correctly:
ICU4XLineSegmenter_create_autocalls down toffi::ICU4XLineSegmenter_::create_auto- that calls down to
LineSegmenter::new_auto(),try_new_auto_with_buffer_provider, andtry_new_auto_with_buffer_provider new_auto()should only call down to the LSTM stuff, ~70 KBsegment_line_v1.data.rs- The two macro provider things don't seem to ultimately go down to any data.
I guess turning off the compiled_data feature on create segmenter can help -- if it's not off yet. That still could bring in unneeded word-level segmentation data though.
| Assignee | ||
Comment 4•2 years ago
•
|
||
By bug 1847958, APK size will be reduced on Nigtly before landing ICU4X. Most incremented data is CJ (Chinese and Japanese). ICU4X team is considering/investigating to move dictionary based implementation to other approaches such as ML.
Comment 6•2 years ago
|
||
Hi, I'm from the ICU4X team. I know from previous benchmarks that the whole LSTM-based line segmentation component is about 378 KB, code and data together. Here is an example: https://github.com/unicode-org/icu4x/pull/3866
If you need word segmentation on top of line segmentation, the Chinese/Japanese word segmentation dictionary in ICU4X is identical to the one in ICU4C. It is about 2 MB uncompressed and 1.5 MB gzipped. That is big, but it should not be bigger than the ICU4C equivalent. We're happy to mentor anyone who wants to implement an ML-based solution for CJ word segmentation, like we did for Southeast Asian in the LSTM. There are already some early design proposals written.
Comment 7•2 years ago
|
||
Thanks for the reference points Shane. Since no references to word segmentation has been added as far as I can tell, the larger-than-expected (378 K -> 2 MB) growth must be because of something else having been added. Could be dependencies, could be inadvertent introduction of unused code & data (is LTO having a good enough peek through the FFI?).
| Assignee | ||
Comment 8•2 years ago
|
||
ICU4X datagen won't generate more custom filter. If locale is th and ja, and data is LSTM and dictionary, th data will be generated for LSTM and dictionary. So I think that I will patch to datagen in update_icu4x.sh script when generating data.
| Assignee | ||
Comment 9•2 years ago
|
||
Ah, I have already remove dictionaries for EA.
| Assignee | ||
Comment 10•2 years ago
|
||
- It is necessary change to support Chinese and Japanese word segmenter since this is implemented by dictionary. Before landing this, we cannot handle these languages.
- ICU4X have a future plan to move CJ dictionary to ML-based data to reduce data size etc.
- We already move EA (th, lo, my and km) languages from the dictionary based implementation to LSTM's to reduce data size.
- I have fixed other APK size issue by bug 1847958 for Nightly.
compiled_datafeature (comment #3) is new feature from ICU4X 1.3. But current stable is 1.2. Even if upgrading to 1.3, it seems to case other problem that iscargo vendorbug. (I will investigate and file this issue)
So this is wontfix.
Updated•2 years ago
|
Description
•