Closed
Bug 1233642
Opened 10 years ago
Closed 9 years ago
Self-host Array.prototype.concat.
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: arai, Assigned: arai)
References
Details
Attachments
(3 files, 26 obsolete files)
3.12 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
40.16 KB,
patch
|
efaust
:
review+
|
Details | Diff | Splinter Review |
1.47 KB,
patch
|
bholley
:
review+
|
Details | Diff | Splinter Review |
derived from bug 1165052.
Before supporting @@species in Array.prototype.concat, it should be better to self-host it.
With simple perf test (bug 1165052 comment #25), straight forward implementation seems to be fast enough.
Assignee | ||
Comment 1•10 years ago
|
||
tested WIP patch
https://treeherder.mozilla.org/#/jobs?repo=try&revision=231cb879bd41
only ecma_3/Array/regress-322135-02.js fails with timeout. however, the code works somehow strangely in m-c.
https://dxr.mozilla.org/mozilla-central/source/js/src/tests/ecma_3/Array/regress-322135-02.js
> var length = 4294967295;
> var array1 = new Array(length);
> var array2 = ['Kibo'];
> var array;
>
> try
> {
> array = array1.concat(array2);
> }
> catch(ex)
> {
> printStatus(ex.name + ': ' + ex.message);
> }
|array| there becomes an Array with 0-length without throwing any exception, maybe because of uint32_t overflow. I think it should throw |RangeError: invalid array length| or something like that.
Anyway, is it required to run this code quickly?
Assignee | ||
Comment 2•10 years ago
|
||
Here's benchmark result
| before | concat only | concat+slice
----------+---------+--------------+--------------
SunSpider | 164.5ms | 154.6ms | 154.7ms
Kraken | 897.3ms | 898.7ms | 966.0ms
| | |
JetStream | 180.28 | 181.57 | 181.95
Octane | 29061 | 27882 | 31316
Some regression in Kraken+slice, Octane+concat, and some improvement in SunSpider+concat, Octane+slice.
Assignee | ||
Comment 3•10 years ago
|
||
So, about regress-322135-02.js case, it uses following branch
https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2633
> if (isArray && !ObjectMayHaveExtraIndexedProperties(aobj)) {
> if (!GetLengthProperty(cx, aobj, &length))
> return false;
>
> size_t initlen = GetAnyBoxedOrUnboxedInitializedLength(aobj);
> narr = NewFullyAllocatedArrayTryReuseGroup(cx, aobj, initlen);
> if (!narr)
> return false;
> CopyAnyBoxedOrUnboxedDenseElements(cx, narr, aobj, 0, 0, initlen);
> SetAnyBoxedOrUnboxedArrayLength(cx, narr, length);
>
> args.rval().setObject(*narr);
> ...
with initlen = 0, length = 4294967295, so it skips iterating |this| value's elements.
https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2650
> if (length == initlen) {
> while (argc) {
> ...
> for (size_t i = 0; i < initlen; i++) {
> ...
> }
> ...
> }
> }
this case could be optimized with current behavior, even in self-hosted code, but might need a bit complicated optimization path when we implement @@species.
Also, I'm not sure if this case really matters on real web. does anyone know the actual use case with similar code?
anyway, will try implementing it and see how it's hard to do so :)
Assignee | ||
Comment 4•10 years ago
|
||
oh, I mis-understood the code.
|if (length == initlen)| check is used for iterating arguments, not |this|.
|this| elements are copied by |CopyAnyBoxedOrUnboxedDenseElements|.
Assignee | ||
Comment 5•10 years ago
|
||
I copied the CopyAnyBoxedOrUnboxedDenseElements optimization from native array_concat to self-hosted concat, but pure-JS code is faster than it when the Array length fits into int32_t.
Also, if the Array length doesn't fit into int32_t, it gets extremely slower, and that's the reason of the TIMEOUT in comment #1.
Filed bug 1234947 for it.
If that case is unusual in ordinary web app, maybe we could just disable the testcase and avoid applying the CopyAnyBoxedOrUnboxedDenseElements optimization?
Do you know if such case exists?
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 6•10 years ago
|
||
As suggested in IRC, redirecting ni?
Is there any known case in wild that we should perform Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly with an Array that its length doesn't fit into int32_t?
Flags: needinfo?(luke)
Flags: needinfo?(jdemooij)
Flags: needinfo?(efaustbmo)
![]() |
||
Comment 7•10 years ago
|
||
I'm not aware of any, but I haven't followed Array stuff much.
Flags: needinfo?(luke)
Comment 8•10 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #6)
> Is there any known case in wild that we should perform
> Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly
> with an Array that its length doesn't fit into int32_t?
I'm not aware of such cases and I don't expect it to be common. Sparse arrays definitely show up in the wild (see bug 1088189 and bug 1087963 for instance), but I think that's mostly orthogonal and most of these should still have an int32 length.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 9•10 years ago
|
||
Thanks :)
I'll focus on other cases then.
performance on sparse arrays is fast enough I think.
short dense arrays may need some more optimization.
Assignee | ||
Comment 10•10 years ago
|
||
This is from bug 1165052.
Attachment #8707459 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 11•10 years ago
|
||
This implements general case of Array#concat, and removes native one including Ion code.
Attachment #8707460 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 12•10 years ago
|
||
This improves performance with single short array.
will post perf comparison.
Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc18dc840676
Attachment #8707465 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 13•10 years ago
|
||
Updated•10 years ago
|
Attachment #8707459 -
Flags: review?(efaustbmo) → review+
Comment 14•10 years ago
|
||
I must be confused or misreading graphs, or something. Do we understand why these patches make small dense array concats much slower? I imagine that length 10 must be more common than length 2**10, so...
Flags: needinfo?(arai.unmht)
Assignee | ||
Comment 15•10 years ago
|
||
thank you for reviewing :)
here's the characteristic between the length of dense array and the time taken by 50000 loops of |A.concat(A).length|.
In attached graph, current m-c (before) is faster than patched ones in [0, 63] and [208, 1023] range, and the performance gets too worse on length>=1024.
Part1+2 is faster than before in [64, 192] range.
Part1+2+3 is faster than before in [1024-] range, also, it's faster than Part1+2 in [0, 7], [16, 58], and [512-] range. So, applying Part3 introduces both performance improvement and regression from Part1+2.
Will look into the reason of the spike at length=8, 64, 256, 512, etc.
Assignee | ||
Comment 16•10 years ago
|
||
so, the spike comes from std_Array(len1 + len2) allocation,
the graph is the result of following function
function ArrayConcat(arg1) {
var len1 = ToLength(this.length);
var len2 = ToLength(arg1.length);
// Step 2 (reordered).
var A = std_Array(len1 + len2);
return A;
}
there seems to be a difference in allocation logic between original native concat and sts_Array. will look into it.
Flags: needinfo?(arai.unmht)
Comment 17•10 years ago
|
||
Note that the spike with > ~7 elements is probably because the elements will no longer fit inline in the object. So we need either malloc or nursery allocated elements. Wild guess, maybe Ion does not have a fast path to allocate elements in the nursery when we do Array(x) and x is not a constant?
Assignee | ||
Comment 18•10 years ago
|
||
thanks :)
yeah, N=7 (len1+len2==14) and N=8 (len1+len2==16) uses different branch in following condition
https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/js/src/jit/CodeGenerator.cpp#4803
> void
> CodeGenerator::visitNewArrayDynamicLength(LNewArrayDynamicLength* lir)
> {
> ...
> if (templateObject->as<ArrayObject>().hasFixedElements()) {
> size_t numSlots = gc::GetGCKindSlots(templateObject->asTenured().getAllocKind());
> inlineLength = numSlots - ObjectElements::VALUES_PER_HEADER;
> } else {
> canInline = false;
> }
will investigate how concat performs it quickly.
Assignee | ||
Comment 19•10 years ago
|
||
Added intrinsic for NewFullyAllocatedArrayTryReuseGroup, and called it instead of std_Array.
it reduces the spike at N=8 (length=16) and it draws almost same curve as "before", in N<1024 cases, but spike at N=64 (length=128) is still large, and it's bigger than Part2's case.
Attachment #8715135 -
Attachment is obsolete: true
Assignee | ||
Comment 20•10 years ago
|
||
here's the patch for the intrinsic.
Assignee | ||
Comment 21•10 years ago
|
||
Attachment #8707460 -
Attachment is obsolete: true
Attachment #8707460 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 22•10 years ago
|
||
Attachment #8707465 -
Attachment is obsolete: true
Attachment #8707465 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 23•10 years ago
|
||
Attachment #8715373 -
Attachment is obsolete: true
Assignee | ||
Comment 24•10 years ago
|
||
Here's current performance comparison with octane between m-c and part 2, 3, 4, average of 10 runs.
There is at most 5% improvement in SplayLatency, but other cases have <2% improvement/regression, depends on testcase.
will check other benchmarks.
Assignee | ||
Comment 25•10 years ago
|
||
In kraken, stanford-crypto-pbkdf2 regress about 20-25% in all cases.
stanford-crypto-ccm regress about 15% in 1+2+3 and 1+2+3+4 cases.
stanford-crypto-sha256-iterative and json-stringify-tinderbox regress about 5% in 1+2+3+4 case.
Assignee | ||
Comment 26•10 years ago
|
||
with SunSpider, again, all cases improve/regress about 5% depending on testcase.
Assignee | ||
Comment 27•10 years ago
|
||
Calculated characteristic for 2 cases,
1. A.concat(A) where the length of A is variable.
2. A.concat(B) where the length of A is variable and the length of B is fixed.
Attachment #8715372 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8703112 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8707466 -
Attachment is obsolete: true
Assignee | ||
Comment 28•10 years ago
|
||
Now I'm thinking just using only Part 2 might be the best way, as Part 3 and Part 4 adds extra code and complexity, while the performance improvements in benchmarks are not so big :/
Assignee | ||
Comment 29•10 years ago
|
||
efaust, can I have your opinion about what we should do here?
do you think Part 3 and Part 4 (or some other optimization maybe?) should be applied?
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 30•10 years ago
|
||
In stanford-crypto-pbkdf2 and stanford-crypto-ccm, concat is called with only 1 argument on every time:
stanford-crypto-pbkdf2
called | this.length | arguments[0].length
----------+-------------+---------------------
32768 | 0 | 8
32760 | 8 | 1
16 | 0 | 16
8 | 3 | 1
8 | 2 | 1
8 | 0 | 3
stanford-crypto-ccm
called | this.length | arguments[0].length
----------+-------------+---------------------
2254 | 1 | 4
1056 | 3 | 3
944 | 4 | 3
11 | 28 | 4
9 | 44 | 4
8 | 8 | 4
8 | 4 | 4
8 | 36 | 4
8 | 34 | 4
8 | 24 | 4
8 | 0 | 4
7 | 47 | 4
7 | 29 | 4
7 | 25 | 4
7 | 2 | 4
6 | 9 | 4
6 | 6 | 4
6 | 46 | 4
6 | 45 | 4
6 | 41 | 4
6 | 39 | 4
6 | 23 | 4
5 | 7 | 4
5 | 49 | 4
5 | 21 | 4
5 | 18 | 4
5 | 17 | 4
5 | 12 | 4
4 | 5 | 4
4 | 48 | 4
4 | 38 | 4
4 | 37 | 4
4 | 35 | 4
4 | 33 | 4
4 | 27 | 4
4 | 22 | 4
4 | 20 | 4
4 | 14 | 4
4 | 11 | 4
4 | 10 | 4
3 | 43 | 4
3 | 42 | 4
3 | 40 | 4
3 | 31 | 4
3 | 30 | 4
3 | 3 | 4
3 | 19 | 4
3 | 16 | 4
2 | 15 | 4
1 | 32 | 4
1 | 13 | 4
Will look for another way to improve performance with short array.
Comment 31•10 years ago
|
||
yeah, I think that's my main hangup here. We need to drive that kraken regression closer to 0, and then the rest of this looks great!
Flags: needinfo?(efaustbmo)
Assignee | ||
Comment 32•10 years ago
|
||
here's the performance characteristic with @@species patch.
both in native and self-hosted case, almost offset is applied.
currently self-hosted @@species support is not optimized as much as native one, as it always accesses to constructor and @@species. maybe we could apply some optimization there.
Assignee | ||
Comment 33•10 years ago
|
||
and Kraken time, with @@species.
Assignee | ||
Comment 34•10 years ago
|
||
stanford-crypto-pbkdf2 time
with original LIST file
before: 79.9
native + @@species: 82.2
part 2 + @@species: 100.7
part 3 + @@species: 96.2
part 4 + @@species: 91.3
move stanford-crypto-pbkdf2 to top
before: 85.6
native + @@species: 86.4
part 2 + @@species: 97.2
part 3 + @@species: 90.1
part 4 + @@species: 88.0
only stanford-crypto-pbkdf2
before: 86.4
native + @@species: 86.9
part 2 + @@species: 98.3
part 3 + @@species: 90.0
part 4 + @@species: 86.2
so, other test that runs before it affects so much.
they're good for native ones, and bad for self-hosted ones.
Assignee | ||
Comment 35•10 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #32)
> here's the performance characteristic with @@species patch.
> both in native and self-hosted case, almost offset is applied.
I meant, almost *same* offset.
Assignee | ||
Comment 36•10 years ago
|
||
placing other stanford-crypto-* test before stanford-crypto-pbkdf2 affects the time of stanford-crypto-pbkdf2.
`sjcl` object is re-initialized at the top of each test, so functions are not shared directly.
maybe GC related issue, or something related to shape?
Assignee | ||
Comment 37•10 years ago
|
||
https://github.com/h4writer/arewefastyet/blob/master/benchmarks/kraken/resources/sunspider-standalone-driver.js#L41
> for (var krakenCounter = 0; krakenCounter < tests.length; krakenCounter++) {
> var testBase = "tests/" + suiteName + "/" + tests[krakenCounter];
> var testName = testBase + ".js";
> var testData = testBase + "-data.js";
> // load test data
> load(testData);
> var startTime = new Date;
> load(testName);
> times[krakenCounter] = new Date() - startTime;
> gc(); <----- this one
> }
in the test harness, gc() is called after each test. when I increase the number of gc() call from 1 to 2, the score changes, and it's almost same as the score when I run stanford-crypto-pbkdf2 alone.
stanford-crypto-pbkdf2 is affected by this so much, and when we call gc() twice, the Part 4 score is better than before.
others are also affected but the difference are small.
Assignee | ||
Comment 38•10 years ago
|
||
changed gc(); to following to see the result of gc:
print(gc());
print(gc());
and executed following command in arewefastyet/benchmarks/kraken:
js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js
and the output is following:
before 3809280, after 3805184
before 3809280, after 856064
before 958464, after 958464
before 962560, after 655360
{
"stanford-crypto-ccm": 95,
"stanford-crypto-pbkdf2": 86
}
so, first gc() call after test does not reduced the memory usage.
Assignee | ||
Comment 39•10 years ago
|
||
terrence, what would be the reason of the difference between 1st and 2nd gc call?
Flags: needinfo?(terrence)
Assignee | ||
Comment 40•10 years ago
|
||
so far, looks like `gc()` call starts background sweeping, but it does not wait for the background sweeping it started. Also, it waits for already-running sweeping. So 2nd gc() call waits for the sweeping for 1st gc() call, and after 2nd gc() call returns, gcBytes is reduced.
Is this expected behavior for gc() testing function?
[main thread]
gc(); // JavaScript
|
+- GC
|
+- JS::GCForReason
|
+- GCRuntime::gc
|
+- GCRuntime::collect
|
+- GCRuntime::gcCycle
|
+- waitBackgroundSweepEnd
| |
| +- state
| | # returns IDLE on 1st gc() call
| | # returns SWEEPING on 2nd gc() call
| |
| +- waitForBackgroundThread
| # called only on 2nd gc() call
| # after returned, gcBytes is reduced
|
+- GCRuntime::incrementalCollectSlice
|
+- GCRuntime::sweepPhase
|
+- GCRuntime::endSweepingZoneGroup
|
+- GCRuntime::queueZonesForBackgroundSweep
|
+- GCHelperState::maybeStartBackgroundSweep
|
+- GCHelperState::startBackgroundThread(SWEEPING)
# starts background sweeping
[background thread]
HelperThread::threadLoop()
|
+- HelperThread::handleGCHelperWorkload
|
+- GCHelperState::work
|
+- GCHelperState::doSweep
|
+- sweepBackgroundThings
|
+- LifoAlloc::freeAll
# take some time on 1st call, and 1st gc() returns while
# LifoAlloc::freeAll is running
Comment 41•10 years ago
|
||
Yes, this is the current behavior of direct calls to gc(). GC invoked by the system is smart enough to skip the second GC if the first is still going, but direct calls to gc() from script is generally an indication that the invoker wants to verify that some piece of memory is gone, so we have to be thorough. I am currently improving this situation in bug 1249367.
Flags: needinfo?(terrence)
Assignee | ||
Comment 42•10 years ago
|
||
thanks, if bug 1249367 changes the gc() function behavior to wait for the GC it starts, I can just wait for it.
I'll look into stanford-crypto-ccm next, as it's still slower even if I call gc() twice.
Assignee | ||
Comment 43•10 years ago
|
||
mmm, still the order in the LIST and the number of gc() affect the test result when Part 1-4 are applied.
seems that there's yet another reason.
Assignee | ||
Comment 44•10 years ago
|
||
So far, each function is Ion compiled different times, depending on whether stanford-crypto-pbkdf2 is executed alone or after stanford-crypto-ccm.
for example, when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm,
sjcl.misc.pbkdf2 is compiled 5 times (compared to 3 times).
maybe, ArrayConcat is executed under Interpreter more times when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm?
I will look into it.
Assignee | ||
Comment 45•10 years ago
|
||
time consumed by ArrayConcat itself is not so different
stanford-crypto-pbkdf2 only: 11.5 [ms]
stanford-crypto-pbkdf2 after stanford-crypto-ccm: 12.8 [ms]
anyway, will try figuring out why the Ion code is discarded 2 more times.
Assignee | ||
Comment 46•10 years ago
|
||
testing with Parts 1-4 applied.
1. the performance difference disappears when I rename all `sjcl` to `sjcl2` in stanford-crypto-pbkdf2.js and stanford-crypto-pbkdf2-data.js
2. I see both stanford-crypto-pbkdf2-data.js and stanford-crypto-ccm-data.js filenames in single iongraph when I execute following
js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js
looks like, there `sjcl.codec.utf8String.toBits` and `sjcl.bitArray.P` in stanford-crypto-ccm-data.js are called inside some function in stanford-crypto-pbkdf2-data.js (not yet investigated what the function is tho)
Assignee | ||
Comment 47•10 years ago
|
||
1. the performance difference in stanford-crypto-pbkdf2 between running alone and running with another tests gets somehow different
with bug 1252228 patch, Part 1+2+3+4, pbkdf2 alone: 88.1 [ms]
with bug 1252228 patch, Part 1+2+3+4, pbkdf2 after ccm: 83.2 [ms]
now stanford-crypto-pbkdf2 is faster when stanford-crypto-ccm is executed before it.
but now the score doesn't depend on the number of gc() call, nor whether running in different global or not.
2. the patch in bug 1252228 makes stanford-crypto-pbkdf2 faster even without this bug's patches
so, now we have yet another performance regression issue for stanford-crypto-pbkdf2, and pre-existing regression for stanford-crypto-ccm, from m-i (bug 1252228 patch patch applied).
Assignee | ||
Comment 48•10 years ago
|
||
looks like, inlineNewArrayTryReuseGroup fails several times because it tries to optimize too much.
it expects the argument to be always same, but Array#concat can be called with different groups.
maybe, I have to fallback to std_Array's inlining in that case.
Assignee | ||
Comment 49•10 years ago
|
||
Here's current WIP patches, including extra Part 5 that changes inlining of NewArrayTryReuseGroup to fallback to std_Array's one, in case there are multiple groups for template.
Assignee | ||
Comment 50•10 years ago
|
||
Also, here's what I'm testing now with stanford-crypto-ccm.
the difference between Part2 and Part3 that impacts the performance in stanford-crypto-ccm is something like following.
following 3 version of ArrayConcat are the simplified Array#concat that works with stanford-crypto-ccm. That is, fast-path used in Part 3 (works with array.concat(array)).
The difference between them is how to get 1st argument. The score regresses with latter 2 cases about 10ms.
// fast
function ArrayConcat(arg1) {
var i = 0; i *= 1; var E = arguments[i];
var len1 = this.length;
var len2 = E.length;
var A = std_Array(len1 + len2);
for (var n = 0; n < len1; n++) {
if (n in this)
_DefineDataProperty(A, n, this[n]);
}
for (var k = 0; k < len2; k++) {
if (k in E)
_DefineDataProperty(A, n, E[k]);
n++;
}
return A;
}
// slow
function ArrayConcat(arg1) {
var i = 0; var E = arguments[i];
var len1 = this.length;
var len2 = E.length;
var A = std_Array(len1 + len2);
for (var n = 0; n < len1; n++) {
if (n in this)
_DefineDataProperty(A, n, this[n]);
}
for (var k = 0; k < len2; k++) {
if (k in E)
_DefineDataProperty(A, n, E[k]);
n++;
}
return A;
}
// also slow
function ArrayConcat(arg1) {
var E = arg1;
var len1 = this.length;
var len2 = E.length;
var A = std_Array(len1 + len2);
for (var n = 0; n < len1; n++) {
if (n in this)
_DefineDataProperty(A, n, this[n]);
}
for (var k = 0; k < len2; k++) {
if (k in E)
_DefineDataProperty(A, n, E[k]);
n++;
}
return A;
}
Then, here's the minimal testcase that seems to be almost same situation as stanford-crypto-ccm's regression.
the script takes single argument, when it's "A", it uses ArrayConcat_A, that corresponds to slow one above. and when it's others like "B", it uses ArrayConcat_B, that corresponds to fast one above.
"use strict";
function ArrayConcat_A() {
var i = 0;
arguments[i];
for (var n = 0; n < 0; n++) {
n in this;
}
}
function ArrayConcat_B() {
var i = 0;
i *= 1;
arguments[i];
for (var n = 0; n < 0; n++) {
n in this;
}
}
function f() {
for (var i = 0; i < 10000; i++) {
[].my_concat([]);
for (var n = 0; n < 0; n++)
[].slice();
}
}
if (scriptArgs[0] == "A")
Array.prototype.my_concat = ArrayConcat_A;
else
Array.prototype.my_concat = ArrayConcat_B;
var T = elapsed();
f();
print(elapsed() - T);
Then, when I look the iongraph, ArrayConcat_A is inlined but ArrayConcat_B is not inlined. so I think the difference comes from inlining, and the reason is the way I used to access arguments.
Then, when I increase the number of the loop (like i < 100000), the score changes and then ArrayConcat_A gets faster.
and now I stuck here.
Assignee | ||
Comment 51•10 years ago
|
||
fixed Part 4 to use dedicated temp register instead of output register that can alias input.
Attachment #8716587 -
Attachment is obsolete: true
Assignee | ||
Comment 52•10 years ago
|
||
Now Kraken score gets better by checking IsPackedArray and skip `k in E` in that case (Part 2.1), thanks to jandem :)
Part 2.2 is just for some possible cleanup.
Part3-5 in previous posts are not used.
Here're Kraken and Octane performance comparison, thanks to bbouvier :)
Kraken benchmark score (average of 10 runs)
http://bnjbvr.github.io/simplegraph/#title=Kraken%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=audio-fft,stanford-crypto-pbkdf2,audio-beat-detection,stanford-crypto-ccm,imaging-darkroom,json-parse-financial,audio-oscillator,ai-astar,audio-dft,stanford-crypto-sha256-iterative,json-stringify-tinderbox,imaging-gaussian-blur,stanford-crypto-aes,imaging-desaturate&values=before%2047.8%2074.6%2088%2074.9%2081.6%2036.4%2063.8%2065.8%20109.2%2039.2%2041%2064.2%2051.8%2053%0Apart2%2046.4%2092.6%2089.4%2076.7%2082.1%2036.8%2064.2%2065.6%20109.2%2039.3%2041%2064.3%2051.3%2053.6%0Apart2.1%2046.6%2081.6%2088.3%2074.9%2081.9%2036.6%2064.1%2065.7%20109%2036.4%2038.8%2064.4%2051.5%2053.3%0Apart2.2%2046.6%2081.8%2088.6%2075%2082%2036.6%2063.9%2065.6%20109.4%2036.4%2038.7%2064.4%2051.5%2053.2
Octane benchmark score (average of 10 runs)
http://bnjbvr.github.io/simplegraph/#title=Octane%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=Richards,DeltaBlue,Crypto,RayTrace,EarleyBoyer,RegExp,Splay,SplayLatency,NavierStokes,PdfJS,Mandreel,MandreelLatency,Gameboy,CodeLoad,Box2D,zlib,Typescript,Score&values=before%2032093%2064102%2030975%20117954%2032595%203880%2017804%2012523%2037782%2014402%2031255%2034548%2052844%2017265%2057200%2080568%2031340%2030397%0Apart2%2031867%2065302%2030837%20116592%2032310%203826%2017989%2013004%2037871%2014717%2031203%2033782%2053609%2017144%2057031%2080819%2031307%2030444%0Apart2.1%2032139%2065006%2030676%20116296%2032513%203876%2019411%2018949%2038127%2014807%2031053%2037245%2050540%2017517%2057930%2080317%2031216%2031416%0Apart2.2%2032204%2064728%2030976%20116430%2032611%203890%2018791%2017367%2038041%2014796%2030792%2038123%2050104%2017412%2057067%2080691%2031729%2031222
Kraken stanford-crypto-ccm is as fast as today's m-c, and performance regresion in stanford-crypto-pbkdf2 is about 9% (+7ms).
then, now Octane SplayLatency and MandreelLatency show somewhat strange regression :/
following are SplayLatency result for each run, with part2.1:
13665
21148
21168
13144
13569
21102
21596
21513
21556
21026
and same for MandreelLatency:
38947
38947
38947
37767
37483
38348
37767
28984
37203
38055
sometimes they show nice score.
Also, following are SplayLatency result with m-c:
13224
12421
12574
12452
12379
11439
12123
12914
12839
12862
and same for MandreelLatency:
38947
28487
28816
28165
37483
38947
38055
28984
38645
38947
SplayLatency doesn't show score around 21000 with m-c, so this will be something I have to investigate next.
MandreelLatency shows around 28000 and 38000 with both m-c and part2.1, but there seems to be some difference in probability (maybe I have to check with more runs)
Assignee | ||
Comment 53•10 years ago
|
||
again, it looks like a issue depends on executing order.
SplayLatency score for 20 runs, runing splay.js only
11699
12216
11524
16435
11666
15628
11565
12836
12246
11930
12039
11977
11938
11728
11779
11591
12053
11961
12594
11645
SplayLatency score for 20 runs, runing splay.js after regexp.js
18539
18684
13321
14097
14019
12608
21694
18504
13918
14068
13870
14162
14233
13706
14287
18777
22132
13793
14151
21276
Assignee | ||
Comment 54•10 years ago
|
||
just confirmed `SplayLatency: 22321` on m-c, so both SplayLatency and MandreelLatency seem to be the issue of the possibility.
Assignee | ||
Comment 55•10 years ago
|
||
scratch that. I totally mis-read the Octane results.
so, they shows bigger numbers when applying part 2.1, it means that it improves the performance.
Assignee | ||
Comment 56•10 years ago
|
||
Attachment #8707459 -
Attachment is obsolete: true
Attachment #8715141 -
Attachment is obsolete: true
Attachment #8716585 -
Attachment is obsolete: true
Attachment #8716586 -
Attachment is obsolete: true
Attachment #8716588 -
Attachment is obsolete: true
Attachment #8716592 -
Attachment is obsolete: true
Attachment #8716593 -
Attachment is obsolete: true
Attachment #8716597 -
Attachment is obsolete: true
Attachment #8721042 -
Attachment is obsolete: true
Attachment #8721045 -
Attachment is obsolete: true
Attachment #8721609 -
Attachment is obsolete: true
Attachment #8724434 -
Attachment is obsolete: true
Attachment #8725172 -
Attachment is obsolete: true
Attachment #8725579 -
Attachment is obsolete: true
Attachment #8725604 -
Attachment is obsolete: true
Attachment #8725669 -
Attachment is obsolete: true
Attachment #8725746 -
Flags: review+
Assignee | ||
Comment 57•10 years ago
|
||
folded Part 2.1 and 2.2, and updated spec date.
Attachment #8725747 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 58•10 years ago
|
||
sorry, forgot to revert unnecessary reorder for Step 2.
Attachment #8725747 -
Attachment is obsolete: true
Attachment #8725747 -
Flags: review?(efaustbmo)
Attachment #8725770 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 59•10 years ago
|
||
Comment on attachment 8725770 [details] [diff] [review]
Part 2: Self-host Array.prototype.concat.
sorry again
I should've checked IsArray (or at least IsObject) before IsPackedArray, as IsPackedArray accepts only object.
will check jstests/jit-test first and then post fixed patch
Attachment #8725770 -
Flags: review?(efaustbmo)
Assignee | ||
Comment 60•10 years ago
|
||
Moved IsPackedArray to IsArray branch, as we will change the IsArray call there to IsConcatSpreadable, and the difference is only in the loop there.
IsConcatSpreadable also ensures the value is an object, so it can be just replaced.
Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cb1f5318301
Attachment #8725770 -
Attachment is obsolete: true
Attachment #8725896 -
Flags: review?(efaustbmo)
Comment 61•10 years ago
|
||
Comment on attachment 8725896 [details] [diff] [review]
Part 2: Self-host Array.prototype.concat.
Review of attachment 8725896 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
::: js/src/builtin/Array.js
@@ +876,5 @@
> + // Step 1.
> + var O = ToObject(this);
> +
> + // Step 2.
> + var A = std_Array(0);
This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I assume.
@@ +881,5 @@
> +
> + // Step 3.
> + var n = 0;
> +
> + // Step 4 (skipped).
nit: prefer (implicit in |arguments|), or something
::: js/src/tests/ecma_3/Array/regress-322135-02.js
@@ +1,1 @@
> +// |reftest| skip -- slow
Do we will need to skip this? Do we want or need a followup bug to fix this?
::: js/src/vm/Xdr.h
@@ +35,4 @@
> static const uint32_t XDR_BYTECODE_VERSION =
> uint32_t(0xb973c0de - XDR_BYTECODE_VERSION_SUBTRAHEND);
>
> +static_assert(JSErr_Limit == 446,
I've had to review this for a long time, I guess. Luckily, we don't need this anymore
Attachment #8725896 -
Flags: review?(efaustbmo) → review+
Assignee | ||
Comment 62•10 years ago
|
||
Thank you for reviewing :D
(In reply to Eric Faust [:efaust] from comment #61)
> Comment on attachment 8725896 [details] [diff] [review]
> Part 2: Self-host Array.prototype.concat.
>
> Review of attachment 8725896 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Looks good!
>
> ::: js/src/builtin/Array.js
> @@ +876,5 @@
> > + // Step 1.
> > + var O = ToObject(this);
> > +
> > + // Step 2.
> > + var A = std_Array(0);
>
> This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I
> assume.
Yes :)
> ::: js/src/tests/ecma_3/Array/regress-322135-02.js
> @@ +1,1 @@
> > +// |reftest| skip -- slow
>
> Do we will need to skip this? Do we want or need a followup bug to fix this?
Yes, we need to skip this for now.
It's related to bug 1234947. not sure if the bug can be fixed easily.
the test is slow because of `4294967295` is not int32_t, and Ion cannot use Int32 in the loop in concat.
> var length = 4294967295;
> var array1 = new Array(length);
Assignee | ||
Comment 63•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/17a5a95bf6cc6ddf4251dec76201baf20bbd3452
Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/b535cc24f7d0b2703a43cf43fa371c6087dbb5e4
Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust
Assignee | ||
Comment 64•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc9e586a8f1944d87e53f601d41ea860b272a3bc
Backed out changeset b535cc24f7d0 (bug 1233642)
https://hg.mozilla.org/integration/mozilla-inbound/rev/203c7e4b8b20e9912f330ac630520508ec1e2f98
Backed out changeset 17a5a95bf6cc (bug 1233642)
Assignee | ||
Comment 65•10 years ago
|
||
In Part 2, Array#concat was changed from native to self-hosted, and it causes that xray exposes Array.concat generic.
looks like other self-hosted generic are also not in the list to skip, so I think I can remove concat from there too.
how do you think?
Attachment #8736028 -
Flags: review?(efaustbmo)
Comment 66•10 years ago
|
||
Comment on attachment 8736028 [details] [diff] [review]
Part 2.1: Exclude concat from Array ctor property skip list in xray.
Review of attachment 8736028 [details] [diff] [review]:
-----------------------------------------------------------------
Forwarding to an xpconnect peer.
Attachment #8736028 -
Flags: review?(efaustbmo) → review?(bobbyholley)
Updated•10 years ago
|
Attachment #8736028 -
Flags: review?(bobbyholley) → review+
Assignee | ||
Comment 67•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef8f61469172c5f62bf990e1b38f5c25526ce3a0
Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/70e78d669f9de949dcbe972f74742045ca094fae
Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust,bholley
Comment 68•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ef8f61469172
https://hg.mozilla.org/mozilla-central/rev/70e78d669f9d
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in
before you can comment on or make changes to this bug.
Description
•