Closed Bug 1233642 Opened 10 years ago Closed 9 years ago

Self-host Array.prototype.concat.

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla48

Tracking Flags:

Tracking

Status

firefox46

---

affected

firefox48

---

fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(3 files, 26 obsolete files)

Performance comparison 10 years ago Tooru Fujisawa [:arai] 111.13 KB, image/png		Details
Part 1: Add IsArray intrinsic. 10 years ago Tooru Fujisawa [:arai] 3.11 KB, patch	efaust : review+	Details \| Diff \| Splinter Review
Part 2: Self-host Array.prototype.concat. 10 years ago Tooru Fujisawa [:arai] 38.81 KB, patch		Details \| Diff \| Splinter Review
Part 3: Optimize Array.prototype.concat with single Array. 10 years ago Tooru Fujisawa [:arai] 2.48 KB, patch		Details \| Diff \| Splinter Review
Performance comparison with Part 3 10 years ago Tooru Fujisawa [:arai] 160.68 KB, image/png		Details
characteristic between array length and time 10 years ago Tooru Fujisawa [:arai] 29.22 KB, image/png		Details
time taken by std_Array(len1 + len2). 10 years ago Tooru Fujisawa [:arai] 7.41 KB, image/png		Details
characteristic between array length and time 10 years ago Tooru Fujisawa [:arai] 47.26 KB, image/png		Details
(WIP) Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. 10 years ago Tooru Fujisawa [:arai] 25.36 KB, patch		Details \| Diff \| Splinter Review
Part 2: Self-host Array.prototype.concat. 10 years ago Tooru Fujisawa [:arai] 39.14 KB, patch		Details \| Diff \| Splinter Review
Part 3: Optimize Array.prototype.concat with single Array. 10 years ago Tooru Fujisawa [:arai] 4.56 KB, patch		Details \| Diff \| Splinter Review
Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. 10 years ago Tooru Fujisawa [:arai] 25.37 KB, patch		Details \| Diff \| Splinter Review
Octane score (the ratio of mozilla-central) 10 years ago Tooru Fujisawa [:arai] 319.88 KB, image/png		Details
Kraken time (the ratio of mozilla-central) 10 years ago Tooru Fujisawa [:arai] 218.91 KB, image/png		Details
SunSpider time (the ratio of mozilla-central) 10 years ago Tooru Fujisawa [:arai] 280.10 KB, image/png		Details
characteristic between array length and time 10 years ago Tooru Fujisawa [:arai] 81.77 KB, image/png		Details
characteristic between array length and time, with @@species 10 years ago Tooru Fujisawa [:arai] 97.65 KB, image/png		Details
Kraken time, with @@species (the ratio of mozilla-central) 10 years ago Tooru Fujisawa [:arai] 231.57 KB, image/png		Details
Kraken time, comparison between before/part4, and gc() x 1 and gc() x 2 10 years ago Tooru Fujisawa [:arai] 58.15 KB, image/png		Details
The number of handleIonWorkload gets called for each function with stanford-crypto-pbkdf2 10 years ago Tooru Fujisawa [:arai] 77.44 KB, image/png		Details
Kraken time comparison with patch in bug 1252228 10 years ago Tooru Fujisawa [:arai] 61.99 KB, image/png		Details
WIP patches 10 years ago Tooru Fujisawa [:arai] 21.50 KB, application/zip		Details
Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. 10 years ago Tooru Fujisawa [:arai] 25.51 KB, patch		Details \| Diff \| Splinter Review
WIP patches (Part 1, 2, 2.1, 2.2) 10 years ago Tooru Fujisawa [:arai] 12.79 KB, application/zip		Details
Part 1: Add IsArray intrinsic. r=efaust 10 years ago Tooru Fujisawa [:arai] 3.12 KB, patch	arai : review+	Details \| Diff \| Splinter Review
Part 2: Self-host Array.prototype.concat. 10 years ago Tooru Fujisawa [:arai] 39.70 KB, patch		Details \| Diff \| Splinter Review
Part 2: Self-host Array.prototype.concat. 10 years ago Tooru Fujisawa [:arai] 39.69 KB, patch		Details \| Diff \| Splinter Review
Part 2: Self-host Array.prototype.concat. 10 years ago Tooru Fujisawa [:arai] 40.16 KB, patch	efaust : review+	Details \| Diff \| Splinter Review
Part 2.1: Exclude concat from Array ctor property skip list in xray. 10 years ago Tooru Fujisawa [:arai] 1.47 KB, patch	bholley : review+	Details \| Diff \| Splinter Review

Tooru Fujisawa [:arai]

Assignee

Description

•

10 years ago

derived from bug 1165052. Before supporting @@species in Array.prototype.concat, it should be better to self-host it. With simple perf test (bug 1165052 comment #25), straight forward implementation seems to be fast enough.

Tooru Fujisawa [:arai]

Assignee

Comment 1

•

10 years ago

tested WIP patch https://treeherder.mozilla.org/#/jobs?repo=try&revision=231cb879bd41 only ecma_3/Array/regress-322135-02.js fails with timeout. however, the code works somehow strangely in m-c. https://dxr.mozilla.org/mozilla-central/source/js/src/tests/ecma_3/Array/regress-322135-02.js > var length = 4294967295; > var array1 = new Array(length); > var array2 = ['Kibo']; > var array; > > try > { > array = array1.concat(array2); > } > catch(ex) > { > printStatus(ex.name + ': ' + ex.message); > } |array| there becomes an Array with 0-length without throwing any exception, maybe because of uint32_t overflow. I think it should throw |RangeError: invalid array length| or something like that. Anyway, is it required to run this code quickly?

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Comment 2

•

10 years ago

Here's benchmark result | before | concat only | concat+slice ----------+---------+--------------+-------------- SunSpider | 164.5ms | 154.6ms | 154.7ms Kraken | 897.3ms | 898.7ms | 966.0ms | | | JetStream | 180.28 | 181.57 | 181.95 Octane | 29061 | 27882 | 31316 Some regression in Kraken+slice, Octane+concat, and some improvement in SunSpider+concat, Octane+slice.

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Blocks: 784288

Tooru Fujisawa [:arai]

Assignee

Comment 3

•

10 years ago

So, about regress-322135-02.js case, it uses following branch https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2633 > if (isArray && !ObjectMayHaveExtraIndexedProperties(aobj)) { > if (!GetLengthProperty(cx, aobj, &length)) > return false; > > size_t initlen = GetAnyBoxedOrUnboxedInitializedLength(aobj); > narr = NewFullyAllocatedArrayTryReuseGroup(cx, aobj, initlen); > if (!narr) > return false; > CopyAnyBoxedOrUnboxedDenseElements(cx, narr, aobj, 0, 0, initlen); > SetAnyBoxedOrUnboxedArrayLength(cx, narr, length); > > args.rval().setObject(*narr); > ... with initlen = 0, length = 4294967295, so it skips iterating |this| value's elements. https://dxr.mozilla.org/mozilla-central/rev/388bdc46ba51ee31da8b8abe977e0ca38d117434/js/src/jsarray.cpp#2650 > if (length == initlen) { > while (argc) { > ... > for (size_t i = 0; i < initlen; i++) { > ... > } > ... > } > } this case could be optimized with current behavior, even in self-hosted code, but might need a bit complicated optimization path when we implement @@species. Also, I'm not sure if this case really matters on real web. does anyone know the actual use case with similar code? anyway, will try implementing it and see how it's hard to do so :)

Tooru Fujisawa [:arai]

Assignee

Comment 4

•

10 years ago

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Comment 5

•

10 years ago

I copied the CopyAnyBoxedOrUnboxedDenseElements optimization from native array_concat to self-hosted concat, but pure-JS code is faster than it when the Array length fits into int32_t. Also, if the Array length doesn't fit into int32_t, it gets extremely slower, and that's the reason of the TIMEOUT in comment #1. Filed bug 1234947 for it. If that case is unusual in ordinary web app, maybe we could just disable the testcase and avoid applying the CopyAnyBoxedOrUnboxedDenseElements optimization? Do you know if such case exists?

Flags: needinfo?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 6

•

10 years ago

As suggested in IRC, redirecting ni? Is there any known case in wild that we should perform Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly with an Array that its length doesn't fit into int32_t?

Flags: needinfo?(luke)

Flags: needinfo?(jdemooij)

Flags: needinfo?(efaustbmo)

Luke Wagner [:luke]

Comment 7

•

10 years ago

I'm not aware of any, but I haven't followed Array stuff much.

Flags: needinfo?(luke)

Jan de Mooij [:jandem]

Comment 8

•

10 years ago

(In reply to Tooru Fujisawa [:arai] from comment #6) > Is there any known case in wild that we should perform > Array.prototype.concat (and Array.prototype.slice for bug 1233643) quickly > with an Array that its length doesn't fit into int32_t? I'm not aware of such cases and I don't expect it to be common. Sparse arrays definitely show up in the wild (see bug 1088189 and bug 1087963 for instance), but I think that's mostly orthogonal and most of these should still have an int32 length.

Flags: needinfo?(jdemooij)

Tooru Fujisawa [:arai]

Assignee

Comment 9

•

10 years ago

Attached image Performance comparison (obsolete) — Details

Thanks :) I'll focus on other cases then. performance on sparse arrays is fast enough I think. short dense arrays may need some more optimization.

Tooru Fujisawa [:arai]

Assignee

Comment 10

•

10 years ago

Attached patch Part 1: Add IsArray intrinsic. (obsolete) — Details — Splinter Review

This is from bug 1165052.

Attachment #8707459 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 11

•

10 years ago

Attached patch Part 2: Self-host Array.prototype.concat. (obsolete) — Details — Splinter Review

This implements general case of Array#concat, and removes native one including Ion code.

Attachment #8707460 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 12

•

10 years ago

Attached patch Part 3: Optimize Array.prototype.concat with single Array. (obsolete) — Details — Splinter Review

This improves performance with single short array. will post perf comparison. Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc18dc840676

Attachment #8707465 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 13

•

10 years ago

Attached image Performance comparison with Part 3 (obsolete) — Details

Eric Faust [:efaust]

Updated

•

10 years ago

Attachment #8707459 - Flags: review?(efaustbmo) → review+

Eric Faust [:efaust]

Comment 14

•

10 years ago

I must be confused or misreading graphs, or something. Do we understand why these patches make small dense array concats much slower? I imagine that length 10 must be more common than length 2**10, so...

Flags: needinfo?(arai.unmht)

Tooru Fujisawa [:arai]

Assignee

Comment 15

•

10 years ago

Attached image characteristic between array length and time (obsolete) — Details

thank you for reviewing :) here's the characteristic between the length of dense array and the time taken by 50000 loops of |A.concat(A).length|. In attached graph, current m-c (before) is faster than patched ones in [0, 63] and [208, 1023] range, and the performance gets too worse on length>=1024. Part1+2 is faster than before in [64, 192] range. Part1+2+3 is faster than before in [1024-] range, also, it's faster than Part1+2 in [0, 7], [16, 58], and [512-] range. So, applying Part3 introduces both performance improvement and regression from Part1+2. Will look into the reason of the spike at length=8, 64, 256, 512, etc.

Tooru Fujisawa [:arai]

Assignee

Comment 16

•

10 years ago

Attached image time taken by std_Array(len1 + len2). (obsolete) — Details

so, the spike comes from std_Array(len1 + len2) allocation, the graph is the result of following function function ArrayConcat(arg1) { var len1 = ToLength(this.length); var len2 = ToLength(arg1.length); // Step 2 (reordered). var A = std_Array(len1 + len2); return A; } there seems to be a difference in allocation logic between original native concat and sts_Array. will look into it.

Flags: needinfo?(arai.unmht)

Jan de Mooij [:jandem]

Comment 17

•

10 years ago

Note that the spike with > ~7 elements is probably because the elements will no longer fit inline in the object. So we need either malloc or nursery allocated elements. Wild guess, maybe Ion does not have a fast path to allocate elements in the nursery when we do Array(x) and x is not a constant?

Tooru Fujisawa [:arai]

Assignee

Comment 18

•

10 years ago

thanks :) yeah, N=7 (len1+len2==14) and N=8 (len1+len2==16) uses different branch in following condition https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/js/src/jit/CodeGenerator.cpp#4803 > void > CodeGenerator::visitNewArrayDynamicLength(LNewArrayDynamicLength* lir) > { > ... > if (templateObject->as<ArrayObject>().hasFixedElements()) { > size_t numSlots = gc::GetGCKindSlots(templateObject->asTenured().getAllocKind()); > inlineLength = numSlots - ObjectElements::VALUES_PER_HEADER; > } else { > canInline = false; > } will investigate how concat performs it quickly.

Tooru Fujisawa [:arai]

Assignee

Comment 19

•

10 years ago

Attached image characteristic between array length and time (obsolete) — Details

Added intrinsic for NewFullyAllocatedArrayTryReuseGroup, and called it instead of std_Array. it reduces the spike at N=8 (length=16) and it draws almost same curve as "before", in N<1024 cases, but spike at N=64 (length=128) is still large, and it's bigger than Part2's case.

Attachment #8715135 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Comment 20

•

10 years ago

Attached patch (WIP) Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. (obsolete) — Details — Splinter Review

here's the patch for the intrinsic.

Tooru Fujisawa [:arai]

Assignee

Comment 21

•

10 years ago

Attached patch Part 2: Self-host Array.prototype.concat. (obsolete) — Details — Splinter Review

Attachment #8707460 - Attachment is obsolete: true

Attachment #8707460 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 22

•

10 years ago

Attached patch Part 3: Optimize Array.prototype.concat with single Array. (obsolete) — Details — Splinter Review

Attachment #8707465 - Attachment is obsolete: true

Attachment #8707465 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 23

•

10 years ago

Attached patch Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. (obsolete) — Details — Splinter Review

Attachment #8715373 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Comment 24

•

10 years ago

Attached image Octane score (the ratio of mozilla-central) (obsolete) — Details

Here's current performance comparison with octane between m-c and part 2, 3, 4, average of 10 runs. There is at most 5% improvement in SplayLatency, but other cases have <2% improvement/regression, depends on testcase. will check other benchmarks.

Tooru Fujisawa [:arai]

Assignee

Comment 25

•

10 years ago

Attached image Kraken time (the ratio of mozilla-central) (obsolete) — Details

In kraken, stanford-crypto-pbkdf2 regress about 20-25% in all cases. stanford-crypto-ccm regress about 15% in 1+2+3 and 1+2+3+4 cases. stanford-crypto-sha256-iterative and json-stringify-tinderbox regress about 5% in 1+2+3+4 case.

Tooru Fujisawa [:arai]

Assignee

Comment 26

•

10 years ago

Attached image SunSpider time (the ratio of mozilla-central) (obsolete) — Details

with SunSpider, again, all cases improve/regress about 5% depending on testcase.

Tooru Fujisawa [:arai]

Assignee

Comment 27

•

10 years ago

Attached image characteristic between array length and time (obsolete) — Details

Calculated characteristic for 2 cases, 1. A.concat(A) where the length of A is variable. 2. A.concat(B) where the length of A is variable and the length of B is fixed.

Attachment #8715372 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Attachment #8703112 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Attachment #8707466 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Comment 28

•

10 years ago

Now I'm thinking just using only Part 2 might be the best way, as Part 3 and Part 4 adds extra code and complexity, while the performance improvements in benchmarks are not so big :/

Tooru Fujisawa [:arai]

Assignee

Comment 29

•

10 years ago

efaust, can I have your opinion about what we should do here? do you think Part 3 and Part 4 (or some other optimization maybe?) should be applied?

Flags: needinfo?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 30

•

10 years ago

In stanford-crypto-pbkdf2 and stanford-crypto-ccm, concat is called with only 1 argument on every time: stanford-crypto-pbkdf2 called | this.length | arguments[0].length ----------+-------------+--------------------- 32768 | 0 | 8 32760 | 8 | 1 16 | 0 | 16 8 | 3 | 1 8 | 2 | 1 8 | 0 | 3 stanford-crypto-ccm called | this.length | arguments[0].length ----------+-------------+--------------------- 2254 | 1 | 4 1056 | 3 | 3 944 | 4 | 3 11 | 28 | 4 9 | 44 | 4 8 | 8 | 4 8 | 4 | 4 8 | 36 | 4 8 | 34 | 4 8 | 24 | 4 8 | 0 | 4 7 | 47 | 4 7 | 29 | 4 7 | 25 | 4 7 | 2 | 4 6 | 9 | 4 6 | 6 | 4 6 | 46 | 4 6 | 45 | 4 6 | 41 | 4 6 | 39 | 4 6 | 23 | 4 5 | 7 | 4 5 | 49 | 4 5 | 21 | 4 5 | 18 | 4 5 | 17 | 4 5 | 12 | 4 4 | 5 | 4 4 | 48 | 4 4 | 38 | 4 4 | 37 | 4 4 | 35 | 4 4 | 33 | 4 4 | 27 | 4 4 | 22 | 4 4 | 20 | 4 4 | 14 | 4 4 | 11 | 4 4 | 10 | 4 3 | 43 | 4 3 | 42 | 4 3 | 40 | 4 3 | 31 | 4 3 | 30 | 4 3 | 3 | 4 3 | 19 | 4 3 | 16 | 4 2 | 15 | 4 1 | 32 | 4 1 | 13 | 4 Will look for another way to improve performance with short array.

Eric Faust [:efaust]

Comment 31

•

10 years ago

yeah, I think that's my main hangup here. We need to drive that kraken regression closer to 0, and then the rest of this looks great!

Flags: needinfo?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 32

•

10 years ago

Attached image characteristic between array length and time, with @@species (obsolete) — Details

here's the performance characteristic with @@species patch. both in native and self-hosted case, almost offset is applied. currently self-hosted @@species support is not optimized as much as native one, as it always accesses to constructor and @@species. maybe we could apply some optimization there.

Tooru Fujisawa [:arai]

Assignee

Comment 33

•

10 years ago

Attached image Kraken time, with @@species (the ratio of mozilla-central) (obsolete) — Details

and Kraken time, with @@species.

Tooru Fujisawa [:arai]

Assignee

Comment 34

•

10 years ago

stanford-crypto-pbkdf2 time with original LIST file before: 79.9 native + @@species: 82.2 part 2 + @@species: 100.7 part 3 + @@species: 96.2 part 4 + @@species: 91.3 move stanford-crypto-pbkdf2 to top before: 85.6 native + @@species: 86.4 part 2 + @@species: 97.2 part 3 + @@species: 90.1 part 4 + @@species: 88.0 only stanford-crypto-pbkdf2 before: 86.4 native + @@species: 86.9 part 2 + @@species: 98.3 part 3 + @@species: 90.0 part 4 + @@species: 86.2 so, other test that runs before it affects so much. they're good for native ones, and bad for self-hosted ones.

Tooru Fujisawa [:arai]

Assignee

Comment 35

•

10 years ago

(In reply to Tooru Fujisawa [:arai] from comment #32) > here's the performance characteristic with @@species patch. > both in native and self-hosted case, almost offset is applied. I meant, almost *same* offset.

Tooru Fujisawa [:arai]

Assignee

Comment 36

•

10 years ago

placing other stanford-crypto-* test before stanford-crypto-pbkdf2 affects the time of stanford-crypto-pbkdf2. `sjcl` object is re-initialized at the top of each test, so functions are not shared directly. maybe GC related issue, or something related to shape?

Tooru Fujisawa [:arai]

Assignee

Comment 37

•

10 years ago

Attached image Kraken time, comparison between before/part4, and gc() x 1 and gc() x 2 (obsolete) — Details

https://github.com/h4writer/arewefastyet/blob/master/benchmarks/kraken/resources/sunspider-standalone-driver.js#L41 > for (var krakenCounter = 0; krakenCounter < tests.length; krakenCounter++) { > var testBase = "tests/" + suiteName + "/" + tests[krakenCounter]; > var testName = testBase + ".js"; > var testData = testBase + "-data.js"; > // load test data > load(testData); > var startTime = new Date; > load(testName); > times[krakenCounter] = new Date() - startTime; > gc(); <----- this one > } in the test harness, gc() is called after each test. when I increase the number of gc() call from 1 to 2, the score changes, and it's almost same as the score when I run stanford-crypto-pbkdf2 alone. stanford-crypto-pbkdf2 is affected by this so much, and when we call gc() twice, the Part 4 score is better than before. others are also affected but the difference are small.

Tooru Fujisawa [:arai]

Assignee

Comment 38

•

10 years ago

changed gc(); to following to see the result of gc: print(gc()); print(gc()); and executed following command in arewefastyet/benchmarks/kraken: js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js and the output is following: before 3809280, after 3805184 before 3809280, after 856064 before 958464, after 958464 before 962560, after 655360 { "stanford-crypto-ccm": 95, "stanford-crypto-pbkdf2": 86 } so, first gc() call after test does not reduced the memory usage.

Tooru Fujisawa [:arai]

Assignee

Comment 39

•

10 years ago

terrence, what would be the reason of the difference between 1st and 2nd gc call?

Flags: needinfo?(terrence)

Tooru Fujisawa [:arai]

Assignee

Comment 40

•

10 years ago

so far, looks like `gc()` call starts background sweeping, but it does not wait for the background sweeping it started. Also, it waits for already-running sweeping. So 2nd gc() call waits for the sweeping for 1st gc() call, and after 2nd gc() call returns, gcBytes is reduced. Is this expected behavior for gc() testing function? [main thread] gc(); // JavaScript | +- GC | +- JS::GCForReason | +- GCRuntime::gc | +- GCRuntime::collect | +- GCRuntime::gcCycle | +- waitBackgroundSweepEnd | | | +- state | | # returns IDLE on 1st gc() call | | # returns SWEEPING on 2nd gc() call | | | +- waitForBackgroundThread | # called only on 2nd gc() call | # after returned, gcBytes is reduced | +- GCRuntime::incrementalCollectSlice | +- GCRuntime::sweepPhase | +- GCRuntime::endSweepingZoneGroup | +- GCRuntime::queueZonesForBackgroundSweep | +- GCHelperState::maybeStartBackgroundSweep | +- GCHelperState::startBackgroundThread(SWEEPING) # starts background sweeping [background thread] HelperThread::threadLoop() | +- HelperThread::handleGCHelperWorkload | +- GCHelperState::work | +- GCHelperState::doSweep | +- sweepBackgroundThings | +- LifoAlloc::freeAll # take some time on 1st call, and 1st gc() returns while # LifoAlloc::freeAll is running

Terrence Cole [:terrence]

Comment 41

•

10 years ago

Yes, this is the current behavior of direct calls to gc(). GC invoked by the system is smart enough to skip the second GC if the first is still going, but direct calls to gc() from script is generally an indication that the invoker wants to verify that some piece of memory is gone, so we have to be thorough. I am currently improving this situation in bug 1249367.

Flags: needinfo?(terrence)

Tooru Fujisawa [:arai]

Assignee

Comment 42

•

10 years ago

thanks, if bug 1249367 changes the gc() function behavior to wait for the GC it starts, I can just wait for it. I'll look into stanford-crypto-ccm next, as it's still slower even if I call gc() twice.

Tooru Fujisawa [:arai]

Assignee

Comment 43

•

10 years ago

mmm, still the order in the LIST and the number of gc() affect the test result when Part 1-4 are applied. seems that there's yet another reason.

Tooru Fujisawa [:arai]

Assignee

Comment 44

•

10 years ago

Attached image The number of handleIonWorkload gets called for each function with stanford-crypto-pbkdf2 (obsolete) — Details

So far, each function is Ion compiled different times, depending on whether stanford-crypto-pbkdf2 is executed alone or after stanford-crypto-ccm. for example, when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm, sjcl.misc.pbkdf2 is compiled 5 times (compared to 3 times). maybe, ArrayConcat is executed under Interpreter more times when stanford-crypto-pbkdf2 is executed after stanford-crypto-ccm? I will look into it.

Tooru Fujisawa [:arai]

Assignee

Comment 45

•

10 years ago

time consumed by ArrayConcat itself is not so different stanford-crypto-pbkdf2 only: 11.5 [ms] stanford-crypto-pbkdf2 after stanford-crypto-ccm: 12.8 [ms] anyway, will try figuring out why the Ion code is discarded 2 more times.

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Comment 46

•

10 years ago

testing with Parts 1-4 applied. 1. the performance difference disappears when I rename all `sjcl` to `sjcl2` in stanford-crypto-pbkdf2.js and stanford-crypto-pbkdf2-data.js 2. I see both stanford-crypto-pbkdf2-data.js and stanford-crypto-ccm-data.js filenames in single iongraph when I execute following js -e "var tests = ['stanford-crypto-ccm', 'stanford-crypto-pbkdf2'], suitePath = 'test/kraken-1.1'; suiteName='kraken-1.1';" -f resources/sunspider-standalone-driver.js looks like, there `sjcl.codec.utf8String.toBits` and `sjcl.bitArray.P` in stanford-crypto-ccm-data.js are called inside some function in stanford-crypto-pbkdf2-data.js (not yet investigated what the function is tho)

Tooru Fujisawa [:arai]

Assignee

Updated

•

10 years ago

Depends on: 1252228

Tooru Fujisawa [:arai]

Assignee

Comment 47

•

10 years ago

Attached image Kraken time comparison with patch in bug 1252228 (obsolete) — Details

1. the performance difference in stanford-crypto-pbkdf2 between running alone and running with another tests gets somehow different with bug 1252228 patch, Part 1+2+3+4, pbkdf2 alone: 88.1 [ms] with bug 1252228 patch, Part 1+2+3+4, pbkdf2 after ccm: 83.2 [ms] now stanford-crypto-pbkdf2 is faster when stanford-crypto-ccm is executed before it. but now the score doesn't depend on the number of gc() call, nor whether running in different global or not. 2. the patch in bug 1252228 makes stanford-crypto-pbkdf2 faster even without this bug's patches so, now we have yet another performance regression issue for stanford-crypto-pbkdf2, and pre-existing regression for stanford-crypto-ccm, from m-i (bug 1252228 patch patch applied).

Tooru Fujisawa [:arai]

Assignee

Comment 48

•

10 years ago

looks like, inlineNewArrayTryReuseGroup fails several times because it tries to optimize too much. it expects the argument to be always same, but Array#concat can be called with different groups. maybe, I have to fallback to std_Array's inlining in that case.

Tooru Fujisawa [:arai]

Assignee

Comment 49

•

10 years ago

Attached file WIP patches (obsolete) — Details

Here's current WIP patches, including extra Part 5 that changes inlining of NewArrayTryReuseGroup to fallback to std_Array's one, in case there are multiple groups for template.

Tooru Fujisawa [:arai]

Assignee

Comment 50

•

10 years ago

Also, here's what I'm testing now with stanford-crypto-ccm. the difference between Part2 and Part3 that impacts the performance in stanford-crypto-ccm is something like following. following 3 version of ArrayConcat are the simplified Array#concat that works with stanford-crypto-ccm. That is, fast-path used in Part 3 (works with array.concat(array)). The difference between them is how to get 1st argument. The score regresses with latter 2 cases about 10ms. // fast function ArrayConcat(arg1) { var i = 0; i *= 1; var E = arguments[i]; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } // slow function ArrayConcat(arg1) { var i = 0; var E = arguments[i]; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } // also slow function ArrayConcat(arg1) { var E = arg1; var len1 = this.length; var len2 = E.length; var A = std_Array(len1 + len2); for (var n = 0; n < len1; n++) { if (n in this) _DefineDataProperty(A, n, this[n]); } for (var k = 0; k < len2; k++) { if (k in E) _DefineDataProperty(A, n, E[k]); n++; } return A; } Then, here's the minimal testcase that seems to be almost same situation as stanford-crypto-ccm's regression. the script takes single argument, when it's "A", it uses ArrayConcat_A, that corresponds to slow one above. and when it's others like "B", it uses ArrayConcat_B, that corresponds to fast one above. "use strict"; function ArrayConcat_A() { var i = 0; arguments[i]; for (var n = 0; n < 0; n++) { n in this; } } function ArrayConcat_B() { var i = 0; i *= 1; arguments[i]; for (var n = 0; n < 0; n++) { n in this; } } function f() { for (var i = 0; i < 10000; i++) { [].my_concat([]); for (var n = 0; n < 0; n++) [].slice(); } } if (scriptArgs[0] == "A") Array.prototype.my_concat = ArrayConcat_A; else Array.prototype.my_concat = ArrayConcat_B; var T = elapsed(); f(); print(elapsed() - T); Then, when I look the iongraph, ArrayConcat_A is inlined but ArrayConcat_B is not inlined. so I think the difference comes from inlining, and the reason is the way I used to access arguments. Then, when I increase the number of the loop (like i < 100000), the score changes and then ArrayConcat_A gets faster. and now I stuck here.

Tooru Fujisawa [:arai]

Assignee

Comment 51

•

10 years ago

Attached patch Part 4: Add NewArrayTryReuseGroup self-hosting intrinsic. (obsolete) — Details — Splinter Review

fixed Part 4 to use dedicated temp register instead of output register that can alias input.

Attachment #8716587 - Attachment is obsolete: true

Tooru Fujisawa [:arai]

Assignee

Comment 52

•

10 years ago

Attached file WIP patches (Part 1, 2, 2.1, 2.2) (obsolete) — Details

Now Kraken score gets better by checking IsPackedArray and skip `k in E` in that case (Part 2.1), thanks to jandem :) Part 2.2 is just for some possible cleanup. Part3-5 in previous posts are not used. Here're Kraken and Octane performance comparison, thanks to bbouvier :) Kraken benchmark score (average of 10 runs) http://bnjbvr.github.io/simplegraph/#title=Kraken%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=audio-fft,stanford-crypto-pbkdf2,audio-beat-detection,stanford-crypto-ccm,imaging-darkroom,json-parse-financial,audio-oscillator,ai-astar,audio-dft,stanford-crypto-sha256-iterative,json-stringify-tinderbox,imaging-gaussian-blur,stanford-crypto-aes,imaging-desaturate&values=before%2047.8%2074.6%2088%2074.9%2081.6%2036.4%2063.8%2065.8%20109.2%2039.2%2041%2064.2%2051.8%2053%0Apart2%2046.4%2092.6%2089.4%2076.7%2082.1%2036.8%2064.2%2065.6%20109.2%2039.3%2041%2064.3%2051.3%2053.6%0Apart2.1%2046.6%2081.6%2088.3%2074.9%2081.9%2036.6%2064.1%2065.7%20109%2036.4%2038.8%2064.4%2051.5%2053.3%0Apart2.2%2046.6%2081.8%2088.6%2075%2082%2036.6%2063.9%2065.6%20109.4%2036.4%2038.7%2064.4%2051.5%2053.2 Octane benchmark score (average of 10 runs) http://bnjbvr.github.io/simplegraph/#title=Octane%20benchmark%20score%20%28average%20of%2010%20runs%29&categories=Richards,DeltaBlue,Crypto,RayTrace,EarleyBoyer,RegExp,Splay,SplayLatency,NavierStokes,PdfJS,Mandreel,MandreelLatency,Gameboy,CodeLoad,Box2D,zlib,Typescript,Score&values=before%2032093%2064102%2030975%20117954%2032595%203880%2017804%2012523%2037782%2014402%2031255%2034548%2052844%2017265%2057200%2080568%2031340%2030397%0Apart2%2031867%2065302%2030837%20116592%2032310%203826%2017989%2013004%2037871%2014717%2031203%2033782%2053609%2017144%2057031%2080819%2031307%2030444%0Apart2.1%2032139%2065006%2030676%20116296%2032513%203876%2019411%2018949%2038127%2014807%2031053%2037245%2050540%2017517%2057930%2080317%2031216%2031416%0Apart2.2%2032204%2064728%2030976%20116430%2032611%203890%2018791%2017367%2038041%2014796%2030792%2038123%2050104%2017412%2057067%2080691%2031729%2031222 Kraken stanford-crypto-ccm is as fast as today's m-c, and performance regresion in stanford-crypto-pbkdf2 is about 9% (+7ms). then, now Octane SplayLatency and MandreelLatency show somewhat strange regression :/ following are SplayLatency result for each run, with part2.1: 13665 21148 21168 13144 13569 21102 21596 21513 21556 21026 and same for MandreelLatency: 38947 38947 38947 37767 37483 38348 37767 28984 37203 38055 sometimes they show nice score. Also, following are SplayLatency result with m-c: 13224 12421 12574 12452 12379 11439 12123 12914 12839 12862 and same for MandreelLatency: 38947 28487 28816 28165 37483 38947 38055 28984 38645 38947 SplayLatency doesn't show score around 21000 with m-c, so this will be something I have to investigate next. MandreelLatency shows around 28000 and 38000 with both m-c and part2.1, but there seems to be some difference in probability (maybe I have to check with more runs)

Tooru Fujisawa [:arai]

Assignee

Comment 53

•

10 years ago

again, it looks like a issue depends on executing order. SplayLatency score for 20 runs, runing splay.js only 11699 12216 11524 16435 11666 15628 11565 12836 12246 11930 12039 11977 11938 11728 11779 11591 12053 11961 12594 11645 SplayLatency score for 20 runs, runing splay.js after regexp.js 18539 18684 13321 14097 14019 12608 21694 18504 13918 14068 13870 14162 14233 13706 14287 18777 22132 13793 14151 21276

Tooru Fujisawa [:arai]

Assignee

Comment 54

•

10 years ago

just confirmed `SplayLatency: 22321` on m-c, so both SplayLatency and MandreelLatency seem to be the issue of the possibility.

Tooru Fujisawa [:arai]

Assignee

Comment 55

•

10 years ago

scratch that. I totally mis-read the Octane results. so, they shows bigger numbers when applying part 2.1, it means that it improves the performance.

Tooru Fujisawa [:arai]

Assignee

Comment 56

•

10 years ago

Attached patch Part 1: Add IsArray intrinsic. r=efaust — Details — Splinter Review

Attachment #8707459 - Attachment is obsolete: true

Attachment #8715141 - Attachment is obsolete: true

Attachment #8716585 - Attachment is obsolete: true

Attachment #8716586 - Attachment is obsolete: true

Attachment #8716588 - Attachment is obsolete: true

Attachment #8716592 - Attachment is obsolete: true

Attachment #8716593 - Attachment is obsolete: true

Attachment #8716597 - Attachment is obsolete: true

Attachment #8721042 - Attachment is obsolete: true

Attachment #8721045 - Attachment is obsolete: true

Attachment #8721609 - Attachment is obsolete: true

Attachment #8724434 - Attachment is obsolete: true

Attachment #8725172 - Attachment is obsolete: true

Attachment #8725579 - Attachment is obsolete: true

Attachment #8725604 - Attachment is obsolete: true

Attachment #8725669 - Attachment is obsolete: true

Attachment #8725746 - Flags: review+

Tooru Fujisawa [:arai]

Assignee

Comment 57

•

10 years ago

Attached patch Part 2: Self-host Array.prototype.concat. (obsolete) — Details — Splinter Review

folded Part 2.1 and 2.2, and updated spec date.

Attachment #8725747 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 58

•

10 years ago

Attached patch Part 2: Self-host Array.prototype.concat. (obsolete) — Details — Splinter Review

sorry, forgot to revert unnecessary reorder for Step 2.

Attachment #8725747 - Attachment is obsolete: true

Attachment #8725747 - Flags: review?(efaustbmo)

Attachment #8725770 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 59

•

10 years ago

Comment on attachment 8725770 [details] [diff] [review] Part 2: Self-host Array.prototype.concat. sorry again I should've checked IsArray (or at least IsObject) before IsPackedArray, as IsPackedArray accepts only object. will check jstests/jit-test first and then post fixed patch

Attachment #8725770 - Flags: review?(efaustbmo)

Tooru Fujisawa [:arai]

Assignee

Comment 60

•

10 years ago

Attached patch Part 2: Self-host Array.prototype.concat. — Details — Splinter Review

Moved IsPackedArray to IsArray branch, as we will change the IsArray call there to IsConcatSpreadable, and the difference is only in the loop there. IsConcatSpreadable also ensures the value is an object, so it can be just replaced. Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cb1f5318301

Attachment #8725770 - Attachment is obsolete: true

Attachment #8725896 - Flags: review?(efaustbmo)

Eric Faust [:efaust]

Comment 61

•

10 years ago

Comment on attachment 8725896 [details] [diff] [review] Part 2: Self-host Array.prototype.concat. Review of attachment 8725896 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! ::: js/src/builtin/Array.js @@ +876,5 @@ > + // Step 1. > + var O = ToObject(this); > + > + // Step 2. > + var A = std_Array(0); This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I assume. @@ +881,5 @@ > + > + // Step 3. > + var n = 0; > + > + // Step 4 (skipped). nit: prefer (implicit in |arguments|), or something ::: js/src/tests/ecma_3/Array/regress-322135-02.js @@ +1,1 @@ > +// |reftest| skip -- slow Do we will need to skip this? Do we want or need a followup bug to fix this? ::: js/src/vm/Xdr.h @@ +35,4 @@ > static const uint32_t XDR_BYTECODE_VERSION = > uint32_t(0xb973c0de - XDR_BYTECODE_VERSION_SUBTRAHEND); > > +static_assert(JSErr_Limit == 446, I've had to review this for a long time, I guess. Luckily, we don't need this anymore

Attachment #8725896 - Flags: review?(efaustbmo) → review+

Tooru Fujisawa [:arai]

Assignee

Comment 62

•

10 years ago

Thank you for reviewing :D (In reply to Eric Faust [:efaust] from comment #61) > Comment on attachment 8725896 [details] [diff] [review] > Part 2: Self-host Array.prototype.concat. > > Review of attachment 8725896 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good! > > ::: js/src/builtin/Array.js > @@ +876,5 @@ > > + // Step 1. > > + var O = ToObject(this); > > + > > + // Step 2. > > + var A = std_Array(0); > > This will go to ArraySpeciesCreate(0,O) in the other bug I'm reviewing, I > assume. Yes :) > ::: js/src/tests/ecma_3/Array/regress-322135-02.js > @@ +1,1 @@ > > +// |reftest| skip -- slow > > Do we will need to skip this? Do we want or need a followup bug to fix this? Yes, we need to skip this for now. It's related to bug 1234947. not sure if the bug can be fixed easily. the test is slow because of `4294967295` is not int32_t, and Ion cannot use Int32 in the loop in concat. > var length = 4294967295; > var array1 = new Array(length);

Tooru Fujisawa [:arai]

Assignee

Comment 63

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/17a5a95bf6cc6ddf4251dec76201baf20bbd3452 Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust https://hg.mozilla.org/integration/mozilla-inbound/rev/b535cc24f7d0b2703a43cf43fa371c6087dbb5e4 Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust

Tooru Fujisawa [:arai]

Assignee

Comment 64

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bc9e586a8f1944d87e53f601d41ea860b272a3bc Backed out changeset b535cc24f7d0 (bug 1233642) https://hg.mozilla.org/integration/mozilla-inbound/rev/203c7e4b8b20e9912f330ac630520508ec1e2f98 Backed out changeset 17a5a95bf6cc (bug 1233642)

Tooru Fujisawa [:arai]

Assignee

Comment 65

•

10 years ago

Attached patch Part 2.1: Exclude concat from Array ctor property skip list in xray. — Details — Splinter Review

In Part 2, Array#concat was changed from native to self-hosted, and it causes that xray exposes Array.concat generic. looks like other self-hosted generic are also not in the list to skip, so I think I can remove concat from there too. how do you think?

Attachment #8736028 - Flags: review?(efaustbmo)

Eric Faust [:efaust]

Comment 66

•

10 years ago

Comment on attachment 8736028 [details] [diff] [review] Part 2.1: Exclude concat from Array ctor property skip list in xray. Review of attachment 8736028 [details] [diff] [review]: ----------------------------------------------------------------- Forwarding to an xpconnect peer.

Attachment #8736028 - Flags: review?(efaustbmo) → review?(bobbyholley)

Bobby Holley (:bholley)

Updated

•

10 years ago

Attachment #8736028 - Flags: review?(bobbyholley) → review+

Tooru Fujisawa [:arai]

Assignee

Comment 67

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef8f61469172c5f62bf990e1b38f5c25526ce3a0 Bug 1233642 - Part 1: Add IsArray intrinsic. r=efaust https://hg.mozilla.org/integration/mozilla-inbound/rev/70e78d669f9de949dcbe972f74742045ca094fae Bug 1233642 - Part 2: Self-host Array.prototype.concat. r=efaust,bholley

Tom S. (please needinfo tschuster)

Updated

•

9 years ago

Blocks: 1041586

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 68

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ef8f61469172 https://hg.mozilla.org/mozilla-central/rev/70e78d669f9d

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

9 years ago

Depends on: 1271850

Tooru Fujisawa [:arai]

Assignee

Updated

•

9 years ago

Depends on: 1287520

You need to log in before you can comment on or make changes to this bug.