Closed
Bug 1294606
Opened 9 years ago
Closed 9 years ago
Folds Lsh/Rsh same bits to SignExntend
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: hev, Assigned: hev)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 4 obsolete files)
7.08 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
13.60 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
2.29 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
466 bytes,
patch
|
Details | Diff | Splinter Review |
e.g.
original instructions:
Lsh v1, v0, 24
Rsh v2, v1, 24
folds to sign-extend:
SignExtend(byte) v2, v0
Assignee | ||
Comment 1•9 years ago
|
||
Simulate Test Results:
==== x86_64 Intel i7-4790 3.60GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
.text
.global main
main:
movabs $0x6fc23ac00,%rdx
mov $0xff,%ecx
1:
#if 0
movsbl %cxl,%ecx
#else
sall $24, %ecx
sarl $24, %ecx
#endif
sub $0x1,%rdx
jne 1b
xor %eax,%eax
retq
.end main
.size main, . - main
Results (Unit: second) Test Loop count = 30000000000:
Sign-extned: 8.497, 8.283, 8.083, 8.103, 8.463, 8.613, 8.137, 8.827, 8.533, 8.267
Average: 8.3806
Shift : 15.067, 15.18, 15.09, 15.097, 15.243, 15.163, 15.027, 15.027, 15.043, 15.03
Average: 15.0967
==== ARM Rpi2 Model-B ARMv7 900MHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
.text
.align 2
.global main
main:
movw r0, #0x5e00
movt r0, #0xb2d0
mov r3, #0xff
1:
#if 0
sxtb r3, r3
#else
mov r3, r3, asl #24
mov r3, r3, asr #24
#endif
subs r0, r0, #1
bne 1b
bx lr
.end main
.size main, . - main
Results (Unit: second) Test Loop count = 3000000000:
Sign-extned: 6.84, 6.69, 6.71, 6.68, 6.7, 6.71, 6.71, 6.69, 6.71, 6.69
Average: 6.713
Shift : 13.53 ,13.39, 13.42, 13.39, 13.42, 13.38, 13.42, 13.39, 13.39, 13.39
Average: 13.412
==== MIPS Loongson-3A2000 1.0GHz & Ci20 1.0GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
.text
.align 2
.global main
.ent main, 0
.type main, @function
main:
li $v0, 3000000000
li $t0, 0xff
1:
#if 0
seb $t0, $t0
#else
sll $t0, 24
sra $t0, 24
#endif
addiu $v0, -1
bnez $v0, 1b
jr $ra
.end main
.size main, . - main
Results (Unit: second) Test Loop count = 3000000000:
Loongson-3A2000:
Sign-extned: 6.02, 6.023, 6.02, 6.02, 6.02, 6.023, 6.02, 6.023, 6.016, 6.02
Average: 6.0205
Shift : 9.031, 9.027, 9.031, 9.027, 9.027, 9.031, 9.027, 9.031, 9.031, 9.031
Average: 9.0294
Ci20:
Sign-extned: 12.51, 12.53, 12.52, 12.54, 12.53, 12.53, 12.52, 12.54, 12.53, 12.53
Average: 12.528
Shift : 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51
Average: 12.51
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8780423 -
Flags: review?(nicolas.b.pierron)
Comment 3•9 years ago
|
||
Comment on attachment 8780423 [details] [diff] [review]
0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch
Review of attachment 8780423 [details] [diff] [review]:
-----------------------------------------------------------------
Nice work, the main remark from nits bellow is that this code is too redundant, and you can factor most of it and leave the differences for the */MacroAssembler-*-inl.h implementations.
::: js/src/jit/MIR.cpp
@@ +3004,5 @@
> +{
> + MDefinition* lhs = getOperand(0);
> + MDefinition* rhs = getOperand(1);
> +
> + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())
nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is more common in the code base, and less verbose.
::: js/src/jit/MIR.h
@@ +5994,5 @@
> + public:
> + enum Mode {
> + Byte,
> + Half,
> + Word,
nit: Word is not handled anywhere, remove it, and remove the default clauses from the switch statements, as these would generate a compiler warning if the switch is not complete.
@@ +6014,5 @@
> + return new(alloc) MSignExtend(op, mode);
> + }
> + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> + return new(alloc) MSignExtend(op, mode);
> + }
nit: replace these New functions, including the NewAsmJS by the TRIVIAL_NEW_WRAPPER macro.
@@ +6018,5 @@
> + }
> +
> + Mode mode() { return mode_; }
> +
> + ALLOW_CLONE(MSignExtend)
follow-up: Add a recover instruction for MSignExtend. (see js/src/jit/Recover.h)
::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +821,5 @@
> + Register out = ToRegister(ins->output());
> +
> + switch (ins->mode()) {
> + case MSignExtend::Byte:
> + masm.as_seb(out, input);
nit: Move this CodeGenerator function to the generic CodeGenerator, and add the following signature to the MacroAssembler.h file:
inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;
Add these functions under the "Move instructions" section.
::: js/src/jit/x86/Lowering-x86.cpp
@@ +700,5 @@
> +
> +void
> +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> +{
> + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
Looking at the Intel documentation does not seem to mention any specific register allocation for MOVSX instructions.
nit: This implies that we have the same LIRGenetator for all platform, thus move this to the Lowering.cpp file instead of having one for each architecture.
Attachment #8780423 -
Flags: review?(nicolas.b.pierron) → feedback+
Assignee | ||
Comment 4•9 years ago
|
||
Thank you for review. :)
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> Comment on attachment 8780423 [details] [diff] [review]
> 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch
>
> Review of attachment 8780423 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Nice work, the main remark from nits bellow is that this code is too
> redundant, and you can factor most of it and leave the differences for the
> */MacroAssembler-*-inl.h implementations.
>
> ::: js/src/jit/MIR.cpp
> @@ +3004,5 @@
> > +{
> > + MDefinition* lhs = getOperand(0);
> > + MDefinition* rhs = getOperand(1);
> > +
> > + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())
>
> nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is
> more common in the code base, and less verbose.
>
> ::: js/src/jit/MIR.h
> @@ +5994,5 @@
> > + public:
> > + enum Mode {
> > + Byte,
> > + Half,
> > + Word,
>
> nit: Word is not handled anywhere, remove it, and remove the default clauses
> from the switch statements, as these would generate a compiler warning if
> the switch is not complete.
>
> @@ +6014,5 @@
> > + return new(alloc) MSignExtend(op, mode);
> > + }
> > + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> > + return new(alloc) MSignExtend(op, mode);
> > + }
>
> nit: replace these New functions, including the NewAsmJS by the
> TRIVIAL_NEW_WRAPPER macro.
>
> @@ +6018,5 @@
> > + }
> > +
> > + Mode mode() { return mode_; }
> > +
> > + ALLOW_CLONE(MSignExtend)
>
> follow-up: Add a recover instruction for MSignExtend. (see
> js/src/jit/Recover.h)
How to verify recover instruction is correct?
>
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +821,5 @@
> > + Register out = ToRegister(ins->output());
> > +
> > + switch (ins->mode()) {
> > + case MSignExtend::Byte:
> > + masm.as_seb(out, input);
>
> nit: Move this CodeGenerator function to the generic CodeGenerator, and add
> the following signature to the MacroAssembler.h file:
>
> inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
> inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;
>
> Add these functions under the "Move instructions" section.
>
> ::: js/src/jit/x86/Lowering-x86.cpp
> @@ +700,5 @@
> > +
> > +void
> > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> > +{
> > + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
>
> Looking at the Intel documentation does not seem to mention any specific
> register allocation for MOVSX instructions.
At the start, The LIR::visitSignExtend is a common function, but jit-test is busted on x86 (32-bit).
Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a subset of GPRs can be allocated here, not only %ebx. Do you know which use?? function is appropriate?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
movsbl %dil,%ecx
test.S: Assembler messages:
test.S:8: Error: bad register name `%dil'
or
js/src/jit/Lowering.cpp:
void
LIRGenerator::visitSignExtend(MSignExtend* ins)
{
#ifdef JS_CODEGEN_X86
define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
#else
define(new(alloc()) LSignExtend(useRegisterAtStart(ins->input()), ins->mode()), ins);
#endif
}
What do you think?
>
> nit: This implies that we have the same LIRGenetator for all platform, thus
> move this to the Lowering.cpp file instead of having one for each
> architecture.
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 5•9 years ago
|
||
Attachment #8780423 -
Attachment is obsolete: true
Comment 6•9 years ago
|
||
(In reply to Heiher [:hev] from comment #4)
> (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > + ALLOW_CLONE(MSignExtend)
> >
> > follow-up: Add a recover instruction for MSignExtend. (see
> > js/src/jit/Recover.h)
>
> How to verify recover instruction is correct?
Have a look at
http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-with-rinstructions.js
> > Looking at the Intel documentation does not seem to mention any specific
> > register allocation for MOVSX instructions.
>
> At the start, The LIR::visitSignExtend is a common function, but jit-test is
> busted on x86 (32-bit).
> Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> function is appropriate?
>
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
>
> movsbl %dil,%ecx
>
> test.S: Assembler messages:
> test.S:8: Error: bad register name `%dil'
> […]
>
> What do you think?
Currently the register allocator is not aware of byte-size GPR. Currently we have a function named useByteOpRegister [1] which either takes any register on platform where this is doable (such as x64), or a fixed register (such as x86).
Unfortunately, this function does not make use of the AtStart flag. I suggest to either add an enum optional argument for AtStart flag, or copy these useByteOpRegister to make a AtStart variant.
[1] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-32
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8781468 -
Attachment is obsolete: true
Attachment #8781524 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 8•9 years ago
|
||
Attachment #8781533 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 9•9 years ago
|
||
Attachment #8781534 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to Heiher [:hev] from comment #4)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > > + ALLOW_CLONE(MSignExtend)
> > >
> > > follow-up: Add a recover instruction for MSignExtend. (see
> > > js/src/jit/Recover.h)
> >
> > How to verify recover instruction is correct?
>
> Have a look at
> http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-
> with-rinstructions.js
>
> > > Looking at the Intel documentation does not seem to mention any specific
> > > register allocation for MOVSX instructions.
> >
> > At the start, The LIR::visitSignExtend is a common function, but jit-test is
> > busted on x86 (32-bit).
> > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> > subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> > function is appropriate?
> >
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
> >
> > movsbl %dil,%ecx
> >
> > test.S: Assembler messages:
> > test.S:8: Error: bad register name `%dil'
> > […]
> >
> > What do you think?
>
> Currently the register allocator is not aware of byte-size GPR. Currently
> we have a function named useByteOpRegister [1] which either takes any
> register on platform where this is doable (such as x64), or a fixed register
> (such as x86).
>
> Unfortunately, this function does not make use of the AtStart flag. I
> suggest to either add an enum optional argument for AtStart flag, or copy
> these useByteOpRegister to make a AtStart variant.
>
> [1]
> http://searchfox.org/mozilla-central/rev/
> 9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-
> 32
Thank you reply. :)
Assignee | ||
Comment 11•9 years ago
|
||
Assignee | ||
Comment 12•9 years ago
|
||
Fix build failure on arm.
Attachment #8781533 -
Attachment is obsolete: true
Attachment #8781533 -
Flags: review?(nicolas.b.pierron)
Attachment #8781853 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 13•9 years ago
|
||
Updated•9 years ago
|
Attachment #8781524 -
Flags: review?(nicolas.b.pierron) → review+
Comment 14•9 years ago
|
||
Comment on attachment 8781534 [details] [diff] [review]
Part 3 - Add tests for SignExtend.
Review of attachment 8781534 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit-test/tests/ion/dce-with-rinstructions.js
@@ +165,5 @@
> + if (uceFault_signextend(i) || uceFault_signextend(i))
> + assertEq(x, 99 /* = (99 << 24) >> 24 */);
> + var y = (i << 16) >> 16;
> + if (uceFault_signextend(i) || uceFault_signextend(i))
> + assertEq(y, 99 /* = (99 << 16) >> 16 */);
This assert won't work as expected, create a new function to test the shifts by 16.
Also add 2 new functions to test with negative numbers. (-1 * i)
The reason this does not work as expect is that "uceFault" ensure the removal of the branch and the assertion is made to check the Recover instruction result. In this case, recover instructions are executed during the bailout path, thus the first assertion will used the recover instruction path, while the second will compute the result within baseline.
Attachment #8781534 -
Flags: review?(nicolas.b.pierron)
Comment 15•9 years ago
|
||
Comment on attachment 8781853 [details] [diff] [review]
Part 2 - Folds Lsh/Rsh same bits to SignExntend.
Review of attachment 8781853 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/Lowering.cpp
@@ +1307,5 @@
>
> void
> +LIRGenerator::visitSignExtend(MSignExtend* ins)
> +{
> + define(new(alloc()) LSignExtend(useByteOpRegisterAtStart(ins->input()), ins->mode()), ins);
nit: Only use useByteOpRegisterAtStart for MSignExtend::Byte.
::: js/src/vm/Interpreter-inl.h
@@ +766,5 @@
> +template <typename T>
> +static MOZ_ALWAYS_INLINE bool
> +SignExtendOperation(JSContext* cx, HandleValue in, int* out)
> +{
> + int i;
nit: int32_t
Attachment #8781853 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 16•9 years ago
|
||
Attachment #8781534 -
Attachment is obsolete: true
Attachment #8782723 -
Flags: review?(nicolas.b.pierron)
Updated•9 years ago
|
Attachment #8782723 -
Flags: review?(nicolas.b.pierron) → review+
Comment 17•9 years ago
|
||
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3172e3fa6e24
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/defd76119eda
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d05a40172ca
Part 3: Add tests for SignExtend. r=nbp
![]() |
||
Comment 18•9 years ago
|
||
sorry had to back this out for Spider Monkey ARM64 bustage, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-inbound#L979
Flags: needinfo?(r)
Comment 19•9 years ago
|
||
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/835eee39bb16
Backed out changeset 4d05a40172ca for Spider Monkey ARM64 bustage
https://hg.mozilla.org/integration/mozilla-inbound/rev/6568553f7432
Backed out changeset defd76119eda
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9dcfd11f12d
Backed out changeset 3172e3fa6e24
Comment 20•9 years ago
|
||
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b67594606d54
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/803ae1fb9740
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/d547167d7b56
Part 3: Add tests for SignExtend. r=nbp
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to Iris Hsiao [:ihsiao] from comment #18)
> sorry had to back this out for Spider Monkey ARM64 bustage, e.g.,
> https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-
> inbound#L979
Fixed, thank you.
Flags: needinfo?(r)
Comment 22•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b67594606d54
https://hg.mozilla.org/mozilla-central/rev/803ae1fb9740
https://hg.mozilla.org/mozilla-central/rev/d547167d7b56
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 23•9 years ago
|
||
Hi,
I'm running into severe compilation time issues with recent versions of SpiderMonkey when running a (very) large asm.js file. I've bisected the problem down to patch part 2 of this bug. My first guess at what's going wrong is that the new MRSh::foldsTo function just gives up if it's not looking at a sign extension, rather than calling MBinaryBitwiseInstruction::foldsTo.
The details:
I'm working on a gcc (binutils,glibc) backend for asmjs (and wasm) at https://github.com/pipcet/asmjs. I'm producing large switch statements, one for each function, rather than using an Emscripten-like relooper.
When compiling Perl, the Perl_yylex function turns into about 30,000 lines of code. When switching from SpiderMonkey-as-of-a-few-months ago to the current inbound branch, compilation time drastically increased, from one minute real time (which is painful to begin with) to about three minutes. Most of that time was indeed spent compiling Perl_yylex, and the problem appeared to be in the Ion register allocators. --wasm-always-baseline resulted in acceptable startup times, and nodejs --harmony similarly didn't seem to be too problematic.
I bisected it down to part 2 of the patch for this bug, noticed the lack of inheritance for ::foldsTo, tried fixing it, and performance appears to have gone back to the previous level.
I've attached a proposed patch which applies to the current (well, a few days old) inbound tree. If it looks okay I'll convert to SVN format.
Comment 24•9 years ago
|
||
I believe this fixes a minor issue that was causing major compile time performance degradation.
Comment 25•9 years ago
|
||
pipcet, thanks for the patch! Can you open a new bug for this and ask Nicolas (the reviewer of the patches in this bug) for feedback on your patch?
This bug is closed as the original issue it was about is fixed and all patches have been landed. Attaching a new patch here runs the risk of it being overlooked.
Flags: needinfo?(pipcet)
Comment 26•9 years ago
|
||
Thanks! I've done that at bug 1318926. Do let me know if there's anything else I can do (I'll try whittling down the problematic script to something manageable in size to see whether that still triggers the bug).
Flags: needinfo?(pipcet)
Updated•9 years ago
|
Blocks: sm-js-perf
Priority: -- → P1
You need to log in
before you can comment on or make changes to this bug.
Description
•