Closed Bug 1294606 Opened 9 years ago Closed 9 years ago

Folds Lsh/Rsh same bits to SignExntend

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: hev, Assigned: hev)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 4 obsolete files)

e.g. original instructions: Lsh v1, v0, 24 Rsh v2, v1, 24 folds to sign-extend: SignExtend(byte) v2, v0
Simulate Test Results: ==== x86_64 Intel i7-4790 3.60GHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .global main main: movabs $0x6fc23ac00,%rdx mov $0xff,%ecx 1: #if 0 movsbl %cxl,%ecx #else sall $24, %ecx sarl $24, %ecx #endif sub $0x1,%rdx jne 1b xor %eax,%eax retq .end main .size main, . - main Results (Unit: second) Test Loop count = 30000000000: Sign-extned: 8.497, 8.283, 8.083, 8.103, 8.463, 8.613, 8.137, 8.827, 8.533, 8.267 Average: 8.3806 Shift : 15.067, 15.18, 15.09, 15.097, 15.243, 15.163, 15.027, 15.027, 15.043, 15.03 Average: 15.0967 ==== ARM Rpi2 Model-B ARMv7 900MHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .align 2 .global main main: movw r0, #0x5e00 movt r0, #0xb2d0 mov r3, #0xff 1: #if 0 sxtb r3, r3 #else mov r3, r3, asl #24 mov r3, r3, asr #24 #endif subs r0, r0, #1 bne 1b bx lr .end main .size main, . - main Results (Unit: second) Test Loop count = 3000000000: Sign-extned: 6.84, 6.69, 6.71, 6.68, 6.7, 6.71, 6.71, 6.69, 6.71, 6.69 Average: 6.713 Shift : 13.53 ,13.39, 13.42, 13.39, 13.42, 13.38, 13.42, 13.39, 13.39, 13.39 Average: 13.412 ==== MIPS Loongson-3A2000 1.0GHz & Ci20 1.0GHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .align 2 .global main .ent main, 0 .type main, @function main: li $v0, 3000000000 li $t0, 0xff 1: #if 0 seb $t0, $t0 #else sll $t0, 24 sra $t0, 24 #endif addiu $v0, -1 bnez $v0, 1b jr $ra .end main .size main, . - main Results (Unit: second) Test Loop count = 3000000000: Loongson-3A2000: Sign-extned: 6.02, 6.023, 6.02, 6.02, 6.02, 6.023, 6.02, 6.023, 6.016, 6.02 Average: 6.0205 Shift : 9.031, 9.027, 9.031, 9.027, 9.027, 9.031, 9.027, 9.031, 9.031, 9.031 Average: 9.0294 Ci20: Sign-extned: 12.51, 12.53, 12.52, 12.54, 12.53, 12.53, 12.52, 12.54, 12.53, 12.53 Average: 12.528 Shift : 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51 Average: 12.51
Comment on attachment 8780423 [details] [diff] [review] 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch Review of attachment 8780423 [details] [diff] [review]: ----------------------------------------------------------------- Nice work, the main remark from nits bellow is that this code is too redundant, and you can factor most of it and leave the differences for the */MacroAssembler-*-inl.h implementations. ::: js/src/jit/MIR.cpp @@ +3004,5 @@ > +{ > + MDefinition* lhs = getOperand(0); > + MDefinition* rhs = getOperand(1); > + > + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op()) nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is more common in the code base, and less verbose. ::: js/src/jit/MIR.h @@ +5994,5 @@ > + public: > + enum Mode { > + Byte, > + Half, > + Word, nit: Word is not handled anywhere, remove it, and remove the default clauses from the switch statements, as these would generate a compiler warning if the switch is not complete. @@ +6014,5 @@ > + return new(alloc) MSignExtend(op, mode); > + } > + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) { > + return new(alloc) MSignExtend(op, mode); > + } nit: replace these New functions, including the NewAsmJS by the TRIVIAL_NEW_WRAPPER macro. @@ +6018,5 @@ > + } > + > + Mode mode() { return mode_; } > + > + ALLOW_CLONE(MSignExtend) follow-up: Add a recover instruction for MSignExtend. (see js/src/jit/Recover.h) ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp @@ +821,5 @@ > + Register out = ToRegister(ins->output()); > + > + switch (ins->mode()) { > + case MSignExtend::Byte: > + masm.as_seb(out, input); nit: Move this CodeGenerator function to the generic CodeGenerator, and add the following signature to the MacroAssembler.h file: inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH; Add these functions under the "Move instructions" section. ::: js/src/jit/x86/Lowering-x86.cpp @@ +700,5 @@ > + > +void > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins) > +{ > + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); Looking at the Intel documentation does not seem to mention any specific register allocation for MOVSX instructions. nit: This implies that we have the same LIRGenetator for all platform, thus move this to the Lowering.cpp file instead of having one for each architecture.
Attachment #8780423 - Flags: review?(nicolas.b.pierron) → feedback+
Thank you for review. :) (In reply to Nicolas B. Pierron [:nbp] from comment #3) > Comment on attachment 8780423 [details] [diff] [review] > 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch > > Review of attachment 8780423 [details] [diff] [review]: > ----------------------------------------------------------------- > > Nice work, the main remark from nits bellow is that this code is too > redundant, and you can factor most of it and leave the differences for the > */MacroAssembler-*-inl.h implementations. > > ::: js/src/jit/MIR.cpp > @@ +3004,5 @@ > > +{ > > + MDefinition* lhs = getOperand(0); > > + MDefinition* rhs = getOperand(1); > > + > > + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op()) > > nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is > more common in the code base, and less verbose. > > ::: js/src/jit/MIR.h > @@ +5994,5 @@ > > + public: > > + enum Mode { > > + Byte, > > + Half, > > + Word, > > nit: Word is not handled anywhere, remove it, and remove the default clauses > from the switch statements, as these would generate a compiler warning if > the switch is not complete. > > @@ +6014,5 @@ > > + return new(alloc) MSignExtend(op, mode); > > + } > > + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) { > > + return new(alloc) MSignExtend(op, mode); > > + } > > nit: replace these New functions, including the NewAsmJS by the > TRIVIAL_NEW_WRAPPER macro. > > @@ +6018,5 @@ > > + } > > + > > + Mode mode() { return mode_; } > > + > > + ALLOW_CLONE(MSignExtend) > > follow-up: Add a recover instruction for MSignExtend. (see > js/src/jit/Recover.h) How to verify recover instruction is correct? > > ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp > @@ +821,5 @@ > > + Register out = ToRegister(ins->output()); > > + > > + switch (ins->mode()) { > > + case MSignExtend::Byte: > > + masm.as_seb(out, input); > > nit: Move this CodeGenerator function to the generic CodeGenerator, and add > the following signature to the MacroAssembler.h file: > > inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH > inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH; > > Add these functions under the "Move instructions" section. > > ::: js/src/jit/x86/Lowering-x86.cpp > @@ +700,5 @@ > > + > > +void > > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins) > > +{ > > + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); > > Looking at the Intel documentation does not seem to mention any specific > register allocation for MOVSX instructions. At the start, The LIR::visitSignExtend is a common function, but jit-test is busted on x86 (32-bit). Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a subset of GPRs can be allocated here, not only %ebx. Do you know which use?? function is appropriate? https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 movsbl %dil,%ecx test.S: Assembler messages: test.S:8: Error: bad register name `%dil' or js/src/jit/Lowering.cpp: void LIRGenerator::visitSignExtend(MSignExtend* ins) { #ifdef JS_CODEGEN_X86 define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); #else define(new(alloc()) LSignExtend(useRegisterAtStart(ins->input()), ins->mode()), ins); #endif } What do you think? > > nit: This implies that we have the same LIRGenetator for all platform, thus > move this to the Lowering.cpp file instead of having one for each > architecture.
Flags: needinfo?(nicolas.b.pierron)
Attachment #8780423 - Attachment is obsolete: true
(In reply to Heiher [:hev] from comment #4) > (In reply to Nicolas B. Pierron [:nbp] from comment #3) > > > + ALLOW_CLONE(MSignExtend) > > > > follow-up: Add a recover instruction for MSignExtend. (see > > js/src/jit/Recover.h) > > How to verify recover instruction is correct? Have a look at http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-with-rinstructions.js > > Looking at the Intel documentation does not seem to mention any specific > > register allocation for MOVSX instructions. > > At the start, The LIR::visitSignExtend is a common function, but jit-test is > busted on x86 (32-bit). > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a > subset of GPRs can be allocated here, not only %ebx. Do you know which use?? > function is appropriate? > > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 > > movsbl %dil,%ecx > > test.S: Assembler messages: > test.S:8: Error: bad register name `%dil' > […] > > What do you think? Currently the register allocator is not aware of byte-size GPR. Currently we have a function named useByteOpRegister [1] which either takes any register on platform where this is doable (such as x64), or a fixed register (such as x86). Unfortunately, this function does not make use of the AtStart flag. I suggest to either add an enum optional argument for AtStart flag, or copy these useByteOpRegister to make a AtStart variant. [1] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-32
Flags: needinfo?(nicolas.b.pierron)
Attachment #8781468 - Attachment is obsolete: true
Attachment #8781524 - Flags: review?(nicolas.b.pierron)
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #6) > (In reply to Heiher [:hev] from comment #4) > > (In reply to Nicolas B. Pierron [:nbp] from comment #3) > > > > + ALLOW_CLONE(MSignExtend) > > > > > > follow-up: Add a recover instruction for MSignExtend. (see > > > js/src/jit/Recover.h) > > > > How to verify recover instruction is correct? > > Have a look at > http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce- > with-rinstructions.js > > > > Looking at the Intel documentation does not seem to mention any specific > > > register allocation for MOVSX instructions. > > > > At the start, The LIR::visitSignExtend is a common function, but jit-test is > > busted on x86 (32-bit). > > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. > > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a > > subset of GPRs can be allocated here, not only %ebx. Do you know which use?? > > function is appropriate? > > > > https://treeherder.mozilla.org/#/ > > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 > > > > movsbl %dil,%ecx > > > > test.S: Assembler messages: > > test.S:8: Error: bad register name `%dil' > > […] > > > > What do you think? > > Currently the register allocator is not aware of byte-size GPR. Currently > we have a function named useByteOpRegister [1] which either takes any > register on platform where this is doable (such as x64), or a fixed register > (such as x86). > > Unfortunately, this function does not make use of the AtStart flag. I > suggest to either add an enum optional argument for AtStart flag, or copy > these useByteOpRegister to make a AtStart variant. > > [1] > http://searchfox.org/mozilla-central/rev/ > 9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28- > 32 Thank you reply. :)
Fix build failure on arm.
Attachment #8781533 - Attachment is obsolete: true
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Attachment #8781853 - Flags: review?(nicolas.b.pierron)
Attachment #8781524 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8781534 [details] [diff] [review] Part 3 - Add tests for SignExtend. Review of attachment 8781534 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit-test/tests/ion/dce-with-rinstructions.js @@ +165,5 @@ > + if (uceFault_signextend(i) || uceFault_signextend(i)) > + assertEq(x, 99 /* = (99 << 24) >> 24 */); > + var y = (i << 16) >> 16; > + if (uceFault_signextend(i) || uceFault_signextend(i)) > + assertEq(y, 99 /* = (99 << 16) >> 16 */); This assert won't work as expected, create a new function to test the shifts by 16. Also add 2 new functions to test with negative numbers. (-1 * i) The reason this does not work as expect is that "uceFault" ensure the removal of the branch and the assertion is made to check the Recover instruction result. In this case, recover instructions are executed during the bailout path, thus the first assertion will used the recover instruction path, while the second will compute the result within baseline.
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8781853 [details] [diff] [review] Part 2 - Folds Lsh/Rsh same bits to SignExntend. Review of attachment 8781853 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/Lowering.cpp @@ +1307,5 @@ > > void > +LIRGenerator::visitSignExtend(MSignExtend* ins) > +{ > + define(new(alloc()) LSignExtend(useByteOpRegisterAtStart(ins->input()), ins->mode()), ins); nit: Only use useByteOpRegisterAtStart for MSignExtend::Byte. ::: js/src/vm/Interpreter-inl.h @@ +766,5 @@ > +template <typename T> > +static MOZ_ALWAYS_INLINE bool > +SignExtendOperation(JSContext* cx, HandleValue in, int* out) > +{ > + int i; nit: int32_t
Attachment #8781853 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8781534 - Attachment is obsolete: true
Attachment #8782723 - Flags: review?(nicolas.b.pierron)
Attachment #8782723 - Flags: review?(nicolas.b.pierron) → review+
Pushed by r@hev.cc: https://hg.mozilla.org/integration/mozilla-inbound/rev/3172e3fa6e24 Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/defd76119eda Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/4d05a40172ca Part 3: Add tests for SignExtend. r=nbp
sorry had to back this out for Spider Monkey ARM64 bustage, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-inbound#L979
Flags: needinfo?(r)
Backout by ihsiao@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/835eee39bb16 Backed out changeset 4d05a40172ca for Spider Monkey ARM64 bustage https://hg.mozilla.org/integration/mozilla-inbound/rev/6568553f7432 Backed out changeset defd76119eda https://hg.mozilla.org/integration/mozilla-inbound/rev/c9dcfd11f12d Backed out changeset 3172e3fa6e24
Pushed by r@hev.cc: https://hg.mozilla.org/integration/mozilla-inbound/rev/b67594606d54 Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/803ae1fb9740 Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/d547167d7b56 Part 3: Add tests for SignExtend. r=nbp
(In reply to Iris Hsiao [:ihsiao] from comment #18) > sorry had to back this out for Spider Monkey ARM64 bustage, e.g., > https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla- > inbound#L979 Fixed, thank you.
Flags: needinfo?(r)
Hi, I'm running into severe compilation time issues with recent versions of SpiderMonkey when running a (very) large asm.js file. I've bisected the problem down to patch part 2 of this bug. My first guess at what's going wrong is that the new MRSh::foldsTo function just gives up if it's not looking at a sign extension, rather than calling MBinaryBitwiseInstruction::foldsTo. The details: I'm working on a gcc (binutils,glibc) backend for asmjs (and wasm) at https://github.com/pipcet/asmjs. I'm producing large switch statements, one for each function, rather than using an Emscripten-like relooper. When compiling Perl, the Perl_yylex function turns into about 30,000 lines of code. When switching from SpiderMonkey-as-of-a-few-months ago to the current inbound branch, compilation time drastically increased, from one minute real time (which is painful to begin with) to about three minutes. Most of that time was indeed spent compiling Perl_yylex, and the problem appeared to be in the Ion register allocators. --wasm-always-baseline resulted in acceptable startup times, and nodejs --harmony similarly didn't seem to be too problematic. I bisected it down to part 2 of the patch for this bug, noticed the lack of inheritance for ::foldsTo, tried fixing it, and performance appears to have gone back to the previous level. I've attached a proposed patch which applies to the current (well, a few days old) inbound tree. If it looks okay I'll convert to SVN format.
I believe this fixes a minor issue that was causing major compile time performance degradation.
pipcet, thanks for the patch! Can you open a new bug for this and ask Nicolas (the reviewer of the patches in this bug) for feedback on your patch? This bug is closed as the original issue it was about is fixed and all patches have been landed. Attaching a new patch here runs the risk of it being overlooked.
Flags: needinfo?(pipcet)
Thanks! I've done that at bug 1318926. Do let me know if there's anything else I can do (I'll try whittling down the problematic script to something manageable in size to see whether that still triggers the bug).
Flags: needinfo?(pipcet)
Blocks: sm-js-perf
Priority: -- → P1
Duplicate of this bug: 1022946
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: