Skip to content

Enhancement of parser operand handling, compaction of instruction dumper #488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 28, 2015

Conversation

ruben-ayrapetyan
Copy link
Contributor

Binary size (Linux armv7-hf): 198328 -> 194232

Benchmark Rss Perf
Benchmark Rss Perf
./sunspider-1.0.2.stable/3d-cube.js 132 -> 132 (0.000%) 1.93 -> 1.844 (4.456%)
./sunspider-1.0.2.stable/access-binary-trees.js 88 -> 88 (0.000%) 1.647 -> 1.616 (1.882%)
./sunspider-1.0.2.stable/access-fannkuch.js 44 -> 44 (0.000%) 5.084 -> 4.796 (5.665%)
./sunspider-1.0.2.stable/access-nbody.js 64 -> 64 (0.000%) 2.535 -> 2.325 (8.284%)
./sunspider-1.0.2.stable/bitops-3bit-bits-in-byte.js 36 -> 36 (0.000%) 1.564 -> 1.553 (0.703%)
./sunspider-1.0.2.stable/bitops-bits-in-byte.js 36 -> 36 (0.000%) 2.039 -> 2.006 (1.618%)
./sunspider-1.0.2.stable/bitops-bitwise-and.js 28 -> 28 (0.000%) 2.01 -> 1.926 (4.179%)
./sunspider-1.0.2.stable/controlflow-recursive.js 212 -> 212 (0.000%) 1.697 -> 1.691 (0.354%)
./sunspider-1.0.2.stable/date-format-xparb.js 96 -> 96 (0.000%) 1.793 -> 1.791 (0.112%)
./sunspider-1.0.2.stable/math-cordic.js 40 -> 40 (0.000%) 2.385 -> 2.37 (0.629%)
./sunspider-1.0.2.stable/math-partial-sums.js 40 -> 40 (0.000%) 1.438 -> 1.435 (0.209%)
./sunspider-1.0.2.stable/math-spectral-norm.js 48 -> 48 (0.000%) 1.616 -> 1.572 (2.723%)
./sunspider-1.0.2.stable/string-fasta.js 52 -> 52 (0.000%) 15.453 -> 15.233 (1.424%)
Geometric mean 0% 2.5098%

@ruben-ayrapetyan ruben-ayrapetyan added enhancement An improvement parser Related to the JavaScript parser development Feature implementation labels Jul 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan changed the title Enhancement of parser operand handling and compaction of instruction dumper Enhancement of parser operand handling, compaction of instruction dumper Jul 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from cb8bec7 to b7f9faa Compare July 28, 2015 20:41
@egavrin
Copy link
Contributor

egavrin commented Jul 29, 2015

Please fix precommit

/**
* Operand (descriptor of value or reference in context of parser)
*/
class jsp_operand_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to C++ is still under discussion in #470 with no conclusion yet. This change would already start adding such features (i.e., classes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akiss77, I'll provide measurements for both versions (i.e. before switching to class, and just after) today.

I think, we should perform the measurements on engine code, rather on some external tests / benchmarks. Because of this, I prepared the patch with usage of classes.

So, we could see if usage of classes could be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akiss77, I've performed the measurements on the following versions (if I didn't missed anything, the versions are only different usage of struct / class for parser operand):

https://github.com/Samsung/jerryscript/compare/Ruben-class-usage-investigation-prev...Ruben-class-usage-investigation-new?expand=1

Performance was measured in --parse-only mode, as the changes involve only parser.
Benchmark, used for the test was generated by the following script:

rm -f bench.js
for i in `seq 1 1 100`
do
    echo "a = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 & (1 * a * b + o.p * q - 1 + 'str');"
    echo "if (a) {b = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 ^ (1 * a * b + o.p * q - 1 + 'str');}"
    echo "else { for (i = 0; i < 1000; i++) { c = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 | (1 * a * b + o.p * q - 1 + 'str'); } }"
done >> bench.js
  • release.mcu_stm32f4-cp (gcc version 4.8.2 (4.8.2-14ubuntu1+6)):
    • binary size
      • struct: 150700 bytes
      • class: 151012 bytes
  • release.linux (gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-16ubuntu4), hard float):
    • binary size (after strip):
      • struct: 190152 bytes
      • class: 190152 bytes
    • performance (every value in the sequence is a mean by 5 consecutive measurements):
      • struct:
        • 3.332
        • 3.328
        • 3.334
        • 3.34
        • 3.334
      • class:
        • 3.331
        • 3.334
        • 3.33
        • 3.337
        • 3.334

On mcu version binary size of build with class-based implementation is greater on 312 bytes.
Seems that this is caused by extra inline of a functions in build with C++ class-based implementation, relatively to build with C struct-based implementation.

In the performance measurements above share of parse_* functions was very small - about 0.01% or less.

To increase share of parse_* functions, I've added some performance optimizations (for heap allocator and linked list collection) to both builds.
On the new versions I've got the following (same environment; each value is mean of 20 measurements):

  • struct:
    • 0.1445
    • 0.14475
    • 0.14475
    • 0.14475
    • 0.14475
  • class:
    • 0.14525
    • 0.145
    • 0.14475
    • 0.145
    • 0.14525

So, time of build that uses class implementation is slower on about 0.2%.

However, the following is part of performance profile, consisting of first two parse_*expression functions in each profile, collected using perf record utility with frequency argument -F 50000:

  • struct:
    • parse_member_expression: 0.36%
    • parse_unary_expression: 0.27%
  • class:
    • parse_member_expression: 0.35% (a bit less)
    • parse_unary_expression: 0.27% (same)

Considering the profile, we can see that actually reduce of performance is not relevant to change of parse_*expression functions' performance.

For example, in the same profile:

  • struct:
    • serializer_dump_op_meta: 0.22%
    • generate_instr: 0.29%
    • current_locus: 0.22%
    • linked_list_element: 12.86%
    • memcpy: 5.23%
  • class:
    • serializer_dump_op_meta: 0.16%
    • generate_instr: 0.37%
    • current_locus: 0.49%
    • linked_list_element: 13.33%
    • memcpy: 5.22%

Considering the above, in the case the C++-feature itself doesn't involve binary size or performance.

For me it seems that, the C++-feature itself doesn't involve binary size of performance in general case too, and so can be used, with accurate consideration of consequences in each case (as for any other way of implementation).

In other words, there definitely can be some cases, in which performance is reduced because of using some feature, but it seems to me that the cases should be considered in the feature-independent way, i.e. by analysing actual origin of the reduce.

Concerning classes, I don't see any "physical" difference between basic classes and structures. For me, it seems to be just different way to declare the same behaviour.
Actual difference can occur, for example, during implementation of some default operations (like default constructor) that would add unnecessary operations in some cases, in which the operations are not optimized out. The cases should definitely be considered very accurately.

@akiss77, @zherczeg, @egavrin, @sand1k, could you, please, share your opinion about this?

@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from b7f9faa to 438036a Compare July 29, 2015 21:04
@ruben-ayrapetyan
Copy link
Contributor Author

@egavrin, precommit fixed.

@egavrin
Copy link
Contributor

egavrin commented Jul 30, 2015

This change is OK for me. I don't see any changes in binary size, memory consumption or performance. Additionally, classes are already used in other parts of the engine. Anyway, significant binary size reduction should be done other way, not by limiting C++ -> C.

@egavrin egavrin assigned zherczeg and unassigned egavrin Jul 30, 2015
@egavrin
Copy link
Contributor

egavrin commented Jul 31, 2015

@akiss77 @zherczeg can we merge this?

@akosthekiss
Copy link
Member

It's hard to object against the current patch because the measurements don't show big differences. However:

  • The code size increase, however small in this case, only cca. 300 bytes, shows that we don't have everything under control as much as we have (had) with C (i.e., inlining in this case). This can escalate the more we start using C++ features.
  • The other concern of mine is more subtle: should we go for classes, we will be left with no other option but try/catch exception handling -- setjmp/longjmp is a no-no in the presence of objects with constructors/destructors. However, we should not want to introduce the bloat that comes with EH frames, should we? (BTW, we are using setjmp/longjmp in the parser!) So, this current change with seemingly no big effect may be limiting our options in the future.

So, my vote is still against classes, exceptions, and C++ in general -- unless size is not that important anymore.

That being said, that's just my two cents. I have no formal LGTM right, so no formal protest right either. (But I do protest informally :) )

@ruben-ayrapetyan
Copy link
Contributor Author

It's hard to object against the current patch because the measurements don't show big differences.
However:
The code size increase, however small in this case, only cca. 300 bytes, shows that we don't have >everything under control as much as we have (had) with C (i.e., inlining in this case). This can escalate the >more we start using C++ features.

@akiss77, I'm not sure that inlining is controlled less or more, depending on C or C++. Is it so?

I think that in the case inlining can be evidence of more efficient code generation: usually, inline algorithns use some heuristics, like size of function to inline, etc. So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined. Can't it be the cause?

The other concern of mine is more subtle: should we go for classes, we will be left with no other option >but > try/catch exception handling -- setjmp/longjmp is a no-no in the presence of objects with >constructors/destructors.

It seems to me, that if we don't use classes with constructors / destructors, then we should emulate the constructors / desctructors mechanism manually, and calling them somehow upon longjmp, manually unwinding stack.
I think that in the case, language built-in feature could be more efficient than manual implementation of the same behaviour, achieved with same level of effort.
However, by switching to classes with constructors / destructors, we don't lose possibility to implement more efficient own exception handling mechanism - we just need to call appropriate constructors / destructors ourself upon exceptions, the same way as we would call another per-stack-frame handlers in the case. So, exception handling mechanism can be left under discussion for now.

So, my vote is still against classes, exceptions, and C++ in general -- unless size is not that important anymore.
I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently. On the other hand, any other language also can be used in a way, that disallows achieving of enough performance / binary size / memory consumption level.

What do you think about this?

@akosthekiss
Copy link
Member

So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined.

Well, we have release builds here, which implies -Os. Then, the goal of the compiler should be optimization for size, which almost always contradicts with function inlining. In this current case it certainly did not shrink the size, however. That's bad in my eyes.

we don't lose possibility to implement more efficient own exception handling mechanism

You mean implementing a totally self-invented exception handling solution? How could we make the compiler know about it and emit code that conforms to our mechanism?

I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently.

Eventhough that's picky, the above 300 bytes code increase does not underpin that completely. (Don't misunderstand me, I use C++ often in desktop environment, and I'm fairly content with its performance and convenience. But that's normally the -O2 and -O3 "world". I haven't been convinced yet that C++ is really the language to use in embedded, unless we restrict it so much that it becomes Clean C.)

On the other hand, any other language also can be used in a way, that disallows achieving of enough performance / binary size / memory consumption level.

That's true, but not the other way around, though.

@ruben-ayrapetyan
Copy link
Contributor Author

So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined.

Well, we have release builds here, which implies -Os. Then, the goal of the compiler should be optimization for size, which almost always contradicts with function inlining. In this current case it certainly did not shrink the size, however. That's bad in my eyes.

I agree that gcc / g++ with -Os doesn't actually output minimal possible code (I observed many cases of possible reduce of code size). However, I don't see any evidence that switching between C and C++ changes anything from this view point.

we don't lose possibility to implement more efficient own exception handling mechanism

You mean implementing a totally self-invented exception handling solution? How could we make the compiler know about it and emit code that conforms to our mechanism?

Actually, in the same way as we can do it in C. As C++ compiler has native mechanism of exception handling, and C compiler - doesn't have, I would prefer to use C++ compiler.

I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently.

Eventhough that's picky, the above 300 bytes code increase does not underpin that completely. (Don't misunderstand me, I use C++ often in desktop environment, and I'm fairly content with its performance and convenience. But that's normally the -O2 and -O3 "world". I haven't been convinced yet that C++ is really the language to use in embedded, unless we restrict it so much that it becomes Clean C.)

As I wrote above, I agree that gcc / g++ with -Os doesn't actually output minimal possible code, but I see no evidence that switching between C and C++ influences code size, in average.

@akiss77, @zherczeg, thank you very much for sharing your opinion.
Don't you mind landing this specific patch, considering provided measurements?

@akosthekiss
Copy link
Member

Since we don't agree completely on all points, I humbly propagate the chance of giving a +1 LGTM to someone else.

@ruben-ayrapetyan ruben-ayrapetyan assigned egavrin and unassigned zherczeg Jul 31, 2015

return _data.idx_const;
} /* get_idx_const */
private:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a newline before private.

@zherczeg
Copy link
Member

zherczeg commented Aug 3, 2015

Looks good to me (we have our concerns about C++ but we don't want to slow you down). We should introduce style guides for C++ code.

@egavrin
Copy link
Contributor

egavrin commented Aug 3, 2015

@zherczeg thank you!

@ruben-ayrapetyan rebase to master and make push

@egavrin egavrin assigned ruben-ayrapetyan and unassigned egavrin Aug 4, 2015
* @return constructed operand
*/
static jsp_operand_t
make_idx_const_operand (vm_idx_t cnst) /**< integer in vm_idx_t range */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cnst -> idx_const

@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 2 times, most recently from c4fbe97 to f54547a Compare August 12, 2015 13:29
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 4 times, most recently from 8af9143 to a416184 Compare August 19, 2015 11:53
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 2 times, most recently from e1116a1 to c80704c Compare August 27, 2015 18:24
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from c80704c to 9f2c803 Compare August 28, 2015 11:44
JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
… types to enum defined in the class.

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from 9f2c803 to 28ffadd Compare August 28, 2015 12:35
…ALUE -> VM_IDX_EMPTY / VM_IDX_REWRITE_GENERAL_CASE, LITERAL_TO_REWRITE -> VM_IDX_REWRITE_LITERAL_UID.

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
…e dumper, by extracting them to several separate routines that can be used for most cases, remove getop_* routines from vm.

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from 28ffadd to a00079e Compare August 28, 2015 13:22
@ruben-ayrapetyan ruben-ayrapetyan merged commit a00079e into master Aug 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan deleted the Ruben-compaction-of-opcodes-dumper branch August 28, 2015 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Feature implementation enhancement An improvement parser Related to the JavaScript parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants