Enhancement of parser operand handling, compaction of instruction dumper #488

ruben-ayrapetyan · 2015-07-28T20:27:05Z

Binary size (Linux armv7-hf): 198328 -> 194232

Benchmark	Rss	Perf
Benchmark	Rss	Perf
./sunspider-1.0.2.stable/3d-cube.js	132 -> 132 (0.000%)	1.93 -> 1.844 (4.456%)
./sunspider-1.0.2.stable/access-binary-trees.js	88 -> 88 (0.000%)	1.647 -> 1.616 (1.882%)
./sunspider-1.0.2.stable/access-fannkuch.js	44 -> 44 (0.000%)	5.084 -> 4.796 (5.665%)
./sunspider-1.0.2.stable/access-nbody.js	64 -> 64 (0.000%)	2.535 -> 2.325 (8.284%)
./sunspider-1.0.2.stable/bitops-3bit-bits-in-byte.js	36 -> 36 (0.000%)	1.564 -> 1.553 (0.703%)
./sunspider-1.0.2.stable/bitops-bits-in-byte.js	36 -> 36 (0.000%)	2.039 -> 2.006 (1.618%)
./sunspider-1.0.2.stable/bitops-bitwise-and.js	28 -> 28 (0.000%)	2.01 -> 1.926 (4.179%)
./sunspider-1.0.2.stable/controlflow-recursive.js	212 -> 212 (0.000%)	1.697 -> 1.691 (0.354%)
./sunspider-1.0.2.stable/date-format-xparb.js	96 -> 96 (0.000%)	1.793 -> 1.791 (0.112%)
./sunspider-1.0.2.stable/math-cordic.js	40 -> 40 (0.000%)	2.385 -> 2.37 (0.629%)
./sunspider-1.0.2.stable/math-partial-sums.js	40 -> 40 (0.000%)	1.438 -> 1.435 (0.209%)
./sunspider-1.0.2.stable/math-spectral-norm.js	48 -> 48 (0.000%)	1.616 -> 1.572 (2.723%)
./sunspider-1.0.2.stable/string-fasta.js	52 -> 52 (0.000%)	15.453 -> 15.233 (1.424%)
Geometric mean	0%	2.5098%

egavrin · 2015-07-29T08:59:02Z

Please fix precommit

akosthekiss · 2015-07-29T11:40:34Z

jerry-core/parser/js/opcodes-dumper.h

+/**
+ * Operand (descriptor of value or reference in context of parser)
+ */
+class jsp_operand_t


Moving to C++ is still under discussion in #470 with no conclusion yet. This change would already start adding such features (i.e., classes).

@akiss77, I'll provide measurements for both versions (i.e. before switching to class, and just after) today.

I think, we should perform the measurements on engine code, rather on some external tests / benchmarks. Because of this, I prepared the patch with usage of classes.

So, we could see if usage of classes could be ok.

@akiss77, I've performed the measurements on the following versions (if I didn't missed anything, the versions are only different usage of struct / class for parser operand):

https://github.com/Samsung/jerryscript/compare/Ruben-class-usage-investigation-prev...Ruben-class-usage-investigation-new?expand=1

Performance was measured in --parse-only mode, as the changes involve only parser.
Benchmark, used for the test was generated by the following script:

rm -f bench.js for i in `seq 1 1 100` do echo "a = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 & (1 * a * b + o.p * q - 1 + 'str');" echo "if (a) {b = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 ^ (1 * a * b + o.p * q - 1 + 'str');}" echo "else { for (i = 0; i < 1000; i++) { c = 1 + 1 + 1 + 1 ? 1 * 1 * 1 : 1 | (1 * a * b + o.p * q - 1 + 'str'); } }" done >> bench.js

release.mcu_stm32f4-cp (gcc version 4.8.2 (4.8.2-14ubuntu1+6)):

binary size

struct: 150700 bytes

class: 151012 bytes

release.linux (gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-16ubuntu4), hard float):

binary size (after strip):

struct: 190152 bytes

class: 190152 bytes

performance (every value in the sequence is a mean by 5 consecutive measurements):

struct:

3.332

3.328

3.334

3.34

3.334

class:

3.331

3.334

3.33

3.337

3.334

On mcu version binary size of build with class-based implementation is greater on 312 bytes.
Seems that this is caused by extra inline of a functions in build with C++ class-based implementation, relatively to build with C struct-based implementation.

In the performance measurements above share of parse_* functions was very small - about 0.01% or less.

To increase share of parse_* functions, I've added some performance optimizations (for heap allocator and linked list collection) to both builds.
On the new versions I've got the following (same environment; each value is mean of 20 measurements):

struct:

0.1445

0.14475

0.14475

0.14475

0.14475

class:

0.14525

0.145

0.14475

0.145

0.14525

So, time of build that uses class implementation is slower on about 0.2%.

However, the following is part of performance profile, consisting of first two parse_*expression functions in each profile, collected using perf record utility with frequency argument -F 50000:

struct:

parse_member_expression: 0.36%

parse_unary_expression: 0.27%

class:

parse_member_expression: 0.35% (a bit less)

parse_unary_expression: 0.27% (same)

Considering the profile, we can see that actually reduce of performance is not relevant to change of parse_*expression functions' performance.

For example, in the same profile:

struct:

serializer_dump_op_meta: 0.22%

generate_instr: 0.29%

current_locus: 0.22%

linked_list_element: 12.86%

memcpy: 5.23%

class:

serializer_dump_op_meta: 0.16%

generate_instr: 0.37%

current_locus: 0.49%

linked_list_element: 13.33%

memcpy: 5.22%

Considering the above, in the case the C++-feature itself doesn't involve binary size or performance.

For me it seems that, the C++-feature itself doesn't involve binary size of performance in general case too, and so can be used, with accurate consideration of consequences in each case (as for any other way of implementation).

In other words, there definitely can be some cases, in which performance is reduced because of using some feature, but it seems to me that the cases should be considered in the feature-independent way, i.e. by analysing actual origin of the reduce.

Concerning classes, I don't see any "physical" difference between basic classes and structures. For me, it seems to be just different way to declare the same behaviour.
Actual difference can occur, for example, during implementation of some default operations (like default constructor) that would add unnecessary operations in some cases, in which the operations are not optimized out. The cases should definitely be considered very accurately.

@akiss77, @zherczeg, @egavrin, @sand1k, could you, please, share your opinion about this?

ruben-ayrapetyan · 2015-07-29T21:12:11Z

@egavrin, precommit fixed.

egavrin · 2015-07-30T07:46:44Z

This change is OK for me. I don't see any changes in binary size, memory consumption or performance. Additionally, classes are already used in other parts of the engine. Anyway, significant binary size reduction should be done other way, not by limiting C++ -> C.

egavrin · 2015-07-31T05:18:20Z

@akiss77 @zherczeg can we merge this?

akosthekiss · 2015-07-31T09:55:23Z

It's hard to object against the current patch because the measurements don't show big differences. However:

The code size increase, however small in this case, only cca. 300 bytes, shows that we don't have everything under control as much as we have (had) with C (i.e., inlining in this case). This can escalate the more we start using C++ features.
The other concern of mine is more subtle: should we go for classes, we will be left with no other option but try/catch exception handling -- setjmp/longjmp is a no-no in the presence of objects with constructors/destructors. However, we should not want to introduce the bloat that comes with EH frames, should we? (BTW, we are using setjmp/longjmp in the parser!) So, this current change with seemingly no big effect may be limiting our options in the future.

So, my vote is still against classes, exceptions, and C++ in general -- unless size is not that important anymore.

That being said, that's just my two cents. I have no formal LGTM right, so no formal protest right either. (But I do protest informally :) )

ruben-ayrapetyan · 2015-07-31T10:16:02Z

It's hard to object against the current patch because the measurements don't show big differences.
However:
The code size increase, however small in this case, only cca. 300 bytes, shows that we don't have >everything under control as much as we have (had) with C (i.e., inlining in this case). This can escalate the >more we start using C++ features.

@akiss77, I'm not sure that inlining is controlled less or more, depending on C or C++. Is it so?

I think that in the case inlining can be evidence of more efficient code generation: usually, inline algorithns use some heuristics, like size of function to inline, etc. So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined. Can't it be the cause?

The other concern of mine is more subtle: should we go for classes, we will be left with no other option >but > try/catch exception handling -- setjmp/longjmp is a no-no in the presence of objects with >constructors/destructors.

It seems to me, that if we don't use classes with constructors / destructors, then we should emulate the constructors / desctructors mechanism manually, and calling them somehow upon longjmp, manually unwinding stack.
I think that in the case, language built-in feature could be more efficient than manual implementation of the same behaviour, achieved with same level of effort.
However, by switching to classes with constructors / destructors, we don't lose possibility to implement more efficient own exception handling mechanism - we just need to call appropriate constructors / destructors ourself upon exceptions, the same way as we would call another per-stack-frame handlers in the case. So, exception handling mechanism can be left under discussion for now.

So, my vote is still against classes, exceptions, and C++ in general -- unless size is not that important anymore.
I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently. On the other hand, any other language also can be used in a way, that disallows achieving of enough performance / binary size / memory consumption level.

What do you think about this?

akosthekiss · 2015-07-31T11:01:56Z

So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined.

Well, we have release builds here, which implies -Os. Then, the goal of the compiler should be optimization for size, which almost always contradicts with function inlining. In this current case it certainly did not shrink the size, however. That's bad in my eyes.

we don't lose possibility to implement more efficient own exception handling mechanism

You mean implementing a totally self-invented exception handling solution? How could we make the compiler know about it and emit code that conforms to our mechanism?

I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently.

Eventhough that's picky, the above 300 bytes code increase does not underpin that completely. (Don't misunderstand me, I use C++ often in desktop environment, and I'm fairly content with its performance and convenience. But that's normally the -O2 and -O3 "world". I haven't been convinced yet that C++ is really the language to use in embedded, unless we restrict it so much that it becomes Clean C.)

On the other hand, any other language also can be used in a way, that disallows achieving of enough performance / binary size / memory consumption level.

That's true, but not the other way around, though.

ruben-ayrapetyan · 2015-07-31T12:02:19Z

So, in the case, maybe, code of some function became a little bit smaller after changing to classes, and so, the function was inlined.

Well, we have release builds here, which implies -Os. Then, the goal of the compiler should be optimization for size, which almost always contradicts with function inlining. In this current case it certainly did not shrink the size, however. That's bad in my eyes.

I agree that gcc / g++ with -Os doesn't actually output minimal possible code (I observed many cases of possible reduce of code size). However, I don't see any evidence that switching between C and C++ changes anything from this view point.

we don't lose possibility to implement more efficient own exception handling mechanism

You mean implementing a totally self-invented exception handling solution? How could we make the compiler know about it and emit code that conforms to our mechanism?

Actually, in the same way as we can do it in C. As C++ compiler has native mechanism of exception handling, and C compiler - doesn't have, I would prefer to use C++ compiler.

I think that C++ can be used in a way, that allows achieving the same performance / binary size and memory consumption, more conveniently.

Eventhough that's picky, the above 300 bytes code increase does not underpin that completely. (Don't misunderstand me, I use C++ often in desktop environment, and I'm fairly content with its performance and convenience. But that's normally the -O2 and -O3 "world". I haven't been convinced yet that C++ is really the language to use in embedded, unless we restrict it so much that it becomes Clean C.)

As I wrote above, I agree that gcc / g++ with -Os doesn't actually output minimal possible code, but I see no evidence that switching between C and C++ influences code size, in average.

@akiss77, @zherczeg, thank you very much for sharing your opinion.
Don't you mind landing this specific patch, considering provided measurements?

akosthekiss · 2015-07-31T12:20:02Z

Since we don't agree completely on all points, I humbly propagate the chance of giving a +1 LGTM to someone else.

zherczeg · 2015-08-03T12:47:42Z

jerry-core/parser/js/opcodes-dumper.h

+
+    return _data.idx_const;
+  } /* get_idx_const */
+private:


I would put a newline before private.

zherczeg · 2015-08-03T12:51:12Z

Looks good to me (we have our concerns about C++ but we don't want to slow you down). We should introduce style guides for C++ code.

egavrin · 2015-08-03T20:37:26Z

@zherczeg thank you!

@ruben-ayrapetyan rebase to master and make push

ruben-ayrapetyan · 2015-08-04T12:03:32Z

jerry-core/parser/js/opcodes-dumper.h

+   * @return constructed operand
+   */
+  static jsp_operand_t
+  make_idx_const_operand (vm_idx_t cnst) /**< integer in vm_idx_t range */


cnst -> idx_const

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

… types to enum defined in the class. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

…ALUE -> VM_IDX_EMPTY / VM_IDX_REWRITE_GENERAL_CASE, LITERAL_TO_REWRITE -> VM_IDX_REWRITE_LITERAL_UID. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

…e dumper, by extracting them to several separate routines that can be used for most cases, remove getop_* routines from vm. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

ruben-ayrapetyan added enhancement An improvement parser Related to the JavaScript parser development Feature implementation labels Jul 28, 2015

ruben-ayrapetyan assigned egavrin Jul 28, 2015

ruben-ayrapetyan added this to the Engine optimization & enhancement milestone Jul 28, 2015

ruben-ayrapetyan changed the title ~~Enhancement of parser operand handling and compaction of instruction dumper~~ Enhancement of parser operand handling, compaction of instruction dumper Jul 28, 2015

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from cb8bec7 to b7f9faa Compare July 28, 2015 20:41

akosthekiss reviewed Jul 29, 2015
View reviewed changes

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from b7f9faa to 438036a Compare July 29, 2015 21:04

egavrin assigned zherczeg and unassigned egavrin Jul 30, 2015

ruben-ayrapetyan assigned egavrin and unassigned zherczeg Jul 31, 2015

zherczeg reviewed Aug 3, 2015
View reviewed changes

jerry-core/parser/js/opcodes-dumper.h

return _data.idx_const;

} /* get_idx_const */

private:

Copy link

Member

zherczeg Aug 3, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a newline before private.

egavrin assigned ruben-ayrapetyan and unassigned egavrin Aug 4, 2015

ruben-ayrapetyan reviewed Aug 4, 2015
View reviewed changes

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 2 times, most recently from c4fbe97 to f54547a Compare August 12, 2015 13:29

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 4 times, most recently from 8af9143 to a416184 Compare August 19, 2015 11:53

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch 2 times, most recently from e1116a1 to c80704c Compare August 27, 2015 18:24

ruben-ayrapetyan assigned egavrin and unassigned ruben-ayrapetyan Aug 27, 2015

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from c80704c to 9f2c803 Compare August 28, 2015 11:44

ruben-ayrapetyan added 2 commits August 28, 2015 15:35

Fix print of a 'number' literal.

b0276b3

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

Convert parser operand structure to jsp_operand_t class, move operand…

0111a73

… types to enum defined in the class. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from 9f2c803 to 28ffadd Compare August 28, 2015 12:35

ruben-ayrapetyan added 3 commits August 28, 2015 16:18

Rename idx_t -> vm_idx_t, opcode_special_reg_t -> vm_reg_t, INVALID_V…

173becc

…ALUE -> VM_IDX_EMPTY / VM_IDX_REWRITE_GENERAL_CASE, LITERAL_TO_REWRITE -> VM_IDX_REWRITE_LITERAL_UID. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

Remove raw_instr.

cbdc48a

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

Reduce number of operand type handling conditional blocks in byte-cod…

a00079e

…e dumper, by extracting them to several separate routines that can be used for most cases, remove getop_* routines from vm. JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]

ruben-ayrapetyan force-pushed the Ruben-compaction-of-opcodes-dumper branch from 28ffadd to a00079e Compare August 28, 2015 13:22

ruben-ayrapetyan merged commit a00079e into master Aug 28, 2015

ruben-ayrapetyan deleted the Ruben-compaction-of-opcodes-dumper branch August 28, 2015 13:57

zherczeg mentioned this pull request Apr 7, 2016

Nominating Akos Kiss (akiss77) for JerryScript Maintainer status #992

Closed

somang-park unassigned egavrin Nov 25, 2016

Enhancement of parser operand handling, compaction of instruction dumper #488

Enhancement of parser operand handling, compaction of instruction dumper #488

Uh oh!

Conversation

ruben-ayrapetyan commented Jul 28, 2015

Uh oh!

egavrin commented Jul 29, 2015

Uh oh!

akosthekiss Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

ruben-ayrapetyan Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

ruben-ayrapetyan Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

ruben-ayrapetyan commented Jul 29, 2015

Uh oh!

egavrin commented Jul 30, 2015

Uh oh!

egavrin commented Jul 31, 2015

Uh oh!

akosthekiss commented Jul 31, 2015

Uh oh!

ruben-ayrapetyan commented Jul 31, 2015

Uh oh!

akosthekiss commented Jul 31, 2015

Uh oh!

ruben-ayrapetyan commented Jul 31, 2015

Uh oh!

akosthekiss commented Jul 31, 2015

Uh oh!

zherczeg Aug 3, 2015

Choose a reason for hiding this comment

Uh oh!

zherczeg commented Aug 3, 2015

Uh oh!

egavrin commented Aug 3, 2015

Uh oh!

ruben-ayrapetyan Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

Uh oh!