fix bug in vm call_stack_size calculation #1345

jiangzidong · 2016-09-13T02:41:47Z

call_stack_size = args_p->register_end + args_p->stack_limit

While in parser_post_processing,
it already has args_p->stack_limit = (uint8_t) (context_p->register_count + context_p->stack_limit);

So context_p->register_count is added twice in call_stack_size. I think it is a bug (leads vm easily choose vm_run_with_alloca )

zherczeg · 2016-09-13T03:52:39Z

Wow, nice catch. I overlooked this. Thank you.

LGTM

zherczeg · 2016-09-13T03:55:14Z

It is a good question what would we gain if we would not add them together in the parser, since we could cover more functions in the "byte" range. Some statistics would be useful for that.

jiangzidong · 2016-09-13T05:12:10Z

@zherczeg Yes, it will save the size of snapshot headers. I didn't notice that before. I will update the patch and do some memory test.
BTW, is there any convenient tools to create the table in #1271 (which has the heap info)

LaszloLango · 2016-09-13T06:21:46Z

@jiangzidong, we have an internal performance measurement tool with a dedicated device. Just update the PR and write a comment here or ping me on IRC and I will do it.

jiangzidong · 2016-09-13T07:35:28Z

@LaszloLango @zherczeg I updated the patch, and choose to add in the vm instead of parser.
The result of run-perf-test seems same.

                                     Benchmark |          RSS(+ is better) |                   Perf(+ is better)
                                    3d-cube.js |   36k ->   36k :  +0.000% |   0.080s ->   0.080s :  +0.000%  (+-5.824%) : [≈]
                        access-binary-trees.js |   36k ->   36k :  +0.000% |   0.050s ->   0.051s :  -2.000% (+-11.689%) : [≈]
                            access-fannkuch.js |   24k ->   24k :  +0.000% |   0.174s ->   0.175s :  -0.574%  (+-5.383%) : [≈]
                               access-nbody.js |   28k ->   28k :  +0.000% |   0.091s ->   0.092s :  -1.838% (+-15.061%) : [≈]
                   bitops-3bit-bits-in-byte.js |   20k ->   20k :  +0.000% |   0.052s ->   0.054s :  -3.822%  (+-9.112%) : [≈]
                        bitops-bits-in-byte.js |   20k ->   20k :  +0.000% |   0.082s ->   0.081s :  +1.225% (+-11.335%) : [≈]
                         bitops-bitwise-and.js |   20k ->   20k :  +0.000% |   0.070s ->   0.071s :  -1.914% (+-24.035%) : [≈]
                      controlflow-recursive.js |   48k ->   48k :  +0.000% |   0.035s ->   0.035s :  -1.923% (+-13.626%) : [≈]
                                math-cordic.js |   20k ->   20k :  +0.000% |   0.097s ->   0.096s :  +1.370% (+-10.645%) : [≈]
                          math-partial-sums.js |   20k ->   20k :  +0.000% |   0.056s ->   0.055s :  +1.775%  (+-8.232%) : [≈]
                         math-spectral-norm.js |   20k ->   20k :  +0.000% |   0.047s ->   0.046s :  +1.429%  (+-6.988%) : [≈]
                               string-fasta.js |   32k ->   32k :  +0.000% |   0.112s ->   0.110s :  +1.195% (+-12.181%) : [≈]
                               Geometric mean: |     RSS reduction: 0.000% |                Speed up: -0.408% (+-3.483%) : [≈]

jiangzidong · 2016-09-13T07:41:28Z

I constructed the following case to show the benefits


function func2() {
  print(arguments.length);
}

function func1() {
//60 registers
  var a1,a2,a3,a4,a5,a6,a7,a8,a9,a0;
  var b1,b2,b3,b4,b5,b6,b7,b8,b9,b0;
  var c1,c2,c3,c4,c5,c6,c7,c8,c9,c0;
  var d1,d2,d3,d4,d5,d6,d7,d8,d9,d0;
  var e1,e2,e3,e4,e5,e6,e7,e8,e9,e0;
  var f1,f2,f3,f4,f5,f6,f7,f8,f9,f0;

//200 arguments (stack depth)
func2(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
}

func1()

The func1 context has 60 registers and 200 stacklimit, and the snapshot size (by save-snapshot) reduced from 924 to 916, after applying the patch

zherczeg · 2016-09-13T08:50:51Z

It seems the benefit is very little. We need quite a big function to have a gain, but the saving on the header is negligible on large functions.

This proof makes me prefer the original patch. But I would like to hear others opinion (including you, @jiangzidong). What would you prefer?

LaszloLango · 2016-09-13T09:14:10Z

FYI:

Benchmark	Stack (bytes)	RSS (bytes)
3d-cube.js	18016 -> 17912 : +0.577%	57344 -> 57344 : 0.000%
3d-raytrace.js	3716 -> 3412 : +8.181%	217088 -> 217088 : 0.000%
access-binary-trees.js	4404 -> 4404 : 0.000%	53248 -> 53248 : 0.000%
access-fannkuch.js	2140 -> 2068 : +3.364%	24576 -> 24576 : 0.000%
access-nbody.js	2252 -> 2228 : +1.066%	28672 -> 28672 : 0.000%
bitops-3bit-bits-in-byte.js	1992 -> 1992 : 0.000%	20480 -> 16384 : +20.000%
bitops-bits-in-byte.js	1992 -> 1992 : 0.000%	20480 -> 16384 : +20.000%
bitops-bitwise-and.js	1768 -> 1768 : 0.000%	20480 -> 16384 : +20.000%
bitops-nsieve-bits.js	1992 -> 1992 : 0.000%	155648 -> 155648 : 0.000%
controlflow-recursive.js	78584 -> 78584 : 0.000%	98304 -> 94208 : +4.167%
crypto-aes.js	3496 -> 3424 : +2.059%	94208 -> 86016 : +8.696%
crypto-md5.js	2740 -> 2652 : +3.212%	167936 -> 163840 : +2.439%
crypto-sha1.js	2240 -> 2224 : +0.714%	118784 -> 110592 : +6.897%
date-format-tofte.js	2840 -> 2832 : +0.282%	45056 -> 40960 : +9.091%
date-format-xparb.js	3120 -> 3120 : 0.000%	49152 -> 40960 : +16.667%
math-cordic.js	2020 -> 2004 : +0.792%	24576 -> 20480 : +16.667%
math-partial-sums.js	1944 -> 1944 : 0.000%	20480 -> 16384 : +20.000%
math-spectral-norm.js	2324 -> 2308 : +0.688%	24576 -> 24576 : 0.000%
string-base64.js	2152 -> 2120 : +1.487%	147456 -> 143360 : +2.778%
string-fasta.js	1976 -> 1976 : 0.000%	32768 -> 32768 : 0.000%
Geometric mean:	+1.140%	+7.736%

Binary sizes (bytes)
baeb83d:154364
a599e5b:156756

Runtime is not changed by this PR.

jiangzidong · 2016-09-13T10:37:05Z

@zherczeg I prefer the current one.

Original patch: fix the bug
Current patch: fix the bug + optimize the size of snapshot header

Though the benefits of size is negligible in my case, it didn't bring any side effects.

LaszloLango · 2016-09-13T10:46:38Z

Both solution is good to me. If we choose snapshot header optimization, then the snapshot version should be increased.

jiangzidong · 2016-09-13T11:10:39Z

Oh, I forget the issue of snapshot version and compatibility.

zherczeg · 2016-09-13T17:32:33Z

OK, lets use the current one. LGTM

zherczeg · 2016-09-15T04:21:14Z

jerry-core/parser/js/js-parser.c

@@ -1450,7 +1450,7 @@ parser_post_processing (parser_context_t *context_p) /**< context */
  needs_uint16_arguments = false;
  total_size = sizeof (cbc_uint8_arguments_t);

-  if ((context_p->register_count + context_p->stack_limit) > CBC_MAXIMUM_BYTE_VALUE
+  if (context_p->stack_limit > CBC_MAXIMUM_BYTE_VALUE


I just realized we should check the register_count as well:

|| context_p->register_count > CBC_MAXIMUM_BYTE_VALUE

It is true that PARSER_MAXIMUM_NUMBER_OF_REGISTERS is 256 at this moment, but it can be changed.

zherczeg · 2016-09-15T04:21:43Z

And please also add the snapshot version number increase.

jiangzidong · 2016-09-15T06:33:37Z

updated, thanks

call_stack_size should be register_count + maximum stack depth *We don't add in the parser to save the size of snapshot header *jerry_snapshot_version 5 -> 6 JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected] JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected]

LaszloLango · 2016-09-15T07:09:58Z

LGTM

call_stack_size should be register_count + maximum stack depth * We don't add in the parser to save the size of snapshot header * jerry_snapshot_version 5 -> 6 JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected]

LaszloLango added the bug Undesired behaviour label Sep 13, 2016

jiangzidong force-pushed the stacklimit branch from 4e227d1 to a599e5b Compare September 13, 2016 07:32

zherczeg reviewed Sep 15, 2016

View reviewed changes

jiangzidong force-pushed the stacklimit branch from a599e5b to 70bc512 Compare September 15, 2016 06:31

LaszloLango merged commit 86ac644 into jerryscript-project:master Sep 15, 2016

zherczeg mentioned this pull request Jan 19, 2017

Nominating Zidong Jiang (@jiangzidong) to become a JerryScript committer #1535

Closed

fix bug in vm call_stack_size calculation #1345

fix bug in vm call_stack_size calculation #1345

Uh oh!

Conversation

jiangzidong commented Sep 13, 2016

Uh oh!

zherczeg commented Sep 13, 2016

Uh oh!

zherczeg commented Sep 13, 2016

Uh oh!

jiangzidong commented Sep 13, 2016

Uh oh!

LaszloLango commented Sep 13, 2016

Uh oh!

jiangzidong commented Sep 13, 2016

Uh oh!

jiangzidong commented Sep 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zherczeg commented Sep 13, 2016

Uh oh!

LaszloLango commented Sep 13, 2016

Uh oh!

jiangzidong commented Sep 13, 2016

Uh oh!

LaszloLango commented Sep 13, 2016

Uh oh!

jiangzidong commented Sep 13, 2016

Uh oh!

zherczeg commented Sep 13, 2016

Uh oh!

zherczeg Sep 15, 2016

Choose a reason for hiding this comment

Uh oh!

zherczeg commented Sep 15, 2016

Uh oh!

jiangzidong commented Sep 15, 2016

Uh oh!

LaszloLango commented Sep 15, 2016

Uh oh!

Uh oh!

jiangzidong commented Sep 13, 2016 •

edited

Loading