Skip to content

fix bug in vm call_stack_size calculation #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 15, 2016

Conversation

jiangzidong
Copy link
Contributor

call_stack_size = args_p->register_end + args_p->stack_limit

While in parser_post_processing,
it already has args_p->stack_limit = (uint8_t) (context_p->register_count + context_p->stack_limit);

So context_p->register_count is added twice in call_stack_size. I think it is a bug (leads vm easily choose vm_run_with_alloca )

@zherczeg
Copy link
Member

Wow, nice catch. I overlooked this. Thank you.

LGTM

@zherczeg
Copy link
Member

It is a good question what would we gain if we would not add them together in the parser, since we could cover more functions in the "byte" range. Some statistics would be useful for that.

@jiangzidong
Copy link
Contributor Author

@zherczeg Yes, it will save the size of snapshot headers. I didn't notice that before. I will update the patch and do some memory test.
BTW, is there any convenient tools to create the table in #1271 (which has the heap info)

@LaszloLango
Copy link
Contributor

@jiangzidong, we have an internal performance measurement tool with a dedicated device. Just update the PR and write a comment here or ping me on IRC and I will do it.

@LaszloLango LaszloLango added the bug Undesired behaviour label Sep 13, 2016
@jiangzidong
Copy link
Contributor Author

@LaszloLango @zherczeg I updated the patch, and choose to add in the vm instead of parser.
The result of run-perf-test seems same.

                                     Benchmark |          RSS(+ is better) |                   Perf(+ is better)
                                    3d-cube.js |   36k ->   36k :  +0.000% |   0.080s ->   0.080s :  +0.000%  (+-5.824%) : [≈]
                        access-binary-trees.js |   36k ->   36k :  +0.000% |   0.050s ->   0.051s :  -2.000% (+-11.689%) : [≈]
                            access-fannkuch.js |   24k ->   24k :  +0.000% |   0.174s ->   0.175s :  -0.574%  (+-5.383%) : [≈]
                               access-nbody.js |   28k ->   28k :  +0.000% |   0.091s ->   0.092s :  -1.838% (+-15.061%) : [≈]
                   bitops-3bit-bits-in-byte.js |   20k ->   20k :  +0.000% |   0.052s ->   0.054s :  -3.822%  (+-9.112%) : [≈]
                        bitops-bits-in-byte.js |   20k ->   20k :  +0.000% |   0.082s ->   0.081s :  +1.225% (+-11.335%) : [≈]
                         bitops-bitwise-and.js |   20k ->   20k :  +0.000% |   0.070s ->   0.071s :  -1.914% (+-24.035%) : [≈]
                      controlflow-recursive.js |   48k ->   48k :  +0.000% |   0.035s ->   0.035s :  -1.923% (+-13.626%) : [≈]
                                math-cordic.js |   20k ->   20k :  +0.000% |   0.097s ->   0.096s :  +1.370% (+-10.645%) : [≈]
                          math-partial-sums.js |   20k ->   20k :  +0.000% |   0.056s ->   0.055s :  +1.775%  (+-8.232%) : [≈]
                         math-spectral-norm.js |   20k ->   20k :  +0.000% |   0.047s ->   0.046s :  +1.429%  (+-6.988%) : [≈]
                               string-fasta.js |   32k ->   32k :  +0.000% |   0.112s ->   0.110s :  +1.195% (+-12.181%) : [≈]
                               Geometric mean: |     RSS reduction: 0.000% |                Speed up: -0.408% (+-3.483%) : [≈]

@jiangzidong
Copy link
Contributor Author

jiangzidong commented Sep 13, 2016

I constructed the following case to show the benefits


function func2() {
  print(arguments.length);
}

function func1() {
//60 registers
  var a1,a2,a3,a4,a5,a6,a7,a8,a9,a0;
  var b1,b2,b3,b4,b5,b6,b7,b8,b9,b0;
  var c1,c2,c3,c4,c5,c6,c7,c8,c9,c0;
  var d1,d2,d3,d4,d5,d6,d7,d8,d9,d0;
  var e1,e2,e3,e4,e5,e6,e7,e8,e9,e0;
  var f1,f2,f3,f4,f5,f6,f7,f8,f9,f0;

//200 arguments (stack depth)
func2(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
}

func1()

The func1 context has 60 registers and 200 stacklimit, and the snapshot size (by save-snapshot) reduced from 924 to 916, after applying the patch

@zherczeg
Copy link
Member

It seems the benefit is very little. We need quite a big function to have a gain, but the saving on the header is negligible on large functions.

This proof makes me prefer the original patch. But I would like to hear others opinion (including you, @jiangzidong). What would you prefer?

@LaszloLango
Copy link
Contributor

FYI:

Benchmark Stack (bytes) RSS (bytes)
3d-cube.js 18016 -> 17912 : +0.577% 57344 -> 57344 : 0.000%
3d-raytrace.js 3716 -> 3412 : +8.181% 217088 -> 217088 : 0.000%
access-binary-trees.js 4404 -> 4404 : 0.000% 53248 -> 53248 : 0.000%
access-fannkuch.js 2140 -> 2068 : +3.364% 24576 -> 24576 : 0.000%
access-nbody.js 2252 -> 2228 : +1.066% 28672 -> 28672 : 0.000%
bitops-3bit-bits-in-byte.js 1992 -> 1992 : 0.000% 20480 -> 16384 : +20.000%
bitops-bits-in-byte.js 1992 -> 1992 : 0.000% 20480 -> 16384 : +20.000%
bitops-bitwise-and.js 1768 -> 1768 : 0.000% 20480 -> 16384 : +20.000%
bitops-nsieve-bits.js 1992 -> 1992 : 0.000% 155648 -> 155648 : 0.000%
controlflow-recursive.js 78584 -> 78584 : 0.000% 98304 -> 94208 : +4.167%
crypto-aes.js 3496 -> 3424 : +2.059% 94208 -> 86016 : +8.696%
crypto-md5.js 2740 -> 2652 : +3.212% 167936 -> 163840 : +2.439%
crypto-sha1.js 2240 -> 2224 : +0.714% 118784 -> 110592 : +6.897%
date-format-tofte.js 2840 -> 2832 : +0.282% 45056 -> 40960 : +9.091%
date-format-xparb.js 3120 -> 3120 : 0.000% 49152 -> 40960 : +16.667%
math-cordic.js 2020 -> 2004 : +0.792% 24576 -> 20480 : +16.667%
math-partial-sums.js 1944 -> 1944 : 0.000% 20480 -> 16384 : +20.000%
math-spectral-norm.js 2324 -> 2308 : +0.688% 24576 -> 24576 : 0.000%
string-base64.js 2152 -> 2120 : +1.487% 147456 -> 143360 : +2.778%
string-fasta.js 1976 -> 1976 : 0.000% 32768 -> 32768 : 0.000%
Geometric mean: +1.140% +7.736%

Binary sizes (bytes)
baeb83d:154364
a599e5b:156756

Runtime is not changed by this PR.

@jiangzidong
Copy link
Contributor Author

@zherczeg I prefer the current one.

Original patch: fix the bug
Current patch: fix the bug + optimize the size of snapshot header

Though the benefits of size is negligible in my case, it didn't bring any side effects.

@LaszloLango
Copy link
Contributor

Both solution is good to me. If we choose snapshot header optimization, then the snapshot version should be increased.

@jiangzidong
Copy link
Contributor Author

Oh, I forget the issue of snapshot version and compatibility.

@zherczeg
Copy link
Member

OK, lets use the current one. LGTM

@@ -1450,7 +1450,7 @@ parser_post_processing (parser_context_t *context_p) /**< context */
needs_uint16_arguments = false;
total_size = sizeof (cbc_uint8_arguments_t);

if ((context_p->register_count + context_p->stack_limit) > CBC_MAXIMUM_BYTE_VALUE
if (context_p->stack_limit > CBC_MAXIMUM_BYTE_VALUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we should check the register_count as well:

|| context_p->register_count > CBC_MAXIMUM_BYTE_VALUE

It is true that PARSER_MAXIMUM_NUMBER_OF_REGISTERS is 256 at this moment, but it can be changed.

@zherczeg
Copy link
Member

And please also add the snapshot version number increase.

@jiangzidong
Copy link
Contributor Author

updated, thanks

call_stack_size should be register_count + maximum stack depth

*We don't add in the parser to save the size of snapshot header
*jerry_snapshot_version 5 -> 6

JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected]

JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected]
@LaszloLango
Copy link
Contributor

LGTM

@LaszloLango LaszloLango merged commit 86ac644 into jerryscript-project:master Sep 15, 2016
bsdelf pushed a commit to bsdelf/jerryscript that referenced this pull request Sep 18, 2016
call_stack_size should be register_count + maximum stack depth

* We don't add in the parser to save the size of snapshot header
* jerry_snapshot_version 5 -> 6

JerryScript-DCO-1.0-Signed-off-by: Zidong Jiang [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Undesired behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants