-
Notifications
You must be signed in to change notification settings - Fork 684
[API] Introduce jerry_parse_and_save_literals() #1500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return str_size; | ||
} | ||
|
||
/* Move the pointer behind the buffer to prevent further write */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to prevent further writes.
{ | ||
lit_utf8_size_t bytes_copied = 0; | ||
|
||
JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ECMA_STRING_TO_UTF8_STRING here?
lit_utf8_size_t str_size = ecma_string_get_size (string_p); | ||
bool result = false; | ||
|
||
JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto?
jerry_init (JERRY_INIT_EMPTY); | ||
|
||
static jerry_char_t literal_buffer_c[256]; | ||
static const char *code_for_c_format_p = "var object = { aa:'min', Bb:'max' };"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a
aaa:'xzy0'
true); | ||
TEST_ASSERT (literal_sizes_c_format == 153); | ||
|
||
static const jerry_char_t expected_c_format[256] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we align it after the = sign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had aligned in the mentioned format, but the Vera complains about, but it pass when enclosed into parentheses, thanks.
56860c5
to
d12afe3
Compare
I've updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good patch, but some extra changes are needed.
|
||
- `source_p` - script source, it must be a valid utf8 string. | ||
- `source_size` - script source size, in bytes. | ||
- `is_strict` - strict mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other sentences stops with dot. Please add it here
|
||
/** | ||
* ====================== Functions for literal saving ========================== | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are other sections have these ====================== ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at jerry.c:1867.
{ | ||
jerry_save_literals_sort_helper (src, dest, begin, mid); | ||
jerry_save_literals_sort_helper (src, dest, mid + 1, end); | ||
jerry_save_literals_merging_helper (src, dest, begin, mid, end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: is there a way to avoid recursion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw heapsort does not need recuresive implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in case of merge sort is not possible. So you advice to use a heapsort algorithm instead of the current merge sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complicated algorithm for an array of data:
* @return number of bytes, actually copied to the buffer. | ||
*/ | ||
static lit_utf8_size_t | ||
append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these functions have no jerry_ prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this file there are other 'helper' functions without jerry_
prefix (e.g. snapshot_add_compiled_code
), but ok, I'll prefix them.
total_bytes, | ||
(const char *) uint32_to_str_buffer, | ||
utf8_str_size); | ||
return bytes_copied; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to create a variable, just return with the value.
} | ||
else | ||
{ | ||
result = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result is initialized to false above
* which are valid identifiers and none of them are magic string. | ||
* | ||
* @return size of the literal-list in bytes, at most equal to the buffer size, | ||
* if the source parsed successfully and the list of the literals isn't empty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space in alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At jerry_parse_and_save_snapshot
there are the same style, which one is nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No extra space. The other is probably a style error.
} | ||
|
||
ecma_string_t *literal_array[literal_count]; | ||
ecma_string_t *literal_array_sorted[literal_count]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change these to real allocation, since stack consumption might be too big.
(const char *) "jerry_length_t literal_count = ", | ||
0); | ||
|
||
buffer_p += append_number_to_buffer (buffer_p, buffer_end_p, &total_bytes, literal_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think total_bytes is a redundant argument since it can be computed as buffer_p-buffer_start_p. Passing less arguments is generally faster. Just save the buffer_p into a buffer_start_p.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function could also return with the new buffer_p instead of the size, which would also reduce the code size, since there is no preserving the original value and an extra addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the buffer is insufficient, we cannot determine the exact amount of the used bytes just from the pointers. But if we don't want to save anything in the mentioned case, we can omit this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, meanwhile I've removed this argument.
@@ -354,6 +356,10 @@ main (int argc, | |||
bool is_save_snapshot_mode_for_global_or_eval = false; | |||
const char *save_snapshot_file_name_p = NULL; | |||
|
|||
bool is_save_literals_mode = false; | |||
bool is_save_literals_mode_in_c_format_or_list = false; | |||
const char *save_literals_file_name_p = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: can snapshot and save literals be done in one step? It is great if it is possible. Othoerwise only one file name parameter is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be done in one step.
After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
5bb89d9
to
53480a2
Compare
return buffer_p; | ||
} | ||
|
||
const lit_utf8_size_t str_size = (string_size == 0) ? (lit_utf8_size_t) strlen (chars) : string_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jerry_append_chars_to_buffer is good now. Just one more comment. What about:
if (string_size == 0)
{
string_size = (lit_utf8_size_t) strlen (chars);
}
This way we don't need another variable.
{ | ||
return ecma_compare_ecma_strings_relational (literal1, literal2); | ||
} | ||
else if (lit1_size < lit2_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
return (lit1_size < lit2_size);
We don't need several ifs.
break; | ||
} | ||
|
||
ecma_string_t *swap = literals[node_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swap_p or tmp_p
{ | ||
const lit_utf8_size_t last_idx = num_of_literals - lit_idx - 1; | ||
|
||
ecma_string_t *swap = literals[last_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
} /* jerry_save_literals_down_heap */ | ||
|
||
/** | ||
* Helper function for a heapsort algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of this algorithm. No need any large stack.
|
||
if (buffer_p + str_size <= buffer_end_p) | ||
{ | ||
strncpy ((char *) buffer_p, chars, str_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use memcpy. The strncpy stops if it encounters a 0, which isn't possible at this moment (string is identifier), but we might add other rules in the future, and this copy will be a difficult to debug.
|
||
/* Move the pointer behind the buffer to prevent further writes. */ | ||
/* Distance from the buffer represents the unused bytes. */ | ||
return (buffer_end_p + (buffer_end_p - buffer_p)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??? Distance from the buffer represents the unused bytes ??? - I don't understand this comment.
Isn't it represent the needed bytes of the last append?
What about simply return with:
return buffer_end_p + 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the buffer_end_p - buffer_p
represents the remaining bytes in the buffer.
When we encounter a string which doesn't fit into the available space
we move the buffer_p
after the buffer_end_p
.
For the saving process, we need to know the exact amount of used bytes in the buffer and
we can compute it easily when we shifted the buffer_p
according to the unused bytes,
see jerry-snapshot.c:1049
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we move the buffer_p after the buffer_end_p
return buffer_end_p + 1;
This exactly do this. I don't really understand the last sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the effect is the same, but how many bytes would you save when the buffer is insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the purpose of the computation, please explain it to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shift the buffer pointer by the number of the unused bytes in the buffer instead of 1,
thus we can determine the number of the used bytes at the end of jerry_parse_and_save_literals()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, but that is why I suggested returning with buffer_end_p + 1, and not buffer_p + 1. So we get a pointer which is right after the end of the buffer. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we return with 0 if the buffer is too small, we have no plans to return with the required number of bytes.
jerry_append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */ | ||
uint8_t *buffer_end_p, /**< the end of the buffer */ | ||
const char *chars, /**< string */ | ||
lit_utf8_size_t string_size) /**< string size */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks better without the extra argument :)
53480a2
to
25df93b
Compare
|
||
**Summary** | ||
|
||
Collect and save the used literals from the specified source code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how it works. Please add more description here.
* false - otherwise. | ||
*/ | ||
static bool | ||
ecma_string_is_valid_identifier (ecma_string_t *string_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ecma_string_t *string_p
jerry_parse_and_save_literals (const jerry_char_t *source_p, /**< script source */ | ||
size_t source_size, /**< script source size */ | ||
bool is_strict, /**< strict mode */ | ||
uint8_t *buffer_p, /**< buffer to save literals to */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint8_t *buffer_p, /**< [out] buffer to save literals to */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think out_buffer_p or result_buffer_p would be better names (Only my personal preference)
JERRY_BUFFER_SIZE); | ||
if (snapshot_size == 0) | ||
{ | ||
ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't leave the message empty. (I know it was empty before.)
is_save_literals_mode_in_c_format_or_list); | ||
if (literal_buffer_size == 0) | ||
{ | ||
ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
{ | ||
FILE *literal_file_p = fopen (save_literals_file_name_p, "w"); | ||
fwrite (snapshot_save_buffer, sizeof (uint8_t), literal_buffer_size, literal_file_p); | ||
fclose (literal_file_p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add this file saving code the example source code of the jerry_parse_and_save_literal
function in API docs.
literal_buffer_c, | ||
sizeof (literal_buffer_c), | ||
true); | ||
TEST_ASSERT (literal_sizes_c_format == 203); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
203? why is it so big for such a small example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the size of the C-style format, please look 3 lines below for the expected result.
After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent patch. A couple of small fixes, and you can land it.
*/ | ||
|
||
/** | ||
* Compare two ecma_string by size, then lexicographically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ecma_strings
} /* jerry_save_literals_heap_max */ | ||
|
||
/** | ||
* Helper function for a heapsort algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the heapsort
} /* jerry_save_literals_compare */ | ||
|
||
/** | ||
* Helper function for a heapsort algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the heapsort
|
||
ECMA_STRING_TO_UTF8_STRING (string_p, str_buffer_p, str_buffer_size); | ||
|
||
/* Append the string to the buffer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot after sentence.
ecma_lit_storage_item_t *string_list_p = JERRY_CONTEXT (string_list_first_p); | ||
lit_utf8_size_t literal_count = 0; | ||
|
||
/* Count the valid and non-magic identifiers in the list */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot after sentence.
ecma_string_t *literal_p = JMEM_CP_GET_NON_NULL_POINTER (ecma_string_t, | ||
string_list_p->values[i]); | ||
/* We don't save a literal which isn't a valid identifier | ||
or it's a magic string */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot after sentence.
/* We don't save a literal which isn't a valid identifier | ||
or it's a magic string */ | ||
if (ecma_string_is_valid_identifier (literal_p) | ||
&& ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT first.
string_list_p->values[i]); | ||
|
||
if (ecma_string_is_valid_identifier (literal_p) | ||
&& ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
string_list_p = JMEM_CP_GET_POINTER (ecma_lit_storage_item_t, string_list_p->next_cp); | ||
} | ||
|
||
/* Sort the strings by size at first, then lexicographically */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add dots after each sentence.
if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE)) | ||
{ | ||
is_save_literals_mode = true; | ||
is_save_literals_mode_in_c_format_or_list = !strcmp ("--save-literals-c-format", argv[i - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply check the length, since the two string lengths are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR according to your review, but there is another occurence the mentioned comparison at main-unix.c:445 in save-snapshot section. It could be modified in a follow-up patch.
if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE)) | ||
{ | ||
is_save_literals_mode = true; | ||
is_save_literals_mode_in_c_format_or_list = strlen ("--save-literals-c-format") == strlen (argv[i - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is looks less nicely. Please revert to the original, this is not time critical. Sorry for the extra work.
This function can be used to save literals into a specific file in a list or C format. These literals are valid identifiers, and doesn't match to any magic-string. The '--save-literals-list-format FILE' and '--save-literals-c-format FILE' options are used to save into the given file, when the snapshot-save is enabled. The saved literals are sorted by size and lexicographically. The C-format is useful for jerry_register_magic_strings() to generate the array of external magic strings. JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
This function can be used to save literals into a specific file in a list or C format.
These literals are valid identifiers, and doesn't match to any magic-string.
The '--save-literals-list-format FILE' and '--save-literals-c-format FILE'
options are used to save into the given file, when the snapshot-save is enabled.
The saved literals are sorted by size and lexicographically.
The C-format is useful for jerry_register_magic_strings() to generate the array
of external magic strings.
JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]