Skip to content

[API] Introduce jerry_parse_and_save_literals() #1500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 11, 2017

Conversation

bzsolt
Copy link
Member

@bzsolt bzsolt commented Dec 21, 2016

This function can be used to save literals into a specific file in a list or C format.
These literals are valid identifiers, and doesn't match to any magic-string.
The '--save-literals-list-format FILE' and '--save-literals-c-format FILE'
options are used to save into the given file, when the snapshot-save is enabled.
The saved literals are sorted by size and lexicographically.
The C-format is useful for jerry_register_magic_strings() to generate the array
of external magic strings.

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

@bzsolt bzsolt added api Related to the public API feature request Requested feature snapshot Related to the snapshot feature labels Dec 21, 2016
return str_size;
}

/* Move the pointer behind the buffer to prevent further write */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prevent further writes.

{
lit_utf8_size_t bytes_copied = 0;

JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ECMA_STRING_TO_UTF8_STRING here?

lit_utf8_size_t str_size = ecma_string_get_size (string_p);
bool result = false;

JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto?

jerry_init (JERRY_INIT_EMPTY);

static jerry_char_t literal_buffer_c[256];
static const char *code_for_c_format_p = "var object = { aa:'min', Bb:'max' };";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a

aaa:'xzy0'

true);
TEST_ASSERT (literal_sizes_c_format == 153);

static const jerry_char_t expected_c_format[256] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we align it after the = sign?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had aligned in the mentioned format, but the Vera complains about, but it pass when enclosed into parentheses, thanks.

@bzsolt
Copy link
Member Author

bzsolt commented Dec 21, 2016

I've updated the PR.

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good patch, but some extra changes are needed.


- `source_p` - script source, it must be a valid utf8 string.
- `source_size` - script source size, in bytes.
- `is_strict` - strict mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other sentences stops with dot. Please add it here


/**
* ====================== Functions for literal saving ==========================
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other sections have these ====================== ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at jerry.c:1867.

{
jerry_save_literals_sort_helper (src, dest, begin, mid);
jerry_save_literals_sort_helper (src, dest, mid + 1, end);
jerry_save_literals_merging_helper (src, dest, begin, mid, end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: is there a way to avoid recursion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw heapsort does not need recuresive implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in case of merge sort is not possible. So you advice to use a heapsort algorithm instead of the current merge sort?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complicated algorithm for an array of data:

https://rosettacode.org/wiki/Sorting_algorithms/Heapsort#C

* @return number of bytes, actually copied to the buffer.
*/
static lit_utf8_size_t
append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these functions have no jerry_ prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file there are other 'helper' functions without jerry_ prefix (e.g. snapshot_add_compiled_code), but ok, I'll prefix them.

total_bytes,
(const char *) uint32_to_str_buffer,
utf8_str_size);
return bytes_copied;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to create a variable, just return with the value.

}
else
{
result = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is initialized to false above

* which are valid identifiers and none of them are magic string.
*
* @return size of the literal-list in bytes, at most equal to the buffer size,
* if the source parsed successfully and the list of the literals isn't empty,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space in alignment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At jerry_parse_and_save_snapshot there are the same style, which one is nicer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No extra space. The other is probably a style error.

}

ecma_string_t *literal_array[literal_count];
ecma_string_t *literal_array_sorted[literal_count];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change these to real allocation, since stack consumption might be too big.

(const char *) "jerry_length_t literal_count = ",
0);

buffer_p += append_number_to_buffer (buffer_p, buffer_end_p, &total_bytes, literal_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think total_bytes is a redundant argument since it can be computed as buffer_p-buffer_start_p. Passing less arguments is generally faster. Just save the buffer_p into a buffer_start_p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function could also return with the new buffer_p instead of the size, which would also reduce the code size, since there is no preserving the original value and an extra addition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the buffer is insufficient, we cannot determine the exact amount of the used bytes just from the pointers. But if we don't want to save anything in the mentioned case, we can omit this variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, meanwhile I've removed this argument.

@@ -354,6 +356,10 @@ main (int argc,
bool is_save_snapshot_mode_for_global_or_eval = false;
const char *save_snapshot_file_name_p = NULL;

bool is_save_literals_mode = false;
bool is_save_literals_mode_in_c_format_or_list = false;
const char *save_literals_file_name_p = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: can snapshot and save literals be done in one step? It is great if it is possible. Othoerwise only one file name parameter is enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be done in one step.

bzsolt pushed a commit to bzsolt/jerryscript that referenced this pull request Dec 22, 2016
After this patch, we have to provide external strings ordered by size and lexicographically.
We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500).

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
bzsolt pushed a commit to bzsolt/jerryscript that referenced this pull request Dec 22, 2016
After this patch, we have to provide external strings ordered by size and lexicographically.
We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500).

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
@bzsolt bzsolt force-pushed the save-literals branch 2 times, most recently from 5bb89d9 to 53480a2 Compare December 30, 2016 00:35
return buffer_p;
}

const lit_utf8_size_t str_size = (string_size == 0) ? (lit_utf8_size_t) strlen (chars) : string_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jerry_append_chars_to_buffer is good now. Just one more comment. What about:

if (string_size == 0)
{
  string_size = (lit_utf8_size_t) strlen (chars);
}

This way we don't need another variable.

{
return ecma_compare_ecma_strings_relational (literal1, literal2);
}
else if (lit1_size < lit2_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

return (lit1_size < lit2_size);

We don't need several ifs.

break;
}

ecma_string_t *swap = literals[node_idx];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swap_p or tmp_p

{
const lit_utf8_size_t last_idx = num_of_literals - lit_idx - 1;

ecma_string_t *swap = literals[last_idx];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

} /* jerry_save_literals_down_heap */

/**
* Helper function for a heapsort algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this algorithm. No need any large stack.


if (buffer_p + str_size <= buffer_end_p)
{
strncpy ((char *) buffer_p, chars, str_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use memcpy. The strncpy stops if it encounters a 0, which isn't possible at this moment (string is identifier), but we might add other rules in the future, and this copy will be a difficult to debug.


/* Move the pointer behind the buffer to prevent further writes. */
/* Distance from the buffer represents the unused bytes. */
return (buffer_end_p + (buffer_end_p - buffer_p));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??? Distance from the buffer represents the unused bytes ??? - I don't understand this comment.
Isn't it represent the needed bytes of the last append?

What about simply return with:

return buffer_end_p + 1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the buffer_end_p - buffer_p represents the remaining bytes in the buffer.
When we encounter a string which doesn't fit into the available space
we move the buffer_p after the buffer_end_p.
For the saving process, we need to know the exact amount of used bytes in the buffer and
we can compute it easily when we shifted the buffer_p according to the unused bytes,
see jerry-snapshot.c:1049

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we move the buffer_p after the buffer_end_p

return buffer_end_p + 1;

This exactly do this. I don't really understand the last sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the effect is the same, but how many bytes would you save when the buffer is insufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the purpose of the computation, please explain it to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shift the buffer pointer by the number of the unused bytes in the buffer instead of 1,
thus we can determine the number of the used bytes at the end of jerry_parse_and_save_literals().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, but that is why I suggested returning with buffer_end_p + 1, and not buffer_p + 1. So we get a pointer which is right after the end of the buffer. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we return with 0 if the buffer is too small, we have no plans to return with the required number of bytes.

jerry_append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */
uint8_t *buffer_end_p, /**< the end of the buffer */
const char *chars, /**< string */
lit_utf8_size_t string_size) /**< string size */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks better without the extra argument :)


**Summary**

Collect and save the used literals from the specified source code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how it works. Please add more description here.

* false - otherwise.
*/
static bool
ecma_string_is_valid_identifier (ecma_string_t *string_p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ecma_string_t *string_p

jerry_parse_and_save_literals (const jerry_char_t *source_p, /**< script source */
size_t source_size, /**< script source size */
bool is_strict, /**< strict mode */
uint8_t *buffer_p, /**< buffer to save literals to */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint8_t *buffer_p, /**< [out] buffer to save literals to */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think out_buffer_p or result_buffer_p would be better names (Only my personal preference)

JERRY_BUFFER_SIZE);
if (snapshot_size == 0)
{
ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't leave the message empty. (I know it was empty before.)

is_save_literals_mode_in_c_format_or_list);
if (literal_buffer_size == 0)
{
ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

{
FILE *literal_file_p = fopen (save_literals_file_name_p, "w");
fwrite (snapshot_save_buffer, sizeof (uint8_t), literal_buffer_size, literal_file_p);
fclose (literal_file_p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add this file saving code the example source code of the jerry_parse_and_save_literal function in API docs.

literal_buffer_c,
sizeof (literal_buffer_c),
true);
TEST_ASSERT (literal_sizes_c_format == 203);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

203? why is it so big for such a small example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the size of the C-style format, please look 3 lines below for the expected result.

bzsolt pushed a commit to bzsolt/jerryscript that referenced this pull request Jan 3, 2017
After this patch, we have to provide external strings ordered by size and lexicographically.
We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500).

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
bzsolt pushed a commit to bzsolt/jerryscript that referenced this pull request Jan 4, 2017
After this patch, we have to provide external strings ordered by size and lexicographically.
We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500).

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
LaszloLango pushed a commit that referenced this pull request Jan 9, 2017
After this patch, we have to provide external strings ordered by size and lexicographically.
We can do this with jerry_parse_and_save_literals() (#1500).

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
Copy link
Contributor

@LaszloLango LaszloLango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent patch. A couple of small fixes, and you can land it.

*/

/**
* Compare two ecma_string by size, then lexicographically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ecma_strings

} /* jerry_save_literals_heap_max */

/**
* Helper function for a heapsort algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the heapsort

} /* jerry_save_literals_compare */

/**
* Helper function for a heapsort algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the heapsort


ECMA_STRING_TO_UTF8_STRING (string_p, str_buffer_p, str_buffer_size);

/* Append the string to the buffer */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot after sentence.

ecma_lit_storage_item_t *string_list_p = JERRY_CONTEXT (string_list_first_p);
lit_utf8_size_t literal_count = 0;

/* Count the valid and non-magic identifiers in the list */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot after sentence.

ecma_string_t *literal_p = JMEM_CP_GET_NON_NULL_POINTER (ecma_string_t,
string_list_p->values[i]);
/* We don't save a literal which isn't a valid identifier
or it's a magic string */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dot after sentence.

/* We don't save a literal which isn't a valid identifier
or it's a magic string */
if (ecma_string_is_valid_identifier (literal_p)
&& ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would check ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT first.

string_list_p->values[i]);

if (ecma_string_is_valid_identifier (literal_p)
&& ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

string_list_p = JMEM_CP_GET_POINTER (ecma_lit_storage_item_t, string_list_p->next_cp);
}

/* Sort the strings by size at first, then lexicographically */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add dots after each sentence.

if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE))
{
is_save_literals_mode = true;
is_save_literals_mode_in_c_format_or_list = !strcmp ("--save-literals-c-format", argv[i - 1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simply check the length, since the two string lengths are different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR according to your review, but there is another occurence the mentioned comparison at main-unix.c:445 in save-snapshot section. It could be modified in a follow-up patch.

if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE))
{
is_save_literals_mode = true;
is_save_literals_mode_in_c_format_or_list = strlen ("--save-literals-c-format") == strlen (argv[i - 1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is looks less nicely. Please revert to the original, this is not time critical. Sorry for the extra work.

This function can be used to save literals into a specific file in a list or C format.
These literals are valid identifiers, and doesn't match to any magic-string.
The '--save-literals-list-format FILE' and '--save-literals-c-format FILE'
options are used to save into the given file, when the snapshot-save is enabled.
The saved literals are sorted by size and lexicographically.
The C-format is useful for jerry_register_magic_strings() to generate the array
of external magic strings.

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]
@bzsolt bzsolt merged commit f1ed571 into jerryscript-project:master Jan 11, 2017
@bzsolt bzsolt deleted the save-literals branch January 12, 2017 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the public API feature request Requested feature snapshot Related to the snapshot feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants