[API] Introduce jerry_parse_and_save_literals() #1500

bzsolt · 2016-12-21T09:37:57Z

This function can be used to save literals into a specific file in a list or C format.
These literals are valid identifiers, and doesn't match to any magic-string.
The '--save-literals-list-format FILE' and '--save-literals-c-format FILE'
options are used to save into the given file, when the snapshot-save is enabled.
The saved literals are sorted by size and lexicographically.
The C-format is useful for jerry_register_magic_strings() to generate the array
of external magic strings.

JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

zherczeg · 2016-12-21T10:51:41Z

jerry-core/jerry-snapshot.c

+    return str_size;
+  }
+
+  /* Move the pointer behind the buffer to prevent further write */


to prevent further writes.

zherczeg · 2016-12-21T10:53:14Z

jerry-core/jerry-snapshot.c

+{
+  lit_utf8_size_t bytes_copied = 0;
+
+  JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t);


Can we use ECMA_STRING_TO_UTF8_STRING here?

zherczeg · 2016-12-21T10:53:36Z

jerry-core/jerry-snapshot.c

+  lit_utf8_size_t str_size = ecma_string_get_size (string_p);
+  bool result = false;
+
+  JMEM_DEFINE_LOCAL_ARRAY (str_buffer_p, str_size, lit_utf8_byte_t);


zherczeg · 2016-12-21T10:54:57Z

tests/unit/test-api.c

+    jerry_init (JERRY_INIT_EMPTY);
+
+    static jerry_char_t literal_buffer_c[256];
+    static const char *code_for_c_format_p = "var object = { aa:'min', Bb:'max' };";


Would be good to have a

aaa:'xzy0'

zherczeg · 2016-12-21T10:56:17Z

tests/unit/test-api.c

+                                                                   true);
+    TEST_ASSERT (literal_sizes_c_format == 153);
+
+    static const jerry_char_t expected_c_format[256] =


Can we align it after the = sign?

Yes, I had aligned in the mentioned format, but the Vera complains about, but it pass when enclosed into parentheses, thanks.

bzsolt · 2016-12-21T13:10:21Z

I've updated the PR.

zherczeg

Good patch, but some extra changes are needed.

zherczeg · 2016-12-22T12:28:58Z

docs/02.API-REFERENCE.md

+
+- `source_p` - script source, it must be a valid utf8 string.
+- `source_size` - script source size, in bytes.
+- `is_strict` - strict mode


All other sentences stops with dot. Please add it here

zherczeg · 2016-12-22T12:30:25Z

jerry-core/jerry-snapshot.c

+
+/**
+ * ====================== Functions for literal saving ==========================
+ */


Are other sections have these ====================== ?

Yes, at jerry.c:1867.

zherczeg · 2016-12-22T12:32:40Z

jerry-core/jerry-snapshot.c

+  {
+    jerry_save_literals_sort_helper (src, dest, begin, mid);
+    jerry_save_literals_sort_helper (src, dest, mid + 1, end);
+    jerry_save_literals_merging_helper (src, dest, begin, mid, end);


Just a question: is there a way to avoid recursion?

Btw heapsort does not need recuresive implementation.

I think in case of merge sort is not possible. So you advice to use a heapsort algorithm instead of the current merge sort?

Not a complicated algorithm for an array of data:

https://rosettacode.org/wiki/Sorting_algorithms/Heapsort#C

zherczeg · 2016-12-22T12:34:34Z

jerry-core/jerry-snapshot.c

+ * @return number of bytes, actually copied to the buffer.
+ */
+static lit_utf8_size_t
+append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */


Why these functions have no jerry_ prefix?

In this file there are other 'helper' functions without jerry_ prefix (e.g. snapshot_add_compiled_code), but ok, I'll prefix them.

zherczeg · 2016-12-22T12:35:32Z

jerry-core/jerry-snapshot.c

+                                                         total_bytes,
+                                                         (const char *) uint32_to_str_buffer,
+                                                         utf8_str_size);
+  return bytes_copied;


We don't need to create a variable, just return with the value.

zherczeg · 2016-12-22T12:36:46Z

jerry-core/jerry-snapshot.c

+  }
+  else
+  {
+    result = false;


result is initialized to false above

zherczeg · 2016-12-22T12:37:29Z

jerry-core/jerry-snapshot.c

+ * which are valid identifiers and none of them are magic string.
+ *
+ * @return size of the literal-list in bytes, at most equal to the buffer size,
+ *          if the source parsed successfully and the list of the literals isn't empty,


extra space in alignment

At jerry_parse_and_save_snapshot there are the same style, which one is nicer?

No extra space. The other is probably a style error.

zherczeg · 2016-12-22T12:40:02Z

jerry-core/jerry-snapshot.c

+  }
+
+  ecma_string_t *literal_array[literal_count];
+  ecma_string_t *literal_array_sorted[literal_count];


I would change these to real allocation, since stack consumption might be too big.

zherczeg · 2016-12-22T12:43:36Z

jerry-core/jerry-snapshot.c

+                                        (const char *) "jerry_length_t literal_count = ",
+                                        0);
+
+    buffer_p += append_number_to_buffer (buffer_p, buffer_end_p, &total_bytes, literal_count);


I think total_bytes is a redundant argument since it can be computed as buffer_p-buffer_start_p. Passing less arguments is generally faster. Just save the buffer_p into a buffer_start_p.

The function could also return with the new buffer_p instead of the size, which would also reduce the code size, since there is no preserving the original value and an extra addition.

When the buffer is insufficient, we cannot determine the exact amount of the used bytes just from the pointers. But if we don't want to save anything in the mentioned case, we can omit this variable.

Ok, meanwhile I've removed this argument.

zherczeg · 2016-12-22T12:47:15Z

jerry-main/main-unix.c

@@ -354,6 +356,10 @@ main (int argc,
  bool is_save_snapshot_mode_for_global_or_eval = false;
  const char *save_snapshot_file_name_p = NULL;

+  bool is_save_literals_mode = false;
+  bool is_save_literals_mode_in_c_format_or_list = false;
+  const char *save_literals_file_name_p = NULL;


Question: can snapshot and save literals be done in one step? It is great if it is possible. Othoerwise only one file name parameter is enough.

Yes, it can be done in one step.

After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

zherczeg · 2016-12-30T03:54:52Z

jerry-core/jerry-snapshot.c

+    return buffer_p;
+  }
+
+  const lit_utf8_size_t str_size = (string_size == 0) ? (lit_utf8_size_t) strlen (chars) : string_size;


The jerry_append_chars_to_buffer is good now. Just one more comment. What about:

if (string_size == 0) { string_size = (lit_utf8_size_t) strlen (chars); }

This way we don't need another variable.

zherczeg · 2016-12-30T03:57:01Z

jerry-core/jerry-snapshot.c

+  {
+    return ecma_compare_ecma_strings_relational (literal1, literal2);
+  }
+  else if (lit1_size < lit2_size)


What about:

return (lit1_size < lit2_size);

We don't need several ifs.

zherczeg · 2016-12-30T04:00:03Z

jerry-core/jerry-snapshot.c

+      break;
+    }
+
+    ecma_string_t *swap  = literals[node_idx];


swap_p or tmp_p

zherczeg · 2016-12-30T04:00:37Z

jerry-core/jerry-snapshot.c

+  {
+    const lit_utf8_size_t last_idx = num_of_literals - lit_idx - 1;
+
+    ecma_string_t *swap = literals[last_idx];


zherczeg · 2016-12-30T04:01:15Z

jerry-core/jerry-snapshot.c

+} /* jerry_save_literals_down_heap */
+
+/**
+ * Helper function for a heapsort algorithm.


I like the simplicity of this algorithm. No need any large stack.

zherczeg · 2016-12-30T04:03:36Z

jerry-core/jerry-snapshot.c

+
+  if (buffer_p + str_size <= buffer_end_p)
+  {
+    strncpy ((char *) buffer_p, chars, str_size);


You should use memcpy. The strncpy stops if it encounters a 0, which isn't possible at this moment (string is identifier), but we might add other rules in the future, and this copy will be a difficult to debug.

zherczeg · 2016-12-30T04:04:09Z

jerry-core/jerry-snapshot.c

+
+  /* Move the pointer behind the buffer to prevent further writes. */
+  /* Distance from the buffer represents the unused bytes. */
+  return (buffer_end_p + (buffer_end_p - buffer_p));


??? Distance from the buffer represents the unused bytes ??? - I don't understand this comment.
Isn't it represent the needed bytes of the last append?

What about simply return with:

return buffer_end_p + 1;

So, the buffer_end_p - buffer_p represents the remaining bytes in the buffer.
When we encounter a string which doesn't fit into the available space
we move the buffer_p after the buffer_end_p.
For the saving process, we need to know the exact amount of used bytes in the buffer and
we can compute it easily when we shifted the buffer_p according to the unused bytes,
see jerry-snapshot.c:1049

we move the buffer_p after the buffer_end_p

return buffer_end_p + 1;

This exactly do this. I don't really understand the last sentence.

Yes, the effect is the same, but how many bytes would you save when the buffer is insufficient?

I still don't understand the purpose of the computation, please explain it to me.

We shift the buffer pointer by the number of the unused bytes in the buffer instead of 1,
thus we can determine the number of the used bytes at the end of jerry_parse_and_save_literals().

Hey, but that is why I suggested returning with buffer_end_p + 1, and not buffer_p + 1. So we get a pointer which is right after the end of the buffer. Am I missing something?

I think we return with 0 if the buffer is too small, we have no plans to return with the required number of bytes.

zherczeg · 2016-12-30T04:04:37Z

jerry-core/jerry-snapshot.c

+jerry_append_chars_to_buffer (uint8_t *buffer_p, /**< buffer */
+                              uint8_t *buffer_end_p, /**< the end of the buffer */
+                              const char *chars, /**< string */
+                              lit_utf8_size_t string_size) /**< string size */


This function looks better without the extra argument :)

LaszloLango · 2017-01-02T09:28:54Z

docs/02.API-REFERENCE.md

+
+**Summary**
+
+Collect and save the used literals from the specified source code.


I don't understand how it works. Please add more description here.

LaszloLango · 2017-01-02T09:32:47Z

jerry-core/jerry-snapshot.c

+ *         false - otherwise.
+ */
+static bool
+ecma_string_is_valid_identifier (ecma_string_t *string_p)


const ecma_string_t *string_p

LaszloLango · 2017-01-02T09:33:35Z

jerry-core/jerry-snapshot.c

+jerry_parse_and_save_literals (const jerry_char_t *source_p, /**< script source */
+                               size_t source_size, /**< script source size */
+                               bool is_strict, /**< strict mode */
+                               uint8_t *buffer_p, /**< buffer to save literals to */


uint8_t *buffer_p, /**< [out] buffer to save literals to */

I think out_buffer_p or result_buffer_p would be better names (Only my personal preference)

LaszloLango · 2017-01-02T09:43:03Z

jerry-main/main-unix.c

+                                                                JERRY_BUFFER_SIZE);
+          if (snapshot_size == 0)
+          {
+            ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) "");


Please don't leave the message empty. (I know it was empty before.)

LaszloLango · 2017-01-02T09:43:17Z

jerry-main/main-unix.c

+                                                                            is_save_literals_mode_in_c_format_or_list);
+          if (literal_buffer_size == 0)
+          {
+            ret_value = jerry_create_error (JERRY_ERROR_COMMON, (jerry_char_t *) "");


LaszloLango · 2017-01-02T09:44:53Z

jerry-main/main-unix.c

+          {
+            FILE *literal_file_p = fopen (save_literals_file_name_p, "w");
+            fwrite (snapshot_save_buffer, sizeof (uint8_t), literal_buffer_size, literal_file_p);
+            fclose (literal_file_p);


I'd add this file saving code the example source code of the jerry_parse_and_save_literal function in API docs.

LaszloLango · 2017-01-02T09:46:00Z

tests/unit/test-api.c

+                                                                   literal_buffer_c,
+                                                                   sizeof (literal_buffer_c),
+                                                                   true);
+    TEST_ASSERT (literal_sizes_c_format == 203);


203? why is it so big for such a small example?

That is the size of the C-style format, please look 3 lines below for the expected result.

After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (jerryscript-project#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

After this patch, we have to provide external strings ordered by size and lexicographically. We can do this with jerry_parse_and_save_literals() (#1500). JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

LaszloLango

LGTM

zherczeg

Excellent patch. A couple of small fixes, and you can land it.

zherczeg · 2017-01-10T12:29:21Z

jerry-core/jerry-snapshot.c

+ */
+
+/**
+ * Compare two ecma_string by size, then lexicographically.


ecma_strings

zherczeg · 2017-01-10T12:31:40Z

jerry-core/jerry-snapshot.c

+} /* jerry_save_literals_heap_max */
+
+/**
+ * Helper function for a heapsort algorithm.


for the heapsort

zherczeg · 2017-01-10T12:32:06Z

jerry-core/jerry-snapshot.c

+} /* jerry_save_literals_compare */
+
+/**
+ * Helper function for a heapsort algorithm.


for the heapsort

zherczeg · 2017-01-10T12:35:42Z

jerry-core/jerry-snapshot.c

+
+  ECMA_STRING_TO_UTF8_STRING (string_p, str_buffer_p, str_buffer_size);
+
+  /* Append the string to the buffer */


Dot after sentence.

zherczeg · 2017-01-10T12:37:53Z

jerry-core/jerry-snapshot.c

+  ecma_lit_storage_item_t *string_list_p = JERRY_CONTEXT (string_list_first_p);
+  lit_utf8_size_t literal_count = 0;
+
+  /* Count the valid and non-magic identifiers in the list */


Dot after sentence.

zherczeg · 2017-01-10T12:39:45Z

jerry-core/jerry-snapshot.c

+        ecma_string_t *literal_p = JMEM_CP_GET_NON_NULL_POINTER (ecma_string_t,
+                                                                 string_list_p->values[i]);
+        /* We don't save a literal which isn't a valid identifier
+           or it's a magic string */


Dot after sentence.

zherczeg · 2017-01-10T12:41:40Z

jerry-core/jerry-snapshot.c

+        /* We don't save a literal which isn't a valid identifier
+           or it's a magic string */
+        if (ecma_string_is_valid_identifier (literal_p)
+            && ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT)


I would check ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT first.

zherczeg · 2017-01-10T12:41:50Z

jerry-core/jerry-snapshot.c

+                                                                 string_list_p->values[i]);
+
+        if (ecma_string_is_valid_identifier (literal_p)
+            && ecma_get_string_magic (literal_p) == LIT_MAGIC_STRING__COUNT)


zherczeg · 2017-01-10T12:42:14Z

jerry-core/jerry-snapshot.c

+    string_list_p = JMEM_CP_GET_POINTER (ecma_lit_storage_item_t, string_list_p->next_cp);
+  }
+
+  /* Sort the strings by size at first, then lexicographically */


Please add dots after each sentence.

zherczeg · 2017-01-10T12:44:18Z

jerry-main/main-unix.c

+      if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE))
+      {
+        is_save_literals_mode = true;
+        is_save_literals_mode_in_c_format_or_list = !strcmp ("--save-literals-c-format", argv[i - 1]);


I think we can simply check the length, since the two string lengths are different.

I've updated the PR according to your review, but there is another occurence the mentioned comparison at main-unix.c:445 in save-snapshot section. It could be modified in a follow-up patch.

zherczeg · 2017-01-11T10:59:36Z

jerry-main/main-unix.c

+      if (jerry_is_feature_enabled (JERRY_FEATURE_SNAPSHOT_SAVE))
+      {
+        is_save_literals_mode = true;
+        is_save_literals_mode_in_c_format_or_list = strlen ("--save-literals-c-format") == strlen (argv[i - 1]);


Unfortunately this is looks less nicely. Please revert to the original, this is not time critical. Sorry for the extra work.

This function can be used to save literals into a specific file in a list or C format. These literals are valid identifiers, and doesn't match to any magic-string. The '--save-literals-list-format FILE' and '--save-literals-c-format FILE' options are used to save into the given file, when the snapshot-save is enabled. The saved literals are sorted by size and lexicographically. The C-format is useful for jerry_register_magic_strings() to generate the array of external magic strings. JerryScript-DCO-1.0-Signed-off-by: Zsolt Borbély [email protected]

bzsolt added api Related to the public API feature request Requested feature snapshot Related to the snapshot feature labels Dec 21, 2016

zherczeg reviewed Dec 21, 2016

View reviewed changes

zherczeg requested changes Dec 21, 2016

View reviewed changes

bzsolt force-pushed the save-literals branch from 56860c5 to d12afe3 Compare December 21, 2016 12:58

zherczeg requested changes Dec 22, 2016

View reviewed changes

bzsolt mentioned this pull request Dec 22, 2016

[API] Improve the performance of the external magic id search #1506

Merged

bzsolt force-pushed the save-literals branch 2 times, most recently from 5bb89d9 to 53480a2 Compare December 30, 2016 00:35

zherczeg requested changes Dec 30, 2016

View reviewed changes

bzsolt force-pushed the save-literals branch from 53480a2 to 25df93b Compare December 30, 2016 22:21

LaszloLango requested changes Jan 2, 2017

View reviewed changes

bzsolt force-pushed the save-literals branch from 25df93b to 3491665 Compare January 2, 2017 23:10

bzsolt force-pushed the save-literals branch from 3491665 to 25d2d9d Compare January 9, 2017 14:33

LaszloLango approved these changes Jan 9, 2017

View reviewed changes

bzsolt force-pushed the save-literals branch from 25d2d9d to 5848a69 Compare January 9, 2017 22:45

zherczeg approved these changes Jan 10, 2017

View reviewed changes

bzsolt force-pushed the save-literals branch from 5848a69 to 5bd8cb6 Compare January 11, 2017 10:42

zherczeg approved these changes Jan 11, 2017

View reviewed changes

bzsolt force-pushed the save-literals branch from 5bd8cb6 to 916b94c Compare January 11, 2017 11:06

bzsolt merged commit f1ed571 into jerryscript-project:master Jan 11, 2017

bzsolt deleted the save-literals branch January 12, 2017 10:09


		Summary

		Collect and save the used literals from the specified source code.


		ECMA_STRING_TO_UTF8_STRING (string_p, str_buffer_p, str_buffer_size);

		/* Append the string to the buffer */

[API] Introduce jerry_parse_and_save_literals() #1500

[API] Introduce jerry_parse_and_save_literals() #1500

Uh oh!

Conversation

bzsolt commented Dec 21, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bzsolt commented Dec 21, 2016

Uh oh!

zherczeg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment