Skip to content

Change '{' regex parsing to work similarly to other engines (#2134) #2169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2018

Conversation

szledan
Copy link
Contributor

@szledan szledan commented Jan 22, 2018

Changed regex parsing to be able to handle opening braces
as other engines do, and also added a compile flag which
can disable this behaviour.

JerryScript-DCO-1.0-Signed-off-by: Szilard Ledan [email protected]

…ipt-project#2134)

Changed regex parsing to be able to handle opening braces
as other engines do, and also added a compile flag which
can disable this behaviour.

JerryScript-DCO-1.0-Signed-off-by: Szilard Ledan [email protected]
@szledan szledan force-pushed the regex-brace-parsing branch from ad7702a to 02cf035 Compare January 22, 2018 14:03
case LIT_CHAR_LEFT_BRACE:
{
#ifdef ENABLE_REGEXP_STRICT_MODE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we switch around the logic and do something like this?

    case LIT_CHAR_PLUS:
    case LIT_CHAR_LEFT_BRACE:
    {
#ifndef ENABLE_REGEXP_STRICT_MODE
      const lit_utf8_byte_t *input_curr_p = parser_ctx_p->input_curr_p;

      lit_utf8_decr (&parser_ctx_p->input_curr_p);
      if (!ecma_is_value_empty (re_parse_iterator (parser_ctx_p, out_token_p)))
      {
          parser_ctx_p->input_curr_p = input_curr_p;

          out_token_p->type = RE_TOK_CHAR;
          out_token_p->value = ch;
          ret_value = re_parse_iterator (parser_ctx_p, out_token_p);

          if (!ecma_is_value_empty (ret_value))
          {
            parser_ctx_p->input_curr_p = input_curr_p;
            ret_value = ECMA_VALUE_EMPTY;
          }
          break;
      }
#else /* ENABLE_REGEXP_STRICT_MODE */
      return ecma_raise_syntax_error (ECMA_ERR_MSG ("Invalid RegExp token."));
#endif /* !ENABLE_REGEXP_STRICT_MODE */

so we can remove the 884-886 line additions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't remove these lines, because this patch handles only the { braces issue and doesn't catch other iterator characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh yeah! I was to eager to simplify.

set(FEATURE_MEM_STRESS_TEST OFF CACHE BOOL "Enable mem-stress test?")
set(FEATURE_PARSER_DUMP OFF CACHE BOOL "Enable parser byte-code dumps?")
set(FEATURE_PROFILE "es5.1" CACHE STRING "Use default or other profile?")
set(FEATURE_REGEXP_STRICT_MODE OFF CACHE BOOL "Enable regexp strict mode?")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we change the current behaviour here? Personally I would prefer FEATURE_REGEXP_COMPATIBILITY_MODE, and keep it OFF by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now the compatibility mode is the default, so the regexp engine accepts this /{}/ regex pattern while the strict mode does not.
Shall I set the strict mode as default?
(Personally I would prefer the extended behaviour as default, like other engines.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to hear others opinion (kind of voting). I usually prefer to keep the current behaviour but I am not that strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong preferences. I think the average user won't notice the compatibility features only if they dig into the build system. So enable compatibility mode by default sounds reasonable for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @szledan and @LaszloLango, the compatibility features should be enabled by default, but I would change the name also. So my vote is to set the FEATURE_REGEXP_COMPATIBILITY_MODE to ON instead of setting the FEATURE_REGEXP_STRICT_MODE to OFF.

Copy link
Contributor

@robertsipka robertsipka Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I rethought it, the FEATURE_REGEXP_STRICT_MODE with OFF is fine. The other version may cause problems/different behaviors if somebody want to compile without CMake and forget to define the appropriate macro. LGTM (informally)

Copy link
Contributor

@LaszloLango LaszloLango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zherczeg zherczeg merged commit a4afde2 into jerryscript-project:master Jan 24, 2018
LaszloLango added a commit to LaszloLango/jerryscript that referenced this pull request Feb 16, 2018
It is a followup fix after jerryscript-project#2169. Fixes jerryscript-project#2204. It also fixes a memory leak.

JerryScript-DCO-1.0-Signed-off-by: László Langó [email protected]
LaszloLango added a commit to LaszloLango/jerryscript that referenced this pull request Feb 16, 2018
It is a followup fix after jerryscript-project#2169. Fixes jerryscript-project#2204. It also fixes a memory leak.

JerryScript-DCO-1.0-Signed-off-by: László Langó [email protected]
LaszloLango added a commit to LaszloLango/jerryscript that referenced this pull request Feb 16, 2018
It is a followup fix after jerryscript-project#2169. It also fixes a memory leak.
This fixes jerryscript-project#2198 and fixes jerryscript-project#2204

JerryScript-DCO-1.0-Signed-off-by: László Langó [email protected]
yichoi pushed a commit that referenced this pull request Feb 19, 2018
It is a followup fix after #2169. It also fixes a memory leak.
This fixes #2198 and fixes #2204

JerryScript-DCO-1.0-Signed-off-by: László Langó [email protected]
grgustaf pushed a commit to grgustaf/jerryscript that referenced this pull request Mar 19, 2018
It is a followup fix after jerryscript-project#2169. It also fixes a memory leak.
This fixes jerryscript-project#2198 and fixes jerryscript-project#2204

JerryScript-DCO-1.0-Signed-off-by: László Langó [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants