Support MySQL Character Set Introducers #788

mskrzypkows · 2023-01-04T12:25:43Z

https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html

Characters set introducers: _latin1, _binary, _utf8, _utf8mb4

…l_characters_introducer

coveralls · 2023-01-04T12:30:37Z

Pull Request Test Coverage Report for Build 4206535525

45 of 53 (84.91%) changed or added relevant lines in 4 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.008%) to 86.045%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/parser.rs	10	18	55.56%

Files with Coverage Reduction	New Missed Lines	%
src/parser.rs	1	83.17%

Totals
Change from base Build 4206494817:	-0.008%
Covered Lines:	13238
Relevant Lines:	15385

💛 - Coveralls

AugustoFKL · 2023-01-05T02:29:12Z

src/ast/mod.rs

@@ -696,6 +698,7 @@ impl fmt::Display for Expr {
            Expr::Collate { expr, collation } => write!(f, "{} COLLATE {}", expr, collation),
            Expr::Nested(ast) => write!(f, "({})", ast),
            Expr::Value(v) => write!(f, "{}", v),
+            Expr::IntroducedString { introducer, value } => write!(f, "{} {}", introducer, value),


@mskrzypkows is this supposed to always be spaced?

In simple strings, it seems that there's no space:

SELECT _latin1'abc';

While in strings like a binary string does have spaces:

SELECT _latin1 X'4D7953514C';

Am I missing something?

OBS: both examples are from your reference: https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html

The space doesn't matter, it may be both with or without.

Testing both versions here: https://github.com/sqlparser-rs/sqlparser-rs/pull/788/files#diff-c8a9c2f5aa6b5c0f71fde332e59342930b2d67697d6fa11b538a7fadbbe1fe78R1302

Oh, I see, sorry the confusion.

AugustoFKL

LGTM

@alamb would it be possible to access the reports of coverage for this and other PR's?

It feels really confusing sometimes, since we have only the percentage and files, but not exactly what is not covered. I know the report contains the exact uncovered changed lines.

alamb · 2023-01-09T22:00:21Z

@alamb would it be possible to access the reports of coverage for this and other PR's?

It would be fine with me -- how can I do it? Can you go to https://coveralls.io/builds/55632033 ?

alamb · 2023-01-09T22:01:39Z

I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week

alamb · 2023-01-09T22:02:19Z

I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week

AugustoFKL · 2023-01-11T12:45:28Z

@alamb not perfectly sure. For some reason for me, the coverage is like this:

Is it the same for you?

alamb · 2023-01-13T09:02:16Z

@AugustoFKL here is what https://coveralls.io/builds/55632033 looks like to me:

I wonder if it just took a while to update? Can you check again?

The code coverage job / infrastructure predates my involvement in sqlparser-rs

AugustoFKL · 2023-01-13T16:44:58Z

@alamb try to open one of the files, please.

Here's an example of one coverage of one of my projects

alamb · 2023-01-15T10:33:56Z

https://coveralls.io/builds/55632033/source?filename=src%2Fparser.rs

Appears to say "source not available"

alamb

Thank you for the contribution -- this is looking close. My major question is about the hard coded list of introducers.

alamb · 2023-01-15T10:37:12Z

src/tokenizer.rs

-                    Ok(Some(Token::make_word(&s, None)))
+
+                    // https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
+                    if ch == '_' && dialect_of!(self is MySqlDialect) {


In general, I think it would be nice to support this also in the GenericDialect for consistency with other vendor specific features, but I don't think it is required.

alamb · 2023-01-15T10:37:50Z

src/tokenizer.rs

+
+                    // https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
+                    if ch == '_' && dialect_of!(self is MySqlDialect) {
+                        const INTRODUCERS: [&str; 4] = ["_latin1", "_binary", "_utf8", "_utf8mb4"];


My reading of https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html suggests the value for the charset introducer can be any character set supported; Why are these 4 hard coded?

That's good point! I'll fix it.

mskrzypkows · 2023-01-16T12:31:46Z

src/parser.rs

+                    Token::SingleQuotedString(_)
+                    | Token::DoubleQuotedString(_)
+                    | Token::HexStringLiteral(_)
+                        if w.value.starts_with('_') =>


@alamb Moved detection of Introduced String here. So I check it form Token::Word this seems to be the most generic solution. What do you think?

Unfortunately, it turned out to be the wrong solution. In MySQL, it's possible to run a query like this one:

select _col1 'str' from test_table;

In this case, 'str' is just a column alias...
So, I need to check if the string introducer is one of the following stings: https://dev.mysql.com/doc/refman/8.0/en/charset-charsets.html with _ prefix.

The question is where to check it. I may bring back Token::StringIntroducer or check it here, only when we have string token in front of the introducer. Personally, I would stick with the second option.

@alamb What do you think, does it fit to the sqlparser-rs, to have such a check for mysql string introducers?

I am sorry -- I am behind on sqlparser-rs reviews. I hope to catch up later this week

alamb

LGTM -- thank you @mskrzypkows and @AugustoFKL

…s_introducer

alamb · 2023-02-17T18:31:18Z

I took the liberty of merging up from main to resolve a conflict and fixing the clippy failure

alamb · 2023-02-17T18:38:39Z

Thanks again @mskrzypkows and sorry for the delay

Maciej Skrzypkowski added 2 commits January 4, 2023 13:22

MySQL Character Set Introducers

a9adb32

Merge branch 'main' of github.com:sqlparser-rs/sqlparser-rs into mysq…

2dcd320

…l_characters_introducer

Documentation fix

30bf873

AugustoFKL reviewed Jan 5, 2023

View reviewed changes

AugustoFKL approved these changes Jan 7, 2023

View reviewed changes

alamb reviewed Jan 15, 2023

View reviewed changes

Maciej Skrzypkowski added 2 commits January 16, 2023 13:24

Parsing string introducer from Token::word

84cdffc

Fixed lint

befac8d

mskrzypkows commented Jan 16, 2023

View reviewed changes

alamb changed the title ~~MySQL Character Set Introducers~~ Support MySQL Character Set Introducers Feb 17, 2023

alamb approved these changes Feb 17, 2023

View reviewed changes

alamb added 2 commits February 17, 2023 13:29

Merge remote-tracking branch 'sqlparser-rs/main' into mysql_character…

95eec07

…s_introducer

fix clippy

a12acc7

alamb merged commit 488e8a8 into apache:main Feb 17, 2023

Support MySQL Character Set Introducers #788

Support MySQL Character Set Introducers #788

Uh oh!

Conversation

mskrzypkows commented Jan 4, 2023

Uh oh!

coveralls commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 4206535525

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AugustoFKL left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 9, 2023

Uh oh!

alamb commented Jan 9, 2023

Uh oh!

alamb commented Jan 9, 2023

Uh oh!

AugustoFKL commented Jan 11, 2023

Uh oh!

alamb commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AugustoFKL commented Jan 13, 2023

Uh oh!

alamb commented Jan 15, 2023

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mskrzypkows Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Feb 17, 2023

Uh oh!

alamb commented Feb 17, 2023

Uh oh!

Uh oh!

coveralls commented Jan 4, 2023 •

edited

Loading

alamb commented Jan 13, 2023 •

edited

Loading

mskrzypkows Jan 17, 2023 •

edited

Loading