Skip to content

Support MySQL Character Set Introducers #788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 17, 2023

Conversation

mskrzypkows
Copy link
Contributor

https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html

Characters set introducers: _latin1, _binary, _utf8, _utf8mb4

@coveralls
Copy link

coveralls commented Jan 4, 2023

Pull Request Test Coverage Report for Build 4206535525

  • 45 of 53 (84.91%) changed or added relevant lines in 4 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.008%) to 86.045%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser.rs 10 18 55.56%
Files with Coverage Reduction New Missed Lines %
src/parser.rs 1 83.17%
Totals Coverage Status
Change from base Build 4206494817: -0.008%
Covered Lines: 13238
Relevant Lines: 15385

💛 - Coveralls

src/ast/mod.rs Outdated
@@ -696,6 +698,7 @@ impl fmt::Display for Expr {
Expr::Collate { expr, collation } => write!(f, "{} COLLATE {}", expr, collation),
Expr::Nested(ast) => write!(f, "({})", ast),
Expr::Value(v) => write!(f, "{}", v),
Expr::IntroducedString { introducer, value } => write!(f, "{} {}", introducer, value),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mskrzypkows is this supposed to always be spaced?

In simple strings, it seems that there's no space:

SELECT _latin1'abc';

While in strings like a binary string does have spaces:

SELECT _latin1 X'4D7953514C';

Am I missing something?

OBS: both examples are from your reference: https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space doesn't matter, it may be both with or without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, sorry the confusion.

Copy link
Contributor

@AugustoFKL AugustoFKL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb would it be possible to access the reports of coverage for this and other PR's?

It feels really confusing sometimes, since we have only the percentage and files, but not exactly what is not covered. I know the report contains the exact uncovered changed lines.

@alamb
Copy link
Contributor

alamb commented Jan 9, 2023

@alamb would it be possible to access the reports of coverage for this and other PR's?

It would be fine with me -- how can I do it? Can you go to https://coveralls.io/builds/55632033 ?

@alamb
Copy link
Contributor

alamb commented Jan 9, 2023

I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week

1 similar comment
@alamb
Copy link
Contributor

alamb commented Jan 9, 2023

I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week

@AugustoFKL
Copy link
Contributor

@alamb not perfectly sure. For some reason for me, the coverage is like this:

image

Is it the same for you?

@alamb
Copy link
Contributor

alamb commented Jan 13, 2023

@AugustoFKL here is what https://coveralls.io/builds/55632033 looks like to me:

Screenshot 2023-01-13 at 10 01 10 AM

I wonder if it just took a while to update? Can you check again?

The code coverage job / infrastructure predates my involvement in sqlparser-rs

@AugustoFKL
Copy link
Contributor

@alamb try to open one of the files, please.

Here's an example of one coverage of one of my projects
image

@alamb
Copy link
Contributor

alamb commented Jan 15, 2023

https://coveralls.io/builds/55632033/source?filename=src%2Fparser.rs

Appears to say "source not available"

Screenshot 2023-01-15 at 5 33 41 AM

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution -- this is looking close. My major question is about the hard coded list of introducers.

src/tokenizer.rs Outdated
Ok(Some(Token::make_word(&s, None)))

// https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
if ch == '_' && dialect_of!(self is MySqlDialect) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it would be nice to support this also in the GenericDialect for consistency with other vendor specific features, but I don't think it is required.

src/tokenizer.rs Outdated

// https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
if ch == '_' && dialect_of!(self is MySqlDialect) {
const INTRODUCERS: [&str; 4] = ["_latin1", "_binary", "_utf8", "_utf8mb4"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html suggests the value for the charset introducer can be any character set supported; Why are these 4 hard coded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point! I'll fix it.

Token::SingleQuotedString(_)
| Token::DoubleQuotedString(_)
| Token::HexStringLiteral(_)
if w.value.starts_with('_') =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Moved detection of Introduced String here. So I check it form Token::Word this seems to be the most generic solution. What do you think?

Copy link
Contributor Author

@mskrzypkows mskrzypkows Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it turned out to be the wrong solution. In MySQL, it's possible to run a query like this one:

select _col1 'str' from test_table; 

In this case, 'str' is just a column alias...
So, I need to check if the string introducer is one of the following stings: https://dev.mysql.com/doc/refman/8.0/en/charset-charsets.html with _ prefix.

The question is where to check it. I may bring back Token::StringIntroducer or check it here, only when we have string token in front of the introducer. Personally, I would stick with the second option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb What do you think, does it fit to the sqlparser-rs, to have such a check for mysql string introducers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry -- I am behind on sqlparser-rs reviews. I hope to catch up later this week

@alamb alamb changed the title MySQL Character Set Introducers Support MySQL Character Set Introducers Feb 17, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thank you @mskrzypkows and @AugustoFKL

@alamb
Copy link
Contributor

alamb commented Feb 17, 2023

I took the liberty of merging up from main to resolve a conflict and fixing the clippy failure

@alamb
Copy link
Contributor

alamb commented Feb 17, 2023

Thanks again @mskrzypkows and sorry for the delay

@alamb alamb merged commit 488e8a8 into apache:main Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants