-
Notifications
You must be signed in to change notification settings - Fork 630
Support MySQL Character Set Introducers #788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 4206535525
💛 - Coveralls |
src/ast/mod.rs
Outdated
@@ -696,6 +698,7 @@ impl fmt::Display for Expr { | |||
Expr::Collate { expr, collation } => write!(f, "{} COLLATE {}", expr, collation), | |||
Expr::Nested(ast) => write!(f, "({})", ast), | |||
Expr::Value(v) => write!(f, "{}", v), | |||
Expr::IntroducedString { introducer, value } => write!(f, "{} {}", introducer, value), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mskrzypkows is this supposed to always be spaced?
In simple strings, it seems that there's no space:
SELECT _latin1'abc';
While in strings like a binary string does have spaces:
SELECT _latin1 X'4D7953514C';
Am I missing something?
OBS: both examples are from your reference: https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space doesn't matter, it may be both with or without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, sorry the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@alamb would it be possible to access the reports of coverage for this and other PR's?
It feels really confusing sometimes, since we have only the percentage and files, but not exactly what is not covered. I know the report contains the exact uncovered changed lines.
It would be fine with me -- how can I do it? Can you go to https://coveralls.io/builds/55632033 ? |
I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week |
1 similar comment
I am sorry I am a bit behind on reviews / merging in sqlparser-rs. I hope to be able to catch up later this week |
@alamb not perfectly sure. For some reason for me, the coverage is like this: Is it the same for you? |
@AugustoFKL here is what https://coveralls.io/builds/55632033 looks like to me: I wonder if it just took a while to update? Can you check again? The code coverage job / infrastructure predates my involvement in sqlparser-rs |
@alamb try to open one of the files, please. |
https://coveralls.io/builds/55632033/source?filename=src%2Fparser.rs Appears to say "source not available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution -- this is looking close. My major question is about the hard coded list of introducers.
src/tokenizer.rs
Outdated
Ok(Some(Token::make_word(&s, None))) | ||
|
||
// https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html | ||
if ch == '_' && dialect_of!(self is MySqlDialect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think it would be nice to support this also in the GenericDialect for consistency with other vendor specific features, but I don't think it is required.
src/tokenizer.rs
Outdated
|
||
// https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html | ||
if ch == '_' && dialect_of!(self is MySqlDialect) { | ||
const INTRODUCERS: [&str; 4] = ["_latin1", "_binary", "_utf8", "_utf8mb4"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html suggests the value for the charset introducer can be any character set supported; Why are these 4 hard coded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good point! I'll fix it.
Token::SingleQuotedString(_) | ||
| Token::DoubleQuotedString(_) | ||
| Token::HexStringLiteral(_) | ||
if w.value.starts_with('_') => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Moved detection of Introduced String here. So I check it form Token::Word
this seems to be the most generic solution. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it turned out to be the wrong solution. In MySQL, it's possible to run a query like this one:
select _col1 'str' from test_table;
In this case, 'str'
is just a column alias...
So, I need to check if the string introducer is one of the following stings: https://dev.mysql.com/doc/refman/8.0/en/charset-charsets.html with _
prefix.
The question is where to check it. I may bring back Token::StringIntroducer
or check it here, only when we have string token in front of the introducer. Personally, I would stick with the second option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb What do you think, does it fit to the sqlparser-rs, to have such a check for mysql string introducers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry -- I am behind on sqlparser-rs reviews. I hope to catch up later this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @mskrzypkows and @AugustoFKL
I took the liberty of merging up from main to resolve a conflict and fixing the clippy failure |
Thanks again @mskrzypkows and sorry for the delay |
https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
Characters set introducers: _latin1, _binary, _utf8, _utf8mb4