-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: Dedented String Literals #3830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Jacob Lifshay <[email protected]>
In order to strip the first level of indentation, the ending quote is aligned to the `println!` call. | ||
|
||
```rust | ||
fn main() { | ||
println!(d" | ||
create table student( | ||
id int primary key, | ||
name text | ||
) | ||
"); | ||
^^^^ // common leading whitespace (will be removed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I generally like this proposal, I am not happy with this part of it. Specifically, I think people (including me) are going to want to write the closing quotation mark one level leftward of the amount of indentation that they want stripped, to make it line up with the outer construct.
I think it would be better to have it work like this:
-
Normally, the first graphic character on the first line of the indented string determines the amount of leading whitespace to be stripped. So, in the marked example, the
c
increate
would be made flush left, as would the closing parenthesis.In addition to letting people align the close quote with an outer construct, this rule also has the advantage that the parser knows how much whitespace to discard as soon as it scans that first graphic character; it doesn't have to wait until it scans the close quote.
-
It is an error if any subsequent line has a graphic character to the left of the first graphic character on the first line. (Empty lines and lines that contain only whitespace characters are allowed.) This rule applies before conversion of -escapes, so e.g. you cannot begin the first line with a hard tab and the second line with
\t
. -
If you want to preserve some but not all of the common whitespace, you can put a backslash at the position of the first horizontal space you want preserved:
fn main() { println!(d" \ create table student( id int primary key, name text ) "); }
If the character immediately after the \ is U+0020 (space), the backslash is counted as a space -- so in the example above, four spaces of indentation are preserved. If it is any other whitespace character, the backslash is not counted as a space -- so for example if all the indentation is being done with hard tabs
fn main() { ␉ println!(d" ␉ \␉ create table student( ␉ ␉ ␉ id int primary key, ␉ ␉ ␉ name text ␉ ␉ ) ␉ "); }
then the preserved indentation is a single hard tab, not a space followed by a tab.
It is allowed, but not required, to put matching backslashes on every line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not directly related to your comment, but it is about formatting
Note, the example I gave is not formatted correctly.
Specifically, if you have this:
fn main() {
println!("
create table student(
id int primary key,
name text
)
"
);
}
rustfmt
changes that to:
fn main() {
println!(
"
create table student(
id int primary key,
name text
)
"
);
}
Meanwhile, the example with dedented strings would be better formatted like this:
fn main() {
println!(
d"
create table student(
id int primary key,
name text
)
"
^^^^^^^^ // common leading whitespace (will be removed)
);
}
I'll go ahead and fix the formatting for all examples in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zackw I like that proposal as well, and I think it'd be likely to produce more readable code on balance. That would allow using the closing quote as a delimiter, and stacking it with the closing delimiters of the function, matching the opening indentation of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zackw How does this work if it's a raw string literal? Then the backslash cannot be used.
I prefer the indentation to be set by the closing quote - it eliminates the need for any escaping and is thus easier to work with IMO. One option to make the brackets line up would be:
fn main() {
println!(
d"
create table student(
id int primary key,
name text
)
"
);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work if it's a raw string literal?
Rawness takes precedence and the backslash is considered the first graphic character on the line. There isn't any way to make a raw dedented string be partially dedented; this is sufficiently exotic that I don't mind not supporting it.
I feel pretty strongly that having the indentation be set by the closing quote is bad, because it means that both human readers and the compiler cannot know how much indentation is gonna be stripped until they get to the end of the string. Additionally I think making this code layout do the least surprising thing -- which I think is "the word 'create' and the close paren are flush left" -- is very important.
fn main() {
println!(d"
create table student(
id int primary key,
name text
)
");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel pretty strongly that having the indentation be set by the closing quote is bad, because it means that both human readers and the compiler cannot know how much indentation is gonna be stripped until they get to the end of the string.
That's the case with your proposal too but actually worse, since you don't know where in the string the least indented line will be, so you have to check every line.
Additionally I think making this code layout do the least surprising thing -- which I think is "the word 'create' and the close paren are flush left" -- is very important.
One option (if we assume this layout continues to be the "blessed" rustfmt layout) would be to include the extra level of indentation as part of the spec, ie. always remove one level of indentation from the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any way to make a raw dedented string be partially dedented; this is sufficiently exotic that I don't mind not supporting it.
I find it problematic that a raw string literal is incapable of representing some strings, since that's generally why you'd use a raw string literal in the first place...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my proposal you do not have to check every line. The prefix to be stripped is always set by the first non-blank line of the d-string. If any subsequent (non-blank) line fails to begin with the same prefix it is an error.
Personally I'd be fine with dropping the entire partial-dedent mechanism, but it seems to be really important to @nik-rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth looking for advice on this for what indoc does (115M downloads). It does what @zackw suggests:
https://docs.rs/indoc/latest/indoc/#explanation
The following rules characterize the behavior of the indoc!() macro:
- Count the leading spaces of each line, ignoring the first line and any lines that are empty or contain spaces only.
- Take the minimum.
- If the first line is empty i.e. the string begins with a newline, remove the first line.
- Remove the computed number of spaces from the beginning of each line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of this approach is that it means that one cannot use d""
to print indented content.
As an example, consider code generation, be it a SQL query, or other language. You may have a piece of code which conditionally adds a clause/block in a larger piece, and that clause/block should be indented.
Using the indoc approach that's fundamentally impossible, unfortunately. Pity.
Co-authored-by: Sabrina Jewson <[email protected]>
Co-authored-by: Sabrina Jewson <[email protected]>
…at!` Co-authored-by: SabrinaJewson <[email protected]>
The choice for `d` to come before all other modifiers is not arbitrary. | ||
|
||
Consider `dbr` and all possible alternatives: | ||
|
||
1. `dbr`: dedented byte raw string | ||
1. `bdr`: byte dedented raw string | ||
1. `brd`: byte raw dedented string | ||
|
||
The first example reads in the most natural manner. The other two don't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d
and r
change how you process a string while b
and c
change the type. I feel like d
and r
should be grouped with each other.
Thinking over https://github.com/rust-lang/rfcs/pull/3830/files#r2136431112, I think a natural way of modeling this is operations being performed
d
orr
should come beforeb
orc
(is "before" on the left or right? I lean towards right like function calls / bind most tightly) as that is processed before things related to the type, like adding a null terminator- Unsure of implications for having
d
orr
coming first
So I think there is a serious reason for us to consider bdr
or brd
. I lean towards brd
as my mind for some reason feels like describing it as dedenting before treating it as raw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another argument in favor of [bc]?d?r?
(type, content_modification, string_delimiter_and_escapes):
The output type seems way more important to the one reviewing the code than the indentation mode/level.
Unsure of implications for having d or r coming first
One argument against brd
:
br###d"Hello World"###
bdr###"Hello World"###
It is visually harder to to see what is part of the string content. It's less noticeable with rd"Hello World"
because it doesn't have the #
delimiter, which currently behaves like brackets. At least in my mind I see r
more closely linked to when the string ends than to whether escape sequences are parsed, but that's probably just because I used it for that more often.
Granted: Due to the requirement of a newline after "
this is less relevant, but I think it's still better to keep all the modifier characters right next to each other as long as it's possible.
First,
Really? Is it user-friendly? let xxx = d"••
••••••••••••my new line
" As for me it is quite OK, but with your rule this is an error! Second,Right now So, maybe an additional 0th rule is needed
With such rule |
Could you yelp me understand why someone would use The main reason I could see for having content on the first line is along with why the last line can be over-indented, to turn an error into a If it were allowed though, you'd then have to decide how it interacts with dedenting. Is it ignored for counting what the dedented whitespace is? Also, will the fact the first newline cause confusion for people if they can optionally not have it? |
I think that's a good idea.
I do not think that's a good idea, since imo it would make it more confusing for no benefit. if you want a string literal to fit on one line, then don't use a dedented string. the compiler error message can say that. |
Co-authored-by: DragonDev1906 <[email protected]>
Co-authored-by: DragonDev1906 <[email protected]>
Thank you!
Ok, I agree - minimum benefits |
edit: this is not a problem. Hence I have added it to the RFC |
I really don't see how whitespaces only at quote line will complicate anything in future. Meta-data or some other info are not whitespaces only. |
@matthieu-m While I normally have a great deal of appreciation for your takes on things...
Were it to be implemented, my PR guidelines would require that regular strings and @zackw I find it easy to not forget the help string interword space if I just make it a policy that ends of lines must have them (i.e.
Same reason Rust allows various "weird" or "ugly" things that macros might emit... to minimize the number of special cases that the programmer or macro author needs to keep in mind. In this case, I could see myself flipping back and forth between the two forms during RAD/prototyping and find it frustrating that I have to add or remove a Think of it as akin to allowing a trailing |
d2b583a
to
995efe8
Compare
@VitWW and @programmerjake Thanks for the feedback, I've modified the RFC to allow trailing whitespace between the opening quote and the first EOL character |
Co-authored-by: Jacob Lifshay <[email protected]>
Co-authored-by: Jacob Lifshay <[email protected]>
*As long as the level of each indentation is consistent
|
||
- Increases complexity of the language. While it builds upon existing concepts, it is yet another thing for people to learn. | ||
|
||
# Rationale and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should include an alternative of using some kind of delimiter on each line to indicate where indentation happens. That could be something like #3830 (comment), or maybe something like:
something(d"
|CREATE TABLE
| something(
| a INT,
| b INT)
|" ) // ENDS with EOL. Put quote on line above to exclude eol
I agree with both the advantages, and disadvantages previously discussed with this, and I'm not sure whether the advantages outweigh the disadvantages, or vice versa. But I think it is worth mentioning in the RFC. I also feel pretty strongly that if we do go with the delimiter approach, it should be a single character, not two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a single character should be requirement for the delimiter. Today you can emulate Zig's multiline raw strings (which in zig is 2 characters //
):
macro_rules! text {
(#[doc=$first_line:literal] $(#[doc=$lines:literal])*) => {
concat!($first_line, $("\n", $lines),*)
};
}
let sql = text! {
///create table student(
/// id int primary key,
/// name text
///);
};
println!("{}", sql);
Which thanks to the editor support inside of the doc comments, feels perfectly convenient even with out multi-cursor/visual editing block. And that is with three prefix characters.
|
||
It is immediately more obvious which string belongs to which scope. | ||
|
||
## Closing quote controls the removed indentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the opening "quote", instead of, or perhaps even better in addition to the closing quote?
(in addition being a "belts & suspenders" approach)
I note that the example above actually already aligns opening and closing quotes, even though it's not necessary:
println!(
d"
create table student(
id int primary key,
name text
)
"
);
Where "opening quote" is taken to mean d"
here, and would be dr###"
if it came to it, no matter the order in which the qualifiers come in.
Having the indentation obvious at the top, on the top of at the bottom, makes it more obvious what the desired indentation level is, and, as part of the belts & suspenders approach, makes it easier to catch subtle misalignments: one is much more likely to mis-align one character (the closing quote) than two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more we talk about it, the more I'm sympathetic to just prefixing all the lines (with |
maybe, or >
, or perhaps letting the prefix character be chosen somehow). Then the opening quote can live in the most natural place (i.e., on the line before, in this example), and it becomes clear how to handle that trailing newline or lack thereof.
The downside is that when using an underpowered editor, it could be annoying to prefix the lines. But that's balanced out by the upside, that it becomes much more clear that we're in a string context and what the preserved indentation is, particularly outside of an editor or in an underpowered one.
That we tend to write doc comments by prefixing all lines with ///
is pretty good precedent here.
That this prefixing could allow for interspersing comments with the string lines, just as comments can be interspersed with doc comment lines, is also an interesting angle to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the opening "quote", instead of, or perhaps even better in addition to the closing quote?
Unacceptable to people like me who manually ran rustfmt
and cherry-picked proposed changes until use_small_heuristics = "Max"
landed in stable
, along with several tweaks to how things like assert*!
were formatted.
Same reasoning as why I'd rather manually format than accept {
getting put alone on its own line.
(I usually sum it up as "my monitors are not portrait-oriented".)
The downside is that when using an underpowered editor, it could be annoying to prefix the lines.
That alone should be enough to consider rejecting it, given Rust's design philosophy has historically been "Design for editors with syntax highlighting and nothing else. IDEs exist to improve upon an already good thing, not to clean up our messes."
...which is one of the reasons I've loved Rust's syntax since v1.0 when, until January 2024, I was running a relatively skeletal gVim on an Athlon II X2 270 from 2011.
...I still haven't really got the best loadout for editing assists, to be honest, Ryzen 5 7600 or not. It's mostly stuff like using pear-tree to keep the syntax highlights from flickering wildly whenever I type an opener for a token that changes parsing state, like a quotation mark, or using things like coc.nvim explicitly in "only update displayed checks on save" mode to reduce distraction.
(Part of gVim's draw for me is as a coding analogue to a distraction-free writing tool like FocusWriter. Very minimal UI, and configured along the principles behind CSS's @media (prefers-reduced-motion)
. Hell, the main reason I'm using gVim instead of plain Vim is because wavy underlines are less "loud" than the options available in a terminal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, though, it goes the other way too. Humans using underpowered editors or non-editor environments are also the most helped, when reading, by adding the prefixes. And since we spend more time reading code than writing it, and as we tend to have less control of our environment when reading as compared to writing, it's not so easy as you suggest to discard the option on this basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who was programming in Python for nearly 15 years prior to picking up Rust, and who finds the whitespace sensitivity of CoffeeScript problematic... as well finding all lightweight markup I've tried full of footguns when doing outline lists (Markdown, reStructuredText, AsciiDoc), I don't find the lack of a prefix particularly problematic.
As I mentioned, if this comes in with prefixes, I'll just stick to using a crate which replicates Python's prefixless implementation and disallow the Rust-native syntax in my rules for accepting PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was calling, in general, the editors of Github, Q&A websites and forums half-hearted not for their lack of features, but because they're not meant to be full-featured editors in the first place.
But the point is that people write rust in situations where you don't have a full featured editor. And Github, Q&A, wikis, forums, etc. are examples of that.
Maybe the benefits of a prefix are worth the difficulty of editing with an editor that doesn't have multiple cursors and/or block selection, but IMO the existence of those features in a significant subset of editors is sufficient to dismiss those difficulties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reiterating my stance here:
- if you need a smart editor that is capable of multi-prefixing lines,
- just make the editor slightly smarter so it simply surrounds the text with
"...\n"
and you're done. No need for a d-string in the first place.(possiblylet body_begin = ( " <body>\n" );
concat!
needed)
The argument for a prefix seems to be: everyone has a smart editor. To that I say: then use its smarts.
If we're going for readability / copy-pastability, then we cannot have any tokens in there that need to be stripped/added before use. (And especially not single double quotes that confuse syntax-highlighting editors.)
The following without prefixes makes sense (*) for readability and copy-pastability:
let query = d"
SELECT abc FROM def WHERE id IN (
1, 2, 3
);
");
When you cat
the source file, you can copy it straight to your SQL prompt. And vice versa.
If you need readability for your many partial-dedents (the keep_whitespace case), I think maybe you're better off with something like this:
if 1 {
if 1 {
let body_begin = partial_dedent!(prefix_char='|', "
| <body>
");
let p_begin = partial_dedent!(prefix_char='|', "
| <p>
");
But if we arrive at this point:
(*) Shouldn't we just use a (standardized and well thought out) macro in all cases?
If your argument then is: but macros cannot use auto-scope-interpolation to be used in format!
; maybe it's that that should be addressed?
(Edit)
Emoji poll-suggestion:
❤️ = macros all the way
🚀 = d-strings are nice, but forget about the partial-dedent
😒 = no, I want d-strings WITH partial-dedent, just need to figure out the syntax
👀 = whatever happens, I don't want a mandatory prefix on every line (especially not an unpaired quote)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emoji poll-suggestion:
I'm not sure I follow.
The existence of a "d-strings are nice, but forget about the partial-dedent" option suggests that I don't understand what "partial-dedent" is intended to mean, given that, by my understanding of the term, d-strings minus partial dedent already exists in forms like r#" ... "#
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full dedent:
·······ABC\n············DEF\n
-> ABC\n····DEF\n
Partial dedent:
·······ABC\n············DEF\n
-> ····ABC\n········DEF\n
(how many leading blanks do we keep?)
The r#"..."#
does no such dedenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. To me, "partial dedent" is vague enough that, context forgotten, it can refer to either of those behaviours.
I'm honestly not sure what I'd vote for. I've never needed more than full dedent as long as it's implemented in the Python textwrap.dedent
way ("common leading whitespace", not "whitespace of the first line" as in ············ABC\n·······DEF\n
→ ····ABC\nDEF\n
), and I've said that, if a marker character is mandatory, it's a deal-breaker for me, but this is Rust and not stopping at "good enough" for permanent additions to std
is part of what I love about Rust.
I want to be clear that I did *not* suggest taking the minimum. I suggested using the first line's indentation as the amount to strip, instead. (If some later non-blank line has less indentation than the first line, it should be an error.)
|
Same issue really. Sometimes you do want the first line to be indented. |
How about using the first line to indicate the indentation so let sql = d" \
create table student(
id int primary key,
name text
)
"; would prefix each line with two spaces. |
@ssendev I also prefer explicit indentation, but I prefer the last line with closed quote to to indicate the indentation let sql = d"
create table student(
id int primary key,
name text
)
••••"; // 4 spaces This is simpler then search all non-empty lines and calculate minimum quantity of whitespaces. |
@ssendev I think this has more downsides than using the ending quote:
And the biggest advantages?
To me that makes the "whitespaces on ending quote" approach far more appealing. |
Proposal to add dedented string literals to Rust of the form:
d"string"
.With the following:
Being equivalent to:
Rendered