diff --git a/src/tokens.md b/src/tokens.md index a66357ea5..2ad2a9dd8 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -21,13 +21,24 @@ evaluated (primarily) at compile time. | | Example | `#` sets | Characters | Escapes | |----------------------------------------------|-----------------|------------|-------------|---------------------| -| [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) | -| [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) | +| [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | +| [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | | [Raw](#raw-string-literals) | `r#"hello"#` | `0...` | All Unicode | `N/A` | | [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | | [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | `0...` | All ASCII | `N/A` | +#### ASCII escapes + +| | Name | +|---|------| +| `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) | +| `\n` | Newline | +| `\r` | Carriage return | +| `\t` | Tab | +| `\\` | Backslash | +| `\0` | Null | + #### Byte escapes | | Name | @@ -74,12 +85,39 @@ evaluated (primarily) at compile time. #### Character literals +> **Lexer** +> CHAR_LITERAL : +>    `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` +> +> QUOTE_ESCAPE : +>    `\'` | `\"` +> +> ASCII_ESCAPE : +>       `\x` OCT_DIGIT HEX_DIGIT +>    | `\n` | `\r` | `\t` | `\\` | `\0` +> +> UNICODE_ESCAPE : +>    `\u{` ( HEX_DIGIT `_`\* )1..6 `}` + A _character literal_ is a single Unicode character enclosed within two `U+0027` (single-quote) characters, with the exception of `U+0027` itself, which must be _escaped_ by a preceding `U+005C` character (`\`). #### String literals +> **Lexer** +> STRING_LITERAL : +>    `"` ( +>       ~[`"` `\` _IsolatedCR_] +>       | QUOTE_ESCAPE +>       | ASCII_ESCAPE +>       | UNICODE_ESCAPE +>       | STRING_CONTINUE +>    )\* `"` +> +> STRING_CONTINUE : +>    `\` _followed by_ \\n + A _string literal_ is a sequence of any Unicode characters enclosed within two `U+0022` (double-quote) characters, with the exception of `U+0022` itself, which must be _escaped_ by a preceding `U+005C` character (`\`). @@ -120,6 +158,14 @@ following forms: #### Raw string literals +> **Lexer** +> RAW_STRING_LITERAL : +>    `r` RAW_STRING_CONTENT +> +> RAW_STRING_CONTENT : +>       `"` ( ~ _IsolatedCR_ )* (non-greedy) `"` +>    | `#` RAW_STRING_CONTENT `#` + Raw string literals do not process any escapes. They start with the character `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a `U+0022` (double-quote) character. The _raw string body_ can contain any sequence @@ -149,6 +195,17 @@ r##"foo #"# bar"##; // foo #"# bar #### Byte literals +> **Lexer** +> BYTE_LITERAL : +>    `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` +> +> ASCII_FOR_CHAR : +>    _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `/`, \\n, \\r or \\t +> +> BYTE_ESCAPE : +>       `\x` HEX_DIGIT HEX_DIGIT +>    | `\n` | `\r` | `\t` | `\\` | `\0` + A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` range) or a single _escape_ preceded by the characters `U+0062` (`b`) and `U+0027` (single-quote), and followed by the character `U+0027`. If the character @@ -158,6 +215,13 @@ _number literal_. #### Byte string literals +> **Lexer** +> BYTE_STRING_LITERAL : +>    `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )\* `"` +> +> ASCII_FOR_STRING : +>    _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `/` _and IsolatedCR_ + A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_, preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and followed by the character `U+0022`. If the character `U+0022` is present within @@ -183,6 +247,17 @@ following forms: #### Raw byte string literals +> **Lexer** +> RAW_BYTE_STRING_LITERAL : +>    `br` RAW_BYTE_STRING_CONTENT +> +> RAW_BYTE_STRING_CONTENT : +>       `"` ASCII* (non-greedy) `"` +>    | `#` RAW_STRING_CONTENT `#` +> +> ASCII : +>    _any ASCII (i.e. 0x00 to 0x7F)_ + Raw byte string literals do not process any escapes. They start with the character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The