Description
Following up on #19194 and discussion with @aturon, I took a look at how things in the std::ascii
module are used in the Rust repository and in Servo.
The std::ascii::Ascii
type is a newtype of u8
that enforces (unless unsafe
code is used) that the value is in the ASCII range, similar to char
with u32
and the range of Unicode scalar values. [Ascii]
is naturally a string of bytes entirely in the ASCII range.
Using the type system like this to enforce data invariants is interesting, but in practice [Ascii]
is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)
Every single use of the Ascii
type that I’ve found was only to use the to_lowercase
or to_uppercase
method, then immediately convert back to u8
or char
.
Therefore, I suggest:
- Moving the
Ascii
type as well as theAsciiCast
,OwnedAsciiCast
,AsciiStr
, andIntoBytes
traits into a newascii
Cargo package on crates.io - Marking them as deprecated in
std::ascii
, and removing them at some point before 1.0 - Reworking the rest of the module to provide the functionality on
u8
,char
,[u8]
andstr
. Specifically:- Keep the
AsciiExt
andOwnedAsciiExt
traits. (Maybe rename them?) - Implement
AsciiExt
onchar
andu8
(in addition to the existing impls forstr
and[u8]
) - Add
is_ascii() -> bool
. Maybe onAsciiExt
? It’s mostly used onu8
andchar
, but it also makes sense onstr
and[u8]
. - Maybe
is_ascii_lowercase
,is_ascii_uppercase
,is_ascii_alphabetic
, oris_ascii_alphanumeric
could be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with.is_ascii() &&
and the correspondingUnicodeChar
method, which in most cases has an ASCII fast path. - I don’t think the remaining
Ascii
methods are valuable.is_digit
andis_hex
are identical toChar::is_digit(10)
andChar::is_digit(16)
.is_blank
,is_control
,is_graph
,is_print
, andis_punctuation
are never used.
- Keep the
How does this sound? I can help with the implementation work. Should this go through the RFC process?