-
Notifications
You must be signed in to change notification settings - Fork 469
Description
Today, if you try to compile a regex with an empty alternation, e.g., a||b
, then you'll get this error message:
alternations cannot currently contain empty sub-expressions
When I initially built the regex crate, I don't think I was clear on what an empty alternation meant, so I simply made them illegal. However, an empty alternation should have the same match semantics as an empty regex. That is, a||b
should match a
, b
or the empty string.
When I rewrote the regex-syntax crate, I specifically made sure to support empty alternations, which I believe were forbidden in the previous version of regex-syntax. The intent was to propagate that through the regex compiler. However, when I did that, I discovered that it did not implement the correct match semantics. Fixing it did not seem easy, so I simply made the compiler return an error if it found an empty alternate:
Lines 491 to 500 in 488fe56
if prev_entry == self.insts.len() { | |
// TODO(burntsushi): It is kind of silly that we don't support | |
// empty-subexpressions in alternates, but it is supremely | |
// awkward to support them in the existing compiler | |
// infrastructure. This entire compiler needs to be thrown out | |
// anyway, so don't feel too bad. | |
return Err(Error::Syntax( | |
"alternations cannot currently contain \ | |
empty sub-expressions".to_string())); | |
} |
Part of my plans for the future are to rethink a lot of the regex internals, and the compiler itself is at the top of that list. So I plan to tackle this problem when I rework the compiler.