-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-133546: Make re.Match
a well-rounded Sequence
type
#133549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1378,6 +1378,27 @@ when there is no match, you can test whether there was a match with a simple | |||
if match: | |||
process(match) | |||
|
|||
Match objects are proper :class:`~collections.abc.Sequence` types. You can access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true with this PR, Sequence has a number of other requirements (e.g. an index
and count
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks! I added index
and count
by adapting the implementation of tuple.index
and tuple.count
. I also updated the unit test to cover all Sequence
mixin methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage is sometimes insufficient and sometimes redundant with existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional feedback. For the test failures, let's add some assertWarns
. I don't think we should make some additional checks in C for the type of the operands (the issue is when you compare the input (which is a bytes) with the matched item (which are strings), hence a BytesWarning). So, instead, let's just expect the warning.
By the way, could you avoid using @picnixz
in your commits as I'm getting pinged? TiA.
Oh, sorry for the ping. Is it okay if I just leave out the cases with bytes as input? I can't seem to trigger the warning locally, even with |
I'm not sure consensus exists for this on either the issue or the linked Discourse thread... A |
It's a bit quiet, but I guess there's not much to talk about. Serhiy Storchaka brought up a valuable point about the fact that back in the day there was some debate over the possible semantics of This PR is nothing more than painting by numbers. The API was just waiting for someone to spend a couple of hours to fill in the blanks. So it's not particularly exciting, and I doubt the topic will attract a lot of attention on Discourse, but personally I think this is the kind of polish and attention to detail that makes Python feel like home. And it looks like some other people and a core dev are on board with it too. That's enough for me to get started. Plus, working code can help ground discussions. That said, I'm still new to CPython's contributing process, so let me know if I should've done something differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After staring at this PR once again, I think we don't need the full-fledged sequence API but we could have some of it:
- It's useful to have
case
support. - It's useful to have deconstruction via
_, a, b = match
- It may be useful to have
len
as otherwise we need to extract the groups and dolen(m.groups())
. - I can't find use cases of
.index
. While it could give back the index of the group that matches what we pass, it could be more useful if we had the name of the group when possible. IOW,.index
is a bit ambiguous as we can access groups by name (if they exist) or by index.
So, maybe for a first iteration, we can drop .index
and .count
? they don't seem that useful to me. We wouldn't have a real sequence, but at least we would have a more useful interface.
Finally, the docs for .group()
(which is meant to be equivalent to __getitem__
) say:
If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised
IOW, we don't support negative groups but we support negative access and this is inconsistent with the sequence contract I think.
What I find useful though is the possibility to use match m:
and deconstruction.
I agree I don't think that But in generic contexts, a lot of code out there will assert that the object is a proper I often wonder if Python could have originally introduced As you're suggesting, the alternative could be to only add Note that for Regarding |
In this case, I think it's better not change it and abandon the proposal. Isn't it possible to just implement the destructuring & match support without the sequence? I still don't think it's useful to have more methods than necessary.
The question we should ask ourselves is rather: why would they use match as a regular sequence? it's not really a sequence IMO. the current expected semantics are "not a sequence" or am I wrong? I would hold this until some other core dev chimes in to agree with me that we shouldn't add useless methods, reject what I said, or propose something else. I don't know if it's possible to only support destructing and/or matches. |
Yes, adding
It's been teetering on the edge of being a sequence for a decade with the introduction of the When it comes to mixin methods, they're a pain point that's recurrent throughout all Python code. Even objects like And even then, while I don't see So at its core it's a perfectly legitimate > let m = "abc".match(/(a)(b)(c)/)
> console.log(m instanceof Array)
true
> m.concat(['hello'])
[ 'abc', 'a', 'b', 'c', 'hello' ]
This is a trade-off, we add new features and tackle some long overdue API polish, but this comes at the cost of a slight inconsistency between |
Actually, |
all observable behaviors of your system / will be depended on by somebody I think with the current opposition already noted and the compatability break here, better to close this PR and consider a new approach of just adding pattern matching support. Before you spend time working on an implementation, though, please check in the issue or on Discourse that there is support for this idea. A |
re.Match
a well-roundedSequence
type #133546📚 Documentation preview 📚: https://cpython-previews--133549.org.readthedocs.build/