Skip to content

gh-135768: fix allowed/blocked IPv6 domains in http.cookiejar #135771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

LamentXU123
Copy link
Contributor

@LamentXU123 LamentXU123 commented Jun 20, 2025

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests.

@bedevere-app
Copy link

bedevere-app bot commented Jun 20, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

LamentXU123 and others added 5 commits June 21, 2025 01:19
@picnixz picnixz changed the title gh-135768: Support IPv6 in http.cookiejar gh-135768: fix allowed/blocked IPv6 domains in http.cookiejar Jun 20, 2025
@LamentXU123
Copy link
Contributor Author

Please add tests.

Done.

@LamentXU123 LamentXU123 requested a review from picnixz June 20, 2025 18:18
@LamentXU123
Copy link
Contributor Author

In a follow-up PR I think we can refactor the tests because there's lot of duplicated code but not in this one yet.

I'll work on this since this PR is done.

@LamentXU123 LamentXU123 requested a review from picnixz June 21, 2025 12:01
@picnixz
Copy link
Member

picnixz commented Jun 21, 2025

I'll work on this since this PR is done.

No, wait until this is merged. We may need a new issue but this is not pressing right now. Focus on this one first.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, be more careful with your choices. First, answer my question about .local (why are we adding it in the first place?)

Second, detecting whether something is an IP or not needs to be more robust. For instance, we could block domains without using [] notation and only ::. Indeed, an IPv6 is not enclosed by [].

Finally, it would be good if we had more RFC-backed choices.

Comment on lines +546 to +552
if text.startswith('[') and text.endswith(']'):
try:
IPv6Address(text.removeprefix('[').removesuffix(']'))
except ValueError:
return False
else:
return False # not a IPv6 address in []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are IPv6 addressed not surrounded [] rejected? when making a request, we would use http://[::1] for instance, but we could decide to reject the domain ::1 itself. Please be more careful. Instead, we should make it allow both valid IPv6 addresses as well as IPv6 addresses enclosed in '[]'. But then, we need to be careful with the matching. Should [::1] be matched against ::1 or [::1] only?

Try to look at the RFC as well to see if there are more information about IPv6 domains.

self.assertFalse(is_HDN(""))
self.assertFalse(is_HDN("."))
self.assertFalse(is_HDN(".foo.bar.com"))
self.assertFalse(is_HDN("..foo"))
self.assertFalse(is_HDN("foo."))

def test_is_ip(self):
self.assertTrue(is_ip('[::1]'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, it's not an IPv6. It's a netloc with an IPv6 inside it. We should perhaps name is_ip_like instead and also treat ::1 as a valid IP (for blocked domains). I don't have an answer to that question so I'd prefer to know if there is some recommendations done by RFCs.

@bedevere-app
Copy link

bedevere-app bot commented Jun 21, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@LamentXU123
Copy link
Contributor Author

Please, answer to this. Why should we avoid adding '.local'? until this is answered, I won't review it more.

because using .local in IPv6 doesn't make any sense. It will change the input IPv6 to something like [::1].local.

The additional .local is designed for HDNs not IP addresses. Before this PR. We assumed that there are only two types of input: IPv4 addr and HDN. IPv4 indeed contains dots. Some abnormal HDN do not contain dot, than we are adding a .local at the end of it. This is the expected case.

But, IPv6 is a edge case. It do not contains dot. And it is not a HDN. Adding .local after a IPv6 address is meaningless and, the most important thing, will cause error in logic when we are blocking hosts. Since the IPv6 is [::1].local and the input is [::1], every IPv6 is not blocked.

It is the .local at the end of the address that cause the bug.

@picnixz
Copy link
Member

picnixz commented Jun 21, 2025

My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in set_ok_domain and I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().

@LamentXU123
Copy link
Contributor Author

My question was more: why did we add .local for IPv4 in the first place. There is a bunch of logic involving local domains in set_ok_domain and I believe this also needs an upgrade if we're handling IPv6 now. FQDN and IPv4 look the same but IPv6 is entirely different. Therefore, we should be also careful with the changes that could affect the logic of set_ok_domain().

I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:

If a host name
contains no dots, the effective host name is that name with the
string .local appended to it.

And the implementation is correct. IPv4 contains dots so .local will not be added. The .local will only be added to host name without dots. like http://acme, the requested host name will be changed to acme.local, but http://127.0.0.1 will not

so sorry in advanced if I misunderstand the code

@LamentXU123
Copy link
Contributor Author

you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:

import urllib.request
from http.cookiejar import CookieJar, DefaultCookiePolicy
class FakeResponse:
    def __init__(self, headers=[], url=None):
        """
        headers: list of RFC822-style 'Key: value' strings
        """
        import email
        self._headers = email.message_from_string("\n".join(headers))
        self._url = url
    def info(self): return self._headers
pol = DefaultCookiePolicy(
            rfc2965=True, blocked_domains=[])
c = CookieJar(policy=pol)
headers = ["Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/"]
pol.set_blocked_domains(['acme'])
req = urllib.request.Request("http://acme")
res = FakeResponse(headers, "http://acme")
c.extract_cookies(res, req)

# And

c.clear()
pol.set_blocked_domains(['127.0.0.1'])
req = urllib.request.Request("http://127.0.0.1")
res = FakeResponse(headers, "http://127.0.0.1")
c.extract_cookies(res, req)

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 21, 2025

And for the issue of [::1] and ::1. I quote this from RFC 2396 3.2.2

The host is a domain name of a network host, or its IPv4 address as a
set of four decimal digit groups separated by ".". Literal IPv6
addresses are not supported.

I chose to use only [::1] because of this statement, hum..... I'll find some clearer statements now

@picnixz
Copy link
Member

picnixz commented Jun 21, 2025

I might be stupid here, but I think we are actually not adding .local for IPv4. RFC 2965 quotes:

Oups my bad. Ok, so this is an RFC thing. Now, is there a similar RFC for IPv6?
What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).

(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).

so sorry in advanced if I misunderstand the code

no worries!

you can locally test this by adding breakpoints in func eff_request_host to see what is returned of these two code:

Sorry but I don't have time for that.

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 21, 2025

(It appears that there is a lot of code that is IPv-4 only but if we're adding support to IPv6, we need to make it work everywhere actually).

I think we could treat them in seperate issues and PRs, I think we could just focus on fixing allowed/blocked IPv6 domains in http.cookiejar now, and in the future we need to change a lot on that(I think I can handle this), including tests, docs, and logic everywhere. But yeah, I think we could just make sure the blocking and allowing work for IPv6 in this PR. I will make some changes based on what you've commented now.

Sorry but I don't have time for that.

I really thank you so much for your time. My coding skills and experiences have improved by all this.

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 21, 2025

And for the [::1] issue. Well IMO RFC states that Literal IPv6 addresses are not supported as host name. So to be strict, only [::1] is a legel hostname while ::1 is a legel IPv6 address. I think we could just support [::1] in any case.

But hey, users are likely to use ::1 if they don't read the docs very carefully, I actually don't know how to deal with this problem very properly. I need your suggestions tho @picnixz. Thanks!

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 21, 2025

No worries of set_ok_domain. This PR do not change any logic of it. We can change the logic in the future PRs. I think we could just focus on allowed/blocked issues in this PR now.

The only related change of other funcs is eff_request_host, which is used in set_ok_domain. But the only thing it did is just to treat IPv6 as a IP address and not adding .local on it. Since the original one don't support IPv6, this PR indeed also don't supprt IPv6 in the logic of set_ok_domain. We could take care of it later.

(There are indeed a hole bunch of things to change if we are going to support IPv6 entirely)

What I meant with set_ok_domain is that we have some 'strict' policy and under the non-strict policy parsing, I don't know whether [::1] should be treated the same as ::1 (and whether strict policy parsing means that [::1] can only be blocked if we blocked it explicitly, not just ::1) (see https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.DefaultCookiePolicy.DomainStrictNonDomain).

I'll take a look at this after this PR is merged.

@LamentXU123 LamentXU123 requested a review from picnixz June 21, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants