Skip to content

Pipe symbol ("|") is not percent encoded #80

Closed
@odinplus

Description

@odinplus

Pipe symbol ("|") is in reserved symbols list in url.py https://github.com/scrapy/w3lib/blob/master/w3lib/url.py#L67 and is not percent encoded by safe_url_string which used by scrapy to download urls.
RFC mentioned in url.py https://www.ietf.org/rfc/rfc3986.txt doesn't contain "|" in reserved symbols:

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

And I've found a site (in top 20 of Alexa, possible using play framework) which has such links with pipes, and it is answering with http code 400 (bad request) if "|" is not percent encoded in url.
Is this a bug? How can I avoid it properly? For now I just removed "|" from url.py itself.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions