Closed
Description
Pipe symbol ("|") is in reserved symbols list in url.py https://github.com/scrapy/w3lib/blob/master/w3lib/url.py#L67 and is not percent encoded by safe_url_string which used by scrapy to download urls.
RFC mentioned in url.py https://www.ietf.org/rfc/rfc3986.txt doesn't contain "|" in reserved symbols:
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
And I've found a site (in top 20 of Alexa, possible using play framework) which has such links with pipes, and it is answering with http code 400 (bad request) if "|" is not percent encoded in url.
Is this a bug? How can I avoid it properly? For now I just removed "|" from url.py itself.