Skip to content

Commit 93a90c5

Browse files
committed
Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases.
1 parent 03c28d2 commit 93a90c5

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

w3lib/html.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ def get_base_url(text, baseurl='', encoding='utf-8'):
281281
"""
282282

283283
text = to_unicode(text, encoding)
284+
text = remove_comments(text)
284285
m = _baseurl_re.search(text)
285286
if m:
286287
return moves.urllib.parse.urljoin(

0 commit comments

Comments
 (0)