-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Let's say I have this extremely minimal bit of JSON-LD to be expanded with pyld:
>>> import pyld
>>> d = {
... "@context": "https://schema.org",
... "@type":"Dataset",
... "@id":"http://localhost:5000/collections/obs",
... "url":"http://localhost:5000/collections/obs"
... }
>>> pyld.expand(d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'pyld' has no attribute 'expand'
>>> pyld.jsonld.expand(d)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyld/documentloader/requests.py", line 72, in loader
'document': response.json()
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyld/context_resolver.py", line 143, in _fetch_context
remote_doc = jsonld.load_document(url,
File "/usr/local/lib/python3.8/dist-packages/pyld/jsonld.py", line 6583, in load_document
remote_doc = options['documentLoader'](url, options)
File "/usr/local/lib/python3.8/dist-packages/pyld/documentloader/requests.py", line 100, in loader
raise JsonLdError(
pyld.jsonld.JsonLdError: ('Could not retrieve a JSON-LD document from the URL.',)
Type: jsonld.LoadDocumentError
Code: loading document failed
Cause: Expecting value: line 1 column 1 (char 0) File "/usr/local/lib/python3.8/dist-packages/pyld/documentloader/requests.py", line 72, in loader
'document': response.json()
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/pyld/jsonld.py", line 163, in expand
return JsonLdProcessor().expand(input_, options)
File "/usr/local/lib/python3.8/dist-packages/pyld/jsonld.py", line 870, in expand
expanded = self._expand(active_ctx, None, document, options,
File "/usr/local/lib/python3.8/dist-packages/pyld/jsonld.py", line 2302, in _expand
active_ctx = self._process_context(
File "/usr/local/lib/python3.8/dist-packages/pyld/jsonld.py", line 3049, in _process_context
resolved = options['contextResolver'].resolve(active_ctx, local_ctx, options.get('base', ''))
File "/usr/local/lib/python3.8/dist-packages/pyld/context_resolver.py", line 58, in resolve
resolved = self._resolve_remote_context(
File "/usr/local/lib/python3.8/dist-packages/pyld/context_resolver.py", line 108, in _resolve_remote_context
context, remote_doc = self._fetch_context(active_ctx, url, cycles)
File "/usr/local/lib/python3.8/dist-packages/pyld/context_resolver.py", line 148, in _fetch_context
raise jsonld.JsonLdError(
pyld.jsonld.JsonLdError: ('Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, or more than one HTTP Link Header was provided for a remote context.',)
Type: jsonld.InvalidUrl
Code: loading remote context failed
Details: {'url': 'https://schema.org', 'cause': JsonLdError('Could not retrieve a JSON-LD document from the URL.')}
If I susbtitute "https://schema.org" with "https://schema.org/docs/jsonldcontext.jsonld", with the code otherwise unchanged, it will correctly print (as I expected):
>>> [{'@id': 'http://localhost:5000/collections/obs', '@type': ['http://schema.org/Dataset'], 'http://schema.org/url': [{'@id': 'http://localhost:5000/collections/obs'}]}]
However, that then seems to mess up other parsers, including the Google Structured Data Testing Tool.
The root issue seems to be with pyld's remote fetching of contexts, in that "https://schema.org/" does not now have an application/ld+json
content-type, instead opting to use Link
header with rel=alternate
and type=application/ld+json
. It seems that pyld needs to be updated to handle that case:
$ curl -I https://schema.org/
HTTP/2 200
access-control-allow-credentials: true
access-control-allow-headers: Accept
access-control-allow-methods: GET
access-control-allow-origin: *
access-control-expose-headers: Link
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
date: Fri, 19 Jun 2020 03:17:19 GMT
expires: Fri, 19 Jun 2020 03:27:19 GMT
etag: "G8zMyg"
x-cloud-trace-context: d2d5c536d73ce1590813f8e1018a2ad6
content-type: text/html
server: Google Frontend
content-length: 5100
age: 73
cache-control: public, max-age=600
alt-svc: h3-28=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
If you do curl https://schema.org/ -H "Accept: application/ld+json"
you will still get back an HTML response.
Perhaps the cleanest way to implement this would be to check if a non-JSON-LD response is recieved, and if so, to look for an appropriate Link
header and then make a request there.