Description
Original use-case sourced from this PR: #451
The current caching capability significantly improves runtimes for remote schemas when there is a single remote file to download, but does nothing to improve the case where there are refs to resolve. Refs are cached in-memory by referencing
, but discarded between runs.
For faster runs, check-jsonschema
should cache resolved refs on disk as well.
Some basic requirements:
- this must respect the
--no-cache
setting- probably the same object which is used for fetching remote schemas should be passed to the ref resolver
- filenames must be chosen such that there are no conflicts between different schemas (users won't be able to control filenames)
- if the new file-and-dir layout for these data conflicts with the existing cache dir layout, that needs resolution
- ideal: design a strategy to migrate cache data for the next 1-2 calendar years
- acceptable: ignore old cache data, provide a changelog note on how to clean it up
- the behavior here need to be tested
Note
A friend of mine suggested putting cache data into a DB (e.g. sqlite) when we talked about this, so that it could be annotated with richer metadata and structure. Although that might be a good idea longer term, I don't want to reach for that quite yet -- I think this can be solved with a good dir structure for now.
Here's one initial idea, for evaluation:
- each $ref is canonically named
{md5 of the absolute URI}.json
- in the
~/.cache/check_jsonschema/
dir, add a dir namedrefs/
(the schemas are in a dir nameddownloads/
, which now seems like a suboptimal name but will suffice) - ref resolution stores resolved refs in the
refs/
dir