Skip to content

Take TORCH_HOME env variable into account while setting the cache dir #1741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2022

Conversation

parmeet
Copy link
Contributor

@parmeet parmeet commented May 25, 2022

Addresses comment #1740 (comment)

_TORCH_HOME = os.getenv("TORCH_HOME")
if _TORCH_HOME is None:
_TORCH_HOME = "~/.cache/torch" #default
_CACHE_DIR = os.path.expanduser(os.path.join(_TORCH_HOME, "text"))
Copy link
Contributor

@mthrok mthrok May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be as simple as

_CACHE_DIR = os.path.normpath(
    os.path.join(torch.hub.get_dir(), "..", "text"))

torch.hub.get_dir will do the check for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I wonder if someone use the set_dir from hub, then wouldn't that path would deviate from $TORCH_HOME? In other words do we want to give preference to torch.hub root directly over $TORCH_HOME for domains?

Another question for us now is if TORCH_HOME is not set do we want to default to torch.hub root :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning I thought the proposal was to piggy-back on PyTorch's default directory. Which has somewhat elaborated, but reasonable way of resolving the default. My view here is that I would avoid re-inventing the wheel so my suggestion was to reuse the implementation from PyTorch.

However, I get the sense that your view is that only TORCH_HOME should be considered and not necessarily how Torch Hub is resolving default path. I see that is another valid view and I don't have basis to say which is better.

But fundamentally, this kind of thinking is what happens when trying to incorporate yourself into something larger. I think simply using ~/.cache/torchtext would avoid this kind of unnecessary design thinking, and allow users have complete and precise control over where it should be, without worrying how these configurations mix with PyTorch's configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, let me stamp on the PR to unblock you.

Copy link
Member

@NicolasHug NicolasHug May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_torch_home() might be what you need for this.

https://github.com/pytorch/pytorch/blob/089203f8bc71c64688def521284e32aee3fb5989/torch/hub.py#L113-L118

It's private, but since we also maintain torchhub, it's not a problem. The way it's currently done in this PR is fine too, but wouldn't respect the XDG_CACHE_HOME env variable.

@parmeet parmeet merged commit f2fdae2 into pytorch:main May 26, 2022
@parmeet parmeet deleted the torch_home branch May 26, 2022 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants