-
Notifications
You must be signed in to change notification settings - Fork 814
Replacing lambda functions with regular functions in all datasets #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NivekT . LGTM! Feel free to land once the CI passes w.r.t datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry that I didn't have chance to look at this PR before landing. We need to replace local functions by global functions.
def _filepath_fn(_=None): | ||
return os.path.join(root, split + ".csv") | ||
|
||
def _modify_res(t): | ||
return int(t[0]), " ".join(t[1:]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use global function rather than local function here. Local functions are still non-picklable for pickle module.
def _filepath_fn(_=None): | ||
return os.path.join(root, _PATH) | ||
|
||
def _extracted_filepath_fn(_=None): | ||
return os.path.join(root, _EXTRACTED_FILES[split]) | ||
|
||
def _filter_fn(x): | ||
return _EXTRACTED_FILES[split] in x[0] | ||
|
||
def _modify_res(t): | ||
return int(t[0]), " ".join(t[1:]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Let me fix this with another PR. Thanks for catching it. |
I meant to add tests to make sure we we can dump and load dataset objects using pickle. so for instance below code give error: from torchtext.datasets import AG_NEWS
import pickle
dp = AG_NEWS(split='train')
pickle.dump(dp, open("temp.pkl",'w')) Error: AttributeError: Can't pickle local object 'AG_NEWS.._filepath_fn' And in our tests we could just add another unit test that would dump and load the datapipe.. |
Gotcha thanks for clarifying. Let me create a PR to add these tests to our datasets! |
Thanks @Nayef211 . One suggestion here would be to add a common test parameterized over all the datasets added to the repo instead of adding it to individual test suit :) |
Thanks for the advice, I'll try to add a parameterized test for this! |
Replacing lambda functions with regular functions in all datasets because lambda functions are not serializable by
pickle
and serialization is necessary for DataLoader's multiprocessing feature.If the changes are correct, it should not break any CI.
Fixes part of pytorch/data#397 (TorchData team plans to improve DataLoader as well, see the comments within the issue)
Fixes #1716