-
Notifications
You must be signed in to change notification settings - Fork 814
Commit 31b8aaa


* include pytorch 1.5.0-rc1 for CI test
* bump up the version
* Set up ShipIt
fbshipit-source-id: bb7d2eb52240c7223b57c3c9624e61d116e77e39
* Re-sync with internal repository (#749)
* 20200429 pytorch/text import
Summary: [20:45:34: cpuhrsch@devvm3140 pytorch]$ ./fb_build/import_text.sh
Reviewed By: pbelevich
Differential Revision: D21320577
fbshipit-source-id: ac2148b9f0d58e5538443c879845bfb4f6ca7202
* 20200430 torchtext import script to include additional meta files
Summary: ./fb_build/import_text.sh
Reviewed By: zhangguanheng66
Differential Revision: D21343124
fbshipit-source-id: c08ecad2cc6f439fa40130aeaf91383be9403fe8
* torchtext flake8, github, travis metafiles
Summary: See title
Reviewed By: pbelevich
Differential Revision: D21344211
fbshipit-source-id: a8bcf7f3ab9bb2c2853e27f612e82caa341d3651
* Import torchtext 20200520 and update build
Summary: Import torchtext up to #786
Reviewed By: cpuhrsch
Differential Revision: D21483116
fbshipit-source-id: bc8ab38db9dc9ce4a8734ca8ea991c20e4ef0882
* Import torchtext 20200528
Summary:
Import up to #798
Addresses T67599333
Reviewed By: zhangguanheng66
Differential Revision: D21764935
fbshipit-source-id: f44d1db637799f2e95f420a8099fbf19545c7cbd
* 20200604 torchtext github import
Summary: Import from github master
Reviewed By: zhangguanheng66
Differential Revision: D21886238
fbshipit-source-id: a8f098e299466dd1701fe7ceb6a97c2a2fc54b9d
* Import torchtext 20200605
Summary: Import from github master
Reviewed By: zhangguanheng66
Differential Revision: D21907519
fbshipit-source-id: f22370d97796da5f2cb9f76f506c80f18fefea7f
* Back out "Import torchtext 20200605"
Summary: Original commit changeset: f22370d97796
Reviewed By: zhangguanheng66
Differential Revision: D21964222
fbshipit-source-id: c316836596fc3e232e63abc59e172f237b551cc5
* Import torchtext 2020/06/22
Summary: Import from github torchtext/master
Reviewed By: zhangguanheng66, cpuhrsch
Differential Revision: D22168183
fbshipit-source-id: 7d96ade64f18942d9bd19437011be2f65f0b2a5e
* Fix torch.testing._internal module not found
Reviewed By: Nayef211
Differential Revision: D22315715
fbshipit-source-id: 6b8b8544b0aa458cf5e7e9ca380d0dc85c98189f
* Import torchtext 2020/07/07
Summary: Import from github torchtext/master
Reviewed By: cpuhrsch
Differential Revision: D22420576
fbshipit-source-id: 4d2c19d7f1db8f698894ca406c1c44b2ad8e0506
* remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
* remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
* Import torchtext 2020/07/21
Summary: Import from github torchtext/master
Reviewed By: zhangguanheng66
Differential Revision: D22641140
fbshipit-source-id: 8190692d059a937e25c5f93506581086f389c291
* Remove .python3 markers
Reviewed By: ashwinp-fb
Differential Revision: D22955630
fbshipit-source-id: f00ef17a905e4c7cd9196c8924db39f9cdfe8cfa
* Import torchtext 2020/08/06
Summary: Import from github torchtext/master
Reviewed By: zhangguanheng66
Differential Revision: D22989210
fbshipit-source-id: 083464e188b758a8746123f4dd2197cc7edc4bc4
* Import torchtext 2020/08/18
Summary: Import from github torchtext/master
Reviewed By: cpuhrsch
Differential Revision: D23190596
fbshipit-source-id: 1568a25a5bd6431bcef3c6539f64a3ab1f5bccd7
* Import torchtext from 8aecbb9
Reviewed By: hudeven
Differential Revision: D23451795
fbshipit-source-id: 73e6130c16716919c77862cef4ca4c8048428670
* Import torchtext 9/4/2020
Reviewed By: Nayef211
Differential Revision: D23539397
fbshipit-source-id: 88dce59418a3071cbc9e944cf0a4cf2117d7d9f7
* Import github torchtext on 9/9/2020
Reviewed By: cpuhrsch
Differential Revision: D23616189
fbshipit-source-id: 365debc987326145eead7456ed48517fe55cac96
* Add property support for ScriptModules (#42390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390
**Summary**
This commit extends support for properties to include
ScriptModules.
**Test Plan**
This commit adds a unit test that has a ScriptModule with
a user-defined property.
`python test/test_jit_py3.py TestScriptPy3.test_module_properties`
Test Plan: Imported from OSS
Reviewed By: eellison, mannatsingh
Differential Revision: D22880298
Pulled By: SplitInfinity
fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560
* sync with OSS torchtext 9/15/20
Reviewed By: cpuhrsch
Differential Revision: D23721167
fbshipit-source-id: 13b32091c422a3ed0ae299595d69a7afa7136638
* Import Github torchtext on 9/28/2020
Reviewed By: cpuhrsch
Differential Revision: D23962265
fbshipit-source-id: 0d042878fe9119aa725e982ab7d5e96e7c885a59
* Enable @unused syntax for ignoring properties (#45261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45261
**Summary**
This commit enables `unused` syntax for ignoring
properties. Inoring properties is more intuitive with this feature enabled.
`ignore` is not supported because class type properties cannot be
executed in Python (because they exist only as TorchScript types) like
an `ignored` function and module properties that cannot be scripted
are not added to the `ScriptModule` wrapper so that they
may execute in Python.
**Test Plan**
This commit updates the existing unit tests for class type and module
properties to test properties ignored using `unused`.
Test Plan: Imported from OSS
Reviewed By: navahgar, Krovatkin, mannatsingh
Differential Revision: D23971881
Pulled By: SplitInfinity
fbshipit-source-id: 8d3cc1bbede7753d6b6f416619e4660c56311d33
* Import Github torchtext on 10/11/2020
Reviewed By: cpuhrsch
Differential Revision: D24242037
fbshipit-source-id: 605d81412c320373f1158c51dbb120e7d70d624d
* make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API (#47322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47322
Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API.
I migrated all call sites that used the legacy dispatcher registration API (RegisterOperators()) to use the new API (TORCH_LIBRARY...). I found all call-sites by running `fbgs RegisterOperators()`. This includes several places, including other OSS code (nestedtensor, torchtext, torchvision). A few things to call out:
For simple ops that only had one registered kernel without a dispatch key, I replaced them with:
```
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName", fn_name);
}
```
For ops that registered to a specific dispatch key / had multiple kernels registered, I registered the common kernel (math/cpu) directly inside a `TORCH_LIBRARY_FRAGMENT` block, and registered any additional kernels from other files (e.g. cuda) in a separate `TORCH_LIBRARY_IMPL` block.
```
// cpu file
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName(schema_inputs) -> schema_outputs");
m.impl("opName", torch::dispatch(c10::DispatchKey::CPU, TORCH_FN(cpu_kernel)));
}
// cuda file
TORCH_LIBRARY_IMPL(ns, CUDA, m) {
m.impl("opName", torch::dispatch(c10::DispatchKey::CUDA, TORCH_FN(cuda_kernel)));
}
```
Special cases:
I found a few ops that used a (legacy) `CPUTensorId`/`CUDATensorId` dispatch key. Updated those to use CPU/CUDA- this seems safe because the keys are aliased to one another in `DispatchKey.h`
There were a handful of ops that registered a functor (function class) to the legacy API. As far as I could tell we don't allow this case in the new API, mainly because you can accomplish the same thing more cleanly with lambdas. Rather than delete the class I wrote a wrapper function on top of the class, which I passed to the new API.
There were a handful of ops that were registered only to a CUDA dispatch key. I put them inside a TORCH_LIBRARY_FRAGMENT block, and used a `def()` and `impl()` call like in case two above.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D24714803
Pulled By: bdhirsh
fbshipit-source-id: c809aad8a698db3fd0d832f117f833e997b159e1
* Revert D24714803: make duplicate def() calls an error in the dispatcher. Updating all fb operators to use the new dispatcher registration API
Differential Revision:
D24714803
Original commit changeset: c809aad8a698
fbshipit-source-id: fb2ada65f9fc00d965708d202bd9d050f13ef467
* Import torchtext on Nov 20, 2020
Summary:
Import torchtext on the commit of 633548a1bdf0bac1e38f98da375a537ce0c2994b
allow-large-files
Reviewed By: cpuhrsch
Differential Revision: D25127691
fbshipit-source-id: 3a617f5f4849df452f8a102a77ce11a1bce5af1f
* Updating all call-sites of the legacy dispatcher registration API in fbcode to the new API. (#48178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48178
I migrated all call sites that used the legacy dispatcher registration API (RegisterOperators()) to use the new API (TORCH_LIBRARY...). I found all call-sites by running `fbgs RegisterOperators()`. This includes several places, including other OSS code (nestedtensor, torchtext, torchvision). A few things to call out:
For simple ops that only had one registered kernel without a dispatch key, I replaced them with:
```
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName", fn_name);
}
```
For ops that registered to a specific dispatch key / had multiple kernels registered, I registered the common kernel (math/cpu) directly inside a `TORCH_LIBRARY_FRAGMENT` block, and registered any additional kernels from other files (e.g. cuda) in a separate `TORCH_LIBRARY_IMPL` block.
```
// cpu file
TORCH_LIBRARY_FRAGMENT(ns, m) {
m.def("opName(schema_inputs) -> schema_outputs");
m.impl("opName", torch::dispatch(c10::DispatchKey::CPU, TORCH_FN(cpu_kernel)));
}
// cuda file
TORCH_LIBRARY_IMPL(ns, CUDA, m) {
m.impl("opName", torch::dispatch(c10::DispatchKey::CUDA, TORCH_FN(cuda_kernel)));
}
```
Special cases:
I found a few ops that used a (legacy) `CPUTensorId`/`CUDATensorId` dispatch key. Updated those to use CPU/CUDA- this seems safe because the keys are aliased to one another in `DispatchKey.h`
There were a handful of ops that registered a functor (function class) to the legacy API. As far as I could tell we don't allow this case in the new API, mainly because you can accomplish the same thing more cleanly with lambdas. Rather than delete the class I wrote a wrapper function on top of the class, which I passed to the new API.
There were a handful of ops that were registered only to a CUDA dispatch key. I put them inside a TORCH_LIBRARY_FRAGMENT block, and used a `def()` and `impl()` call like in case two above.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D25056090
Pulled By: bdhirsh
fbshipit-source-id: 8f868b45f545e5da2f21924046e786850eba70d9
* Import torchtext from github into fbcode on 1/11/2021
Reviewed By: cpuhrsch
Differential Revision: D25873762
fbshipit-source-id: 0d34d36aeb8e7e2ce72fcf345c5e7e713ef3663c
* Import torchtext from github #1121 d56fffe
Summary: Import torchtext from github #1121 d56fffe
Reviewed By: zhangguanheng66
Differential Revision: D25976268
fbshipit-source-id: 81589f8988a54cc12f17f0a6f298a915e829a830
* Import the hidden files in torchtext github repo
Reviewed By: mthrok
Differential Revision: D26001386
fbshipit-source-id: f822f0f32232d3006ef629937520dee6c0faf414
* add a newline mark to config.yml file (#1128)
Reviewed By: zhangguanheng66
Differential Revision: D26369003
fbshipit-source-id: 09ca48f9705d8663b06e6a329a6b64b24f9c148e
* Replace model with full name when spacy load is used (#1140)
Reviewed By: zhangguanheng66
Differential Revision: D26369005
fbshipit-source-id: b1e6b5d77810bb8f67d14b8a1c7ec0a9f4831cab
* Fix the num_lines argument of the setup_iter func in RawTextIterableDataset (#1142)
Reviewed By: zhangguanheng66
Differential Revision: D26368999
fbshipit-source-id: 4b50e5d9e5fbdf633e8b3f0072223eed050af793
* Fix broken CI tests due to spacy 3.0 release (#1138)
Reviewed By: zhangguanheng66
Differential Revision: D26368998
fbshipit-source-id: 84e883562a9a3d0fe47b54823b22f7b2cd82fca4
* Switch data_select in dataset signature to split (#1143)
Reviewed By: zhangguanheng66
Differential Revision: D26369006
fbshipit-source-id: 608f42fa180db9ebcfaaeadc6b8cdd29393262af
* Add offset arg in the raw text dataset (#1145)
Reviewed By: zhangguanheng66
Differential Revision: D26368996
fbshipit-source-id: 52741015139c302b7b0ddf8c8f50ab45a609fd2f
* switch to_ivalue to __prepare_scriptable__ (#1080)
Reviewed By: zhangguanheng66
Differential Revision: D26368995
fbshipit-source-id: 0352c04e422c835350bd42df35d4054d543fee36
* Pass an embedding layer to the constructor of the BertModel class (#1135)
Reviewed By: zhangguanheng66
Differential Revision: D26369001
fbshipit-source-id: f5a67a2a812d568073505ec4d181f6e418eb4a3f
* add __next__ method to RawTextIterableDataset (#1141)
Reviewed By: zhangguanheng66
Differential Revision: D26368997
fbshipit-source-id: f5ef78f5f4a224db497f47f774eaddedd0498b4b
* Add func to count the total number of parameters in a model (#1134)
Reviewed By: zhangguanheng66
Differential Revision: D26369000
fbshipit-source-id: c687c0f0c2697dbd9c17a79a1291a2e279bbd1b8
* Retire the legacy code in torchtext library and fix the dependency of the downstream libraries
Summary: This diff is doing: 1) move the legacy code in torchtext to the legacy folder; 2) for the downstream libraries in fbcode, if they are using the legacy code, add "legacy" to the path.
Reviewed By: cpuhrsch
Differential Revision: D23718437
fbshipit-source-id: 1660868aaa95ac6555ad6793dda5ce02a9acdc08
* Sync torchtext GH<->fbcode until GH commit 1197514eb8cc33ccff10f588534f405b43908660
Summary: Import recent torchtext changes up until GH commit 1197514eb8cc33ccff10f588534f405b43908660
Reviewed By: zhangguanheng66
Differential Revision: D26824967
fbshipit-source-id: fc4be4f94a8f748ce2ed5e776e30a42422cbcab9
* 20210304[2] Sync torchtext GH<->fbcode until GH commit 2764143865678c41e69ad3b993556fe90c1e6391
Summary: Sync up until commit in title
Reviewed By: zhangguanheng66
Differential Revision: D26829429
fbshipit-source-id: a059a36d83b3803dfed9198d0e474e0e75f94f17
* 20210308 Sync torchtext GH <-> fbcode
Summary: Import latest GH changes
Reviewed By: zhangguanheng66
Differential Revision: D26888371
fbshipit-source-id: cc27f51fd89ad86b8bcfb8f286ad874ab01b1fd6
* Re-name raw_datasets.json file with jsonl extension
Reviewed By: cpuhrsch
Differential Revision: D26923978
fbshipit-source-id: c87c7776445e05d452f6b38244bf4cdaba45bdec
* 20210329 Sync torchtext up to GH commit eb5e39d3d40525c0064c8e7b7c976755e7341a8b
Summary: Sync torchtext up to GH commit eb5e39d3d40525c0064c8e7b7c976755e7341a8b
Reviewed By: parmeet
Differential Revision: D27400885
fbshipit-source-id: 1f8f92ca42ba36d070db6740b3bb4c148f69586b
* Import torchtext #1267 93b03e4
Summary:
Imported latest from github Master
PR#1267
Reviewed By: cpuhrsch
Differential Revision: D27503970
fbshipit-source-id: 853ff895ba42b1feb7442abe1c87478e43d62e5b
* Import torchtext #1266 ba0bf52
Summary: Import torchtext from github
Reviewed By: parmeet
Differential Revision: D27803909
fbshipit-source-id: 9cb0f15858b1417cb5868d5651513eb2df998fbe
* Import torchtext #1287 fab63ed
Reviewed By: parmeet
Differential Revision: D27922562
fbshipit-source-id: 3c18cd9e2583e03471461ad8a22ac6b0ceb596a2
* Import torchtext #1293 d2a0776
Summary: Importing torchtext from github for regular sync.
Reviewed By: cpuhrsch
Differential Revision: D27983819
fbshipit-source-id: 5806421d788afaa872f5320b5f4cbcd913e103ea
* Import torchtext #1291 0790ce6
Reviewed By: parmeet
Differential Revision: D28101664
fbshipit-source-id: a8643b3ecf85de2cb815dcfa5789a4a5d246d80f
* adding __contains__ method to experimental vocab (#1297)
Reviewed By: cpuhrsch
Differential Revision: D28111696
fbshipit-source-id: fef195941492493a399adb37339cfa64795e22a0
* Import torchtext #1292 ede6ce65eb5405ff1f8801ff6b354bb1cd242108
Summary: This diff syncs torchtext GH with fbcode
Reviewed By: cpuhrsch
Differential Revision: D28321356
fbshipit-source-id: 7736f0d100941627b58424911a1329b1ce66c123
* Added APIs for default index and removed unk token (#1302)
Reviewed By: parmeet
Differential Revision: D28478153
fbshipit-source-id: bfcaffe8fe48e96d8df454f7df0d25ec39d5d4a6
* Swapping experimental Vocab and retiring current Vocab into legacy (#1289)
Summary: allow-large-files to commit wikitext103_vocab.pt
Reviewed By: cpuhrsch
Differential Revision: D28478152
fbshipit-source-id: c2a871439f054024b95c05f7664a84028aacaca3
* Import torchtext #1313 36e33e2
Summary: Importing from Github
Reviewed By: cpuhrsch
Differential Revision: D28572929
fbshipit-source-id: 2e7b00aadeda6ab0596ef23295f41c5b0fa246e7
* Adding API usage logging
Summary: Adding API usage logging for Vocab module
Reviewed By: colin2328
Differential Revision: D28585537
fbshipit-source-id: 38975b523fb597412fbcb18ef831bfb4834cb420
* Import torchtext #1314 99557efd98dd0e74346975d75183dd8aa32eb37e
Reviewed By: parmeet
Differential Revision: D28683381
fbshipit-source-id: 7bfbf445dd512f0ce21c34096cf3f08332d90138
* Import torchtext #1325 57a1df3
Reviewed By: NicolasHug
Differential Revision: D28994054
fbshipit-source-id: 4c679f56ef37b18f6d2acaaaed8518facbeaa41c
* Import torchtext #1328 ca514f6
Summary: Import torchtext #1328 ca514f6
Reviewed By: NicolasHug
Differential Revision: D29120370
fbshipit-source-id: 229586f3470bd61bfb2f6a390d79e45d4eae3b4d
* up the priority of numpy array comparisons in self.assertEqual (#59067) (#1340)
* Re-sync with internal repository (#1343)
* up the priority of numpy array comparisons in self.assertEqual (#59067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58988.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067
Reviewed By: jbschlosser
Differential Revision: D28986642
Pulled By: heitorschueroff
fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0
* Import torchtext #1300 0435df13924fd4582d67e5b17bc09f6ded18be8b
Summary: Import torchtext #1300 0435df13924fd4582d67e5b17bc09f6ded18be8b
Reviewed By: parmeet
Differential Revision: D29371832
fbshipit-source-id: 624280ddfa787a4e7628e60fa673cb9df0a66641
* Import torchtext #1345 8cf471c
Summary: Import from github
Reviewed By: hudeven
Differential Revision: D29441995
fbshipit-source-id: 27731ce2714c16180d11bfb26af5d5a2dba408b1
* Import torchtext #1352 7ab50af
Summary: Import from github
Reviewed By: NicolasHug
Differential Revision: D29537684
fbshipit-source-id: 25b1fc1e6d9f930e83f5f2939788b90b083aeaa2
* Enabling torchtext datasets access via manifold and iopath
Summary:
We would like to add and access torchtext datasets on manifold. This Diff unifies the dataset download from external links and through manifold for internal access. This is enabled via io_path package.
The main idea is to plugin the download hooks in the download_from_url function. The download hooks will delegate the download to appropriate Path Handler. In OSS we have enabled download via https and google drive. Internally, we replace the download hook to download data from manifold.
We have created a _download_hooks.py file under /fb/ folder which will replace the corresponding file in OSS. The file under /fb/ folder converts the http/https URL paths into corresponding manifold paths and download the data from there.
Reviewed By: hudeven
Differential Revision: D28892389
fbshipit-source-id: 3b66544dd2345075e2e7c524f344db04aa2a24e3
* Import torchtext #1361 05cb992
Summary: Import from github
Reviewed By: hudeven
Differential Revision: D29856211
fbshipit-source-id: 6332f9bdf3cf4eef572c5423db15101ea904d825
* Import torchtext #1365 c57b1fb
Summary: Import torchtext #1365 c57b1fb
Reviewed By: parmeet
Differential Revision: D29940816
fbshipit-source-id: 6b2495b550a7e6b6110b0df12de51a87b0d31c1c
* Moving Roberta building blocks to torchtext
Summary: This is the first step in moving Roberta Model from pytext_lib into PyTorch Text Library. Here we moved the Roberta building blocks into pytorch/text/fb/nn/modules. The code-base is organized according to WIP document https://docs.google.com/document/d/1c0Fs-v97pndLrT3bdfGRGeUeEC38UcDpibvgOXkbS-g/edit#heading=h.3ybcf0ic42yp
Reviewed By: hudeven
Differential Revision: D29671800
fbshipit-source-id: d01daa99e0a5463716660722381db9a0eeb083f8
* Enabling torchtext availability in @mode/opt
Summary:
More details on context and solution: D29973934
Note that in this implementation, we rely on over-riding behavior of _init_extention() function. This is in similar spirit where we over-ride behavior of download hooks to accommodate necessary changes needed to enable functionality on fbcode.
Reviewed By: mthrok
Differential Revision: D30494836
fbshipit-source-id: b2b015263fa1bca2ef4d4214909e469df3fbe327
* Import torchtext #1382 aa12e9a
Summary: Import torchtext #1382 aa12e9a
Reviewed By: parmeet
Differential Revision: D30584905
fbshipit-source-id: fba23cd19f31fc7826114dd2eb402c8f7b0553df
* Simplify cpp extension initialization process
Summary: Simplifying the cpp extension initialization process by following torchaudio's implementation in D30633316
Reviewed By: mthrok
Differential Revision: D30652618
fbshipit-source-id: f80ac150fa50b1edc22419b21412f64e77064c5d
* fixed bug with incorrect variable name in dataset_utils.py
Summary:
- ValueError was outputting `fn` instead of `func`
- Similar fix done in torchdata https://github.com/facebookexternal/torchdata/pull/167
Reviewed By: ejguan
Differential Revision: D31149667
fbshipit-source-id: 2c1228287d513895f8359cb97935252f0087d738
* Import torchtext #1410 0930843
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D31745899
fbshipit-source-id: e4ac5c337bcbd1a8809544add7679dd3da242999
* Import torchtext #1406 1fb2aed
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D31762288
fbshipit-source-id: f439e04f903d640027660cb969d6d9e00e7ed4a0
* Import from github 10/18/21
Summary: Syncing torchtext github main branch to fbcode
Reviewed By: parmeet
Differential Revision: D31841825
fbshipit-source-id: 9c1a05295e6557ff411e56eb719cb439d5c424ba
* Import torchtext #1420 0153ead
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D31871772
fbshipit-source-id: 989f5a453ef7680592df27e4174f465d11a2fbf8
* Import torchtext #1421 bcc1455
Summary: Syncing torchtext github main branch to fbcode
Reviewed By: parmeet
Differential Revision: D31873514
fbshipit-source-id: 1a964a67ce7ee73f5acf3a1e3f8118028c2dd46e
* Enable OSS torchtext XLMR Base/Large model on fbcode
Summary:
Enable access to open-source torchtext XLMR base/large implementation by:
1) Uploading models/transform weights on manifold
2) Patching public URL with manifold URL (similar to what we have for datasets)
Note that we didn't enabled model tests since it takes relatively long to download huge models weights from manifold. We would rely on Open-source signals when making changes to model implementation, and we need to ensure the any update in weights on AWS cloud is also replicated on manifold.
Reviewed By: hudeven
Differential Revision: D31844166
fbshipit-source-id: 62a4e9a3a8580ab93c3beb3af69be7361f1cc937
* enabling SST2 dataset usage in fbcode
Summary:
Enable access to open-source torchtext SST2 dataset by:
- Uploading SST2 dataset on manifold
- Swapping public URL with manifold URL in fbcode by implementing a dummy `HTTPReader` wrapper class
- The wrapper class does URL mapping and calls `IoPathFileLoaderDataPipe` on the manifold URL
- Enabled SST2Dataset unit tests within fbcode
Reviewed By: parmeet
Differential Revision: D31876606
fbshipit-source-id: fdde14a67cce835da216b296e1a0024e1d1fc7a9
* Import torchtext #1426 4be2792
Summary: Import from github
Reviewed By: Nayef211
Differential Revision: D31962042
fbshipit-source-id: 0308ae0cfe402e8c3eb133cb5a205b65f98ad1df
* Import torchtext #1428 b962c51
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D32006262
fbshipit-source-id: 2d7766104e1116f14f20fa1031178c2143b5e78b
* Import torchtext #1430 4cf19ed
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D32140599
fbshipit-source-id: 3a2902febd5e5024d833699e05e0256b1ae0cae2
* Allow inferred scaling in MultiheadSelfAttention for head_dim != 64
Summary:
Rather than raise an exception whenever head_dim != 64, we can just infer the scaling value and continue to provide a warning.
Also add an assertion in case embed_dim is not a multiple of num_heads (in which case forward will break).
Reviewed By: parmeet
Differential Revision: D32193989
fbshipit-source-id: 30f68c55f3ec37932252c77c355ae55b8bf34ded
* Updated sst2 dataset to accept `validate_hash` parameter
Summary:
## Description
- Updated sst2 dataset to accept a `validate_hash` parameter
- This allows for testing using partial datasets since downloading the entire dataset takes much longer
Reviewed By: parmeet
Differential Revision: D32250435
fbshipit-source-id: 9b5e7183f62df69638e1a3af2107273daa6f4ac5
* Import torchtext #1431 ba20fc5
Summary: Import latest from github
Reviewed By: Nayef211
Differential Revision: D32282533
fbshipit-source-id: 8318cd8b8360dec1febdde0bc48388e6b2f2d768
* Fixed file filtering bug in SST2 dataset
Summary:
- Removed copying partial SST2 asset file to a temp dir and instead directly working with the file from the asset folder
- Fixed bug with path names affecting how files were filtered out from the zip file
- For example, if the value of `split` is "test", the following snippet of code `filter(lambda x: split in x[0])` might match all of the "train", "test", and "dev" files depending on the location of the dataset asset file
- When testing with buck, the location of the extracted files could look something like `/data/users/nayef211/fbsource/fbcode/buck-out/dev/gen/pytorch/text/test/experimental_test_datasets#binary,link-tree/test/asset/SST2/SST-2.zip/train.tsv`. Since the word "test" is contained in this path string, the filtering logic would incorrectly select the "train" file even though what we want is the "test" file
- To resolve this we append the file extension (in this case ".tsv") to the `split` variable in the filtering logic
Reviewed By: parmeet
Differential Revision: D32329831
fbshipit-source-id: dbb4803a04f6cd50fab3f7ce5530d3258b2db012
* Squashed commits (9314b44 to e691934)
Summary:
Trying out new way to import changes :). The main reason for this deviance is to find a way to skip commit ID(s) which is currently blocking another important PR to be landed on fbcode.
Used following command to sync changes from github to fbcode:
```python pytorch/import.py --github_username parmeet --project_name text --commit_from ba20fc525a8a46d3056eeb421a44b9bdb1a90182 --commit_to e691934d2779be40ab425056836565840f49d565 --skip_commit_ids 2cebac34ab26577ee02b7295dbe01dccfdb1a88f daf0f6c71d7b764aafd2f1a2a3e7aa37dcc36e53 --squash```
Notes:
- Skipped commit 2cebac3 as it about removing legacy code which is still work in progress on internal side (resolving legacy use-sites) by abhinavarora
- Skipped commit daf0f6c as this correspond to syncing changes from fbsync. to main branch
- We have used squash, but can skip this option to get 1:1 correspondence from PR to Diff ID, like we have in vision
This text from below here is auto-generated because i think we used --squash
====
Subject:
Update doc and fix CircleCI doc build issue (#1434)
Body:
commit e691934d2779be40ab425056836565840f49d565
====
Subject:
[CircleCI Windows Failure] Fix the way we join URL pieces to download XLM-R components (#1441)
Body:
commit d4a27a05a85d331d84d3ac527ca5f18ca64d326f
====
Subject:
correct the `_compute_ngram_counter` docstring (#1440)
Body:
commit a26a8ef7f7ad22f9f2ae7af0e52e4c9760ab439d
====
Subject:
fix attention mask testing (#1439)
Body:
commit 778b3e62770c24c4ecde06a6aaba1dee38c07e2e
====
Subject:
[Vocab] Refactor vocab factory method to accept special tokens as a keyword argument (#1436)
Body:
* [Vocab] Refactor vocab factory method to accept special tokens as a keyword argument
commit f298494ad90495e4ad442928665ce6d8e9f9c3c0
====
Subject:
add attention mask to transformer encoder modules (#1435)
Body:
commit 9314b44d2a6cb6f4129e1ac3ac57f92eb054f15d
Reviewed By: Nayef211
Differential Revision: D32431346
fbshipit-source-id: 985e242ce5a733c130e9d5b9549a4a330e948dc7
* Refactor OnDiskCache (#61)
Summary:
Pull Request resolved: https://github.com/pytorch/data/pull/61
Fixes https://github.com/facebookexternal/torchdata/issues/114 and https://github.com/facebookexternal/torchdata/issues/140
* #59
Test Plan: Imported from OSS
Reviewed By: wenleix
Differential Revision: D31734382
Pulled By: ejguan
fbshipit-source-id: 16d10bace2a473e3878ac8dd5f7b6885bd924105
* Add a class method in Model Bundler to facilitate model creation with user-defined configuration and checkpoint (#1442)
Summary:
Import from github
Command used:
`python pytorch/import.py --project_name text --commit_ids 2040d8da87394ab5ecf6ac2bbcd5a00beb940cf4`
Note that we still not importing the whole repo using import_text.sh. using import.py would be the worflow we would rely on till we merge [legacy code removal commit](https://github.com/pytorch/text/commit/2cebac34ab26577ee02b7295dbe01dccfdb1a88f) into fbcode.
Reviewed By: Nayef211
Differential Revision: D32603181
fbshipit-source-id: 1f583e5ac96e693b583ae42d5841bf387cf3727a
* Import torchtext from github aea6ad6,#1449 to 9f2fb3f,#1452
Summary:
command:
`python pytorch/import.py --project_name text --commit_ids aea6ad6bf9a6292af3d5051b4862b966871bdcce 9f2fb3f00cd9a4cc8d41d2e9cbfa5e9bf9533224 --squash`
Reviewed By: abhinavarora
Differential Revision: D32690771
fbshipit-source-id: cde616182ecfe643ab48d727b66bbf0194480d3e
* Fix SST2Dataset test iterator
Summary:
## Summary
- Modified SST2 dataset implementation to only return text for test split (since label_ids are not available)
- Updated doc classification datamodule to temporarily use `val_dataset` instead of `test_dataset`
- Updated first line md5 hash for SST2 test split
## Followup Items
- Update doc classification module to work with test splits with and without labels
Reviewed By: parmeet
Differential Revision: D32661112
fbshipit-source-id: ef86aea0ce587c5d5282f2caa943b4b0cdf6f54a
* Fix issue in label Transform
Summary: In the construction of Vocab within label transform, the default index is set to 0. This index is returned when OOV token is given. For this transform, the default index should never be set. Otherwise, it will return default index (which is 0) for unknown labels that might get passed (Ideally it should throw error in this case because we do not know what to do when wrong label is passed for query)
Reviewed By: hudeven
Differential Revision: D32610834
fbshipit-source-id: e49385fb313929627c41fc515b6d900a6bfc3591
* Import torchtext #1437 2cebac3
Summary: Imports [#1437](https://github.com/pytorch/text/pull/1437) from OSS Torchtext that removes the legacy folder.
Reviewed By: parmeet
Differential Revision: D32923084
fbshipit-source-id: 83411efd62cd527c518e36279bdbf586435ac9e5
* Import torchtext #1457 d801e99
Summary: Import from github
Reviewed By: abhinavarora, ebsmothers
Differential Revision: D32962989
fbshipit-source-id: 4de93cbc0ebe29034a505c56d03bb8d4b698891c
* Import torchtext #1459 8ef1b15
Summary: Imports torchtext to fbcode
Reviewed By: parmeet
Differential Revision: D33001763
fbshipit-source-id: 0525982a1aadcfed65172c22734a46fdf2bd7bde
* Fixing typing issues DataSet -> DataType
Summary: Forward fix of D31344867
Reviewed By: Nayef211, ejguan
Differential Revision: D33069330
fbshipit-source-id: 1649049a6caf1178a78a25baf21e1b4ecdc44d77
* Import torchtext #1470 52d38e8
Summary: Import from github
Reviewed By: ebsmothers
Differential Revision: D33291837
fbshipit-source-id: 86f8675f13190425617937dcbdd5b698da0bba0f
* Import torchtext #1486 4908d3c
Summary: As title
Reviewed By: Nayef211
Differential Revision: D33434571
fbshipit-source-id: 3cb1d43583fd1e2f28dfd27109a8bf5f1b255d1d
* Import torchtext #1488 2c98927
Summary:
====
Subject:
Switching to use FileOpener from FileLoader (#1488)
Body:
commit 2c989273a6a99eef12d2e3fe25258b27881cb0bf
====
Subject:
add scriptable sequential transform (#1481)
Body:
commit 3849f4648a5021514b6b91fa721b43b63fad8378
Reviewed By: abhinavarora
Differential Revision: D33485781
fbshipit-source-id: 3a7ca597cb2f2be98be29a639ef05a65a3f7b6be
* Update load_state_dict_from_url method to skip download if file is cached
Summary:
- Update load_state_dict_from_url method to skip download if file is cached in the `model_dir` folder
- The `model_dir` parameter was previously unused
- New logic is similar to the [OSS implementation in torchhub](https://pytorch.org/docs/stable/_modules/torch/hub.html#load_state_dict_from_url)
- Update unit test to test skipping download when file is already cached
Reviewed By: abhinavarora
Differential Revision: D33512850
fbshipit-source-id: 2350c6dcad7e5725cf670c99405bcc7d0fb05e42
* Import torchtext #1482 0e7cf21
Summary:
- Remove xlmr transform class and instead use sequential for model transforms composition
- Modified doc classification recipe to use sequential transform instead of `XLMRobertaModelTransform`
Reviewed By: parmeet
Differential Revision: D33485834
fbshipit-source-id: 01a914112219838620f3ce81cf621665d072ae69
* Import torchtext #1509 1a2fc00
Summary:
====
Subject:
migrate YelpReviewPolarity to datapipes. (#1509)
Body:
* add initial pass at migrating YelpReviewPolarity to datapipes.
* fix flake.
commit 1a2fc00266eb71c202802803c390e00b4082085e
====
Subject:
migrate YelpReviewFull to datapipes. (#1507)
Body:
* add initial pass at migrating YelpReviewFull to datapipes.
* fix flake.
commit 0fb50b45a6b40d63ef8fabc5766a15c78fce6c8e
====
Subject:
migrate YahooAnswers to datapipes. (#1508)
Body:
* add initial pass at migrating YahooAnswersto datapipes.
* fix flake.
commit f99609d7c56d8b742f6b0f281fcd726d05aa4923
====
Subject:
migrate DBPedia to datapipes. (#1500)
Body:
* add initial pass at migrating DBPedia to datapipes.
* add _EXTRACTED_FILES for consistency.
commit 1881705aec892efd45803006fd8b6c845be9965f
====
Subject:
replace funny os.sep joins with os.path.join for consistency. (#1506)
Body:
commit ce4ab8b5c1f22cf533e04f73696b0816f63a4ae5
====
Subject:
migrate AG_NEWS to datapipes. (#1498)
Body:
commit d9fdbc62a7c9b6ed27b47d92edc33d1cf8e9cf9d
====
Subject:
migrate SogouNews to datapipes. (#1503)
Body:
* add initial pass at migrating SogouNews to datapipes.
* make filter for specific split more consistent.
commit e6065a9217a95e71ba47ca0184953627b21ab7ef
====
Subject:
Fix filter logic (#1505)
Body:
commit a415684661ff9d7fb9e2b7f438cc8e70c09781bf
====
Subject:
fix per https://github.com/pytorch/vision/issues/4832\#issuecomment-957695788 (#1504)
Body:
commit 8215832272e8d05f27dc5372a5e4382ce6942819
====
Subject:
add initial pass at migrating Amazon Review Full to datapipes. (#1499)
Body:
commit df0ec14a802bb7b85f06c97f564959f988212f80
====
Subject:
Parameterize jit and non-jit model integration tests (#1502)
Body:
* Updated max seq length for truncate in xlmr base. Updated xlmr docs. Moved xlmr tests to integration tests
* Removing changes to truncate transform
* Remove documentation changes from PR
* Parameterized model tests
* Added nested_params helper method. Updated model integration test to parameterize a single method covering jit and non-jit tests
* Added docstring for unit tests
commit d896135e4f5060bbaeb2cc5c3ed43eb15bc8a4c0
====
Subject:
Remove redundant get asset functions from parameterized_utils (#1501)
Body:
commit 0bcab91246b7ca17db048d0ab97a3199b94c05ab
====
Subject:
Parameterized XLMR and Roberta model integration tests (#1496)
Body:
* Updated max seq length for truncate in xlmr base. Updated xlmr docs. Moved xlmr tests to integration tests
* Removing changes to truncate transform
* Remove documentation changes from PR
* Parameterized model tests
commit 2cb80a23412993b7eb9ded082d084eb39c1f0c4e
====
Subject:
Migrating AmazonReviewPolarity to datapipes (#1490)
Body:
commit 826a051dfd9f62731f3b0dee854d0aa687f4da72
====
Subject:
Updated XLMR docs (#1497)
Body:
commit 1a052693509ce32b6fb91302f6ca62546b0afe0d
====
Subject:
fix max sequence length for xlmr transform (#1495)
Body:
commit 776a15daed49f4046b46a2501ea8b63e85bc9da2
====
Subject:
Add pre-trained Roberta encoder for base and large architecture (#1491)
Body:
* Added new roberta encoders and tests
* Added docs for roberta encoder
* Updated truncate length. Added info on how model was trained along with license info
* Added datasets that roberta was trained on
* Removing unnecessary new line
commit 6d9e6df7dee068d99d74355d14c7cb897b199d60
====
Subject:
remove optionality of dtype in `ToTensor` (#1492)
Body:
commit c0e1c38b34ebabf0f12859ee2594194f9c65957a
Reviewed By: abhinavarora
Differential Revision: D33555196
fbshipit-source-id: eca8e38ea61c72a626ec20096f18827cebae4ef7
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
* Import torchtext #1532 ce1ce9
Summary:
====
Subject:
Add AmazonReviewPolarity Mocked Unit Test (#1532)
Body:
* First attempt at adding test for amazon review polarity
* Updated dataset to take validate_hash param. Finalized tests
* Created non empty tar file
* Remove formatting. Patch _hash_check method from torchdata during testing
* Added super().setUpClass()
* Remove commented import
commit ce1ce99583795207153e13e9bc35a388d368a49d
====
Subject:
migrate Multi30k to datapipes. (#1536)
Body:
commit 627c71f837f6acf7db34b3d96a696624cb4a7087
====
Subject:
add initial pass at migrating UDPOS to datapipes. (#1535)
Body:
commit f685c55e02a43b6489d096f1dd2c05e8be13df63
====
Subject:
Migrate WikiText103 to datapipes (#1518)
Body:
commit 042f12f1be9701fc85129c9be380aec72ed3bc2e
====
Subject:
add double caching for yelp full to speed up extracted reading. (#1529)
Body:
commit d19a77eb69a11a3c9feb74b391e288ed70277bb4
====
Subject:
Migrate WikiText2 to datapipes (#1519)
Body:
* Migrate WikiText2 to datapipes
* Address code review comments and add double caching
commit 437eea8f841fc5efe7dc0f116bbfef781cb88b84
====
Subject:
add double caching for yahoo to speed up extracted reading. (#1528)
Body:
* add double caching for yahoo to speed up extracted reading.
* simplify filepath_fn
* rename dps for consistency.
* add FileOpener within caching block for more consistency.
commit ff78e999f6edb866c33a1464c8288cb90f15c9e4
====
Subject:
add max_tokens kwarg to vocab factory. (#1525)
Body:
commit e1d66cf8ccd2b29378d5f3352b01e4310a36b557
====
Subject:
Migrate IMDB to datapipes (#1531)
Body:
* Migrate IMDB to datapipes
* add double cache for extracted reading
* update cache name
commit 03afb7e1e6b6821eb7b479aa11b9c449c251de7a
====
Subject:
add double caching for yelp polarity to speed up extracted reading. (#1530)
Body:
* add double caching for yelp polarity to speed up extracted reading.
* rename dps for consistency and simplify filepath_fn
* add FileOpener within caching block for more consistency.
commit 83aebf495a92761e9683b4af4461ad28ae5c96a7
====
Subject:
Migrating EnWik9 to datapipes #1511 (#1512)
Body:
* Migrating enwik9 dataset to use torchdata
* Added typing to params
* Fixed PR comments. Updated to data_dp
* Added caching for extracted files
* Moved FileOpener after ondiskcache datapipe
commit 12317098cef5822846125e579cd197b217c9e30e
====
Subject:
Migrating PennTreebank to datapipes (#1511)
Body:
* Migrating penntreebank dataset to use torchdata
* Update FileLoader to FileOpener
* Resolved comments about return_path
* Using strip() to remove leading/trailing spaces
commit eb3994567830aeeccfcc1d7053ac6c29400cb593
====
Subject:
Cache extraction for AmazonReviewPolarity (#1527)
Body:
commit 0f7f859e412fba4a31852c1a84801a182e636fde
====
Subject:
migrate CONLL 2000 to datapipes. (#1515)
Body:
commit b52746546c0648122231e4d73bf24175ef949df3
====
Subject:
add initial pass at migrating SQUAD2 (https://github.com/pytorch/text/commit/4be2792101565ddf6dd79d1b7fffb7d55d63bf06) to datapipes. (#1514)
Body:
commit a2ab9741415b2cff026d158a5a54b62b993571d9
====
Subject:
migrate SQUAD1 to datapipes. (#1513)
Body:
commit a5ca19407b844e49679d87c94003e08c5efd6d78
====
Subject:
Attempting to fix version conflict in CI (#1520)
Body:
* since we no longer support python 3.6, we get dataclasses in stdlib for free.
* replace pip-install of packages with conda-install where applicable for better version management of native code.
* make cpuonly a constraint instead of feature
commit a6ae5946e49db2afb2eb8ca5435afaea036077f3
====
Subject:
fixing cache logic to work with datapipes (#1522)
Body:
* fixing cache logic to work with datapipes
* committing temporary change to build cache
* reverting temp change
commit cf668aabf869ae9bdbc5c1259e011f36a1411a2b
====
Subject:
3.6 is EOL (#1521)
Body:
commit 7467bb5971b8ed59a716ba05b82bb1030ed4fbe2
====
Subject:
Fixing dataset test failures due to incorrect caching mode (#1517)
Body:
commit 38ec295c1970776a43b42712b4156d2635ae85c3
====
Subject:
IterDataPipes do not have __next__ (#1516)
Body:
commit 8f153f692ed85229db8e43b14398adae5f58d646
Reviewed By: abhinavarora
Differential Revision: D33850546
fbshipit-source-id: 2235caac646eb0fcc14fb638cbbfd4b15f966035
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: nayef211 <[email protected]>
* Import torchtext #1538 d72124c
Summary:
- Import d72124c commit which migrates the SST2 dataset away from experimental
- Modify doc classification recipe to work with new functional dataset implementations
- Make label transform optional since some datasets return integer labels
- Added a `num_labels` field to `DocClassificationTransform` class which will be used to determine `num_classes` for metrics computation
- Update the `SST.zip` testing asset with the correct folder structure
Reviewed By: parmeet
Differential Revision: D33792100
fbshipit-source-id: 4480ef0ba8dabb495f0a2adc45f588413aea5f4d
* Import torchtext from commits e0c5528 to 8808e7e
Summary:
Command used:
`python pytorch/import.py --github_username parmeet --project_name text --commit_from d72124cb710574087d0bce87062ee521e1584167 --commit_to 8808e7eee5a2df79b9566a4a348889dc2722fcfb --skip_commit_ids 7f3ed4b183eb451b439740a59bb849771c707f0c --squash`
Followed by: arc lint to fix new line linter issues
Reviewed By: VirgileHlav
Differential Revision: D34717890
fbshipit-source-id: 7aa0f22421b3f3bfb9684c6e24f7dc606052da5c
* Import torchtext #1635 69f67f3
Summary:
Import latest from github using import_text.sh script
Changed `RobertaModelBundle` to `RobertaBundle`
Reviewed By: Nayef211
Differential Revision: D34718778
fbshipit-source-id: f68fc827c5956ffedc4f5a98175d0724ca431c9d
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D34753031
fbshipit-source-id: 6d8a92b4c2f4b5b85b90edb5b8329e3061411620
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D34815193
fbshipit-source-id: 1865de76d8b133f56e4060961c1173097efac575
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D34857180
fbshipit-source-id: 1c483f2277c902271a2ff75f0c36bf5de8bbba34
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D34920364
fbshipit-source-id: c03dfd98da4b66dc63e5e4dfd11af449bc95ce85
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D35074232
fbshipit-source-id: 1772fbf171665894ab8945d967011873aa7f626e
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D35309109
fbshipit-source-id: be74f1c2739fbfb6e43cc6649839e647c37de4c8
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D35392538
fbshipit-source-id: 02ca5e81ec7ca2d607c1eef3b32ddf4a51c279c8
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D35425316
fbshipit-source-id: 815c3d048440211d2107b8605830530db609efe0
* torchx integration
Summary: Integrate with torch to run the training in local or flow
Reviewed By: Nayef211
Differential Revision: D35412165
fbshipit-source-id: 297bea540ace67d93965e0982d6c8f8ff5d03208
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D35773883
fbshipit-source-id: ab1787498d2169b4345f5981c21eb6b898fa8f2e
* BetterTransformer support for torchtext (#1690)
Summary:
Pull Request resolved: https://github.com/pytorch/text/pull/1690
This diff created a fast path of using better transformer (torch.nn.TransformerEncoderLayer), with a converter from the existing torchtext transformer encoder layer to better transforomer.
The related tests are added in the following diff.
Reviewed By: parmeet
Differential Revision: D35948440
fbshipit-source-id: e69e12f2dd28edfea3176a10ee3d7d321d50c897
* Kill to_better by having native load_from_state_dict and init
Summary: Fully remove to_better method by rebuild torchtext TransformerEncoderLayer's load_from_state_dict and init. No more redundant params.
Reviewed By: parmeet
Differential Revision: D36020184
fbshipit-source-id: ccdd6da853a86034762b235cd7d5f793876d16c6
* Remove unneeded modules after using nn.Module for BetterTransformer (#1693)
Summary:
Pull Request resolved: https://github.com/pytorch/text/pull/1693
Remove unneeded modules after using nn.Module for BetterTransformer
Reviewed By: zrphercule
Differential Revision: D36038830
fbshipit-source-id: 1e0f5c7cf81096cf66cc1afcf15b5e0645c3da03
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D36034077
fbshipit-source-id: 40c12ec37992d71c4857f92bc5e2ed939e2d6030
* Replace TransformerEncoder in torchtext with better transformer (#34)
Summary:
X-link: https://github.com/facebookresearch/multimodal/pull/34
Pull Request resolved: https://github.com/pytorch/text/pull/1700
Replace the usage of TransformerEncoder by BetterTransformerEncoder
In theory we should be able to remove torchtext.TransformerEncoderLayer after this diff.
Reviewed By: parmeet
Differential Revision: D36084653
fbshipit-source-id: 64ed3810e809fc1db840e75e2e05783089ff31d2
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D36162313
fbshipit-source-id: ff366f585b4783e903f8388654e71ce635b2a556
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D36307982
fbshipit-source-id: faf90f12012bd962fc5decfd3cf9e117f4b9160a
* Enable model testing in FBCode
Summary:
This diff enables Model testing in FB code
Notes:
1. it only tests XLM-R models (base and large) in integration tests. We need to do a follow-up diff to enable RoBERTa testing since corresponding assets are missing in FBcode.
Edit: Addressed the Roberta model testing in this diff itself
2. parameterized was giving some weird long names to the test which was creating some unknown issue for running them in sandcastlle. Removed it for now to get the proper names for test.
Edit: refactored test suit since nested_params was creating long string names (400+ characters) for test methods due to RobertaBundle objects
Reviewed By: mikekgfb
Differential Revision: D35973306
fbshipit-source-id: 8a50d03466f60c8a4a0fbd5857611e68c92ebf08
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D36340622
fbshipit-source-id: ed6f1994916d5d469198e6d0876387a6363db1ea
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D36448402
fbshipit-source-id: bee15f955a21a730653d72d4aedff7b6122f6ef0
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D36510904
fbshipit-source-id: 1b9b27e62af007e88f76414e936fa08ae1ce7d59
* Import torchtext #1794 a54be1f3a7ac534509ac9c066a1b35127936dd77
Summary:
Manually importing TorchText from github using ```./fbcode/pytorch/fb_build/import_text.sh```
In additional to manual import, this diff also updates the libtorchtext TARGET dependency on utf8proc
Reviewed By: VirgileHlav
Differential Revision: D37250868
fbshipit-source-id: 369d67aa02492f620350eb8b28c00b59dc84f081
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D37171614
fbshipit-source-id: 56fa981bc709f78ac3371a5346b9278730895b82
* Import TorchText from Github
Summary:
Meta:
Import latest TorchText from Github to fbcode. Check fb/LAST_SYNCED_COMMIT_FROM_GITHUB_MAIN for the synced commit hash.
Rules run:
- CodemodTransformerSimpleShell
Config Oncall: [pytorch_text](https://our.intern.facebook.com/intern/oncall3/?shortname=pytorch_text)
CodemodConfig: [CodemodConfigPyTorchTextGithubSync](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/pytorch_text/github_sync/CodemodConfigPyTorchTextGithubSync.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/31525198098541494/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).
_____
## Questions / Comments / Feedback?
**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=31525198098541494).**
* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.
Reviewed By: Nayef211
Differential Revision: D37374922
fbshipit-source-id: d2cfb5e58fc35b653f00b0d81330fe2337e6e347
* Import TorchText from Github
Summary:
Meta:
Import latest TorchText from Github to fbcode. Check fb/LAST_SYNCED_COMMIT_FROM_GITHUB_MAIN for the synced commit hash.
Rules run:
- CodemodTransformerSimpleShell
Config Oncall: [pytorch_text](https://our.intern.facebook.com/intern/oncall3/?shortname=pytorch_text)
CodemodConfig: [CodemodConfigPyTorchTextGithubSync](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/pytorch_text/github_sync/CodemodConfigPyTorchTextGithubSync.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/709158032/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).
_____
## Questions / Comments / Feedback?
**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=709158032).**
* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.
Reviewed By: parmeet
Differential Revision: D37411197
fbshipit-source-id: 8eeb460843eacfd0f3d970062b3e0e393d5eef6f
* Import TorchText from Github
Summary:
Meta:
Import latest TorchText from Github to fbcode. Check fb/LAST_SYNCED_COMMIT_FROM_GITHUB_MAIN for the synced commit hash.
Rules run:
- CodemodTransformerSimpleShell
Config Oncall: [pytorch_text](https://our.intern.facebook.com/intern/oncall3/?shortname=pytorch_text)
CodemodConfig: [CodemodConfigPyTorchTextGithubSync](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/pytorch_text/github_sync/CodemodConfigPyTorchTextGithubSync.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/711752278/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).
_____
## Questions / Comments / Feedback?
**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=711752278).**
* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.
Reviewed By: Nayef211
Differential Revision: D37483835
fbshipit-source-id: b4ad3c43ece7c83c57617e6a5851fff3ecdf8e51
* Adding TARGETS file for torchtext benchmarks
Summary:
### Summary
- Enable benchmarking of torcharrow ops within torchtext
### Benchmark Results
- Benchmarking in fbcode devserver
```
torchtext GPT2BPE tokenizer: 65.811
torchtext vocab: 2.226
torchtext add tokens operation (string): 0.722
torchtext add tokens operation (int): 0.598
torcharrow GPT2BPE tokenizer: 65.739
torcharrow vocab: 1.253
torcharrow add tokens operation (string): 14.335
torcharrow add tokens operation (int): 0.229
```
Benchmarking on Apple MBP (results can also be found in [text#1801](https://github.com/pytorch/text/pull/1801) and [text#1807](https://github.com/pytorch/text/pull/1807))
```
torchtext GPT2BPE tokenizer: 3.13
torchtext vocab: 0.32
torchtext add tokens operation (string): 0.382
torchtext add tokens operation (int): 0.431
torcharrow GPT2BPE tokenizer: 59.13
torcharrow vocab: 0.03
torcharrow add tokens operation (string): 3.652
torcharrow add tokens operation (int): 0.075
```
### Takeaways
- GPT2BPE for torchtext is significantly faster on MBP than devserver
- AddTokens (str) for torcharrow is still significantly slower on both MBP and devserver than the torchtext counterpart
Reviewed By: parmeet
Differential Revision: D37463862
fbshipit-source-id: 1fb538338367bac2b002c1a4b8f128b0b2847bf5
* Import TorchText from Github
Summary:
Meta:
Import latest TorchText from Github to fbcode. Check fb/LAST_SYNCED_COMMIT_FROM_GITHUB_MAIN for the synced commit hash.
Rules run:
- CodemodTransformerSimpleShell
Config Oncall: [pytorch_text](https://our.intern.facebook.com/intern/oncall3/?shortname=pytorch_text)
CodemodConfig: [CodemodConfigPyTorchTextGithubSync](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/pytorch_text/github_sync/CodemodConfigPyTorchTextGithubSync.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/13510799591955868/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).
_____
## Questions / Comments / Feedback?
**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=13510799591955868).**
* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.
Reviewed By: abhinavarora
Differential Revision: D37514618
fbshipit-source-id: efc3b56b6da2afdc601b3dc706c58d0222d0daf6
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D37642224
fbshipit-source-id: 674d2fdfa57bc2131bed136986d385194416f0bb
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D37680190
fbshipit-source-id: b06341b9989bdcb0859ad84838860f05ef2e501f
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D37711064
fbshipit-source-id: 3646b536af2359b776e6a49b9c86f6657c0f1a4c
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D37879352
fbshipit-source-id: 53b04c4b41a3c7e8077842c39a331144eab76208
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D37952995
fbshipit-source-id: 09c492ac8d1333283bb4366c9ae0c6b95b98a87c
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D38110070
fbshipit-source-id: 824a1a2d7a4cb97a69b3bcfd39167ac039edd1b5
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D38146055
fbshipit-source-id: 1b232be8ce396189a123139ac8456433d12d2316
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D38269840
fbshipit-source-id: 901e5279e8e0265fabd48aca861a43d2e4c45dee
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D38351452
fbshipit-source-id: 2439d74bc9ab3f477876f35f549caec9117711bd
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D38381535
fbshipit-source-id: ba50c1a33fda33c4ccc8157702f32b94d415197f
* Import TorchText from Github
Reviewed By: parmeet
Differential Revision: D38419656
fbshipit-source-id: 871439658ed673910c68c025be471501b9b4670a
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D38534440
fbshipit-source-id: 3bf1a7d5cc2daa8d14e424d16509b2df998549b8
* Import TorchText from Github
Reviewed By: Nayef211
Differential Revision: D38655164
fbshipit-source-id: 0b9364fb759520c6fb60147fd0ab1044c362d588
* Import torchtext #1879 72966f0
Summary: ran the `import_text.sh` command to manually update the internal fbcode to match the Github torchtext repo
Reviewed By: Nayef211
Differential Revision: D38796445
fbshipit-source-id: 904143c404141bb016a5f83fbc53906b1c6e1246
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D38907288
fbshipit-source-id: f82ad8121bce924ad6068767845e5ea29dd24bef
* Remove dependency on the torch::jit::script::Module for mobile builds
Summary: In order to resolve linkage errors. Specifically when vocab getting build for "mobile" version it can't resolve symbols for torch::jit::script::Module
Reviewed By: Nayef211
Differential Revision: D38771271
fbshipit-source-id: 693b656f2a17af9fa5a7a1904742557f902edb55
* Replace `pytext_lib`'s `MaskTransform` with new one from `torchtext`
Summary: Replace instances of `pytext_lib`'s `MaskTransform` with new one from `torchtext` that was merged in https://github.com/pytorch/text/pull/1882
Reviewed By: Nayef211
Differential Revision: D39058074
fbshipit-source-id: f61499d88eec7eccda659279786528bac7edf9d0
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D39095295
fbshipit-source-id: 2e447db46b71fc152f2f53b281585650682cb696
* move PATH_MANAGER to OSS
Summary:
## Problem:
pytext got "No module named 'pytorch'" in issue https://github.com/facebookresearch/pytext/issues/1706
It's due to `from pytorch.text.fb.utils import PATH_MANAGER` is internal only but imported in pytext. Actually, `pytorch/text/fb/utils/__init__.py` should be open sourced.
## Solution:
This diff moved it to OSS as `from torchtext.utils import PATH_MANAGER` and updated all the references
Reviewed By: Nayef211
Differential Revision: D39292896
fbshipit-source-id: c0046d62e64145b60ad9a5298b366f0f1a348369
* Turn off mask checking for torchtext which is known to have a legal mask (#1896)
Summary:
Pull Request resolved: https://github.com/pytorch/text/pull/1896
Turn off mask checking for torchtext which is known to have a legal mask
Reviewed By: zrphercule
Differential Revision: D39445703
fbshipit-source-id: 3f0cacfd39ea11a16c7a06f339872554333b5e97
* Back out "move PATH_MANAGER to OSS" (#1724)
Summary:
X-link: https://github.com/facebookresearch/pytext/pull/1724
Original commit changeset: c0046d62e641
Original Phabricator Diff: D39292896
torchtext can't depend on iopath as raised in https://github.com/pytorch/text/pull/1905
Reviewed By: Nayef211
Differential Revision: D39639475
fbshipit-source-id: 69a48eb3820d0642b0a56712e160a0af589e4c7c
* Import TorchText from Github
Summary: Manually import latest changes from github to fbcode
Reviewed By: joecummings
Differential Revision: D39770284
fbshipit-source-id: 1e442f222d582c43a2ca9280d93eca4135d2df09
* Import TorchText from Github
Reviewed By: rshraga
Differential Revision: D39811057
fbshipit-source-id: 33cce346ac3d226a2fff6c162c39164837f34d87
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D40225047
fbshipit-source-id: 7abff009d65d713a6ce134fc88cd1955f62e3e3d
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D40294258
fbshipit-source-id: b3e14d9e78e346c294f1bc65ba3045b92251e034
* Add Character Level BPE Tokenizer (#1936)
Summary:
Pull Request resolved: https://github.com/pytorch/text/pull/1936
This change adds a character level BPE tokenizer to the set of available transforms. It takes a pre-trained encoder dict (i.e vocab dict) and merge list as input. It is not using C++ for encoding / decoding at this time.
Reviewed By: langong347
Differential Revision: D40186470
fbshipit-source-id: 48bacc631f537e941a495e39ef9ccb17d3ef7896
* Add padding_masks and tests for T5Model (#1935)
Summary:
Pull Request resolved: https://github.com/pytorch/text/pull/1935
Added the following parameters to the `forward` method of the T5Model:
* `encoder_padding_mask`
* `decoder_padding_mask`
These allow users to specifically mask out the padding of input sequences. This matches the implementation of Transformers in PyTorch core.
Reviewed By: Nayef211
Differential Revision: D40252794
fbshipit-source-id: 0e0a17fdc97ae0bbcaa1aef91e9914fd6225456b
* Import TorchText from Github
Reviewed By: abhinavarora
Differential Revision: D40425553
fbshipit-source-id: 268b94d65cff771028c2e2fdf21caa9855d07cef
Co-authored-by: Guanheng Zhang <[email protected]>
Co-authored-by: Christian Puhrsch <[email protected]>
Co-authored-by: cpuhrsch <[email protected]>
Co-authored-by: Moto Hira <[email protected]>
Co-authored-by: George Guanheng Zhang <[email protected]>
Co-authored-by: Stanislau Hlebik <[email protected]>
Co-authored-by: Andres Suarez <[email protected]>
Co-authored-by: Meghan Lele <[email protected]>
Co-authored-by: Brian Hirsh <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Jeff Hwang <[email protected]>
Co-authored-by: Parmeet Singh Bhatia <[email protected]>
Co-authored-by: Artyom Astafurov <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Heitor Schueroff <[email protected]>
Co-authored-by: Facebook Community Bot <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Vincent Quenneville-Belair <[email protected]>
Co-authored-by: Yao-Yuan Yang <[email protected]>
Co-authored-by: Evan Smothers <[email protected]>
Co-authored-by: Erjia Guan <[email protected]>
Co-authored-by: Abhinav Arora <[email protected]>
Co-authored-by: Vitaly Fedyunin <[email protected]>
Co-authored-by: nayef211 <[email protected]>
Co-authored-by: CodemodService Bot <>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: Rui Zhu <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by:…
1 parent 5eb33ce commit 31b8aaaCopy full SHA for 31b8aaa
File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedFilter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changed
0 commit comments