Skip to content

Add support for Scriptable BERT tokenizer #1707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
May 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
873fc43
add classes
parmeet Apr 18, 2022
914cf71
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet Apr 24, 2022
82ac4d2
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet Apr 27, 2022
f17cc21
add basic functions
parmeet Apr 27, 2022
06ec41b
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet May 4, 2022
4276030
minor updates
parmeet May 4, 2022
e39826e
minor update
parmeet May 4, 2022
aaad788
added submodule
parmeet May 5, 2022
4f892e1
initial run
parmeet May 6, 2022
ef1b1f7
added pybinded transform and test structure
parmeet May 6, 2022
9414cda
fixed few bugs
parmeet May 8, 2022
ae6206b
fixed _is_control
parmeet May 9, 2022
aa543af
using python strip and removing it from C++
parmeet May 9, 2022
67c452f
fix UString type
parmeet May 10, 2022
d442d86
partially add code for scripting
parmeet May 10, 2022
434c002
add support for scripting
parmeet May 10, 2022
74f1231
minor edit
parmeet May 10, 2022
e67b513
minor edit
parmeet May 10, 2022
70d0fc8
remove chinese punctuation
parmeet May 11, 2022
3a27236
fix lint
parmeet May 11, 2022
1a77b8b
fix lint
parmeet May 11, 2022
a0caeb1
adding to_lower option
parmeet May 11, 2022
5cf80a1
Revert "adding to_lower option"
parmeet May 11, 2022
d0b4e7d
add to_lower option, need to fix unit test
parmeet May 11, 2022
653515c
update to_lower
parmeet May 12, 2022
806d67e
fix upper case tests
parmeet May 12, 2022
fc0a608
modify test suit
parmeet May 12, 2022
c26fb4b
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet May 16, 2022
2dd48ac
minor edits
parmeet May 16, 2022
9e4c098
fixed linter
parmeet May 16, 2022
ab177ea
undo changes in clip test
parmeet May 16, 2022
fa498e5
add fix for 3332 code point
parmeet May 17, 2022
6de32f1
fix lint
parmeet May 17, 2022
9b2038b
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet May 23, 2022
0a64f89
fix doc strings and C++ contructor initializer list
parmeet May 25, 2022
2387924
fix lint
parmeet May 25, 2022
5900bdf
Merge branch 'main' of github.com:pytorch/text into bert_tokenizer
parmeet May 25, 2022
abd81fc
Revert "fix doc strings and C++ contructor initializer list"
parmeet May 25, 2022
d940ecb
re-address comments w.r.t revert
parmeet May 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@
path = third_party/double-conversion
url = https://github.com/google/double-conversion
ignore = dirty
[submodule "third_party/utf8proc"]
path = third_party/utf8proc
url = https://github.com/JuliaStrings/utf8proc
Loading