[llm] Support different shape of input_pos #11869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

larryliu0820 wants to merge 9 commits into gh/larryliu0820/67/base from gh/larryliu0820/67/head

+354 −59

Contributor

larryliu0820 commented Jun 24, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

For huggingface models, forward() is taking tokens as well as cache_positions, which is a list of cache indices. This is different than the .pte files export_llama gives, which are taking tokens and input_pos where input_pos is a scalar tensor.

This PR adds support inside text_decoder_runner.cpp to handle both shapes of input_pos/cache_positions.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in input_pos or cache_position.

Differential Revision: D77203700


          [llm] Support different shape of input_pos

58e2792

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 requested review from kirklandsign, JacobSzwejbka, lucylq, swolchok, jackzhxng and mergennachin as code owners

June 24, 2025 05:07

larryliu0820 mentioned this pull request

[llm] Add arange() tensor maker API #11861

Open

pytorch-bot bot commented Jun 24, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11869

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9481d79 with merge base 222d9e3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

84ef77a

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

ghstack-source-id: 292265667
Pull Request resolved: #11869

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Jun 24, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700

facebook-github-bot added the fb-exported label

larryliu0820 added the release notes: llm label


          Update on "[llm] Support different shape of input_pos"

1ac0a14

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 requested a review from manuelcandales as a code owner

June 24, 2025 22:35

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

db8a437

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292466463
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 24, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

e1744fe

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

2189e3d

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292469061
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 24, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700

kimishpatel approved these changes

View reviewed changes

guangy10 approved these changes

View reviewed changes


          Update on "[llm] Support different shape of input_pos"

2272fc5

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

bb6bddb

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292517675
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

9a698d7

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

cfba98c

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292523573
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

6f07be3

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

0a46fd5

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292529628
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

3b95ef7

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

b792fc9

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292529864
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

64aab38

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

174bf10

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292546578
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

9481d79

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

6ae4290

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292560636
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

Contributor

facebook-github-bot commented Jun 25, 2025

This pull request was exported from Phabricator. Differential Revision: D77203700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

kimishpatel kimishpatel approved these changes

guangy10 guangy10 approved these changes

kirklandsign Awaiting requested review from kirklandsign kirklandsign is a code owner

JacobSzwejbka Awaiting requested review from JacobSzwejbka

lucylq Awaiting requested review from lucylq lucylq is a code owner

swolchok Awaiting requested review from swolchok swolchok is a code owner

jackzhxng Awaiting requested review from jackzhxng jackzhxng is a code owner

mergennachin Awaiting requested review from mergennachin mergennachin is a code owner

manuelcandales Awaiting requested review from manuelcandales manuelcandales is a code owner

Labels

CLA Signed fb-exported release notes: llm