Skip to content

Add support for batched decoding api #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 3, 2023
Merged

Conversation

abetlen
Copy link
Owner

@abetlen abetlen commented Oct 5, 2023

llama.cpp recently moved to a new api which supports batching (both for single sequences with multiple outputs and multiple seperate streams) and streaming support. This new api based on llama_decode supercedes the now deprecated llama_eval api. This means that the current api should be migrated anyways regardless of the new features but we'll see how easy it is to implement along the way.

@flexorRegev
Copy link

is there a way to help with this one?

@zpzheng
Copy link

zpzheng commented Oct 26, 2023

Is this feature live yet?Why can't I support batch tasks locally?

@abetlen abetlen marked this pull request as ready for review November 3, 2023 00:12
@abetlen abetlen merged commit ab028cb into main Nov 3, 2023
@abetlen abetlen deleted the add-support-for-llama-batch branch November 14, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants