-
Notifications
You must be signed in to change notification settings - Fork 467
Import feature-extraction
inference type from TEI
#781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good from a quick glance
@tomaarsen (or @osanseviero since I know you're working on feature-extraction lately) could I get a review on this PR please? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with this codebase nor TEI, but support for normalize
and prompt_name
in particular should be quite useful. I can't speak much on e.g. the keys/formats in the spec files.
"default": "false", | ||
"example": "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to the truncation of inputs, correct? As in, whether inputs get truncated to e.g. 512 tokens? If so, should this not be "true"
by default? In Sentence Transformer models, you can often get reduced performance when you exceed the recommended maximum sequence length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to the truncation of inputs, correct? As in, whether inputs get truncated to e.g. 512 tokens?
Yes
If so, should this not be "true" by default? In Sentence Transformer models, you can often get reduced performance when you exceed the recommended maximum sequence length.
Ping @OlivierDehaene here since it's more of a design choice in TEI
"example": "false", | ||
"nullable": true | ||
}, | ||
"truncation_direction": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is implemented in TEI, but I don't think I've ever seen left truncation on embedding models (which are normally bidirectional and non-causal). It's fine to be prepared for a "more causal" future, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @OlivierDehaene on that if you have more context
(I'll keep it here anyway since defined by TEI)
@@ -1,26 +1,47 @@ | |||
{ | |||
"$id": "/inference/schemas/feature-extraction/input.json", | |||
"$schema": "http://json-schema.org/draft-06/schema#", | |||
"description": "Inputs for Text Embedding inference", | |||
"description": "Feature Extraction Input.\n\nAuto-generated from TEI specs.\nFor more details, check out https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tei-import.ts.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, should we have a way to mark if an input is batchable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a broader discussion for other inference types I think. At first we took the decision to avoid batched inputs in specs for simplicity. We can revisit if we see more demand for it but that's not the case yet for what I've seen (or marginally).
}, | ||
"prompt_name": { | ||
"type": "string", | ||
"description": "The name of the prompt that should be used by for encoding. If not set, no prompt\nwill be applied.\n\nMust be a key in the `Sentence Transformers` configuration `prompts` dictionary.\n\nFor example if ``prompt_name`` is \"query\" and the ``prompts`` is {\"query\": \"query: \", ...},\nthen the sentence \"What is the capital of France?\" will be encoded as\n\"query: What is the capital of France?\" because the prompt text will be prepended before\nany text to encode.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be updated in TEI really
"description": "The name of the prompt that should be used by for encoding. If not set, no prompt\nwill be applied.\n\nMust be a key in the `Sentence Transformers` configuration `prompts` dictionary.\n\nFor example if ``prompt_name`` is \"query\" and the ``prompts`` is {\"query\": \"query: \", ...},\nthen the sentence \"What is the capital of France?\" will be encoded as\n\"query: What is the capital of France?\" because the prompt text will be prepended before\nany text to encode.", | |
"description": "The name of the prompt that should be used by for encoding. If not set, no prompt\nwill be applied.\n\nMust be a key in the `sentence-transformers` configuration `prompts` dictionary.\n\nFor example if `prompt_name` is \"query\" and the `prompts` is {\"query\": \"query: \", ...},\nthen the sentence \"What is the capital of France?\" will be encoded as\n\"query: What is the capital of France?\" because the prompt text will be prepended before\nany text to encode.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a PR: huggingface/text-embeddings-inference#342
*/ | ||
parameters?: { [key: string]: unknown }; | ||
prompt_name?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is ok, although I'm a bit worried about the inconsistencies between TEI and inference API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would prefer not to change this for other tasks as well (we have exceptions for text-generation / chat-completion / feature-extraction now 😕 ). Next time we start building a new framework for a task let's think about specs straight away
Thanks for the reviews! Most comment are about TEI (which was expected^^). I addressed/reply where I can. Is there any blockers before merging this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead 🔥
This PR adds a script to import
feature-extraction
inference types from text-embeddings-inference. The jsonschema is pulled from https://huggingface.github.io/text-embeddings-inference/openapi.json and converted into the JSONSchema format from which we generate types from the JS and Python clients. This script is highly inspired on the TGI importer script.This PR also add
prompt_name
input parameter that has been newly added to TEI (see huggingface/text-embeddings-inference#312).Decisions taken:
string
as input. In theory TEI is capable of handling much more complex inputs (Union[List[Union[List[int], int, str]], str]
) but let's keep it simple for now. Other inference tasks are also currently defined without arrays even when InferenceAPI/Endpoints is capable of it./embed
route, which is the closest one tofeature-extraction
task.Note: in a follow-up PR it would be really nice to put this in a CI workflow that could be triggered manually to open a PR when new arguments are added to TGI / TEI.