-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hi HF folks,
We noticed that the deployment success rate of the TEI models with the HF text-embedding-inference
container is very low for the recent top trending TEI models, in our deployment verification pipeline. We see only 23 out of a total of 500 were successful.
Example deployment
IMAGE=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204
MODEL_ID=ncbi/MedCPT-Article-Encoder
docker run --gpus all -p 7080:80 -e MODEL_ID=${MODEL_ID} --pull always $IMAGE
We looked into the root cause of the failures, and we found the following two common failures:
- Error: The
--pooling
arg is not set and we could not find a pooling configuration (1_Pooling/config.json
) for this model.
My understanding is that the container expects a config file1_Pooling/config.json
in the repo, but the model owner failed to provide one. In this case, what is your suggestion should we do here?
We tried adding a default value of pooling=mean
as an environment variable to the container and we see a significant improvement of the deployment success rate. 23/500 -> 243/500.
We know the pooling parameter is required for the TEI models. However, we are not quite sure if it is the correct way to add a default value for all the TEI models and have a concern if it has a negative impact on the model quality. Can you advise how should we do here?
- Caused by:
0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/facebook/MEXMA/resolve/main/tokenizer.json)
1: HTTP status client error (404 Not Found) for url (https://huggingface.co/facebook/MEXMA/resolve/main/tokenizer.json)
My understanding is that the container expects atokenizer.json
file in the repo but the model owner failed to provide one.
Looking on the model card, the model is a XLM-RoBERTa model, I guess it is the reason the repo does not have a tokenizer.json file? To use the model, the user also needs to load the tokenizer first
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
There are many models failed with the same error, can you advise if there is anything we can do here? we also have a concern that if we provide a default tokenizer will negatively impact on the model quality.
Thanks!