-
Notifications
You must be signed in to change notification settings - Fork 288
Closed
Description
System Info
Official example
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Method 1: I deploy the service using the following method
model=BAAI/bge-m3
revision=refs/pr/5
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
when i request the localhost:8080/embed,get the result
[[-0.03707749,0.0060151797,-0.06545135,......]]
method two: use python code
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel('/workspace/bge-m3',use_fp16=True,device='cuda:0') # Setting use_fp16 to True speeds up computation with a slight performance degradation
sentences_1 = ["你好"]
embeddings_1 = model.encode(sentences_1,
batch_size=12,
max_length=8192, # If you don't need such a long length, you can set a smaller value to speed up the encoding process.
)['dense_vecs']
print(embeddings_1.tolist())
get the result
[[-0.03717041015625, 0.00618743896484375, -0.06524658203125,............]]
Why are the embeds obtained by the two methods different? Can I get the second embedding using text-embedding-inference?
Expected behavior
when use text-embedding-inference i want to get the result as [[-0.03717041015625, 0.00618743896484375, -0.06524658203125, -0.02508544921875,..........]]
Metadata
Metadata
Assignees
Labels
No labels