-
Notifications
You must be signed in to change notification settings - Fork 290
Closed
Description
System Info
System details:
System: Linux
AWS Instance: m5.xlarge
CPU: 4
memory: 16GB
I ran BAAI/bge-base-en-v1.5
embedding model using text-embeddings-inference (TEI) hosting it as a docker container using below commands:
model=BAAI/bge-base-en-v1.5
revision=refs/pr/5
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-0.6 --model-id $model
Now, I used below code to process a pdf to generate it's embeddings:
import requests
import fitz
import time
import traceback
import json
import boto3
import numpy as np
url = "http://127.0.0.1:8080/embed"
headers = {"Content-Type": "application/json"}
s3 = boto3.client('s3')
def process_pdf(buf):
try:
start_time = time.time()
pdf_document = fitz.open(stream=buf, filetype="pdf")
page_count = pdf_document.page_count
for page_num in range(page_count):
page = pdf_document[page_num]
text = page.get_text()
words = text.split()
unique_words = list(set(words))
unique_text = ' '.join(unique_words)
data = {"inputs": unique_text}
embeddings = requests.post(url, json=data, headers=headers)
embeddings = np.array([embeddings])
pdf_document.close()
embed_time = time.time()
print(embed_time - start_time)
except Exception as e:
print(f"An error occurred while processing {s3_path}: {str(e)}")
traceback.print_exc() # Print the traceback information
return
def process_data_sources(s3_path):
try:
if s3_path.endswith('.pdf'):
# Get the PDF file content from S3
print("Processing:", s3_path)
response = s3.get_object(Bucket="state-medicaid-public-pending", Key=s3_path)
buf = response['Body'].read()
# Process the PDF file
process_pdf(buf)
except Exception as e:
print(f"Outer exception occurred: {e}")
traceback.print_exc() # Print traceback information
def main():
s3_path = 'crawldata/2024013001/KY/content/KY-1-RR%20318.pdf'
process_data_sources(s3_path)
if __name__ == "__main__":
main()
It took around 19
seconds
Then I ran same document without TEI:
import fitz
import time
import os
import traceback
import json
import boto3
import pandas as pd
import numpy as np
from llama_index.embeddings import HuggingFaceEmbedding
s3 = boto3.client('s3')
embedding_model = 'BAAI/bge-base-en-v1.5'
# Define all embeddings and rerankers
EMBEDDINGS = {
"embed-model": HuggingFaceEmbedding(model_name=embedding_model, pooling='cls', trust_remote_code=True),
}
embed_model = EMBEDDINGS['embed-model']
def process_pdf(buf):
try:
start_time = time.time()
pdf_document = fitz.open(stream=buf, filetype="pdf")
page_count = pdf_document.page_count
for page_num in range(page_count):
page = pdf_document[page_num]
text = page.get_text()
words = text.split()
unique_words = list(set(words))
unique_text = ' '.join(unique_words)
embeddings = embed_model.get_text_embedding(unique_text)
embeddings = np.array([embeddings])
pdf_document.close()
embed_time = time.time()
print(embed_time - start_time)
except Exception as e:
print(f"An error occurred while processing {s3_path}: {str(e)}")
traceback.print_exc() # Print the traceback information
return
def process_data_sources(s3_path):
try:
if s3_path.endswith('.pdf'):
# Get the PDF file content from S3
print("Processing:", s3_path)
response = s3.get_object(Bucket="state-medicaid-public-pending", Key=s3_path)
buf = response['Body'].read()
# Process the PDF file
process_pdf(buf)
except Exception as e:
print(f"Outer exception occurred: {e}")
traceback.print_exc() # Print traceback information
def main():
s3_path = 'crawldata/2024013001/KY/content/KY-1-RR%20318.pdf'
process_data_sources(s3_path)
if __name__ == "__main__":
main()
It took around 15
seconds.
I'm seeing slower inference speed using TEI engine. Am I missing something or is it slower?
Dependencies:
boto3
pymupdf
transformers
llama_index
torch
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Updated above
Expected behavior
The inference speed should be faster.
Metadata
Metadata
Assignees
Labels
No labels