Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
I have found and tried to diagnose an error which prevented around 50% of my documents to be added to a vector store. Specifically, certain plaintext and markdown files encounter the following problem when being added to a vector store: “The file type is not supported”.
However, simple (often one-character) changes can often circumvent this error. Below is a working example for reproducing the error, followed by an example that differs by just one character where the error does not occur.
Note: I have reported this issue here as well.
To Reproduce
Below are two MWEs with a difference of just one character. MWE1 fails whereas MWE2 succeeds.
MWE1 (produces error)
from openai import OpenAI
client = OpenAI(...)
content = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ac porttitor eros. Etiam quis neque nisi. Proin in turpis augue. Vivamus ullamcorper lobortis enim, a bibendum urna mollis feugiat. Phasellus tortor justo, laoreet non elementum et, blandit faucibus lacus. Etiam luctus convallis massa vel facilisis. Sed congue in nibh bibendum sodales. Cras in diam vel ligula molestie imperdiet gravida eleifend urna. Proin eu lectus et erat lacinia mollis in at turpis. Nam sodales orci neque, vitae feugiat risus rutrum quis. Donec porttitor egestas eros, sed pulvinar eros. Aenean lacinia orci lorem. Curabitur sem augue, interdum a cursus ut, blandit nec felis. Ut bibendum eros tempus tellus imperdiet interdum. Nunc commodo sodales mattis.
Here is some data:
17
0.32
0.26"""
# Upload file
file_id = client.files.create(
file=("mymarkdown.md", content.encode("utf-8"), "text/markdown"),
purpose="assistants", # purpose="user_data"
).id
# Create vector store
vector_store = client.vector_stores.create(file_ids=[file_id])
# Wait for file to be processed
sleep(1)
# List files in vector store
files = client.vector_stores.files.list(vector_store_id=vector_store.id)
if files.data[0].last_error:
print("Error:", files.data[0].last_error)
else:
print("Success")
Output:
Error: LastError(code='unsupported_file', message='The file type is not supported.')
MWE2 (does not produce error)
content = content.replace("0.32", "0.3")
... # remaining code same as above
Output:
Success
Note that the only difference between MWE1 and MWE2 is that MWE2 removes a single character (the 2
in 0.32
).
Code snippets
# see above
OS
macOS
Python version
Python v3.12
Library version
openai v1.88.0