Skip to content

"unsupported_file" error when creating vector store with certain plain-text/markdown files #2439

Open
@georg-wolflein

Description

@georg-wolflein

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

I have found and tried to diagnose an error which prevented around 50% of my documents to be added to a vector store. Specifically, certain plaintext and markdown files encounter the following problem when being added to a vector store: “The file type is not supported”.

However, simple (often one-character) changes can often circumvent this error. Below is a working example for reproducing the error, followed by an example that differs by just one character where the error does not occur.

Note: I have reported this issue here as well.

To Reproduce

Below are two MWEs with a difference of just one character. MWE1 fails whereas MWE2 succeeds.

MWE1 (produces error)

from openai import OpenAI

client = OpenAI(...)

content = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ac porttitor eros. Etiam quis neque nisi. Proin in turpis augue. Vivamus ullamcorper lobortis enim, a bibendum urna mollis feugiat. Phasellus tortor justo, laoreet non elementum et, blandit faucibus lacus. Etiam luctus convallis massa vel facilisis. Sed congue in nibh bibendum sodales. Cras in diam vel ligula molestie imperdiet gravida eleifend urna. Proin eu lectus et erat lacinia mollis in at turpis. Nam sodales orci neque, vitae feugiat risus rutrum quis. Donec porttitor egestas eros, sed pulvinar eros. Aenean lacinia orci lorem. Curabitur sem augue, interdum a cursus ut, blandit nec felis. Ut bibendum eros tempus tellus imperdiet interdum. Nunc commodo sodales mattis.
Here is some data:
17
0.32
0.26"""

# Upload file
file_id = client.files.create(
    file=("mymarkdown.md", content.encode("utf-8"), "text/markdown"),
    purpose="assistants",  # purpose="user_data"
).id

# Create vector store
vector_store = client.vector_stores.create(file_ids=[file_id])

# Wait for file to be processed
sleep(1)

# List files in vector store
files = client.vector_stores.files.list(vector_store_id=vector_store.id)

if files.data[0].last_error:
    print("Error:", files.data[0].last_error)
else:
    print("Success")

Output:

Error: LastError(code='unsupported_file', message='The file type is not supported.')

MWE2 (does not produce error)

content = content.replace("0.32", "0.3")
... # remaining code same as above

Output:

Success

Note that the only difference between MWE1 and MWE2 is that MWE2 removes a single character (the 2 in 0.32).

Code snippets

# see above

OS

macOS

Python version

Python v3.12

Library version

openai v1.88.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions