How do I stream chat completions with OpenAI’s Python API? #2462
-
I'm using the official I tried this: response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}]
) But it waits until everything is returned. How can I make it stream the tokens one by one as they’re generated? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Great question! To stream chat completions using the Here’s how you can do it: import openai
openai.api_key = "your-api-key"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
stream=True # ✅ this enables streaming
)
for chunk in response:
if "choices" in chunk:
content = chunk["choices"][0]["delta"].get("content", "")
print(content, end="", flush=True) This will print the generated message token-by-token in real time. Let me know if that works — and feel free to mark this as the answer if it helps! ✅ |
Beta Was this translation helpful? Give feedback.
Great question!
To stream chat completions using the
openai
Python package, you need to setstream=True
and then iterate over the events.Here’s how you can do it:
This will print the generated message token-by-token in real time.
Let me know if that works — and feel free to mark this as the answer if it helps! ✅