Skip to content

output_type=List[str] silently disables “thinking” in Qwen-3 (and similar models), cutting run-time by ~50 % compared with output_type=str #902

Open
@1nterstellar-JD

Description

@1nterstellar-JD

TL;DR

When I set output_type=List[str] on an agents.Agent, Qwen-3 (and other “thinking”-capable models) return almost instantly and never include their internal reasoning.
With output_type=str the same prompt yields visible “thinking” content and a much longer latency.
It feels like the SDK is implicitly turning off the model’s reasoning mode whenever the output is structured.

Steps to Reproduce

When using output_type=List[str]

agent = Agent(
    name="Task Decomposition",
    instructions=(
        "You are a professional task decomposition assistant, skilled at breaking down a user's question or goal into clear, specific, and actionable subtasks.\n"
        "Please follow these rules:\n"
        "1. Each subtask should be as independent and concise as possible, suitable for execution on its own.\n"
        "2. Subtasks must be explicit action or query commands — avoid vague or abstract descriptions.\n"
        "3. The output format must be a list of strings, with each item representing one subtask.\n"
        "Example input: 'What are the temperature and coordinates of Beijing?'\n"
        "Example output: ['Query the current temperature in Beijing', 'Query the coordinates of Beijing']"
    ),
    model=model,
    output_type=List[str],
)

Output: (Execution time: 4.6 seconds.)

['Determine which type of nonimmigrant visa is appropriate for the visit to the United States',
 'Check the specific requirements for the chosen visa type including necessary documents and fees',
 'Schedule an interview at the nearest U.S. embassy or consulate',
 'Complete the DS-160 form online and print the confirmation page',
 'Pay the non-refundable visa application fee',
 'Prepare all required documents for the visa interview',
 'Attend the visa interview at the embassy or consulate and answer questions truthfully',
 'Wait for the visa decision and follow any additional instructions if required']

When using output_type=str

agent = Agent(
    name="Task Decomposition",
    instructions=(
        "You are a professional task decomposition assistant, skilled at breaking down a user's question or goal into clear, specific, and actionable subtasks.\n"
        "Please follow these rules:\n"
        "1. Each subtask should be as independent and concise as possible, suitable for execution on its own.\n"
        "2. Subtasks must be explicit action or query commands — avoid vague or abstract descriptions.\n"
        "3. The output format must be a list of strings, with each item representing one subtask.\n"
        "Example input: 'What are the temperature and coordinates of Beijing?'\n"
        "Example output: ['Query the current temperature in Beijing', 'Query the coordinates of Beijing']"
    ),
    model=model,
    output_type=str,
)

Output: (Execution time: 10.4 seconds.)

"<think>\nOkay, the user is asking how to apply for a U.S. tourist visa. Let me break this down into steps. 
First, they need to check if they're eligible. That's a common starting point. Then, they have to determine the correct visa type, which is usually B1/B2 for tourists. 
Next, they need to complete the DS-160 form online. After that, pay the non-refundable fee. Scheduling an interview is next, so they need to book an appointment. Preparing the required documents is cruciallike passport, DS-160 confirmation, fee receipt, and others. 
Then, attending the interview at the embassy or consulate. After the interview, they might have to wait for a decision. If approved, they receive the visa and then travel. 
Each step is a specific action they can take, so I'll list them as subtasks.\n</think>
\n\n
['Check eligibility for a U.S. tourist visa based on nationality and purpose of visit', 
'Determine the correct visa classification (e.g., B1/B2 for tourism)', 
'Complete the DS-160 online visa application form and print confirmation page', 
'Pay the non-refundable visa application fee at the designated bank', 
'Schedule a visa interview appointment at the nearest U.S. embassy or consulate', 
'Gather required documents (passport, DS-160 confirmation, fee receipt, financial proof, travel itinerary, etc.)', 
'Attend the visa interview and answer questions about travel plans and ties to home country', 
'Wait for visa approval decision (processing time varies by location)', 
'Receive visa in passport if approved', 
'Prepare for travel by reviewing visa conditions and entry requirements']"

Expected Behaviour

• output_type should only change the format of the final answer, not whether the model goes through (or suppresses) its reasoning steps.
• Latency shouldn’t change by an order of magnitude solely because the return type changes.

Additional Context / Hypothesis

• Docs say that setting output_type puts the model into structured-output (JSON) mode.
• Qwen-3’s docs show an enable_thinking flag that can be hard-disabled by the caller.
• Maybe the Agents SDK’s structured-output path is automatically sending a “no-thinking” directive (or equivalent chat-template) under the hood?
Would love confirmation or guidance on how to keep reasoning visible while still using structured outputs. ❓

openai-agents
0.0.16
Python
3.12.9
Model(s)
qwen3-8b, glm-z1-32b (same effect)
Hardware
RTX A6000 (vLLM 0.8.5.post1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions