Skip to content

Bogda01m/OpenAI batch mode #2367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bogdan01m
Copy link

@bogdan01m bogdan01m commented Jul 29, 2025

Add OpenAI batch processing support

Implements batch processing for OpenAI models to help reduce costs and handle rate limits when processing large volumes of requests.

Summary

  • Add pydantic_ai.batches.openai module with batch processing capabilities
  • Support for creating, submitting, and retrieving batch jobs
  • Integration with existing Agent API for seamless batch operations
  • Comprehensive documentation and examples

Key Features

  • Batch job creation and management
  • Cost reduction through batch pricing
  • Rate limit mitigation for high-volume processing

Test Plan

  • Unit tests for batch request creation and data models
  • Mock-based tests for OpenAI batch API workflow (create, status, retrieve)
  • Tests cover basic batches, tool usage, and structured output scenarios
  • Documentation examples validated against live API - work out of the box with API token

Closes #1771

Addresses a requested feature for high-volume request processing.
Planning to add Anthropic batch support next.

Copy link
Contributor

hyperlint-ai bot commented Jul 29, 2025

PR Change Summary

Implemented batch processing support for OpenAI models to enhance cost efficiency and manage high-volume requests.

  • Added a new module for OpenAI batch processing capabilities
  • Integrated batch job management with the existing Agent API
  • Included comprehensive documentation and usage examples
  • Introduced cost-saving features and rate limit mitigation for bulk requests

Modified Files

  • docs/models/openai.md

Added Files

  • docs/batches/index.md
  • docs/batches/openai.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

bogdan01m and others added 8 commits July 30, 2025 00:59
  - Add comprehensive mocks for OpenAI batch API in test_examples.py
  - Mock batches.create, batches.retrieve, files.create, and files.content methods
  - Add realistic batch result data in JSONL format
  - Add expected output comments to batch examples in documentation
  - Resolves ModelHTTPError and PytestUnraisableExceptionWarning issues
  - All batch-related tests now pass successfully
@DouweM
Copy link
Contributor

DouweM commented Jul 30, 2025

@bogdan01m Thanks for working on this!

I think we can make this feel a bit more Pydantic AI-native, though, and reuse more of the existing model request building and parsing logic from OpenAIModel.

Since a batch request by definition cannot be part of an agentic loop, the closest parallel we have right now is the Direct API: https://ai.pydantic.dev/direct/. Those methods do one single request to the model API by taking in the model object, a list of messages (ModelRequest and ModelResponse), plus additional parameters for things like too calls (ModelRequestParameters and ModelSettings), and returning a ModelResponse.

I would expect a Pydantic AI batch API to similarly take a model object and messages + parameters for each batch item as input, and then once the batch items complete, return ModelResponses for each as output.

OpenAIModel already contains logic for turning those objects into the API request body here (those arguments directly correspond to the API parameters per the SDK source), and for parsing the response body here.

We should be able to separate that logic from actually performing the chat-completions request, and also use it for building a batch request and parsing batch responses.

That also suggests the method to create a new batch could live on the existing OpenAIModel instead of a new class. It could be defined on Model with a default implementation raising NotImplementedError, so that subclasses can choose to implement it using the same function and return signature. (Of course each model can return its own subclass of some generic Batch object that knows how to check the status etc.)

That approach wouldn't immediately support output_type as that happens at a higher level than the model messages and request params, but we can provide new convenience functions (in a separate PR) that pull that logic out of Agent.

Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support batch processing
2 participants