Fix excessive token usage with Unicode text in realtime event serialization #2444

josharsh · 2025-07-04T21:11:43Z

Non-ASCII characters in real-time event data (such as Cyrillic, Chinese, Arabic, etc.) were being unnecessarily escaped during JSON serialisation, causing significant token overhead.

This fix adds ensure_ascii=False to json.dumps() calls in real-time WebSocket event sending, preserving Unicode characters in their original form.

Token savings:

54-60% size reduction for Unicode-heavy schemas
~116+ tokens saved per typical function schema with Cyrillic descriptions
Backwards compatible - outputs valid JSON that parses identically

Fixes issue #2428 where Pydantic schema descriptions with Cyrillic text caused 3.6x token overhead.

The fix updates both sync and async realtime connection send() methods to use ensure_ascii=False, which is the modern standard for JSON serialisation with Unicode content.

I understand that this repository is auto-generated and my pull request may not be merged

Changes being requested

Additional context & links

…zation Non-ASCII characters in realtime event data (such as Cyrillic, Chinese, Arabic, etc.) were being unnecessarily escaped during JSON serialization, causing significant token overhead. This fix adds ensure_ascii=False to json.dumps() calls in realtime WebSocket event sending, preserving Unicode characters in their original form. Token savings: - 54-60% size reduction for Unicode-heavy schemas - ~116+ tokens saved per typical function schema with Cyrillic descriptions - Backward compatible - outputs valid JSON that parses identically Fixes issue openai#2428 where Pydantic schema descriptions with Cyrillic text caused 3.6x token overhead. The fix updates both sync and async realtime connection send() methods to use ensure_ascii=False, which is the modern standard for JSON serialization with Unicode content.

josharsh requested a review from a team as a code owner July 4, 2025 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix excessive token usage with Unicode text in realtime event serialization #2444

Fix excessive token usage with Unicode text in realtime event serialization #2444

josharsh commented Jul 4, 2025

Uh oh!

Uh oh!

Fix excessive token usage with Unicode text in realtime event serialization #2444

Are you sure you want to change the base?

Fix excessive token usage with Unicode text in realtime event serialization #2444

Conversation

josharsh commented Jul 4, 2025

Changes being requested

Additional context & links

Uh oh!

Uh oh!