Fix excessive token usage with Unicode text in realtime event serialization #2444
+175
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Non-ASCII characters in real-time event data (such as Cyrillic, Chinese, Arabic, etc.) were being unnecessarily escaped during JSON serialisation, causing significant token overhead.
This fix adds ensure_ascii=False to json.dumps() calls in real-time WebSocket event sending, preserving Unicode characters in their original form.
Token savings:
Fixes issue #2428 where Pydantic schema descriptions with Cyrillic text caused 3.6x token overhead.
The fix updates both sync and async realtime connection send() methods to use ensure_ascii=False, which is the modern standard for JSON serialisation with Unicode content.
Changes being requested
Additional context & links