Add text-to-speech documentation for Genkit #55

chrisraygill · 2025-06-05T01:14:52Z

Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.

src/content/docs/docs/models.mdx

Update Imagen2 to Imagen, change example prompt text, and remove multi-speaker audio generation section to simplify documentation.

Enhances readability by formatting the provider-specific configuration note as a proper note block in the models documentation.

Update the models.mdx documentation to enhance the media generation section with better formatting, clearer explanations, and improved code examples for both image and audio generation.

src/content/docs/docs/models.mdx

Co-authored-by: Pavel Jbanov <[email protected]>

pavelgj · 2025-06-05T18:54:12Z

src/content/docs/docs/models.mdx

+// Handle the audio data (returned as a data URL)
+if (response.media?.url) {
+  // Extract base64 data from the data URL
+  const audioBuffer = Buffer.from(


did you test this? I don't think this would work because gemini returns audio in PCM format, it needs to be converted to WAV. Ex:

// npm i wav && npm i --save-dev @types/wav import wav from 'wav'; async function saveWaveFile( filename: string, pcmData: Buffer, channels = 1, rate = 24000, sampleWidth = 2 ) { return new Promise((resolve, reject) => { const writer = new wav.FileWriter(filename, { channels, sampleRate: rate, bitDepth: sampleWidth * 8, }); writer.on('finish', resolve); writer.on('error', reject); writer.write(pcmData); writer.end(); }); }

this is my experience with GenAI sdk testing of audio you need to convert using code in your app

Yeah, I forgot to include the converter function from the sample...

Ugh, that's ugly.

Add text-to-speech documentation for Genkit

d11da63

Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.

chrisraygill requested review from mbleigh and pavelgj June 5, 2025 01:15

mbleigh reviewed Jun 5, 2025

View reviewed changes

src/content/docs/docs/models.mdx Outdated Show resolved Hide resolved

chrisraygill added 3 commits June 5, 2025 14:07

docs: update model references and simplify TTS example

c606e67

Update Imagen2 to Imagen, change example prompt text, and remove multi-speaker audio generation section to simplify documentation.

docs: wrap model configuration note in note block

ba1c910

Enhances readability by formatting the provider-specific configuration note as a proper note block in the models documentation.

docs: improve media generation documentation with clearer structure

33b6173

Update the models.mdx documentation to enhance the media generation section with better formatting, clearer explanations, and improved code examples for both image and audio generation.

pavelgj reviewed Jun 5, 2025

View reviewed changes

src/content/docs/docs/models.mdx Outdated Show resolved Hide resolved

Update src/content/docs/docs/models.mdx

850346d

Co-authored-by: Pavel Jbanov <[email protected]>

pavelgj reviewed Jun 5, 2025

View reviewed changes

pavelgj added 5 commits July 14, 2025 22:23

Merge branch 'main' into document-audio-support-clean

3e43707

undo

5a3b07f

undoundo

45856bf

add pcm->wav coversion

8829282

fmt

00459e9

pavelgj approved these changes Jul 15, 2025

View reviewed changes

pavelgj merged commit 9f85c0d into main Jul 15, 2025
1 check passed

pavelgj deleted the document-audio-support-clean branch July 15, 2025 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add text-to-speech documentation for Genkit #55

Add text-to-speech documentation for Genkit #55

Uh oh!

chrisraygill commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

pavelgj Jun 5, 2025 •

edited

Loading

Uh oh!

LyalinDotCom Jun 6, 2025

Uh oh!

chrisraygill Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Add text-to-speech documentation for Genkit #55

Add text-to-speech documentation for Genkit #55

Uh oh!

Conversation

chrisraygill commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

pavelgj Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LyalinDotCom Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

chrisraygill Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pavelgj Jun 5, 2025 •

edited

Loading