Skip to content

Add text-to-speech documentation for Genkit #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 15, 2025

Conversation

chrisraygill
Copy link
Collaborator

Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.

Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.
@chrisraygill chrisraygill requested review from mbleigh and pavelgj June 5, 2025 01:15
Update Imagen2 to Imagen, change example prompt text, and remove multi-speaker audio generation section to simplify documentation.
Enhances readability by formatting the provider-specific configuration note as a proper note block in the models documentation.
Update the models.mdx documentation to enhance the media generation section with better formatting, clearer explanations, and improved code examples for both image and audio generation.
// Handle the audio data (returned as a data URL)
if (response.media?.url) {
// Extract base64 data from the data URL
const audioBuffer = Buffer.from(
Copy link
Contributor

@pavelgj pavelgj Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test this? I don't think this would work because gemini returns audio in PCM format, it needs to be converted to WAV. Ex:

// npm i wav && npm i --save-dev @types/wav
import wav from 'wav';

async function saveWaveFile(
  filename: string,
  pcmData: Buffer,
  channels = 1,
  rate = 24000,
  sampleWidth = 2
) {
  return new Promise((resolve, reject) => {
    const writer = new wav.FileWriter(filename, {
      channels,
      sampleRate: rate,
      bitDepth: sampleWidth * 8,
    });

    writer.on('finish', resolve);
    writer.on('error', reject);

    writer.write(pcmData);
    writer.end();
  });
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my experience with GenAI sdk testing of audio you need to convert using code in your app

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot to include the converter function from the sample...

Ugh, that's ugly.

@pavelgj pavelgj merged commit 9f85c0d into main Jul 15, 2025
1 check passed
@pavelgj pavelgj deleted the document-audio-support-clean branch July 15, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants