-
Notifications
You must be signed in to change notification settings - Fork 8
Add text-to-speech documentation for Genkit #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.
Update Imagen2 to Imagen, change example prompt text, and remove multi-speaker audio generation section to simplify documentation.
Enhances readability by formatting the provider-specific configuration note as a proper note block in the models documentation.
Update the models.mdx documentation to enhance the media generation section with better formatting, clearer explanations, and improved code examples for both image and audio generation.
Co-authored-by: Pavel Jbanov <[email protected]>
// Handle the audio data (returned as a data URL) | ||
if (response.media?.url) { | ||
// Extract base64 data from the data URL | ||
const audioBuffer = Buffer.from( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you test this? I don't think this would work because gemini returns audio in PCM format, it needs to be converted to WAV. Ex:
// npm i wav && npm i --save-dev @types/wav
import wav from 'wav';
async function saveWaveFile(
filename: string,
pcmData: Buffer,
channels = 1,
rate = 24000,
sampleWidth = 2
) {
return new Promise((resolve, reject) => {
const writer = new wav.FileWriter(filename, {
channels,
sampleRate: rate,
bitDepth: sampleWidth * 8,
});
writer.on('finish', resolve);
writer.on('error', reject);
writer.write(pcmData);
writer.end();
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is my experience with GenAI sdk testing of audio you need to convert using code in your app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forgot to include the converter function from the sample...
Ugh, that's ugly.
Add documentation for audio generation capabilities in Genkit, including basic TTS usage and multi-speaker configurations. Update model documentation and provider-specific guides for Google GenAI and Vertex AI.