Generates audio from text using the specified provider and voice.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Provider-specific voice ID for the speech.
Provider-specific model ID (e.g., ‘sonic-2’ for Cartesia, ‘eleven_multilingual_v2’ for ElevenLabs, ‘gpt-4o-mini-tts’ for OpenAI).
Desired audio output format. This will determine the Content-Type of the response.
cartesia_language
string | null
default:"en"
Language code for Cartesia TTS (e.g., ‘en’). If None, Cartesia attempts auto-detection.
Optional sample rate for Cartesia (e.g., 24000, 44100). Provider defaults used if None.
Optional bit rate for Cartesia lossy formats like MP3 (e.g., 128000). Provider defaults used if None. Not for PCM.
elevenlabs_optimize_streaming_latency
0-4. Optimize for streaming latency for ElevenLabs.
elevenlabs_voice_settings_stability
Stability for ElevenLabs voice settings.
elevenlabs_voice_settings_similarity_boost
Similarity boost for ElevenLabs voice settings.
curl --request POST \
--url 'https://api.unify.ai/v0/assistant/voice/generate' \
--header "Authorization: Bearer $UNIFY_KEY" \
--header 'Content-Type: application/json' \
--data '{}'