Generate speech from text - Unify Documentation

Generates audio from text using the specified provider and voice.

Authorizations

Authorization

string

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

text

string

required

Text to synthesize.

provider

string

required

TTS provider.

voice_id

string

required

Provider-specific voice ID for the speech.

model_id

string | null

Provider-specific model ID (e.g., ‘sonic-2’ for Cartesia, ‘eleven_multilingual_v2’ for ElevenLabs, ‘gpt-4o-mini-tts’ for OpenAI).

output_format

string

default:"mp3"

Desired audio output format. This will determine the Content-Type of the response.

cartesia_language

string | null

default:"en"

Language code for Cartesia TTS (e.g., ‘en’). If None, Cartesia attempts auto-detection.

cartesia_sample_rate

integer | null

Optional sample rate for Cartesia (e.g., 24000, 44100). Provider defaults used if None.

cartesia_bit_rate

integer | null

Optional bit rate for Cartesia lossy formats like MP3 (e.g., 128000). Provider defaults used if None. Not for PCM.

elevenlabs_optimize_streaming_latency

integer | null

0-4. Optimize for streaming latency for ElevenLabs.

elevenlabs_voice_settings_stability

number | null

Stability for ElevenLabs voice settings.

elevenlabs_voice_settings_similarity_boost

number | null

Similarity boost for ElevenLabs voice settings.

curl --request POST \
  --url 'https://api.unify.ai/v0/assistant/voice/generate' \
  --header "Authorization: Bearer $UNIFY_KEY" \
  --header 'Content-Type: application/json' \
  --data '{}'

{}

​Authorizations

​Body

Authorizations

Body