Generate speech from text
Generates audio from text using the specified provider and voice.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Text to synthesize.
TTS provider.
Provider-specific voice ID for the speech.
Provider-specific model ID (e.g., ‘sonic-2’ for Cartesia, ‘eleven_multilingual_v2’ for ElevenLabs).
Desired audio output format. This will determine the Content-Type of the response.
Language code for Cartesia TTS (e.g., ‘en’). If None, Cartesia attempts auto-detection.
Optional sample rate for Cartesia (e.g., 24000, 44100). Provider defaults used if None.
Optional bit rate for Cartesia lossy formats like MP3 (e.g., 128000). Provider defaults used if None. Not for PCM.
0-4. Optimize for streaming latency for ElevenLabs.
Stability for ElevenLabs voice settings.
Similarity boost for ElevenLabs voice settings.