TutorialsAudioText-to-Speech & Voice Cloning
IntermediateUpdated Dec 14, 2025

Text-to-Speech & Voice Cloning

Create natural-sounding speech and clone voices from audio samples for personalized voice synthesis.

JL
Jordan Lee
Platform Engineer
8 min read

Introduction

Text-to-speech (TTS) has evolved dramatically. Modern AI voices are nearly indistinguishable from humans, and voice cloning lets you create custom voices from short samples.

Basic Text-to-Speech

python
from abstrakt import AbstraktClient

client = AbstraktClient()

result = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "Hello! Welcome to Abstrakt, the AI platform for developers.",
        "voice": "alloy"
    }
})

print(f"Audio URL: {result.audio.url}")

Available Voices

VoiceStyleBest For
alloyNeutralGeneral purpose
echoMaleNarration
fableExpressiveStorytelling
onyxDeepAnnouncements
novaFemaleAssistants
shimmerWarmConversations

Voice Selection

python
# Professional narration
{"voice": "onyx", "text": "Welcome to our quarterly report..."}

# Friendly assistant
{"voice": "nova", "text": "Hi there! How can I help you today?"}

# Storytelling
{"voice": "fable", "text": "Once upon a time, in a land far away..."}

Controlling Speech

Speed Control

python
{
    "input": {
        "text": "This is spoken at normal speed",
        "speed": 1.0  # 0.5 to 2.0
    }
}

# Slower for clarity
{"speed": 0.8}

# Faster for efficiency
{"speed": 1.2}

SSML Support

python
{
    "input": {
        "text": '''
            <speak>
                <p>This is a paragraph with a <break time="500ms"/> pause.</p>
                <p>This word is <emphasis level="strong">emphasized</emphasis>.</p>
            </speak>
        ''',
        "format": "ssml"
    }
}

Voice Cloning

Clone a voice from an audio sample:

python
# Step 1: Upload reference audio
clone_result = client.run("fal-ai/voice-clone", {
    "input": {
        "reference_audio_url": "https://example.com/voice-sample.mp3",
        "name": "custom_voice"
    }
})

voice_id = clone_result.voice_id

# Step 2: Use cloned voice
speech = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "This is my cloned voice speaking!",
        "voice_id": voice_id
    }
})

Voice Cloning Requirements

  • Audio length: 10-30 seconds ideal
  • Quality: Clear, minimal background noise
  • Format: MP3, WAV, M4A
  • Content: Natural speech, not singing

Batch Audio Generation

python
async def generate_audiobook(chapters):
    audio_urls = []
    
    for chapter in chapters:
        result = await client.run_async("fal-ai/text-to-speech", {
            "input": {
                "text": chapter["content"],
                "voice": "fable"
            }
        })
        audio_urls.append({
            "title": chapter["title"],
            "url": result.audio.url
        })
    
    return audio_urls

Output Formats

python
# MP3 (default, smaller files)
{"output_format": "mp3"}

# WAV (higher quality)
{"output_format": "wav"}

# OGG (web optimized)
{"output_format": "ogg"}

Best Practices

  1. Keep segments short - Break long text into paragraphs
  2. Use punctuation - Helps with natural pacing
  3. Test voices - Different voices suit different content
  4. Quality samples - Better input = better clone
  5. Review output - AI can mispronounce unusual words

Next Steps

#audio#text-to-speech#voice-cloning#tts