Tutorials - Abstrakt

Introduction

Text-to-speech (TTS) has evolved dramatically. Modern AI voices are nearly indistinguishable from humans, and voice cloning lets you create custom voices from short samples.

Basic Text-to-Speech

python

from abstrakt import AbstraktClient

client = AbstraktClient()

result = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "Hello! Welcome to Abstrakt, the AI platform for developers.",
        "voice": "alloy"
    }
})

print(f"Audio URL: {result.audio.url}")

Available Voices

Voice	Style	Best For
alloy	Neutral	General purpose
echo	Male	Narration
fable	Expressive	Storytelling
onyx	Deep	Announcements
nova	Female	Assistants
shimmer	Warm	Conversations

Voice Selection

python

# Professional narration
{"voice": "onyx", "text": "Welcome to our quarterly report..."}

# Friendly assistant
{"voice": "nova", "text": "Hi there! How can I help you today?"}

# Storytelling
{"voice": "fable", "text": "Once upon a time, in a land far away..."}

Controlling Speech

Speed Control

python

{
    "input": {
        "text": "This is spoken at normal speed",
        "speed": 1.0  # 0.5 to 2.0
    }
}

# Slower for clarity
{"speed": 0.8}

# Faster for efficiency
{"speed": 1.2}

SSML Support

python

{
    "input": {
        "text": '''
            <speak>
                <p>This is a paragraph with a <break time="500ms"/> pause.</p>
                <p>This word is <emphasis level="strong">emphasized</emphasis>.</p>
            </speak>
        ''',
        "format": "ssml"
    }
}

Voice Cloning

Clone a voice from an audio sample:

python

# Step 1: Upload reference audio
clone_result = client.run("fal-ai/voice-clone", {
    "input": {
        "reference_audio_url": "https://example.com/voice-sample.mp3",
        "name": "custom_voice"
    }
})

voice_id = clone_result.voice_id

# Step 2: Use cloned voice
speech = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "This is my cloned voice speaking!",
        "voice_id": voice_id
    }
})

Voice Cloning Requirements

Audio length: 10-30 seconds ideal
Quality: Clear, minimal background noise
Format: MP3, WAV, M4A
Content: Natural speech, not singing

Batch Audio Generation

python

async def generate_audiobook(chapters):
    audio_urls = []
    
    for chapter in chapters:
        result = await client.run_async("fal-ai/text-to-speech", {
            "input": {
                "text": chapter["content"],
                "voice": "fable"
            }
        })
        audio_urls.append({
            "title": chapter["title"],
            "url": result.audio.url
        })
    
    return audio_urls

Output Formats

python

# MP3 (default, smaller files)
{"output_format": "mp3"}

# WAV (higher quality)
{"output_format": "wav"}

# OGG (web optimized)
{"output_format": "ogg"}

Best Practices

Keep segments short - Break long text into paragraphs
Use punctuation - Helps with natural pacing
Test voices - Different voices suit different content
Quality samples - Better input = better clone
Review output - AI can mispronounce unusual words

Next Steps

Create AI music
Set up webhooks for long audio
Learn batch processing

Introduction

Text-to-speech (TTS) has evolved dramatically. Modern AI voices are nearly indistinguishable from humans, and voice cloning lets you create custom voices from short samples.

Basic Text-to-Speech

python

from abstrakt import AbstraktClient

client = AbstraktClient()

result = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "Hello! Welcome to Abstrakt, the AI platform for developers.",
        "voice": "alloy"
    }
})

print(f"Audio URL: {result.audio.url}")

Available Voices

Voice	Style	Best For
alloy	Neutral	General purpose
echo	Male	Narration
fable	Expressive	Storytelling
onyx	Deep	Announcements
nova	Female	Assistants
shimmer	Warm	Conversations

Voice Selection

python

# Professional narration
{"voice": "onyx", "text": "Welcome to our quarterly report..."}

# Friendly assistant
{"voice": "nova", "text": "Hi there! How can I help you today?"}

# Storytelling
{"voice": "fable", "text": "Once upon a time, in a land far away..."}

Controlling Speech

Speed Control

python

{
    "input": {
        "text": "This is spoken at normal speed",
        "speed": 1.0  # 0.5 to 2.0
    }
}

# Slower for clarity
{"speed": 0.8}

# Faster for efficiency
{"speed": 1.2}

SSML Support

python

{
    "input": {
        "text": '''
            <speak>
                <p>This is a paragraph with a <break time="500ms"/> pause.</p>
                <p>This word is <emphasis level="strong">emphasized</emphasis>.</p>
            </speak>
        ''',
        "format": "ssml"
    }
}

Voice Cloning

Clone a voice from an audio sample:

python

# Step 1: Upload reference audio
clone_result = client.run("fal-ai/voice-clone", {
    "input": {
        "reference_audio_url": "https://example.com/voice-sample.mp3",
        "name": "custom_voice"
    }
})

voice_id = clone_result.voice_id

# Step 2: Use cloned voice
speech = client.run("fal-ai/text-to-speech", {
    "input": {
        "text": "This is my cloned voice speaking!",
        "voice_id": voice_id
    }
})

Voice Cloning Requirements

Audio length: 10-30 seconds ideal
Quality: Clear, minimal background noise
Format: MP3, WAV, M4A
Content: Natural speech, not singing

Batch Audio Generation

python

async def generate_audiobook(chapters):
    audio_urls = []
    
    for chapter in chapters:
        result = await client.run_async("fal-ai/text-to-speech", {
            "input": {
                "text": chapter["content"],
                "voice": "fable"
            }
        })
        audio_urls.append({
            "title": chapter["title"],
            "url": result.audio.url
        })
    
    return audio_urls

Output Formats

python

# MP3 (default, smaller files)
{"output_format": "mp3"}

# WAV (higher quality)
{"output_format": "wav"}

# OGG (web optimized)
{"output_format": "ogg"}

Best Practices

Keep segments short - Break long text into paragraphs
Use punctuation - Helps with natural pacing
Test voices - Different voices suit different content
Quality samples - Better input = better clone
Review output - AI can mispronounce unusual words

Next Steps

Create AI music
Set up webhooks for long audio
Learn batch processing

Sora 2 Pro

Veo 3.1

Kling 2.6

100+ AI Models

AI Image Generator

Text to Video

Text to Speech

20+ AI Tools

Build Your First AI App

Text-to-Image Masterclass

Text-to-Video Fundamentals

Learn AI Generation

Introduction

Basic Text-to-Speech

Available Voices

Voice Selection

Controlling Speech

Speed Control

SSML Support

Voice Cloning

Voice Cloning Requirements

Batch Audio Generation

Output Formats

Best Practices

Next Steps

Sora 2 Pro

Veo 3.1

Kling 2.6

100+ AI Models

AI Image Generator

Text to Video

Text to Speech

20+ AI Tools

Build Your First AI App

Text-to-Image Masterclass

Text-to-Video Fundamentals

Learn AI Generation

Introduction

Basic Text-to-Speech

Available Voices

Voice Selection

Controlling Speech

Speed Control

SSML Support

Voice Cloning

Voice Cloning Requirements

Batch Audio Generation

Output Formats

Best Practices

Next Steps