Tutorials - Abstrakt

Introduction

Voice cloning enables consistent, scalable audio narration. Whether you're producing a podcast, audiobook, or educational content, AI voices can dramatically reduce production time and costs.

:::warning Ethical Considerations Always obtain consent before cloning someone's voice. Never use voice cloning for deception or impersonation.

:::

Voice Cloning Fundamentals

How It Works

Voice Sample - Provide 30-60 seconds of clean audio
Voice Profile - AI creates a voice model from the sample
Text-to-Speech - Generate new speech using the cloned voice

Available Models

Model	Quality	Speed	Best For
PlayHT 3.0	Excellent	Medium	Audiobooks
ElevenLabs	Excellent	Fast	Real-time
F5 TTS	Good	Very Fast	Drafts

Setting Up Voice Cloning

Create a Voice Profile

javascript

// lib/voice-cloning.js
const ABSTRAKT_API_URL = 'https://api.abstrakt.one/v1';

export async function createVoiceProfile(name, audioUrl) {
  const response = await fetch(`${ABSTRAKT_API_URL}/voices/create`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.ABSTRAKT_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      name,
      sample_url: audioUrl,
      description: 'Custom voice for audiobook narration',
    }),
  });
  
  return response.json(); // { voice_id: 'voice_xxxxx' }
}

Recording Tips for Voice Samples

Environment: Quiet room, no echo
Equipment: Good microphone, pop filter
Content: Read varied content (questions, statements, emotions)
Duration: 1-3 minutes of clean audio
Quality: 44.1kHz, 16-bit minimum

Sample Recording Script

text

[Neutral tone]
Welcome to our audiobook. Today we'll explore the fascinating world of technology.

[Excited tone]  
This is incredible! The results exceeded all our expectations.

[Thoughtful tone]
But we must ask ourselves, what does this mean for the future?

[Question]
Have you ever wondered why the sky is blue?

[Emphasis]
The MOST important thing to remember is this.

Generating Audio Content

Basic Text-to-Speech

javascript

export async function generateSpeech(text, voiceId, options = {}) {
  const {
    speed = 1.0,
    stability = 0.5,
    style = 0.5,
  } = options;
  
  const response = await fetch(`${ABSTRAKT_API_URL}/models/playht-v3/run`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.ABSTRAKT_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      input: {
        text,
        voice_id: voiceId,
        speed,
        stability,
        style_exaggeration: style,
        output_format: 'mp3',
      },
    }),
  });
  
  return response.json();
}

Audiobook Production Pipeline

Chapter Processing

javascript

// lib/audiobook-generator.js
import { generateSpeech } from './voice-cloning.js';

export class AudiobookGenerator {
  constructor(voiceId, options = {}) {
    this.voiceId = voiceId;
    this.options = options;
    this.chapters = [];
  }
  
  async processChapter(chapter) {
    const { title, content } = chapter;
    const segments = this.splitIntoSegments(content);
    
    console.log(`Processing chapter: ${title} (${segments.length} segments)`);
    
    const audioSegments = [];
    
    for (let i = 0; i < segments.length; i++) {
      const result = await generateSpeech(
        segments[i],
        this.voiceId,
        this.options
      );
      
      audioSegments.push(result.audio_url);
      console.log(`  Segment ${i + 1}/${segments.length} complete`);
      
      // Rate limiting
      await this.delay(500);
    }
    
    return {
      title,
      audioSegments,
      totalSegments: segments.length,
    };
  }
  
  splitIntoSegments(text, maxLength = 1000) {
    // Split by paragraphs, respecting max length
    const paragraphs = text.split(/\n\n+/);
    const segments = [];
    let currentSegment = '';
    
    for (const para of paragraphs) {
      if ((currentSegment + para).length > maxLength) {
        if (currentSegment) segments.push(currentSegment.trim());
        currentSegment = para;
      } else {
        currentSegment += (currentSegment ? '\n\n' : '') + para;
      }
    }
    
    if (currentSegment) segments.push(currentSegment.trim());
    return segments;
  }
  
  delay(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

SSML for Advanced Control

javascript

// Add pauses, emphasis, and pronunciation control
function addSSMLMarkup(text) {
  return text
    // Add pauses after sentences
    .replace(/\.\s/g, '.<break time="500ms"/> ')
    // Add pauses for commas
    .replace(/,\s/g, ',<break time="200ms"/> ')
    // Emphasis on quoted text
    .replace(/"([^"]+)"/g, '<emphasis level="moderate">"$1"</emphasis>')
    // Slow down for important terms
    .replace(/\*\*([^*]+)\*\*/g, '<prosody rate="slow">$1</prosody>');
}

// Usage
const ssmlText = addSSMLMarkup(
  'The experiment was a success. "We did it!" she exclaimed. **This changes everything.**'
);

Multi-Voice Podcasts

Character Voice Management

javascript

// lib/podcast-generator.js
export class PodcastGenerator {
  constructor() {
    this.voices = new Map();
  }
  
  addVoice(characterName, voiceId, settings = {}) {
    this.voices.set(characterName, {
      voiceId,
      settings: {
        speed: settings.speed || 1.0,
        stability: settings.stability || 0.5,
        ...settings,
      },
    });
  }
  
  async generateDialogue(script) {
    // Script format: [{speaker: 'Host', text: '...'}, ...]
    const audioClips = [];
    
    for (const line of script) {
      const voice = this.voices.get(line.speaker);
      if (!voice) {
        throw new Error(`Unknown speaker: ${line.speaker}`);
      }
      
      const audio = await generateSpeech(
        line.text,
        voice.voiceId,
        voice.settings
      );
      
      audioClips.push({
        speaker: line.speaker,
        text: line.text,
        audio: audio.audio_url,
      });
    }
    
    return audioClips;
  }
}

// Usage
const podcast = new PodcastGenerator();
podcast.addVoice('Host', 'voice_host_123', { speed: 1.0 });
podcast.addVoice('Guest', 'voice_guest_456', { speed: 0.95 });

const script = [
  { speaker: 'Host', text: 'Welcome to Tech Talk. Today we have a special guest.' },
  { speaker: 'Guest', text: 'Thanks for having me! Excited to be here.' },
  { speaker: 'Host', text: 'Let\'s dive right in. Tell us about your latest project.' },
];

const audio = await podcast.generateDialogue(script);

Quality Control

Audio Post-Processing

javascript

// lib/audio-processing.js
import ffmpeg from 'fluent-ffmpeg';

export function normalizeAudio(inputPath, outputPath) {
  return new Promise((resolve, reject) => {
    ffmpeg(inputPath)
      .audioFilters([
        'loudnorm=I=-16:TP=-1.5:LRA=11',  // Loudness normalization
        'highpass=f=80',                    // Remove low rumble
        'lowpass=f=12000',                  // Remove high hiss
      ])
      .output(outputPath)
      .on('end', resolve)
      .on('error', reject)
      .run();
  });
}

export function concatenateAudio(inputFiles, outputPath) {
  return new Promise((resolve, reject) => {
    const command = ffmpeg();
    
    inputFiles.forEach(file => command.input(file));
    
    command
      .on('end', resolve)
      .on('error', reject)
      .mergeToFile(outputPath);
  });
}

Quality Checklist

[ ] No audio artifacts or glitches
[ ] Consistent volume levels
[ ] Natural pacing and pauses
[ ] Correct pronunciation
[ ] Appropriate emotion/tone

Production Workflow

text

1. Script Preparation
   └── Format text, add SSML markup
   
2. Voice Generation
   └── Generate segments with rate limiting
   
3. Quality Review
   └── Listen, flag issues, regenerate if needed
   
4. Post-Processing
   └── Normalize, concatenate, add music/effects
   
5. Export
   └── Multiple formats (MP3, WAV, M4A)

Cost Estimation

Content Type	Length	Segments	Est. Cost
Podcast Episode	30 min	~60	~$6
Audiobook Chapter	45 min	~90	~$9
Full Audiobook	10 hours	~1200	~$120

Complete Example

javascript

// generate-audiobook.js
import 'dotenv/config';
import { AudiobookGenerator } from './lib/audiobook-generator.js';
import { concatenateAudio, normalizeAudio } from './lib/audio-processing.js';

async function main() {
  const generator = new AudiobookGenerator('voice_narrator_123', {
    speed: 0.95,
    stability: 0.7,
  });
  
  const chapters = [
    { title: 'Chapter 1: The Beginning', content: '...' },
    { title: 'Chapter 2: The Journey', content: '...' },
  ];
  
  for (const chapter of chapters) {
    const result = await generator.processChapter(chapter);
    
    // Download and concatenate segments
    const audioFiles = await downloadAll(result.audioSegments);
    const chapterFile = `output/${chapter.title}.mp3`;
    
    await concatenateAudio(audioFiles, chapterFile);
    await normalizeAudio(chapterFile, chapterFile.replace('.mp3', '_normalized.mp3'));
    
    console.log(`Completed: ${chapter.title}`);
  }
}

main().catch(console.error);

Next Steps

Explore emotion and style controls
Add background music integration
Build a review/approval workflow
Create multi-language versions

Sora 2 Pro

Veo 3.1

Kling 2.6

100+ AI Models

AI Image Generator

Text to Video

Text to Speech

20+ AI Tools

Build Your First AI App

Text-to-Image Masterclass

Text-to-Video Fundamentals

Learn AI Generation

Introduction

Voice Cloning Fundamentals

How It Works

Available Models

Setting Up Voice Cloning

Create a Voice Profile

Recording Tips for Voice Samples

Sample Recording Script

Generating Audio Content

Basic Text-to-Speech

Audiobook Production Pipeline

Chapter Processing

SSML for Advanced Control

Multi-Voice Podcasts

Character Voice Management

Quality Control

Audio Post-Processing

Quality Checklist

Production Workflow

Cost Estimation

Complete Example

Next Steps

Sora 2 Pro

Veo 3.1

Kling 2.6

100+ AI Models

AI Image Generator

Text to Video

Text to Speech

20+ AI Tools

Build Your First AI App

Text-to-Image Masterclass

Text-to-Video Fundamentals

Learn AI Generation

Introduction

Voice Cloning Fundamentals

How It Works

Available Models

Setting Up Voice Cloning

Create a Voice Profile

Recording Tips for Voice Samples

Sample Recording Script

Generating Audio Content

Basic Text-to-Speech

Audiobook Production Pipeline

Chapter Processing

SSML for Advanced Control

Multi-Voice Podcasts

Character Voice Management

Quality Control

Audio Post-Processing

Quality Checklist

Production Workflow

Cost Estimation

Complete Example

Next Steps