Create Long-Form AI Voiceovers: Integrate ElevenLabs API for

1️⃣ Introduction

AI-powered Text-to-Speech (TTS) makes it easy to create natural voiceovers from text. ElevenLabs API provides high-quality voices, supports multiple languages, and can handle long text, making it perfect for videos or audiobooks lasting hours.

2️⃣ Python Environment Setup

Install the required libraries:

!pip install elevenlabs

!pip install pydub

elevenlabs: to access the ElevenLabs TTS API.

pydub: for audio processing, merging, and exporting MP3 files.

3️⃣ Preparing the Text

Save your content in a text file (1.txt).
It can be thousands of words long.
Split the text into manageable parts for TTS:

Your Config:

INPUT_FILE = "./sample_data/1.txt"

API_KEY="c93b52c1a8e9e3a2f8a3e3241cb7e2bc5304c0f56a4c8bccc6a7c15f478f5c68"

VOICE_ID="21m00Tcm4TlvDq8ikWAM" # default (Adam)

MODEL_ID="eleven_multilingual_v2" #more language

MAX_SENTENCE_LENGTH=50

START_PART=0

BATCH_SIZE=3

Split file to more sentense:

def split_text(text, max_length=MAX_SENTENCE_LENGTH):

return [text[i:i + max_length] for i in range(0, len(text), max_length)]

with open(INPUT_FILE, "r", encoding="utf-8") as file:

text = file.read()

# check input

if not text.strip():

print("Data is empty.")

exit()

parts = split_text(text)

print(f"Divided into {len(parts)} parts.")

4️⃣ Convert Text to Speech with ElevenLabs API

Set up your ElevenLabs client and generate audio for each part:

from elevenlabs import ElevenLabs, save

import os

import time

from pydub import AudioSegment

from pydub.playback import play

import os

from elevenlabs import ElevenLabs, play, save

client = ElevenLabs(api_key=API_KEY)

output_files = []

for i in range(len(parts)):

print(i)

if i >= START_PART and i < START_PART + BATCH_SIZE:

filename = f"output_part_{i}.mp3"

if os.path.exists(filename): # check exist file

print(f"Was exist: {filename}, Skip...")

output_files.append(filename)

else:

print(f"Processing part {i + 1}/{len(parts)}...")

audio = client.text_to_speech.convert(

voice_id=VOICE_ID,

model_id=MODEL_ID,

text=text

)

save(audio, filename)

output_files.append(filename)

5️⃣ Merge Audio Parts

Use pydub to combine all parts into a single long MP3:

# merge files by pydub

combined_audio = AudioSegment.empty()

for file in output_files:

audio = AudioSegment.from_file(file)

combined_audio += audio

#combine for one file

final_filename = "output_combined30.mp3"

combined_audio.export(final_filename, format="mp3")

print(f"File âm thanh đã được lưu: {final_filename}")

6️⃣ Notes & Tips

Long Form Support: By splitting text into chunks, you can generate TTS for content lasting hours.
Custom Voices: ElevenLabs supports multiple voices; change VOICE_ID to try different ones.
Multilingual: MODEL_ID="eleven_multilingual_v2" allows you to read text in various languages.
File Size: For very long content, consider generating in batches to avoid memory issues.

7️⃣ Conclusion

By following this method, you can automatically generate long audio content from text using ElevenLabs API. This is ideal for YouTube videos, audiobooks, podcasts, or any project requiring high-quality AI voiceovers.

COPY YOUR API KEY AND RUN IT ON GOOGLE COLAB

If you want to try Text-to-Speech for free, visit ttsforfree.com and start generating your own AI voice today!