Create Long-Form AI Voiceovers: Integrate ElevenLabs API for Text-to-Speech

2025-10-17 09:0112 min read · 160 Views

Create Long-Form AI Voiceovers: Integrate ElevenLabs API for Text-to-Speech


1️⃣ Introduction

AI-powered Text-to-Speech (TTS) makes it easy to create natural voiceovers from text. ElevenLabs API provides high-quality voices, supports multiple languages, and can handle long text, making it perfect for videos or audiobooks lasting hours.

2️⃣ Python Environment Setup

Install the required libraries:

!pip install elevenlabs
!pip install pydub

elevenlabs: to access the ElevenLabs TTS API.

pydub: for audio processing, merging, and exporting MP3 files.


3️⃣ Preparing the Text

  1. Save your content in a text file (1.txt).
  2. It can be thousands of words long.
  3. Split the text into manageable parts for TTS:

Your Config:

INPUT_FILE = "./sample_data/1.txt"
API_KEY="c93b52c1a8e9e3a2f8a3e3241cb7e2bc5304c0f56a4c8bccc6a7c15f478f5c68"
VOICE_ID="21m00Tcm4TlvDq8ikWAM" # default (Adam)
MODEL_ID="eleven_multilingual_v2" #more language
MAX_SENTENCE_LENGTH=50
START_PART=0
BATCH_SIZE=3


Split file to more sentense:

def split_text(text, max_length=MAX_SENTENCE_LENGTH):
return [text[i:i + max_length] for i in range(0, len(text), max_length)]
with open(INPUT_FILE, "r", encoding="utf-8") as file:
text = file.read()

# check input
if not text.strip():
print("Data is empty.")
exit()
parts = split_text(text)
print(f"Divided into {len(parts)} parts.")


4️⃣ Convert Text to Speech with ElevenLabs API

Set up your ElevenLabs client and generate audio for each part:

from elevenlabs import ElevenLabs, save

import os


import time
from pydub import AudioSegment
from pydub.playback import play
import os
from elevenlabs import ElevenLabs, play, save
client = ElevenLabs(api_key=API_KEY)


output_files = []
for i in range(len(parts)):
print(i)
if i >= START_PART and i < START_PART + BATCH_SIZE:
filename = f"output_part_{i}.mp3"
if os.path.exists(filename): # check exist file
print(f"Was exist: {filename}, Skip...")
output_files.append(filename)
else:
print(f"Processing part {i + 1}/{len(parts)}...")
audio = client.text_to_speech.convert(
voice_id=VOICE_ID,
model_id=MODEL_ID,
text=text
)
save(audio, filename)
output_files.append(filename)


5️⃣ Merge Audio Parts

Use pydub to combine all parts into a single long MP3:


# merge files by pydub
combined_audio = AudioSegment.empty()
for file in output_files:
audio = AudioSegment.from_file(file)
combined_audio += audio

#combine for one file
final_filename = "output_combined30.mp3"
combined_audio.export(final_filename, format="mp3")
print(f"File âm thanh đã được lưu: {final_filename}")

6️⃣ Notes & Tips

  1. Long Form Support: By splitting text into chunks, you can generate TTS for content lasting hours.
  2. Custom Voices: ElevenLabs supports multiple voices; change VOICE_ID to try different ones.
  3. Multilingual: MODEL_ID="eleven_multilingual_v2" allows you to read text in various languages.
  4. File Size: For very long content, consider generating in batches to avoid memory issues.

7️⃣ Conclusion

By following this method, you can automatically generate long audio content from text using ElevenLabs API. This is ideal for YouTube videos, audiobooks, podcasts, or any project requiring high-quality AI voiceovers.


COPY YOUR API KEY AND RUN IT ON GOOGLE COLAB


If you want to try Text-to-Speech for free, visit ttsforfree.com and start generating your own AI voice today!


Are You Misreading Your Google Analytics Numbers? The Hidden Truth Behind “Active Users”

Frequently Asked Questions

Q: Does ElevenLabs support long text?

A: Yes — when divided into smaller chunks.

Q: Can I use custom voices?

A: Yes — just change the VOICE_ID.

Q: Can I use this for YouTube monetized videos?

A: Yes, if you follow the licensing terms of ElevenLabs.

Was this article helpful?

Related Articles

Latest from Our Blog

Không có bài viết nào