🔒 Free tier data may be used to improve AI models. Upgrade Pro for 100% Privacy

Added Gemini TTS to My App and Lost Money in 24 Hours – A Painful AI Billing Lesson

Added Gemini TTS to My App and Lost Money in 24 Hours – A Painful AI Billing Lesson

2026-02-02 13:00 | 8 min read | 232 views | Author: Thai Nguyen (Software Engineer)

Added Gemini TTS to My App and Lost Money in 24 Hours 🤡

(A small but painful lesson about AI billing)

Before this happened, my small TTS app was doing fine.

Not many users, but a few people were paying.

Nothing impressive, but profitable.

Then I thought:

“What if I add Gemini TTS to make it more premium?”

And that was when everything went wrong.


Background

My app is a simple web-based text-to-speech service:

  1. reading documents
  2. reading study materials
  3. users paste text → get audio

Initially, I was using:

  1. OpenAI TTS
  2. Azure TTS
  3. Google Cloud TTS

They all share the same characteristics:

  1. pricing based on character count
  2. easy to estimate costs
  3. retries or errors are not financially dangerous

The app was simple, but it worked.

A few customers, no losses.


Then Gemini TTS Came In

To be fair, Gemini TTS is very good:

  1. natural voice
  2. smart pauses
  3. good English and Vietnamese pronunciation

The demo sounded amazing.

So I pushed Gemini TTS into production.


What Happened?

After just one day:

  1. Revenue: ~200,000 VND
  2. Gemini TTS cost: over 400,000 VND

👉 Instant loss.

The most frustrating part:

  1. user count didn’t increase
  2. request volume wasn’t high
  3. dashboard looked normal

But billing told a very different story 💀


The Problem Isn’t That Gemini Is Expensive

The Problem Is That You Can’t Estimate the Cost

After digging into it, I realized why.


1. Gemini TTS Does NOT Charge by Characters

It charges based on:

  1. audio duration
  2. pauses / silence
  3. prosody (intonation, emphasis)
  4. retries / reconnections

👉 Slow reading + pauses = much longer audio

👉 Longer audio = much higher cost


2. Dashboard ≠ Billing

The dashboard mostly shows:

  1. request count
  2. input tokens

But the real cost driver is:

  1. audio output tokens

And that part is not clearly visible.


3. Retry Is Not Always Labeled as “Retry”

Cases like:

  1. stream failure
  2. connection drop
  3. re-rendered audio

👉 Billing still counts them

👉 Dashboard doesn’t clearly say “this was retried”


Quick Comparison

OpenAI / Azure / Google Cloud TTS

  1. priced by characters
  2. 1,000 characters ≈ predictable cost
  3. easy to estimate
  4. very suitable for SaaS


Gemini TTS

  1. priced by audio duration
  2. audio length is unpredictable
  3. silence and pauses cost money
  4. very hard to control spending

👉 Amazing for demos

👉 Dangerous for production


The Irony

Before Gemini:

  1. simple app
  2. few features
  3. still profitable

After Gemini:

  1. no new users yet
  2. already losing money

It really felt like:

“Adding a premium feature and accidentally shooting myself in the foot.” 😅


Lessons Learned

After this experience, here’s what I learned:

  1. A powerful tool doesn’t always mean the right tool
  2. For small SaaS products, predictable cost > wow factor
  3. Billing model matters as much as model quality
  4. New features don’t automatically create new revenue


Conclusion

I lost more than 400k VND, but luckily:

  1. user base was still small
  2. the issue was caught early
  3. the app hadn’t scaled yet

If this happened with higher traffic,

it could easily have burned millions in a single day.

Now I’ve:

  1. removed Gemini TTS from production
  2. switched back to OpenAI / Azure / Google
  3. limited character length per request
  4. stabilized infrastructure costs


Gemini TTS is impressive,

but it truly is a double-edged sword.

And for my app, I chose to put that sword away.

Frequently Asked Questions

Q: Why did adding Gemini TTS cause a sudden cost spike?

A: Because Gemini TTS charges based on audio duration instead of text length. Pauses, silence, prosody, and retries all increase audio length and therefore cost.

Q: Is Gemini TTS more expensive than OpenAI or Azure TTS?

A: Not necessarily per unit, but it is far less predictable. Gemini’s billing model makes it difficult to estimate real production costs in advance.

Q: Why doesn’t the Gemini dashboard match the billing amount?

A: The dashboard focuses on requests and input tokens, while most costs come from audio output tokens, which are not clearly displayed.

Q: Do retries affect Gemini TTS billing?

A: Yes. Stream failures, reconnects, or re-rendered audio may still be billed, even if the dashboard does not clearly indicate retries.

Q: Is Gemini TTS suitable for small SaaS or public apps?

A: It is better suited for demos, showcases, or internal tools. Small SaaS products often require predictable costs, which Gemini TTS does not provide.

Q: Why are OpenAI, Azure, and Google Cloud TTS easier to manage?

A: They charge by character count, allowing developers to estimate costs in advance and avoid billing surprises.

Q: Should Gemini TTS be used for long documents?

A: No. Long content dramatically increases audio duration and cost. Character-based TTS engines are safer for long-form usage.

Q: What is the biggest lesson from this experience?

A: Predictable billing matters more than impressive demos. For small SaaS products, cost control is as important as model quality.

Was this article helpful?

Latest from Our Blog

Không có bài viết nào