Wring
All articlesAWS Guides

AWS Polly Pricing: Text-to-Speech Costs

AWS Polly pricing from $4/million characters for Standard voices to $16 for Neural. Compare voice types, free tier, and cost optimization tips.

Wring Team
March 15, 2026
8 min read
AWS PollyPolly pricingtext-to-speech costsvoice synthesis
Audio waveform visualization representing text-to-speech technology
Audio waveform visualization representing text-to-speech technology

Amazon Polly turns text into lifelike speech using deep learning. Polly supports dozens of languages and offers multiple voice engines at different price points. The cost difference between voice types is significant — Neural voices cost 4x more than Standard, and Long-Form voices run at 25x the Standard rate. Understanding which voice engine your application actually needs is the single biggest lever for controlling Polly costs.

TL;DR: Standard voices cost $4.00 per million characters, Neural voices cost $16.00 per million, and Long-Form voices cost $100.00 per million. Free tier includes 5M Standard characters and 1M Neural characters monthly for 12 months. Use Standard voices for IVR and notifications, reserve Neural for customer-facing audio.


Voice Engine Pricing

Amazon Polly offers four distinct voice engine tiers, each targeting different use cases and quality levels.

Voice EngineCost per 1M CharactersBest ForQuality Level
Standard$4.00IVR, alerts, internal toolsGood
Neural$16.00Podcasts, e-learning, appsNear-human
Long-Form$100.00Audiobooks, long articlesHighest naturalness
Generative$100.00Conversational, expressiveMost expressive

Speech Marks Pricing

FeatureCost per 1M Characters
Speech Marks (Standard)$4.00
Speech Marks (Neural)$16.00

Speech Marks provide metadata about speech timing — word boundaries, sentence boundaries, viseme data for lip-sync, and SSML marks. They are billed separately from audio generation, so requesting both audio and speech marks for the same text doubles your character count.

Polly Pricing Guide comparison chart

Free Tier Details

AWS Polly includes a generous free tier for the first 12 months after account creation.

Voice EngineFree Characters per MonthDuration
Standard5 million12 months
Neural1 million12 months
Long-FormNot includedN/A
GenerativeNot includedN/A

At average speaking rates, 5 million Standard characters translates to roughly 80-100 hours of generated audio per month. That is more than enough for prototyping and low-volume production use.


How Characters Are Counted

Polly bills based on the number of characters processed, including SSML tags.

  • Plain text: Every character counts, including spaces and punctuation
  • SSML input: The SSML markup tags themselves are counted as characters
  • Minimum charge: Each API request has a minimum charge of 100 characters
  • Billing granularity: Billed in increments of individual characters (no rounding up to blocks)

When using SSML for pronunciation control, emphasis, or pauses, your effective character count increases. A sentence with SSML tags can be 2-3x longer than the plain text equivalent.

SSML Impact Example

Input TypeRaw Text LengthBilled Characters
Plain text1,000 chars1,000
SSML with pauses1,000 chars~1,400
SSML with phonemes1,000 chars~2,200
Heavy SSML formatting1,000 chars~3,000

Real-World Cost Examples

Use CaseVoice EngineMonthly VolumeMonthly Cost
IVR phone systemStandard2M characters$8.00
E-learning platformNeural5M characters$80.00
Podcast generationNeural10M characters$160.00
Audiobook productionLong-Form3M characters$300.00
Mobile app narrationNeural500K characters$8.00
Accessibility (screen reader)Standard20M characters$80.00
News article readerGenerative8M characters$800.00

Full Cost Breakdown: E-Learning Platform

A typical e-learning platform generating course narration with Neural voices:

ComponentMonthly VolumeCost
Neural voice generation5M characters$80.00
Speech Marks (for captions)5M characters$80.00
S3 storage for audio50 GB$1.15
CloudFront delivery200 GB$17.00
Total$178.15
Polly Pricing Guide process flow diagram

Brand Voices

Brand Voices let you create a custom Neural voice exclusive to your organization. Pricing has two components:

ComponentCost
Voice creationCustom pricing (contact AWS)
Per-character usage$100.00 per 1M characters

Brand Voice creation requires working with the AWS team and providing voice training data. The per-character usage rate matches Long-Form pricing. This option makes sense only for large enterprises with millions of characters of monthly output where brand consistency matters.


Supported Output Formats

Polly supports multiple audio formats at no pricing difference:

FormatUse CaseFile Size (relative)
MP3Web, mobile apps1x (baseline)
OGG (Vorbis)Web streaming0.8x
PCMTelephony, processing10x
JSON (Speech Marks)Lip-sync, captionsMetadata only

Choosing compressed formats like MP3 or OGG reduces your storage and data transfer costs without affecting Polly billing.


Cost Optimization Tips

  1. Use Standard voices where Neural is not required — Standard voices cost 75% less and work well for alerts, IVR menus, and internal tools where top-tier naturalness is unnecessary.

  2. Cache generated audio aggressively — If the same text is spoken repeatedly (greetings, menu options, common responses), generate once and store in S3. A single cached phrase eliminates all future Polly charges for that content.

  3. Minimize SSML markup — SSML tags count as billed characters. Use SSML only when pronunciation or timing control adds measurable value. Plain text with proper punctuation often produces acceptable results.

  4. Batch requests up to the 100-character minimum — Every API call bills at least 100 characters. Avoid sending single words or very short phrases as separate requests.

  5. Use the SynthesizeSpeech API for text under 3,000 characters and the StartSpeechSynthesisTask API for longer content. The async task API stores results directly in S3, avoiding the need to handle large responses in your application.

  6. Evaluate Long-Form vs Neural carefully — At $100 vs $16 per million characters, Long-Form voices cost 6.25x more. Run A/B tests to determine whether your users notice the quality difference for your specific content type.

  7. Pre-generate content during off-peak hours — Polly has no time-based pricing differences, but pre-generating content ensures you are not making expensive real-time calls.


Polly vs Alternatives

ServiceCost per 1M CharactersVoice QualityLanguages
AWS Polly (Standard)$4.00Good60+
AWS Polly (Neural)$16.00Excellent30+
Google Cloud TTS (Standard)$4.00Good50+
Google Cloud TTS (Neural)$16.00Excellent40+
Azure Speech (Neural)$16.00Excellent100+
ElevenLabs~$24.00Superior29

Polly's competitive advantage is deep AWS integration — direct S3 output, IAM authentication, CloudFront distribution, and Lambda triggers for event-driven generation.

Polly Pricing Guide optimization checklist

Related Guides


FAQ

How does Polly count characters for billing?

Polly counts every character in your request, including spaces, punctuation, and SSML tags. Each API request has a 100-character minimum. If you send 50 characters, you are billed for 100. If you send plain text, only the visible characters count. If you use SSML, the full XML markup is included in the character count.

Can I use the free tier for production workloads?

Yes. The Polly free tier (5M Standard characters, 1M Neural characters per month) applies to production use with no restrictions. After 12 months, you pay standard rates. Many small applications stay within the free tier during their first year.

When should I use Long-Form instead of Neural voices?

Long-Form voices are optimized for narrating content longer than a paragraph — they maintain natural prosody, pacing, and expression across extended passages. Use them for audiobooks, long articles, and educational content where listeners will hear minutes of continuous speech. For short-form content like alerts, UI feedback, or brief responses, Neural voices are indistinguishable in quality at 84% lower cost.

Polly Pricing Guide pricing formula

Lower Your Polly Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Polly text-to-speech costs. Through group buying power, Wring negotiates better rates so you pay less per million characters.

Start saving on AWS →