Wring
All articlesAWS Guides

AWS Bedrock Pricing: Every Model and Feature Cost

AWS Bedrock pricing for Claude, Llama, Mistral, and Titan models. Compare on-demand tokens, batch (50% off), and Provisioned Throughput costs.

Wring Team
March 14, 2026
7 min read
AWS BedrockBedrock pricingLLM pricingfoundation model costsAI API pricingClaude pricing AWS
AI foundation models and cloud pricing visualization
AI foundation models and cloud pricing visualization

AWS Bedrock gives you access to foundation models from Anthropic, Meta, Mistral, Stability AI, Amazon, and others through a single API. The pricing model is straightforward — pay per token for text models, per image for image models — but the total cost depends heavily on which model you choose, how you structure prompts, and which Bedrock features you layer on top. A 10x price difference exists between the cheapest and most expensive text models for the same task.

TL;DR: Bedrock text models range from $0.0002/1K input tokens (Titan Express) to $0.015/1K (Claude Opus). Most production workloads should use Claude Sonnet ($0.003/1K input) or Llama 3.1 70B ($0.00099/1K). Batch inference gives 50% off, Provisioned Throughput saves 20-40% for sustained loads. Knowledge Bases, Agents, and Guardrails each add per-query fees on top of model costs.


Text Model Pricing

Anthropic Claude Models

ModelInput/1K tokensOutput/1K tokensContext Window
Claude 4 Opus$0.015$0.075200K
Claude 4 Sonnet$0.003$0.015200K
Claude 4 Haiku$0.0008$0.004200K

Meta Llama Models

ModelInput/1K tokensOutput/1K tokensContext Window
Llama 3.1 405B$0.00532$0.016128K
Llama 3.1 70B$0.00099$0.00099128K
Llama 3.1 8B$0.00022$0.00022128K

Mistral Models

ModelInput/1K tokensOutput/1K tokensContext Window
Mistral Large$0.004$0.012128K
Mistral Small$0.001$0.003128K
Mixtral 8x7B$0.00045$0.000732K

Amazon Titan Models

ModelInput/1K tokensOutput/1K tokensContext Window
Titan Text Premier$0.0005$0.001532K
Titan Text Express$0.0002$0.00068K
Titan Text Lite$0.00015$0.00024K
Bedrock Pricing Guide savings comparison

Image Model Pricing

ModelPer ImageResolution
Stable Diffusion XL$0.04 (512x512) - $0.08 (1024x1024)Up to 1024x1024
Amazon Titan Image Generator$0.01 (512x512) - $0.02 (1024x1024)Up to 1024x1024
Stability AI SD3$0.04-$0.08Up to 1024x1024

Embedding Model Pricing

ModelCost per 1K tokens
Titan Embeddings V2$0.00002
Cohere Embed English$0.0001
Cohere Embed Multilingual$0.0001

Embeddings are extremely cheap — processing 1 million documents (1K tokens each) costs $20 with Titan Embeddings.

Bedrock Pricing Guide process flow diagram

Pricing Modes

On-Demand (Default)

Pay per token/image with no commitment. Best for variable or low-volume workloads.

Batch Inference

FeatureDiscount
Batch processing50% off on-demand pricing
Turnaround timeUp to 24 hours
Minimum batch sizeNone

Batch inference is ideal for document processing, content generation pipelines, and any workload that doesn't need real-time responses.

Provisioned Throughput

CommitmentDiscount
No commitment~15% off on-demand
1-month~20% off
6-month~35% off

Provisioned Throughput guarantees a specific number of model units dedicated to your account. See the Bedrock user guide for configuration details. Best for sustained high-volume production workloads.

Prompt Caching

ComponentCost
Cache write25% premium over input price
Cache read90% discount on input price
Cache TTL5 minutes (auto-extended)

For applications with large, repeated system prompts, caching reduces input costs dramatically.


Feature Pricing

Knowledge Bases (RAG)

ComponentCost
Data ingestionEmbedding model cost only
StorageOpenSearch Serverless ($0.24/OCU-hour, min 4 OCUs) or Aurora pricing
Retrieval queriesModel token cost for query + retrieved context

The vector store is often the largest Knowledge Base cost. OpenSearch Serverless minimum is $701/month.

Agents

ComponentCost
Agent invocationsNo additional charge beyond model tokens
Orchestration tokensStandard model pricing (agent reasoning uses tokens)
Action group Lambda callsStandard Lambda pricing

Agents consume additional tokens for reasoning, tool selection, and orchestration — typically 2-5x the tokens of a direct model call.

Guardrails

ComponentCost
Text guardrails$0.75 per 1,000 text units (1 unit = 1,000 characters)
Image guardrails$1.00 per image
Contextual grounding$0.10 per 1,000 text units

Guardrails are applied in addition to model costs. For high-volume applications, guardrail costs can be significant.

Model Evaluation

ComponentCost
Automatic evaluationModel token costs only
Human evaluation$24.60 per completed task (via SageMaker Ground Truth)

Real-World Cost Examples

Use CaseModelMonthly VolumeMonthly Cost
Customer support chatbotClaude Haiku1M conversations (500 tokens each)$2,400
Document summarizationClaude Sonnet100K documents (2K tokens each)$3,600
Code generationClaude Sonnet500K requests (1K tokens each)$9,000
Content classificationLlama 8B10M items (200 tokens each)$880
RAG searchTitan Embed + Claude Haiku500K queries$800 + vector store
Image generationTitan Image50K images$500-1,000

Cost Optimization Quick Wins

  1. Use the smallest model that meets quality requirements — Haiku costs 95% less than Opus
  2. Enable batch inference for non-real-time workloads — instant 50% savings
  3. Implement prompt caching for repeated system prompts — up to 90% input savings
  4. Set max_tokens appropriately — don't allocate 4,096 tokens for tasks that need 200
  5. Route requests by complexity — simple tasks to Haiku, complex to Sonnet
Bedrock Pricing Guide optimization checklist

Related Guides


FAQ

Which Bedrock model offers the best price-to-performance ratio?

Claude Sonnet for general tasks, Llama 3.1 70B for cost-sensitive workloads, and Claude Haiku for high-volume simple tasks. Titan Text Express is cheapest overall but less capable.

Is Bedrock cheaper than calling model APIs directly?

Bedrock pricing matches or slightly exceeds direct API pricing (e.g., Anthropic API). The value is in unified billing, VPC integration, Guardrails, Knowledge Bases, and not managing API keys across multiple providers.

How do I estimate my monthly Bedrock costs?

Estimate: (average input tokens per request x input price) + (average output tokens per request x output price) x monthly request volume. Add 20-30% for agent orchestration overhead if using Agents. Add vector store costs if using Knowledge Bases.

Bedrock Pricing Guide key statistics

Lower Your Bedrock Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock costs. Through group buying power, Wring negotiates better rates so you pay less per model inference.

Start saving on Bedrock →