AWS Bedrock Pricing: Every Model and Feature Cost

AI foundation models and cloud pricing visualization

AWS Bedrock gives you access to foundation models from Anthropic, Meta, Mistral, Stability AI, Amazon, and others through a single API. The pricing model is straightforward — pay per token for text models, per image for image models — but the total cost depends heavily on which model you choose, how you structure prompts, and which Bedrock features you layer on top. A 10x price difference exists between the cheapest and most expensive text models for the same task.

TL;DR: Bedrock text models range from $0.0002/1K input tokens (Titan Express) to $0.015/1K (Claude Opus). Most production workloads should use Claude Sonnet ($0.003/1K input) or Llama 3.1 70B ($0.00099/1K). Batch inference gives 50% off, Provisioned Throughput saves 20-40% for sustained loads. Knowledge Bases, Agents, and Guardrails each add per-query fees on top of model costs.

Text Model Pricing

Anthropic Claude Models

Model	Input/1K tokens	Output/1K tokens	Context Window
Claude 4 Opus	$0.015	$0.075	200K
Claude 4 Sonnet	$0.003	$0.015	200K
Claude 4 Haiku	$0.0008	$0.004	200K

Meta Llama Models

Model	Input/1K tokens	Output/1K tokens	Context Window
Llama 3.1 405B	$0.00532	$0.016	128K
Llama 3.1 70B	$0.00099	$0.00099	128K
Llama 3.1 8B	$0.00022	$0.00022	128K

Mistral Models

Model	Input/1K tokens	Output/1K tokens	Context Window
Mistral Large	$0.004	$0.012	128K
Mistral Small	$0.001	$0.003	128K
Mixtral 8x7B	$0.00045	$0.0007	32K

Amazon Titan Models

Model	Input/1K tokens	Output/1K tokens	Context Window
Titan Text Premier	$0.0005	$0.0015	32K
Titan Text Express	$0.0002	$0.0006	8K
Titan Text Lite	$0.00015	$0.0002	4K

Bedrock Pricing Guide savings comparison

Image Model Pricing

Model	Per Image	Resolution
Stable Diffusion XL	$0.04 (512x512) - $0.08 (1024x1024)	Up to 1024x1024
Amazon Titan Image Generator	$0.01 (512x512) - $0.02 (1024x1024)	Up to 1024x1024
Stability AI SD3	$0.04-$0.08	Up to 1024x1024

Embedding Model Pricing

Model	Cost per 1K tokens
Titan Embeddings V2	$0.00002
Cohere Embed English	$0.0001
Cohere Embed Multilingual	$0.0001

Embeddings are extremely cheap — processing 1 million documents (1K tokens each) costs $20 with Titan Embeddings.

Bedrock Pricing Guide process flow diagram

Pricing Modes

On-Demand (Default)

Pay per token/image with no commitment. Best for variable or low-volume workloads.

Batch Inference

Feature	Discount
Batch processing	50% off on-demand pricing
Turnaround time	Up to 24 hours
Minimum batch size	None

Batch inference is ideal for document processing, content generation pipelines, and any workload that doesn't need real-time responses.

Provisioned Throughput

Commitment	Discount
No commitment	~15% off on-demand
1-month	~20% off
6-month	~35% off

Provisioned Throughput guarantees a specific number of model units dedicated to your account. See the Bedrock user guide for configuration details. Best for sustained high-volume production workloads.

Prompt Caching

Component	Cost
Cache write	25% premium over input price
Cache read	90% discount on input price
Cache TTL	5 minutes (auto-extended)

For applications with large, repeated system prompts, caching reduces input costs dramatically.

Feature Pricing

Knowledge Bases (RAG)

Component	Cost
Data ingestion	Embedding model cost only
Storage	OpenSearch Serverless ($0.24/OCU-hour, min 4 OCUs) or Aurora pricing
Retrieval queries	Model token cost for query + retrieved context

The vector store is often the largest Knowledge Base cost. OpenSearch Serverless minimum is $701/month.

Agents

Component	Cost
Agent invocations	No additional charge beyond model tokens
Orchestration tokens	Standard model pricing (agent reasoning uses tokens)
Action group Lambda calls	Standard Lambda pricing

Agents consume additional tokens for reasoning, tool selection, and orchestration — typically 2-5x the tokens of a direct model call.

Guardrails

Component	Cost
Text guardrails	$0.75 per 1,000 text units (1 unit = 1,000 characters)
Image guardrails	$1.00 per image
Contextual grounding	$0.10 per 1,000 text units

Guardrails are applied in addition to model costs. For high-volume applications, guardrail costs can be significant.

Model Evaluation

Component	Cost
Automatic evaluation	Model token costs only
Human evaluation	$24.60 per completed task (via SageMaker Ground Truth)

Real-World Cost Examples

Use Case	Model	Monthly Volume	Monthly Cost
Customer support chatbot	Claude Haiku	1M conversations (500 tokens each)	$2,400
Document summarization	Claude Sonnet	100K documents (2K tokens each)	$3,600
Code generation	Claude Sonnet	500K requests (1K tokens each)	$9,000
Content classification	Llama 8B	10M items (200 tokens each)	$880
RAG search	Titan Embed + Claude Haiku	500K queries	$800 + vector store
Image generation	Titan Image	50K images	$500-1,000

Cost Optimization Quick Wins

Use the smallest model that meets quality requirements — Haiku costs 95% less than Opus
Enable batch inference for non-real-time workloads — instant 50% savings
Implement prompt caching for repeated system prompts — up to 90% input savings
Set max_tokens appropriately — don't allocate 4,096 tokens for tasks that need 200
Route requests by complexity — simple tasks to Haiku, complex to Sonnet

Bedrock Pricing Guide optimization checklist

Related Guides

FAQ

Which Bedrock model offers the best price-to-performance ratio?

Claude Sonnet for general tasks, Llama 3.1 70B for cost-sensitive workloads, and Claude Haiku for high-volume simple tasks. Titan Text Express is cheapest overall but less capable.

Is Bedrock cheaper than calling model APIs directly?

Bedrock pricing matches or slightly exceeds direct API pricing (e.g., Anthropic API). The value is in unified billing, VPC integration, Guardrails, Knowledge Bases, and not managing API keys across multiple providers.

How do I estimate my monthly Bedrock costs?

Estimate: (average input tokens per request x input price) + (average output tokens per request x output price) x monthly request volume. Add 20-30% for agent orchestration overhead if using Agents. Add vector store costs if using Knowledge Bases.

Lower Your Bedrock Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock costs. Through group buying power, Wring negotiates better rates so you pay less per model inference.

Start saving on Bedrock →