AWS Bedrock gives you access to foundation models from Anthropic, Meta, Mistral, Stability AI, Amazon, and others through a single API. The pricing model is straightforward — pay per token for text models, per image for image models — but the total cost depends heavily on which model you choose, how you structure prompts, and which Bedrock features you layer on top. A 10x price difference exists between the cheapest and most expensive text models for the same task.
TL;DR: Bedrock text models range from $0.0002/1K input tokens (Titan Express) to $0.015/1K (Claude Opus). Most production workloads should use Claude Sonnet ($0.003/1K input) or Llama 3.1 70B ($0.00099/1K). Batch inference gives 50% off, Provisioned Throughput saves 20-40% for sustained loads. Knowledge Bases, Agents, and Guardrails each add per-query fees on top of model costs.
Text Model Pricing
Anthropic Claude Models
| Model | Input/1K tokens | Output/1K tokens | Context Window |
|---|---|---|---|
| Claude 4 Opus | $0.015 | $0.075 | 200K |
| Claude 4 Sonnet | $0.003 | $0.015 | 200K |
| Claude 4 Haiku | $0.0008 | $0.004 | 200K |
Meta Llama Models
| Model | Input/1K tokens | Output/1K tokens | Context Window |
|---|---|---|---|
| Llama 3.1 405B | $0.00532 | $0.016 | 128K |
| Llama 3.1 70B | $0.00099 | $0.00099 | 128K |
| Llama 3.1 8B | $0.00022 | $0.00022 | 128K |
Mistral Models
| Model | Input/1K tokens | Output/1K tokens | Context Window |
|---|---|---|---|
| Mistral Large | $0.004 | $0.012 | 128K |
| Mistral Small | $0.001 | $0.003 | 128K |
| Mixtral 8x7B | $0.00045 | $0.0007 | 32K |
Amazon Titan Models
| Model | Input/1K tokens | Output/1K tokens | Context Window |
|---|---|---|---|
| Titan Text Premier | $0.0005 | $0.0015 | 32K |
| Titan Text Express | $0.0002 | $0.0006 | 8K |
| Titan Text Lite | $0.00015 | $0.0002 | 4K |
Image Model Pricing
| Model | Per Image | Resolution |
|---|---|---|
| Stable Diffusion XL | $0.04 (512x512) - $0.08 (1024x1024) | Up to 1024x1024 |
| Amazon Titan Image Generator | $0.01 (512x512) - $0.02 (1024x1024) | Up to 1024x1024 |
| Stability AI SD3 | $0.04-$0.08 | Up to 1024x1024 |
Embedding Model Pricing
| Model | Cost per 1K tokens |
|---|---|
| Titan Embeddings V2 | $0.00002 |
| Cohere Embed English | $0.0001 |
| Cohere Embed Multilingual | $0.0001 |
Embeddings are extremely cheap — processing 1 million documents (1K tokens each) costs $20 with Titan Embeddings.
Pricing Modes
On-Demand (Default)
Pay per token/image with no commitment. Best for variable or low-volume workloads.
Batch Inference
| Feature | Discount |
|---|---|
| Batch processing | 50% off on-demand pricing |
| Turnaround time | Up to 24 hours |
| Minimum batch size | None |
Batch inference is ideal for document processing, content generation pipelines, and any workload that doesn't need real-time responses.
Provisioned Throughput
| Commitment | Discount |
|---|---|
| No commitment | ~15% off on-demand |
| 1-month | ~20% off |
| 6-month | ~35% off |
Provisioned Throughput guarantees a specific number of model units dedicated to your account. See the Bedrock user guide for configuration details. Best for sustained high-volume production workloads.
Prompt Caching
| Component | Cost |
|---|---|
| Cache write | 25% premium over input price |
| Cache read | 90% discount on input price |
| Cache TTL | 5 minutes (auto-extended) |
For applications with large, repeated system prompts, caching reduces input costs dramatically.
Feature Pricing
Knowledge Bases (RAG)
| Component | Cost |
|---|---|
| Data ingestion | Embedding model cost only |
| Storage | OpenSearch Serverless ($0.24/OCU-hour, min 4 OCUs) or Aurora pricing |
| Retrieval queries | Model token cost for query + retrieved context |
The vector store is often the largest Knowledge Base cost. OpenSearch Serverless minimum is $701/month.
Agents
| Component | Cost |
|---|---|
| Agent invocations | No additional charge beyond model tokens |
| Orchestration tokens | Standard model pricing (agent reasoning uses tokens) |
| Action group Lambda calls | Standard Lambda pricing |
Agents consume additional tokens for reasoning, tool selection, and orchestration — typically 2-5x the tokens of a direct model call.
Guardrails
| Component | Cost |
|---|---|
| Text guardrails | $0.75 per 1,000 text units (1 unit = 1,000 characters) |
| Image guardrails | $1.00 per image |
| Contextual grounding | $0.10 per 1,000 text units |
Guardrails are applied in addition to model costs. For high-volume applications, guardrail costs can be significant.
Model Evaluation
| Component | Cost |
|---|---|
| Automatic evaluation | Model token costs only |
| Human evaluation | $24.60 per completed task (via SageMaker Ground Truth) |
Real-World Cost Examples
| Use Case | Model | Monthly Volume | Monthly Cost |
|---|---|---|---|
| Customer support chatbot | Claude Haiku | 1M conversations (500 tokens each) | $2,400 |
| Document summarization | Claude Sonnet | 100K documents (2K tokens each) | $3,600 |
| Code generation | Claude Sonnet | 500K requests (1K tokens each) | $9,000 |
| Content classification | Llama 8B | 10M items (200 tokens each) | $880 |
| RAG search | Titan Embed + Claude Haiku | 500K queries | $800 + vector store |
| Image generation | Titan Image | 50K images | $500-1,000 |
Cost Optimization Quick Wins
- Use the smallest model that meets quality requirements — Haiku costs 95% less than Opus
- Enable batch inference for non-real-time workloads — instant 50% savings
- Implement prompt caching for repeated system prompts — up to 90% input savings
- Set max_tokens appropriately — don't allocate 4,096 tokens for tasks that need 200
- Route requests by complexity — simple tasks to Haiku, complex to Sonnet
Related Guides
- AWS Bedrock Cost Optimization Guide
- AWS Bedrock vs OpenAI Pricing
- AWS Bedrock LLM Models Guide
- AWS Bedrock Batch Inference Guide
- AWS Bedrock vs SageMaker
FAQ
Which Bedrock model offers the best price-to-performance ratio?
Claude Sonnet for general tasks, Llama 3.1 70B for cost-sensitive workloads, and Claude Haiku for high-volume simple tasks. Titan Text Express is cheapest overall but less capable.
Is Bedrock cheaper than calling model APIs directly?
Bedrock pricing matches or slightly exceeds direct API pricing (e.g., Anthropic API). The value is in unified billing, VPC integration, Guardrails, Knowledge Bases, and not managing API keys across multiple providers.
How do I estimate my monthly Bedrock costs?
Estimate: (average input tokens per request x input price) + (average output tokens per request x output price) x monthly request volume. Add 20-30% for agent orchestration overhead if using Agents. Add vector store costs if using Knowledge Bases.
Lower Your Bedrock Costs with Wring
Wring helps you access AWS credits and volume discounts to lower your Bedrock costs. Through group buying power, Wring negotiates better rates so you pay less per model inference.
