Embeddings convert text into numerical vectors that capture semantic meaning. Two sentences with similar meaning produce similar vectors, enabling semantic search, document clustering, recommendation systems, and the retrieval step in RAG (Retrieval-Augmented Generation). Bedrock offers embedding models that are cheap, fast, and production-ready — processing millions of documents for under $50.
TL;DR: Titan Embeddings V2 costs $0.00002 per 1,000 tokens and produces 1024-dimension vectors. Cohere Embed costs $0.0001 per 1,000 tokens with multilingual support. For English-only workloads, Titan is 5x cheaper. For multilingual needs, Cohere is the better choice. Store embeddings in OpenSearch, Aurora pgvector, or Pinecone. The embedding step is typically the cheapest part of any RAG or search pipeline.
Available Embedding Models
| Model | Dimensions | Max Tokens | Cost/1K Tokens | Languages |
|---|---|---|---|---|
| Titan Embeddings V2 | 256, 512, or 1024 | 8,192 | $0.00002 | English + limited multilingual |
| Titan Multimodal Embeddings | 256, 512, or 1024 | 128 (text) + image | $0.0008/image, $0.00002/text | English |
| Cohere Embed English | 1024 | 512 | $0.0001 | English |
| Cohere Embed Multilingual | 1024 | 512 | $0.0001 | 100+ languages |
Cost Analysis
Embedding is remarkably cheap compared to text generation:
| Scenario | Titan V2 Cost | Cohere Cost |
|---|---|---|
| 1,000 documents (1K tokens each) | $0.02 | $0.10 |
| 100,000 documents | $2.00 | $10.00 |
| 1 million documents | $20.00 | $100.00 |
| 10 million documents | $200.00 | $1,000.00 |
For comparison, generating a 500-token response with Claude Haiku for 1 million queries costs $2,000 — embedding 1 million documents costs just $20 with Titan.
Choosing the Right Model
Titan Embeddings V2
Pros:
- 5x cheaper than Cohere
- Variable dimension output (256/512/1024) — lower dimensions = faster search, less storage
- 8,192 token context window — can embed entire documents
- Native AWS integration
Cons:
- Weaker multilingual performance
- Smaller training corpus than Cohere for some domains
Cohere Embed
Pros:
- Superior multilingual support (100+ languages)
- Specialized search and classification input types
- Strong performance on domain-specific text
Cons:
- 5x more expensive than Titan
- 512 token max input — requires chunking for longer documents
Decision Matrix
| Requirement | Recommended Model |
|---|---|
| English-only, cost-sensitive | Titan V2 (1024-dim) |
| English-only, long documents | Titan V2 (8K context) |
| Multilingual content | Cohere Multilingual |
| Multimodal (text + images) | Titan Multimodal |
| Fastest search speed | Titan V2 (256-dim) |
Vector Storage Options
After generating embeddings, you need a vector database to store and search them:
| Storage | Cost | Best For |
|---|---|---|
| OpenSearch Serverless | $701/mo minimum | Large-scale production, hybrid search |
| Aurora pgvector | ~$30/mo | Cost-sensitive, existing PostgreSQL |
| OpenSearch Managed | From ~$50/mo | Medium-scale, full control |
| Pinecone | $70/mo (Starter) | Managed vector DB, simple setup |
| ElastiCache (Valkey) | From ~$14/mo | Low-latency, smaller datasets |
Storage Size Estimation
| Vectors | Dimensions | Approximate Storage |
|---|---|---|
| 100K | 1024 | ~400 MB |
| 1M | 1024 | ~4 GB |
| 10M | 1024 | ~40 GB |
| 100K | 256 | ~100 MB |
Lower dimensions (256) reduce storage 4x and improve search speed, with a small quality trade-off.
Common Use Cases
1. Semantic Search
Replace keyword search with meaning-based search:
Query: "How do I reduce my cloud bill?"
Keyword match: Only finds documents containing those exact words
Semantic match: Also finds "cost optimization strategies", "lower AWS spending", "save money on infrastructure"
2. RAG (Retrieval-Augmented Generation)
The most common Bedrock embedding use case:
User question → Embed question → Search vector DB → Retrieve top-K documents → Pass to LLM → Generate grounded answer
3. Document Classification
Embed documents and known category examples. Classify new documents by finding the nearest category embedding using cosine similarity.
4. Duplicate Detection
Embed all documents and find pairs with cosine similarity above 0.95. Useful for deduplicating support tickets, content, or data entries.
5. Recommendation Systems
Embed user preferences and item descriptions. Recommend items whose embeddings are closest to the user's preference vector.
Best Practices
1. Choose Dimensions Based on Use Case
| Dimension | Search Quality | Speed | Storage |
|---|---|---|---|
| 1024 | Highest | Baseline | 4x |
| 512 | High | 2x faster | 2x |
| 256 | Good | 4x faster | 1x |
Start with 1024 for maximum quality. If search latency or storage is a concern, test with 512 — quality drop is often negligible.
2. Chunk Documents Appropriately
Cohere's 512-token limit requires chunking. Even for Titan's 8K window, chunking often improves retrieval quality:
| Chunk Size | Best For |
|---|---|
| 100-200 tokens | FAQ, short answers |
| 300-500 tokens | General documents, articles |
| 500-1,000 tokens | Technical documentation, reports |
3. Normalize Embeddings
Most vector databases expect normalized vectors (unit length) for cosine similarity search. Bedrock models output normalized vectors by default — verify this with your specific vector store.
4. Use Batch Processing for Large Datasets
For initial ingestion of millions of documents, call the embedding model in parallel or use batch API calls. Titan V2 handles high throughput efficiently.
5. Re-Embed When Changing Models
Embeddings from different models are not compatible. If you switch from Cohere to Titan (or vice versa), you must re-embed your entire corpus. Plan for this cost and downtime.
Related Guides
- AWS Bedrock Knowledge Bases Guide
- AWS Bedrock Pricing Guide
- AWS Bedrock Batch Inference Guide
- AI Cost Optimization Guide
FAQ
Are Bedrock embeddings the same quality as OpenAI embeddings?
Titan Embeddings V2 and Cohere Embed both perform comparably to OpenAI's text-embedding-3-small on standard benchmarks (MTEB). For most production use cases, the quality difference is negligible. The main advantage of Bedrock embeddings is AWS-native integration and competitive pricing.
How do I measure embedding quality for my use case?
Create a test set of 50-100 query-document pairs where you know the correct match. Generate embeddings, run similarity search, and measure recall@K (what percentage of correct documents appear in the top K results). Compare across models and dimension settings.
Can I use the same embedding model for different languages?
Cohere Multilingual is designed for this — it maps text in 100+ languages into the same vector space. A query in English will find relevant documents in German or Japanese. Titan V2 has limited multilingual capability — use Cohere for cross-language search.
Lower Your Bedrock Embeddings Costs with Wring
Wring helps you access AWS credits and volume discounts to lower your Bedrock embeddings costs. Through group buying power, Wring negotiates better rates so you pay less per embedding request.
