AWS Bedrock Embeddings: Vector Search Guide

Embeddings convert text into numerical vectors that capture semantic meaning. Two sentences with similar meaning produce similar vectors, enabling semantic search, document clustering, recommendation systems, and the retrieval step in RAG (Retrieval-Augmented Generation). Bedrock offers embedding models that are cheap, fast, and production-ready — processing millions of documents for under $50.

TL;DR: Titan Embeddings V2 costs $0.00002 per 1,000 tokens and produces 1024-dimension vectors. Cohere Embed costs $0.0001 per 1,000 tokens with multilingual support. For English-only workloads, Titan is 5x cheaper. For multilingual needs, Cohere is the better choice. Store embeddings in OpenSearch, Aurora pgvector, or Pinecone. The embedding step is typically the cheapest part of any RAG or search pipeline.

Available Embedding Models

Model	Dimensions	Max Tokens	Cost/1K Tokens	Languages
Titan Embeddings V2	256, 512, or 1024	8,192	$0.00002	English + limited multilingual
Titan Multimodal Embeddings	256, 512, or 1024	128 (text) + image	$0.0008/image, $0.00002/text	English
Cohere Embed English	1024	512	$0.0001	English
Cohere Embed Multilingual	1024	512	$0.0001	100+ languages

Bedrock Embeddings Guide savings comparison

Cost Analysis

Embedding is remarkably cheap compared to text generation:

Scenario	Titan V2 Cost	Cohere Cost
1,000 documents (1K tokens each)	$0.02	$0.10
100,000 documents	$2.00	$10.00
1 million documents	$20.00	$100.00
10 million documents	$200.00	$1,000.00

For comparison, generating a 500-token response with Claude Haiku for 1 million queries costs $2,000 — embedding 1 million documents costs just $20 with Titan.

Choosing the Right Model

Titan Embeddings V2

Pros:

5x cheaper than Cohere
Variable dimension output (256/512/1024) — lower dimensions = faster search, less storage
8,192 token context window — can embed entire documents
Native AWS integration

Cons:

Weaker multilingual performance
Smaller training corpus than Cohere for some domains

Cohere Embed

Pros:

Superior multilingual support (100+ languages)
Specialized search and classification input types
Strong performance on domain-specific text

Cons:

5x more expensive than Titan
512 token max input — requires chunking for longer documents

Decision Matrix

Requirement	Recommended Model
English-only, cost-sensitive	Titan V2 (1024-dim)
English-only, long documents	Titan V2 (8K context)
Multilingual content	Cohere Multilingual
Multimodal (text + images)	Titan Multimodal
Fastest search speed	Titan V2 (256-dim)

Bedrock Embeddings Guide process flow diagram

Vector Storage Options

After generating embeddings, you need a vector database to store and search them:

Storage	Cost	Best For
OpenSearch Serverless	$701/mo minimum	Large-scale production, hybrid search
Aurora pgvector	~$30/mo	Cost-sensitive, existing PostgreSQL
OpenSearch Managed	From ~$50/mo	Medium-scale, full control
Pinecone	$70/mo (Starter)	Managed vector DB, simple setup
ElastiCache (Valkey)	From ~$14/mo	Low-latency, smaller datasets

Storage Size Estimation

Vectors	Dimensions	Approximate Storage
100K	1024	~400 MB
1M	1024	~4 GB
10M	1024	~40 GB
100K	256	~100 MB

Lower dimensions (256) reduce storage 4x and improve search speed, with a small quality trade-off.

Common Use Cases

1. Semantic Search

Replace keyword search with meaning-based search:

Query: "How do I reduce my cloud bill?"
Keyword match: Only finds documents containing those exact words
Semantic match: Also finds "cost optimization strategies", "lower AWS spending", "save money on infrastructure"

2. RAG (Retrieval-Augmented Generation)

The most common Bedrock embedding use case:

User question → Embed question → Search vector DB → Retrieve top-K documents → Pass to LLM → Generate grounded answer

3. Document Classification

Embed documents and known category examples. Classify new documents by finding the nearest category embedding using cosine similarity.

4. Duplicate Detection

Embed all documents and find pairs with cosine similarity above 0.95. Useful for deduplicating support tickets, content, or data entries.

5. Recommendation Systems

Embed user preferences and item descriptions. Recommend items whose embeddings are closest to the user's preference vector.

Best Practices

1. Choose Dimensions Based on Use Case

Dimension	Search Quality	Speed	Storage
1024	Highest	Baseline	4x
512	High	2x faster	2x
256	Good	4x faster	1x

Start with 1024 for maximum quality. If search latency or storage is a concern, test with 512 — quality drop is often negligible.

2. Chunk Documents Appropriately

Cohere's 512-token limit requires chunking. Even for Titan's 8K window, chunking often improves retrieval quality:

Chunk Size	Best For
100-200 tokens	FAQ, short answers
300-500 tokens	General documents, articles
500-1,000 tokens	Technical documentation, reports

3. Normalize Embeddings

Most vector databases expect normalized vectors (unit length) for cosine similarity search. Bedrock models output normalized vectors by default — verify this with your specific vector store.

4. Use Batch Processing for Large Datasets

For initial ingestion of millions of documents, call the embedding model in parallel or use batch API calls. Titan V2 handles high throughput efficiently.

5. Re-Embed When Changing Models

Embeddings from different models are not compatible. If you switch from Cohere to Titan (or vice versa), you must re-embed your entire corpus. Plan for this cost and downtime.

Bedrock Embeddings Guide optimization checklist

Related Guides

FAQ

Are Bedrock embeddings the same quality as OpenAI embeddings?

Titan Embeddings V2 and Cohere Embed both perform comparably to OpenAI's text-embedding-3-small on standard benchmarks (MTEB). For most production use cases, the quality difference is negligible. The main advantage of Bedrock embeddings is AWS-native integration and competitive pricing.

How do I measure embedding quality for my use case?

Create a test set of 50-100 query-document pairs where you know the correct match. Generate embeddings, run similarity search, and measure recall@K (what percentage of correct documents appear in the top K results). Compare across models and dimension settings.

Can I use the same embedding model for different languages?

Cohere Multilingual is designed for this — it maps text in 100+ languages into the same vector space. A query in English will find relevant documents in German or Japanese. Titan V2 has limited multilingual capability — use Cohere for cross-language search.

Lower Your Bedrock Embeddings Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock embeddings costs. Through group buying power, Wring negotiates better rates so you pay less per embedding request.

Start saving on Bedrock embeddings →