Wring
All articlesAWS Guides

AWS Bedrock Embeddings: Vector Search Guide

AWS Bedrock embedding models for vector search and RAG. Titan V2 costs $0.00002/1K tokens — embed 1M documents for $20. Compare models and storage options.

Wring Team
March 14, 2026
7 min read
AWS Bedrockembeddingsvector searchsemantic similarityTitan EmbeddingsRAG embeddings
Vector embeddings and semantic search technology
Vector embeddings and semantic search technology

Embeddings convert text into numerical vectors that capture semantic meaning. Two sentences with similar meaning produce similar vectors, enabling semantic search, document clustering, recommendation systems, and the retrieval step in RAG (Retrieval-Augmented Generation). Bedrock offers embedding models that are cheap, fast, and production-ready — processing millions of documents for under $50.

TL;DR: Titan Embeddings V2 costs $0.00002 per 1,000 tokens and produces 1024-dimension vectors. Cohere Embed costs $0.0001 per 1,000 tokens with multilingual support. For English-only workloads, Titan is 5x cheaper. For multilingual needs, Cohere is the better choice. Store embeddings in OpenSearch, Aurora pgvector, or Pinecone. The embedding step is typically the cheapest part of any RAG or search pipeline.


Available Embedding Models

ModelDimensionsMax TokensCost/1K TokensLanguages
Titan Embeddings V2256, 512, or 10248,192$0.00002English + limited multilingual
Titan Multimodal Embeddings256, 512, or 1024128 (text) + image$0.0008/image, $0.00002/textEnglish
Cohere Embed English1024512$0.0001English
Cohere Embed Multilingual1024512$0.0001100+ languages
Bedrock Embeddings Guide savings comparison

Cost Analysis

Embedding is remarkably cheap compared to text generation:

ScenarioTitan V2 CostCohere Cost
1,000 documents (1K tokens each)$0.02$0.10
100,000 documents$2.00$10.00
1 million documents$20.00$100.00
10 million documents$200.00$1,000.00

For comparison, generating a 500-token response with Claude Haiku for 1 million queries costs $2,000 — embedding 1 million documents costs just $20 with Titan.


Choosing the Right Model

Titan Embeddings V2

Pros:

  • 5x cheaper than Cohere
  • Variable dimension output (256/512/1024) — lower dimensions = faster search, less storage
  • 8,192 token context window — can embed entire documents
  • Native AWS integration

Cons:

  • Weaker multilingual performance
  • Smaller training corpus than Cohere for some domains

Cohere Embed

Pros:

  • Superior multilingual support (100+ languages)
  • Specialized search and classification input types
  • Strong performance on domain-specific text

Cons:

  • 5x more expensive than Titan
  • 512 token max input — requires chunking for longer documents

Decision Matrix

RequirementRecommended Model
English-only, cost-sensitiveTitan V2 (1024-dim)
English-only, long documentsTitan V2 (8K context)
Multilingual contentCohere Multilingual
Multimodal (text + images)Titan Multimodal
Fastest search speedTitan V2 (256-dim)
Bedrock Embeddings Guide process flow diagram

Vector Storage Options

After generating embeddings, you need a vector database to store and search them:

StorageCostBest For
OpenSearch Serverless$701/mo minimumLarge-scale production, hybrid search
Aurora pgvector~$30/moCost-sensitive, existing PostgreSQL
OpenSearch ManagedFrom ~$50/moMedium-scale, full control
Pinecone$70/mo (Starter)Managed vector DB, simple setup
ElastiCache (Valkey)From ~$14/moLow-latency, smaller datasets

Storage Size Estimation

VectorsDimensionsApproximate Storage
100K1024~400 MB
1M1024~4 GB
10M1024~40 GB
100K256~100 MB

Lower dimensions (256) reduce storage 4x and improve search speed, with a small quality trade-off.


Common Use Cases

1. Semantic Search

Replace keyword search with meaning-based search:

Query: "How do I reduce my cloud bill?"
Keyword match: Only finds documents containing those exact words
Semantic match: Also finds "cost optimization strategies", "lower AWS spending", "save money on infrastructure"

2. RAG (Retrieval-Augmented Generation)

The most common Bedrock embedding use case:

User question → Embed question → Search vector DB → Retrieve top-K documents → Pass to LLM → Generate grounded answer

3. Document Classification

Embed documents and known category examples. Classify new documents by finding the nearest category embedding using cosine similarity.

4. Duplicate Detection

Embed all documents and find pairs with cosine similarity above 0.95. Useful for deduplicating support tickets, content, or data entries.

5. Recommendation Systems

Embed user preferences and item descriptions. Recommend items whose embeddings are closest to the user's preference vector.


Best Practices

1. Choose Dimensions Based on Use Case

DimensionSearch QualitySpeedStorage
1024HighestBaseline4x
512High2x faster2x
256Good4x faster1x

Start with 1024 for maximum quality. If search latency or storage is a concern, test with 512 — quality drop is often negligible.

2. Chunk Documents Appropriately

Cohere's 512-token limit requires chunking. Even for Titan's 8K window, chunking often improves retrieval quality:

Chunk SizeBest For
100-200 tokensFAQ, short answers
300-500 tokensGeneral documents, articles
500-1,000 tokensTechnical documentation, reports

3. Normalize Embeddings

Most vector databases expect normalized vectors (unit length) for cosine similarity search. Bedrock models output normalized vectors by default — verify this with your specific vector store.

4. Use Batch Processing for Large Datasets

For initial ingestion of millions of documents, call the embedding model in parallel or use batch API calls. Titan V2 handles high throughput efficiently.

5. Re-Embed When Changing Models

Embeddings from different models are not compatible. If you switch from Cohere to Titan (or vice versa), you must re-embed your entire corpus. Plan for this cost and downtime.

Bedrock Embeddings Guide optimization checklist

Related Guides


FAQ

Are Bedrock embeddings the same quality as OpenAI embeddings?

Titan Embeddings V2 and Cohere Embed both perform comparably to OpenAI's text-embedding-3-small on standard benchmarks (MTEB). For most production use cases, the quality difference is negligible. The main advantage of Bedrock embeddings is AWS-native integration and competitive pricing.

How do I measure embedding quality for my use case?

Create a test set of 50-100 query-document pairs where you know the correct match. Generate embeddings, run similarity search, and measure recall@K (what percentage of correct documents appear in the top K results). Compare across models and dimension settings.

Can I use the same embedding model for different languages?

Cohere Multilingual is designed for this — it maps text in 100+ languages into the same vector space. A query in English will find relevant documents in German or Japanese. Titan V2 has limited multilingual capability — use Cohere for cross-language search.

Bedrock Embeddings Guide key statistics

Lower Your Bedrock Embeddings Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock embeddings costs. Through group buying power, Wring negotiates better rates so you pay less per embedding request.

Start saving on Bedrock embeddings →