Wring
All articlesAWS Guides

AWS Bedrock Knowledge Bases: Build RAG Systems

Build RAG systems with AWS Bedrock Knowledge Bases. Automate document ingestion, chunking, and retrieval — vector stores from $30/mo with Aurora pgvector.

Wring Team
March 14, 2026
6 min read
AWS BedrockKnowledge BasesRAGretrieval augmented generationvector searchdocument Q&A
Knowledge retrieval and document search AI system
Knowledge retrieval and document search AI system

Bedrock Knowledge Bases let you build Retrieval-Augmented Generation (RAG) systems without managing vector databases, embedding pipelines, or retrieval logic yourself. You point Bedrock at your documents in S3, it automatically chunks, embeds, and indexes them, then retrieves relevant context for every query. The result: LLM responses grounded in your actual data instead of hallucinated answers.

TL;DR: Knowledge Bases automate the RAG pipeline: document ingestion, chunking, embedding, vector storage, and retrieval. You pay for embedding model tokens, vector store infrastructure (OpenSearch Serverless minimum $701/month, or Aurora PostgreSQL from ~$30/month), and the LLM model tokens for query + retrieved context. For production RAG, Knowledge Bases save weeks of custom development. For cost-sensitive setups, use Aurora PostgreSQL with pgvector instead of OpenSearch Serverless.


How Knowledge Bases Work

Documents (S3) → Chunking → Embedding (Titan/Cohere) → Vector Store → Query → Retrieve → LLM Response
  1. Ingest: Upload documents to S3 (PDF, TXT, HTML, CSV, DOCX, MD)
  2. Chunk: Bedrock splits documents into chunks (configurable size)
  3. Embed: Each chunk is converted to a vector using an embedding model
  4. Store: Vectors are stored in a vector database
  5. Query: User question is embedded and matched against stored vectors
  6. Retrieve: Top-K most relevant chunks are returned
  7. Generate: LLM generates an answer using retrieved chunks as context
Bedrock Knowledge Bases Guide savings comparison

Vector Store Options

Vector StoreMinimum CostBest For
OpenSearch Serverless$701/month (4 OCUs)Large-scale, production
Aurora PostgreSQL (pgvector)~$30/month (db.t4g.medium)Cost-sensitive, existing Aurora
Pinecone$70/month (Starter)Third-party managed
MongoDB Atlas$57/monthExisting MongoDB users
Redis EnterpriseVariesLow-latency requirements

Cost warning: OpenSearch Serverless has a hard minimum of 4 OCUs (2 indexing + 2 search) at $0.24/OCU-hour = $701/month. For small knowledge bases, this is often the largest cost component.

Cost-effective alternative: Aurora PostgreSQL with pgvector extension starts at ~$30/month for db.t4g.medium and handles millions of vectors efficiently.


Pricing Breakdown

Document Ingestion (One-Time)

ComponentCost
Titan Embeddings V2$0.00002 per 1K tokens
Cohere Embed$0.0001 per 1K tokens

Embedding 10,000 documents (1,000 tokens average) with Titan: 10M tokens x $0.00002/1K = $0.20. Embedding is extremely cheap.

Vector Store (Ongoing)

Store100K vectors1M vectors10M vectors
OpenSearch Serverless$701/mo$701/mo$1,000+/mo
Aurora pgvector~$30/mo~$60/mo~$200/mo

Query Costs

Each query incurs:

  1. Embedding the question: ~$0.000002 (negligible)
  2. Vector search: Included in vector store cost
  3. LLM generation: Model token cost for question + retrieved chunks
ModelCost per Query (500 token question + 2,000 token context + 500 token response)
Claude Haiku$0.004
Claude Sonnet$0.016
Llama 70B$0.003

At 100,000 queries/month with Claude Haiku: $400/month in model costs + vector store.

Bedrock Knowledge Bases Guide process flow diagram

Chunking Strategies

The chunking configuration directly affects retrieval quality and cost.

StrategyChunk SizeOverlapBest For
Fixed size (default)300 tokens10%General purpose
SemanticVariableN/AHigh-quality retrieval
Sentence-based1-3 sentences1 sentenceFAQ, short documents
HierarchicalParent + child chunksN/ALong documents, reports

Recommendations:

  • Start with 300-500 token chunks for general documents
  • Use hierarchical chunking for long technical documents
  • Set overlap to 10-20% to preserve context at boundaries
  • Larger chunks increase LLM token costs but improve context quality

Architecture Patterns

Pattern 1: Simple Document Q&A

User → Bedrock KB API → Vector Search → Claude Haiku → Response

Best for: Internal knowledge bases, documentation search, FAQ systems. Cost: $400-800/month for moderate usage.

Pattern 2: Multi-Source RAG

User → Bedrock Agent → KB1 (Product docs) + KB2 (Policies) + KB3 (FAQ) → Claude Sonnet → Response

Each knowledge base can have its own data source and vector store. The Agent orchestrates which KB to query based on the question.

Pattern 3: RAG with Guardrails

User → Guardrails (input) → Bedrock KB → Claude → Guardrails (output) → Response

Guardrails filter harmful inputs and ensure outputs don't contain sensitive information from your knowledge base.


Performance Optimization

Improve Retrieval Quality

  • Increase Top-K: Retrieve more chunks (5-10 instead of default 3) for complex questions
  • Use metadata filters: Tag documents with categories and filter at query time
  • Hybrid search: Combine vector similarity with keyword matching (OpenSearch supports this natively)
  • Reranking: Enable the built-in reranker to improve relevance ordering

Reduce Costs

  • Use Aurora pgvector instead of OpenSearch Serverless for small-medium knowledge bases
  • Choose Claude Haiku for straightforward Q&A tasks
  • Cache frequent queries using DynamoDB or ElastiCache to avoid repeated LLM calls
  • Limit retrieved chunks — fewer chunks = fewer input tokens to the LLM

Common Pitfalls

  1. Choosing OpenSearch Serverless for a small KB — $701/month minimum for a knowledge base that could run on a $30/month Aurora instance
  2. Chunks too large — 1,000+ token chunks waste LLM tokens on irrelevant content
  3. Chunks too small — 50-token chunks lose context and reduce answer quality
  4. No metadata filtering — forcing the model to process irrelevant documents
  5. Ignoring ingestion sync — forgetting to re-sync after adding new documents to S3
Bedrock Knowledge Bases Guide optimization checklist

Related Guides


FAQ

How does Bedrock Knowledge Bases compare to building custom RAG?

Knowledge Bases save 2-4 weeks of development for the ingestion, chunking, embedding, and retrieval pipeline. The trade-off is less flexibility in chunking strategies, embedding models, and retrieval logic. For most production use cases, Knowledge Bases are sufficient.

Can I use Knowledge Bases with non-Bedrock models?

Knowledge Bases retrieval (the vector search part) can be used standalone via the Retrieve API. You can then pass the retrieved chunks to any model — including models hosted on SageMaker or external APIs.

How often should I re-sync my knowledge base?

Set up automatic sync using S3 event notifications or schedule syncs based on how frequently your documents change. For static documentation, weekly syncs are sufficient. For dynamic content, consider event-driven syncs.

Bedrock Knowledge Bases Guide key statistics

Lower Your Bedrock Knowledge Bases Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock Knowledge Bases costs. Through group buying power, Wring negotiates better rates so you pay less per knowledge base query.

Start saving on Bedrock Knowledge Bases →