Wring
All articlesAWS Guides

AWS Bedrock Fine-Tuning: Customize Models

Fine-tune models on AWS Bedrock at $8/model unit-hour. Learn when fine-tuning beats prompt engineering and how to prepare training data.

Wring Team
March 14, 2026
7 min read
AWS Bedrockfine-tuningmodel customizationcustom modelstransfer learningAI training
AI model training and customization technology
AI model training and customization technology

Fine-tuning adapts a foundation model to your specific domain, style, or task by training on your own examples. Instead of relying on lengthy system prompts and few-shot examples, a fine-tuned model internalizes your requirements and produces consistent outputs with shorter prompts — saving tokens and improving quality for specialized use cases.

TL;DR: Bedrock supports fine-tuning for select models (Titan, Llama, Cohere). Fine-tuning costs $8 per model unit-hour for training. You need 50-10,000 training examples in JSONL format. Fine-tuning is worth it when: (1) prompt engineering has plateaued in quality, (2) you need consistent output format/style, or (3) token costs from large prompts exceed training costs. For most use cases, start with prompt engineering and Knowledge Bases before investing in fine-tuning.


When to Fine-Tune vs Other Approaches

ApproachBest ForCostSetup Time
Prompt engineeringQuick iteration, general tasksToken costs onlyMinutes
Few-shot promptingFormat consistency with examplesHigher token costsMinutes
Knowledge Bases (RAG)Grounding in your dataVector store + tokensHours
Fine-tuningDomain specialization, style adaptationTraining + inferenceDays
Continued pre-trainingTeaching domain knowledgeHighestDays-weeks

Fine-Tune When:

  • Prompt engineering quality has plateaued and you need better performance
  • Your system prompt exceeds 2,000 tokens (fine-tuning eliminates the need for lengthy instructions)
  • You need consistent output format across thousands of requests
  • Domain-specific language or terminology must be used precisely
  • You have 100+ high-quality labeled examples

Don't Fine-Tune When:

  • You need the model to access current or changing information (use RAG instead)
  • You have fewer than 50 training examples
  • The task is general-purpose (summarization, translation)
  • Prompt engineering produces acceptable quality
Bedrock Fine Tuning Guide savings comparison

Supported Models

ModelFine-TuningContinued Pre-Training
Amazon Titan Text ExpressYesYes
Amazon Titan Text LiteYesYes
Meta Llama 3.1 8BYesNo
Meta Llama 3.1 70BYesNo
Cohere CommandYesNo

Note: Claude models are not available for fine-tuning on Bedrock. For Claude customization, use prompt engineering, Knowledge Bases, or Anthropic's direct fine-tuning program. For self-hosted alternatives, see SageMaker.


Training Data Format

Training data must be in JSONL format, stored in S3:

{"prompt": "Classify this support ticket: My order hasn't arrived", "completion": "category: shipping, priority: medium, sentiment: frustrated"}
{"prompt": "Classify this support ticket: The app keeps crashing", "completion": "category: technical, priority: high, sentiment: frustrated"}

Data Requirements

RequirementDetails
FormatJSONL (one example per line)
Minimum examples50 (recommended: 500-5,000)
Maximum examplesModel-dependent (typically 10,000-100,000)
Validation split20% holdout recommended
QualityConsistent, correct, representative of desired behavior

Data Preparation Tips

  • Quality over quantity: 500 perfect examples outperform 5,000 noisy ones
  • Cover edge cases: Include examples of tricky inputs and expected handling
  • Consistent format: All completions should follow the exact same output structure
  • Balanced classes: For classification, balance examples across categories
  • Remove duplicates: Duplicate examples cause overfitting without improving quality
Bedrock Fine Tuning Guide process flow diagram

Pricing

Fine-Tuning Training

ComponentCost
Training$8.00 per model unit-hour
Typical job2-8 hours depending on data size and model

A typical fine-tuning job with 1,000 examples on Titan Text Express takes approximately 2-4 hours, costing $16-32.

Custom Model Inference

ComponentCost
Provisioned Throughput (required)Model-dependent, commitment required
No on-demand optionYou must purchase Provisioned Throughput to use a fine-tuned model

Important: Fine-tuned models on Bedrock require Provisioned Throughput for inference — there's no pay-per-token option. This means you need sustained usage to justify the fixed cost.

Continued Pre-Training

ComponentCost
Training$6.00 per model unit-hour
Typical job4-24 hours depending on corpus size

Fine-Tuning Process

Step 1: Prepare Training Data

Create JSONL file with prompt-completion pairs. Upload to S3. Create a validation set (20% holdout).

Step 2: Configure Training Job

Set in the Bedrock console or API:

  • Base model: The foundation model to fine-tune
  • Training data: S3 location of JSONL file
  • Validation data: S3 location of holdout set
  • Hyperparameters: Epochs, learning rate, batch size
  • Output: S3 location for model artifacts

Step 3: Monitor Training

Track training metrics:

  • Training loss: Should decrease consistently
  • Validation loss: Should decrease and then plateau (not increase)
  • If validation loss increases: Training is overfitting — reduce epochs or add more data

Step 4: Evaluate

Compare fine-tuned model against the base model on a held-out test set:

  • Run identical prompts through both models
  • Score outputs on accuracy, format compliance, and quality
  • Verify that fine-tuning improved the target metric

Step 5: Deploy

Purchase Provisioned Throughput for the fine-tuned model and integrate into your application using the custom model ID.


Hyperparameter Guidance

ParameterDefaultRecommendation
Epochs5Start with 3-5, increase if validation loss still decreasing
Learning rateModel-dependentUse default unless you see instability
Batch sizeModel-dependentLarger = faster training, potentially less stable

Key rule: Monitor validation loss. If it starts increasing while training loss decreases, you're overfitting. Stop and use the checkpoint from the lowest validation loss.

Bedrock Fine Tuning Guide optimization checklist

Related Guides


FAQ

Is fine-tuning worth the Provisioned Throughput requirement?

Only if your inference volume is high enough to justify reserved capacity. For sporadic usage, prompt engineering with few-shot examples is more cost-effective. Fine-tuning makes financial sense when daily inference costs already exceed the Provisioned Throughput minimum.

How many examples do I need for good results?

For classification tasks: 50-200 examples per class. For generation tasks: 500-2,000 examples. For complex reasoning: 2,000-5,000 examples. More data generally improves quality, but diminishing returns set in above 5,000 examples for most tasks.

Can I fine-tune a fine-tuned model (iterative fine-tuning)?

Not directly on Bedrock. Each fine-tuning job starts from the base foundation model. To iterate, add new examples to your training data and run a new fine-tuning job from the base model.

Bedrock Fine Tuning Guide key statistics

Lower Your Bedrock Fine-Tuning Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock fine-tuning costs. Through group buying power, Wring negotiates better rates so you pay less per training hour.

Start saving on Bedrock fine-tuning →