AWS Bedrock Fine-Tuning: Customize Models

AI model training and customization technology

Fine-tuning adapts a foundation model to your specific domain, style, or task by training on your own examples. Instead of relying on lengthy system prompts and few-shot examples, a fine-tuned model internalizes your requirements and produces consistent outputs with shorter prompts — saving tokens and improving quality for specialized use cases.

TL;DR: Bedrock supports fine-tuning for select models (Titan, Llama, Cohere). Fine-tuning costs $8 per model unit-hour for training. You need 50-10,000 training examples in JSONL format. Fine-tuning is worth it when: (1) prompt engineering has plateaued in quality, (2) you need consistent output format/style, or (3) token costs from large prompts exceed training costs. For most use cases, start with prompt engineering and Knowledge Bases before investing in fine-tuning.

When to Fine-Tune vs Other Approaches

Approach	Best For	Cost	Setup Time
Prompt engineering	Quick iteration, general tasks	Token costs only	Minutes
Few-shot prompting	Format consistency with examples	Higher token costs	Minutes
Knowledge Bases (RAG)	Grounding in your data	Vector store + tokens	Hours
Fine-tuning	Domain specialization, style adaptation	Training + inference	Days
Continued pre-training	Teaching domain knowledge	Highest	Days-weeks

Fine-Tune When:

Prompt engineering quality has plateaued and you need better performance
Your system prompt exceeds 2,000 tokens (fine-tuning eliminates the need for lengthy instructions)
You need consistent output format across thousands of requests
Domain-specific language or terminology must be used precisely
You have 100+ high-quality labeled examples

Don't Fine-Tune When:

You need the model to access current or changing information (use RAG instead)
You have fewer than 50 training examples
The task is general-purpose (summarization, translation)
Prompt engineering produces acceptable quality

Bedrock Fine Tuning Guide savings comparison

Supported Models

Model	Fine-Tuning	Continued Pre-Training
Amazon Titan Text Express	Yes	Yes
Amazon Titan Text Lite	Yes	Yes
Meta Llama 3.1 8B	Yes	No
Meta Llama 3.1 70B	Yes	No
Cohere Command	Yes	No

Note: Claude models are not available for fine-tuning on Bedrock. For Claude customization, use prompt engineering, Knowledge Bases, or Anthropic's direct fine-tuning program. For self-hosted alternatives, see SageMaker.

Training Data Format

Training data must be in JSONL format, stored in S3:

{"prompt": "Classify this support ticket: My order hasn't arrived", "completion": "category: shipping, priority: medium, sentiment: frustrated"}
{"prompt": "Classify this support ticket: The app keeps crashing", "completion": "category: technical, priority: high, sentiment: frustrated"}

Data Requirements

Requirement	Details
Format	JSONL (one example per line)
Minimum examples	50 (recommended: 500-5,000)
Maximum examples	Model-dependent (typically 10,000-100,000)
Validation split	20% holdout recommended
Quality	Consistent, correct, representative of desired behavior

Data Preparation Tips

Quality over quantity: 500 perfect examples outperform 5,000 noisy ones
Cover edge cases: Include examples of tricky inputs and expected handling
Consistent format: All completions should follow the exact same output structure
Balanced classes: For classification, balance examples across categories
Remove duplicates: Duplicate examples cause overfitting without improving quality

Bedrock Fine Tuning Guide process flow diagram

Pricing

Fine-Tuning Training

Component	Cost
Training	$8.00 per model unit-hour
Typical job	2-8 hours depending on data size and model

A typical fine-tuning job with 1,000 examples on Titan Text Express takes approximately 2-4 hours, costing $16-32.

Custom Model Inference

Component	Cost
Provisioned Throughput (required)	Model-dependent, commitment required
No on-demand option	You must purchase Provisioned Throughput to use a fine-tuned model

Important: Fine-tuned models on Bedrock require Provisioned Throughput for inference — there's no pay-per-token option. This means you need sustained usage to justify the fixed cost.

Continued Pre-Training

Component	Cost
Training	$6.00 per model unit-hour
Typical job	4-24 hours depending on corpus size

Fine-Tuning Process

Step 1: Prepare Training Data

Create JSONL file with prompt-completion pairs. Upload to S3. Create a validation set (20% holdout).

Step 2: Configure Training Job

Set in the Bedrock console or API:

Base model: The foundation model to fine-tune
Training data: S3 location of JSONL file
Validation data: S3 location of holdout set
Hyperparameters: Epochs, learning rate, batch size
Output: S3 location for model artifacts

Step 3: Monitor Training

Track training metrics:

Training loss: Should decrease consistently
Validation loss: Should decrease and then plateau (not increase)
If validation loss increases: Training is overfitting — reduce epochs or add more data

Step 4: Evaluate

Compare fine-tuned model against the base model on a held-out test set:

Run identical prompts through both models
Score outputs on accuracy, format compliance, and quality
Verify that fine-tuning improved the target metric

Step 5: Deploy

Purchase Provisioned Throughput for the fine-tuned model and integrate into your application using the custom model ID.

Hyperparameter Guidance

Parameter	Default	Recommendation
Epochs	5	Start with 3-5, increase if validation loss still decreasing
Learning rate	Model-dependent	Use default unless you see instability
Batch size	Model-dependent	Larger = faster training, potentially less stable

Key rule: Monitor validation loss. If it starts increasing while training loss decreases, you're overfitting. Stop and use the checkpoint from the lowest validation loss.

Bedrock Fine Tuning Guide optimization checklist

Related Guides

FAQ

Is fine-tuning worth the Provisioned Throughput requirement?

Only if your inference volume is high enough to justify reserved capacity. For sporadic usage, prompt engineering with few-shot examples is more cost-effective. Fine-tuning makes financial sense when daily inference costs already exceed the Provisioned Throughput minimum.

How many examples do I need for good results?

For classification tasks: 50-200 examples per class. For generation tasks: 500-2,000 examples. For complex reasoning: 2,000-5,000 examples. More data generally improves quality, but diminishing returns set in above 5,000 examples for most tasks.

Can I fine-tune a fine-tuned model (iterative fine-tuning)?

Not directly on Bedrock. Each fine-tuning job starts from the base foundation model. To iterate, add new examples to your training data and run a new fine-tuning job from the base model.

Bedrock Fine Tuning Guide key statistics

Lower Your Bedrock Fine-Tuning Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your Bedrock fine-tuning costs. Through group buying power, Wring negotiates better rates so you pay less per training hour.

Start saving on Bedrock fine-tuning →