AWS GPU Instance Pricing: P5, P4d, G5, Inf2

Server hardware and GPU infrastructure for machine learning

AWS GPU instances span a wide range of price points and capabilities, from $0.758/hr for inference-optimized Inf2 instances up to $98.32/hr for the top-tier P5.48xlarge with eight NVIDIA H100 GPUs. Choosing the right instance family is the highest-impact decision for your ML infrastructure budget. The wrong choice can cost 10-40x more than necessary for the same workload. This guide covers every GPU and accelerator instance family available on EC2 with current on-demand, Spot, and Reserved pricing.

TL;DR: For training, use P5 (H100) or P4d (A100) for large models, Trn1 (Trainium) for cost-efficient training at $1.34/hr. For inference, Inf2 starts at $0.758/hr and beats GPUs on price-performance for supported models. G5 (A10G) at $1.006/hr is the best general-purpose GPU. Spot instances save 60-90% across all families.

GPU Instance Family Overview

Family	GPU/Accelerator	GPUs per Instance	Primary Use Case	On-Demand Start
P5	NVIDIA H100	8	Large model training	$98.32/hr
P4d	NVIDIA A100	8	Training and inference	$32.77/hr
P3	NVIDIA V100	1-8	Legacy training	$3.06/hr
G6	NVIDIA L4	1-8	Inference, graphics	$0.978/hr
G5	NVIDIA A10G	1-8	Inference, graphics	$1.006/hr
G4dn	NVIDIA T4	1-4	Budget inference	$0.526/hr
Inf2	AWS Inferentia2	1-12	High-throughput inference	$0.758/hr
Trn1	AWS Trainium	1-16	Cost-efficient training	$1.34/hr

Gpu Instance Pricing Guide comparison chart

P5 Instances (NVIDIA H100)

The P5 family delivers the highest GPU performance on AWS, powered by NVIDIA H100 Tensor Core GPUs with 80 GB HBM3 memory each.

Instance	GPUs	GPU Memory	vCPUs	RAM	Network	On-Demand/hr
p5.48xlarge	8x H100	640 GB HBM3	192	2,048 GB	3,200 Gbps EFAv2	$98.32

P5 instances are designed for the largest training jobs — foundation models, LLM pre-training, and large-scale distributed training. The 3,200 Gbps EFA networking enables efficient multi-node training. At $98.32/hr on-demand, a single month of continuous P5 usage costs $71,770. Reserved pricing and Spot instances are essential for managing costs.

Pricing Option	p5.48xlarge/hr	Monthly (continuous)	Annual Savings
On-Demand	$98.32	$71,770	Baseline
1-Year Reserved (No Upfront)	~$62.24	$45,435	37%
3-Year Reserved (All Upfront)	~$37.39	$27,295	62%
Spot (varies)	~$29.50-$49.16	Variable	50-70%

P4d Instances (NVIDIA A100)

The P4d family uses NVIDIA A100 GPUs with 40 GB HBM2e memory. Still the workhorse for many training and inference workloads.

Instance	GPUs	GPU Memory	vCPUs	RAM	On-Demand/hr
p4d.24xlarge	8x A100	320 GB HBM2e	96	1,152 GB	$32.77
p4de.24xlarge	8x A100 (80 GB)	640 GB HBM2e	96	1,152 GB	$40.97

Pricing Option	p4d.24xlarge/hr	Monthly (continuous)	Annual Savings
On-Demand	$32.77	$23,922	Baseline
1-Year Reserved	~$20.37	$14,870	38%
3-Year Reserved	~$12.58	$9,184	62%
Spot	~$9.83-$16.38	Variable	50-70%

G5 Instances (NVIDIA A10G)

The G5 family is the most popular choice for inference workloads and single-GPU training. A10G GPUs provide 24 GB GDDR6X memory.

Instance	GPUs	GPU Memory	vCPUs	RAM	On-Demand/hr
g5.xlarge	1x A10G	24 GB	4	16 GB	$1.006
g5.2xlarge	1x A10G	24 GB	8	32 GB	$1.212
g5.4xlarge	1x A10G	24 GB	16	64 GB	$1.624
g5.8xlarge	1x A10G	24 GB	32	128 GB	$2.448
g5.12xlarge	4x A10G	96 GB	48	192 GB	$5.672
g5.24xlarge	4x A10G	96 GB	96	384 GB	$8.144
g5.48xlarge	8x A10G	192 GB	192	768 GB	$16.288

Spot pricing for G5 instances typically provides 60-70% savings:

Instance	On-Demand/hr	Spot/hr (typical)	Savings
g5.xlarge	$1.006	$0.30-$0.50	50-70%
g5.2xlarge	$1.212	$0.36-$0.60	50-70%
g5.12xlarge	$5.672	$1.70-$2.83	50-70%

Gpu Instance Pricing Guide process flow diagram

G6 Instances (NVIDIA L4)

G6 instances use the newer NVIDIA L4 GPUs, offering better inference performance per dollar than G5 for many workloads.

Instance	GPUs	GPU Memory	vCPUs	RAM	On-Demand/hr
g6.xlarge	1x L4	24 GB	4	16 GB	$0.978
g6.2xlarge	1x L4	24 GB	8	32 GB	$1.168
g6.4xlarge	1x L4	24 GB	16	64 GB	$1.548
g6.12xlarge	4x L4	96 GB	48	192 GB	$5.016
g6.48xlarge	8x L4	192 GB	192	768 GB	$13.35

Inf2 Instances (AWS Inferentia2)

Inf2 instances use AWS-designed Inferentia2 chips, offering the lowest cost per inference on AWS. They require the AWS Neuron SDK.

Instance	Accelerators	Accelerator Memory	vCPUs	RAM	On-Demand/hr
inf2.xlarge	1x Inferentia2	32 GB HBM2e	4	16 GB	$0.758
inf2.8xlarge	1x Inferentia2	32 GB HBM2e	32	128 GB	$1.968
inf2.24xlarge	6x Inferentia2	192 GB HBM2e	96	384 GB	$6.49
inf2.48xlarge	12x Inferentia2	384 GB HBM2e	192	768 GB	$12.981

Trn1 Instances (AWS Trainium)

Trn1 instances use AWS Trainium chips, purpose-built for deep learning training at lower cost than NVIDIA GPUs.

Instance	Accelerators	Accelerator Memory	vCPUs	RAM	On-Demand/hr
trn1.2xlarge	1x Trainium	32 GB HBM2e	8	32 GB	$1.34
trn1.32xlarge	16x Trainium	512 GB HBM2e	128	512 GB	$21.50
trn1n.32xlarge	16x Trainium	512 GB HBM2e	128	512 GB	$24.78

Trn1n includes enhanced networking (1,600 Gbps EFA) for multi-node distributed training.

Training vs Inference: Choosing the Right Instance

Workload	Recommended Instance	Why
LLM pre-training (100B+ params)	P5.48xlarge	H100 with 3,200 Gbps networking
Fine-tuning 7B-70B models	P4d.24xlarge or Trn1.32xlarge	Good performance, lower cost
Fine-tuning under 7B models	G5.xlarge or G5.2xlarge	Single A10G is sufficient
LLM inference (70B models)	Inf2.48xlarge or G5.48xlarge	High memory, low per-token cost
LLM inference (7B-13B models)	Inf2.xlarge or G5.xlarge	Single accelerator is enough
Image generation	G5.2xlarge or G6.2xlarge	Optimized for single-GPU workloads
Cost-efficient training	Trn1.32xlarge	Up to 50% cheaper than P4d

Cost Optimization Tips

Use Spot instances for training — Spot pricing saves 60-90% for GPU instances. Implement checkpointing every 15-30 minutes so you can resume from interruptions without losing progress.
Match instance to model size — Running a 7B parameter model on a p4d.24xlarge (8x A100) wastes 87% of your GPU capacity. A single g5.xlarge is often sufficient.
Consider Inferentia2 for inference — Inf2.xlarge at $0.758/hr provides better throughput per dollar than G5 instances for supported models (Llama, GPT-NeoX, BERT, and more via the Neuron SDK).
Use Reserved Instances for steady-state workloads — 3-year Reserved pricing saves up to 62% on P4d and P5 instances. Combine with Spot for burst training.
Shut down idle instances — GPU instances are the most expensive EC2 types. An idle p4d.24xlarge running over a weekend costs $1,573. Implement auto-stop scripts or use SageMaker managed training with automatic termination.
Use Trainium for compatible training jobs — Trn1 instances offer 30-50% cost savings over equivalent GPU instances for training workloads that are compatible with the Neuron SDK.

Gpu Instance Pricing Guide optimization checklist

Related Guides

FAQ

Which GPU instance is best for fine-tuning LLMs?

For models under 13B parameters, a single g5.2xlarge (1x A10G, 24 GB) is typically sufficient and costs $1.212/hr. For 13B-70B models, use p4d.24xlarge (8x A100, 320 GB) at $32.77/hr or trn1.32xlarge at $21.50/hr. For 70B+ models, P5 instances with H100 GPUs provide the best training speed.

How much can I save with Spot instances for ML training?

Spot instances typically save 60-70% for G5 instances and 50-70% for P4d/P5 instances. The key requirement is implementing checkpointing so training can resume after a Spot interruption. SageMaker Managed Spot Training handles this automatically.

Should I use Inferentia2 or NVIDIA GPUs for inference?

Use Inferentia2 (Inf2) when your model is supported by the AWS Neuron SDK and you want the lowest cost per inference. Inf2.xlarge at $0.758/hr is 25% cheaper than the comparable g5.xlarge at $1.006/hr. Use NVIDIA GPUs when you need broader model compatibility, CUDA-specific features, or when running models not yet supported by Neuron.

Gpu Instance Pricing Guide savings breakdown

Lower Your GPU Instance Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your GPU instance costs. Through group buying power, Wring negotiates better rates so you pay less per GPU hour.

Start saving on AWS →