Wring
All articlesAWS Guides

AWS GPU Instance Pricing: P5, P4d, G5, Inf2

AWS GPU instance pricing compared: P5 at $32.77/hr, G5 from $1.006/hr, Inf2 from $0.758/hr. Spot, Reserved, and on-demand rates for ML workloads.

Wring Team
March 15, 2026
9 min read
AWS GPU pricingP5 instancesGPU costsML training instances
Server hardware and GPU infrastructure for machine learning
Server hardware and GPU infrastructure for machine learning

AWS GPU instances span a wide range of price points and capabilities, from $0.758/hr for inference-optimized Inf2 instances up to $98.32/hr for the top-tier P5.48xlarge with eight NVIDIA H100 GPUs. Choosing the right instance family is the highest-impact decision for your ML infrastructure budget. The wrong choice can cost 10-40x more than necessary for the same workload. This guide covers every GPU and accelerator instance family available on EC2 with current on-demand, Spot, and Reserved pricing.

TL;DR: For training, use P5 (H100) or P4d (A100) for large models, Trn1 (Trainium) for cost-efficient training at $1.34/hr. For inference, Inf2 starts at $0.758/hr and beats GPUs on price-performance for supported models. G5 (A10G) at $1.006/hr is the best general-purpose GPU. Spot instances save 60-90% across all families.


GPU Instance Family Overview

FamilyGPU/AcceleratorGPUs per InstancePrimary Use CaseOn-Demand Start
P5NVIDIA H1008Large model training$98.32/hr
P4dNVIDIA A1008Training and inference$32.77/hr
P3NVIDIA V1001-8Legacy training$3.06/hr
G6NVIDIA L41-8Inference, graphics$0.978/hr
G5NVIDIA A10G1-8Inference, graphics$1.006/hr
G4dnNVIDIA T41-4Budget inference$0.526/hr
Inf2AWS Inferentia21-12High-throughput inference$0.758/hr
Trn1AWS Trainium1-16Cost-efficient training$1.34/hr
Gpu Instance Pricing Guide comparison chart

P5 Instances (NVIDIA H100)

The P5 family delivers the highest GPU performance on AWS, powered by NVIDIA H100 Tensor Core GPUs with 80 GB HBM3 memory each.

InstanceGPUsGPU MemoryvCPUsRAMNetworkOn-Demand/hr
p5.48xlarge8x H100640 GB HBM31922,048 GB3,200 Gbps EFAv2$98.32

P5 instances are designed for the largest training jobs — foundation models, LLM pre-training, and large-scale distributed training. The 3,200 Gbps EFA networking enables efficient multi-node training. At $98.32/hr on-demand, a single month of continuous P5 usage costs $71,770. Reserved pricing and Spot instances are essential for managing costs.

Pricing Optionp5.48xlarge/hrMonthly (continuous)Annual Savings
On-Demand$98.32$71,770Baseline
1-Year Reserved (No Upfront)~$62.24$45,43537%
3-Year Reserved (All Upfront)~$37.39$27,29562%
Spot (varies)~$29.50-$49.16Variable50-70%

P4d Instances (NVIDIA A100)

The P4d family uses NVIDIA A100 GPUs with 40 GB HBM2e memory. Still the workhorse for many training and inference workloads.

InstanceGPUsGPU MemoryvCPUsRAMOn-Demand/hr
p4d.24xlarge8x A100320 GB HBM2e961,152 GB$32.77
p4de.24xlarge8x A100 (80 GB)640 GB HBM2e961,152 GB$40.97
Pricing Optionp4d.24xlarge/hrMonthly (continuous)Annual Savings
On-Demand$32.77$23,922Baseline
1-Year Reserved~$20.37$14,87038%
3-Year Reserved~$12.58$9,18462%
Spot~$9.83-$16.38Variable50-70%

G5 Instances (NVIDIA A10G)

The G5 family is the most popular choice for inference workloads and single-GPU training. A10G GPUs provide 24 GB GDDR6X memory.

InstanceGPUsGPU MemoryvCPUsRAMOn-Demand/hr
g5.xlarge1x A10G24 GB416 GB$1.006
g5.2xlarge1x A10G24 GB832 GB$1.212
g5.4xlarge1x A10G24 GB1664 GB$1.624
g5.8xlarge1x A10G24 GB32128 GB$2.448
g5.12xlarge4x A10G96 GB48192 GB$5.672
g5.24xlarge4x A10G96 GB96384 GB$8.144
g5.48xlarge8x A10G192 GB192768 GB$16.288

Spot pricing for G5 instances typically provides 60-70% savings:

InstanceOn-Demand/hrSpot/hr (typical)Savings
g5.xlarge$1.006$0.30-$0.5050-70%
g5.2xlarge$1.212$0.36-$0.6050-70%
g5.12xlarge$5.672$1.70-$2.8350-70%
Gpu Instance Pricing Guide process flow diagram

G6 Instances (NVIDIA L4)

G6 instances use the newer NVIDIA L4 GPUs, offering better inference performance per dollar than G5 for many workloads.

InstanceGPUsGPU MemoryvCPUsRAMOn-Demand/hr
g6.xlarge1x L424 GB416 GB$0.978
g6.2xlarge1x L424 GB832 GB$1.168
g6.4xlarge1x L424 GB1664 GB$1.548
g6.12xlarge4x L496 GB48192 GB$5.016
g6.48xlarge8x L4192 GB192768 GB$13.35

Inf2 Instances (AWS Inferentia2)

Inf2 instances use AWS-designed Inferentia2 chips, offering the lowest cost per inference on AWS. They require the AWS Neuron SDK.

InstanceAcceleratorsAccelerator MemoryvCPUsRAMOn-Demand/hr
inf2.xlarge1x Inferentia232 GB HBM2e416 GB$0.758
inf2.8xlarge1x Inferentia232 GB HBM2e32128 GB$1.968
inf2.24xlarge6x Inferentia2192 GB HBM2e96384 GB$6.49
inf2.48xlarge12x Inferentia2384 GB HBM2e192768 GB$12.981

Trn1 Instances (AWS Trainium)

Trn1 instances use AWS Trainium chips, purpose-built for deep learning training at lower cost than NVIDIA GPUs.

InstanceAcceleratorsAccelerator MemoryvCPUsRAMOn-Demand/hr
trn1.2xlarge1x Trainium32 GB HBM2e832 GB$1.34
trn1.32xlarge16x Trainium512 GB HBM2e128512 GB$21.50
trn1n.32xlarge16x Trainium512 GB HBM2e128512 GB$24.78

Trn1n includes enhanced networking (1,600 Gbps EFA) for multi-node distributed training.


Training vs Inference: Choosing the Right Instance

WorkloadRecommended InstanceWhy
LLM pre-training (100B+ params)P5.48xlargeH100 with 3,200 Gbps networking
Fine-tuning 7B-70B modelsP4d.24xlarge or Trn1.32xlargeGood performance, lower cost
Fine-tuning under 7B modelsG5.xlarge or G5.2xlargeSingle A10G is sufficient
LLM inference (70B models)Inf2.48xlarge or G5.48xlargeHigh memory, low per-token cost
LLM inference (7B-13B models)Inf2.xlarge or G5.xlargeSingle accelerator is enough
Image generationG5.2xlarge or G6.2xlargeOptimized for single-GPU workloads
Cost-efficient trainingTrn1.32xlargeUp to 50% cheaper than P4d

Cost Optimization Tips

  1. Use Spot instances for training — Spot pricing saves 60-90% for GPU instances. Implement checkpointing every 15-30 minutes so you can resume from interruptions without losing progress.

  2. Match instance to model size — Running a 7B parameter model on a p4d.24xlarge (8x A100) wastes 87% of your GPU capacity. A single g5.xlarge is often sufficient.

  3. Consider Inferentia2 for inference — Inf2.xlarge at $0.758/hr provides better throughput per dollar than G5 instances for supported models (Llama, GPT-NeoX, BERT, and more via the Neuron SDK).

  4. Use Reserved Instances for steady-state workloads — 3-year Reserved pricing saves up to 62% on P4d and P5 instances. Combine with Spot for burst training.

  5. Shut down idle instances — GPU instances are the most expensive EC2 types. An idle p4d.24xlarge running over a weekend costs $1,573. Implement auto-stop scripts or use SageMaker managed training with automatic termination.

  6. Use Trainium for compatible training jobs — Trn1 instances offer 30-50% cost savings over equivalent GPU instances for training workloads that are compatible with the Neuron SDK.

Gpu Instance Pricing Guide optimization checklist

Related Guides


FAQ

Which GPU instance is best for fine-tuning LLMs?

For models under 13B parameters, a single g5.2xlarge (1x A10G, 24 GB) is typically sufficient and costs $1.212/hr. For 13B-70B models, use p4d.24xlarge (8x A100, 320 GB) at $32.77/hr or trn1.32xlarge at $21.50/hr. For 70B+ models, P5 instances with H100 GPUs provide the best training speed.

How much can I save with Spot instances for ML training?

Spot instances typically save 60-70% for G5 instances and 50-70% for P4d/P5 instances. The key requirement is implementing checkpointing so training can resume after a Spot interruption. SageMaker Managed Spot Training handles this automatically.

Should I use Inferentia2 or NVIDIA GPUs for inference?

Use Inferentia2 (Inf2) when your model is supported by the AWS Neuron SDK and you want the lowest cost per inference. Inf2.xlarge at $0.758/hr is 25% cheaper than the comparable g5.xlarge at $1.006/hr. Use NVIDIA GPUs when you need broader model compatibility, CUDA-specific features, or when running models not yet supported by Neuron.

Gpu Instance Pricing Guide savings breakdown

Lower Your GPU Instance Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your GPU instance costs. Through group buying power, Wring negotiates better rates so you pay less per GPU hour.

Start saving on AWS →