LLM Hosting vs API: AWS Cost Comparison

Cloud infrastructure comparison for LLM hosting and API services

There are three ways to run LLM inference on AWS: call Bedrock's API and pay per token, deploy a model on a SageMaker endpoint and pay per hour, or run the model on bare EC2 instances with full control. Each approach has a different cost structure, and the cheapest option depends almost entirely on your monthly token volume. Bedrock wins below 10M tokens/month, SageMaker wins at 10-50M tokens/month for managed ease, and self-hosted EC2 wins above 50M tokens/month when you have the engineering team to support it.

TL;DR: For a 7B model: Bedrock API costs $0.22-0.44 per 1M tokens (cheapest under 10M tokens/month). A SageMaker endpoint on ml.g5.xlarge costs $1,015/month (break-even at ~10M tokens). Self-hosted EC2 g5.xlarge costs $724/month with on-demand or $290/month with Reserved (cheapest above 50M tokens/month).

Three Approaches Compared

Feature	Bedrock API	SageMaker Endpoint	Self-Hosted EC2
Pricing model	Per token	Per hour	Per hour
Minimum cost	$0 (pay-per-use)	~$1,015/month (24/7 endpoint)	~$724/month (on-demand 24/7)
Scaling	Automatic	Auto-scaling policies	Manual or custom
Model selection	Bedrock catalog only	Any model	Any model
Infrastructure management	None	Minimal	Full responsibility
Latency control	Limited	Full	Full
GPU utilization	Shared (AWS-managed)	Dedicated	Dedicated

Llm Hosting Vs Api Costs comparison chart

Bedrock API Pricing (Pay-Per-Token)

Bedrock charges per input and output token with no minimum commitment.

Comparable Model Pricing on Bedrock

Model	Input/1K Tokens	Output/1K Tokens	Effective Cost per 1M Mixed Tokens
Llama 3.1 8B	$0.00022	$0.00022	$0.22
Llama 3.1 70B	$0.00099	$0.00099	$0.99
Mistral Small	$0.001	$0.003	$2.00
Claude Haiku	$0.0008	$0.004	$2.40
Claude Sonnet	$0.003	$0.015	$9.00

Mixed tokens assumes a 50/50 input/output ratio.

Bedrock Provisioned Throughput

For sustained high-volume workloads, Bedrock Provisioned Throughput offers dedicated capacity:

Commitment	Discount vs On-Demand
No commitment	~15% off
1-month	~20% off
6-month	~35% off

Provisioned Throughput is priced per model unit, not per token. It makes sense when your token volume is high enough that the per-unit cost beats pay-per-token pricing.

SageMaker Endpoint Pricing (Per-Hour)

SageMaker endpoints bill for the ML instance running your model, regardless of how many (or few) queries it serves.

Common SageMaker Inference Instances

Instance	GPU	GPU Memory	SageMaker Price/hr	Monthly (24/7)
ml.g5.xlarge	1x A10G	24 GB	$1.408	$1,028
ml.g5.2xlarge	1x A10G	24 GB	$1.694	$1,237
ml.g5.12xlarge	4x A10G	96 GB	$7.941	$5,797
ml.inf2.xlarge	1x Inferentia2	32 GB	$1.109	$810
ml.p4d.24xlarge	8x A100	320 GB	$37.688	$27,512

SageMaker Cost Advantages

Auto-scaling: Scale to zero during off-hours (Serverless Inference or auto-scaling policies)
Multi-model endpoints: Share one instance across multiple models
Managed infrastructure: No patching, no CUDA driver management

Self-Hosted EC2 Pricing (Full Control)

Running your own inference server on EC2 gives you the lowest per-hour cost but the highest operational overhead.

EC2 Inference Instance Pricing

Instance	GPU	On-Demand/hr	1-Year Reserved/hr	3-Year Reserved/hr	Monthly (On-Demand)
g5.xlarge	1x A10G	$1.006	$0.636	$0.399	$724
g5.2xlarge	1x A10G	$1.212	$0.768	$0.482	$873
g5.12xlarge	4x A10G	$5.672	$3.590	$2.253	$4,084
inf2.xlarge	1x Inferentia2	$0.758	$0.479	$0.301	$546
p4d.24xlarge	8x A100	$32.77	$20.37	$12.58	$23,594

Self-Hosted Cost Advantages

No SageMaker surcharge: Save 15-40% vs SageMaker pricing
Reserved Instances: 37-62% savings with commitments
Spot Instances: 60-70% savings for fault-tolerant inference
Full CUDA control: Custom kernels, vLLM, TensorRT-LLM

Llm Hosting Vs Api Costs process flow diagram

Break-Even Analysis

7B Model (Llama 3.1 8B equivalent)

Monthly Tokens	Bedrock API Cost	SageMaker (ml.g5.xlarge)	EC2 On-Demand	EC2 Reserved (1-yr)
1M	$0.22	$1,028	$724	$464
5M	$1.10	$1,028	$724	$464
10M	$2.20	$1,028	$724	$464
50M	$11.00	$1,028	$724	$464
100M	$22.00	$1,028	$724	$464
500M	$110.00	$1,028	$724	$464
5B	$1,100.00	$1,028	$724	$464
10B	$2,200.00	$1,028	$724	$464

Break-even points for 7B model:

Bedrock vs SageMaker: ~4.7B tokens/month
Bedrock vs EC2 On-Demand: ~3.3B tokens/month
Bedrock vs EC2 Reserved: ~2.1B tokens/month

For small models like Llama 8B, Bedrock's per-token pricing is extremely competitive. You would need billions of tokens per month before self-hosting becomes cheaper.

70B Model (Llama 3.1 70B equivalent)

Monthly Tokens	Bedrock API Cost	SageMaker (ml.g5.48xlarge)	EC2 On-Demand	EC2 Reserved (1-yr)
1M	$0.99	$22,770	$11,730	$7,416
10M	$9.90	$22,770	$11,730	$7,416
100M	$99.00	$22,770	$11,730	$7,416
1B	$990.00	$22,770	$11,730	$7,416
10B	$9,900.00	$22,770	$11,730	$7,416
50B	$49,500.00	$22,770	$11,730	$7,416

Break-even points for 70B model:

Bedrock vs SageMaker: ~23B tokens/month
Bedrock vs EC2 On-Demand: ~11.8B tokens/month
Bedrock vs EC2 Reserved: ~7.5B tokens/month

Operational Overhead Comparison

Cost is not just compute — engineering time matters too.

Task	Bedrock	SageMaker	Self-Hosted EC2
Initial setup	Minutes	Hours	Days
Model updates	Automatic	Container rebuild	Full redeployment
Scaling	Automatic	Auto-scaling config	Custom solution
Monitoring	CloudWatch built-in	CloudWatch + SageMaker metrics	Custom dashboards
GPU driver management	None	None	Manual
Security patching	None	Minimal	Full responsibility
Estimated DevOps hours/month	0-2	2-8	10-40

At $150/hr for ML engineering time, 20 hours/month of additional self-hosted operations adds $3,000/month to your effective cost.

When Each Option Wins

Choose Bedrock API When:

Token volume is under 5B/month for small models or under 10B/month for large models
You need access to multiple model families (Claude, Llama, Mistral) through one API
You want zero infrastructure management
You need Guardrails, Knowledge Bases, or Agents built in

Choose SageMaker Endpoints When:

You need to deploy custom or fine-tuned models not available on Bedrock
You want managed infrastructure with more control than Bedrock
You need multi-model endpoints to serve several models from one instance
Token volume is moderate and predictable

Choose Self-Hosted EC2 When:

Token volume exceeds 10B/month consistently
You have an ML platform team to manage infrastructure
You need custom inference optimizations (vLLM, TensorRT-LLM, custom kernels)
You can commit to 1-3 year Reserved Instances for the deepest discounts

Cost Optimization Tips

Start with Bedrock, graduate to self-hosted — Begin with Bedrock API for rapid iteration and to establish your actual token volume. Migrate to SageMaker or EC2 only when you have predictable high volume that justifies the fixed cost.
Use SageMaker Serverless Inference for bursty workloads — Serverless endpoints scale to zero when idle, eliminating the 24/7 cost. You pay only for the compute time during active inference.
Deploy with vLLM or TensorRT-LLM on self-hosted instances — These inference engines provide 2-4x throughput improvement over naive model serving, effectively cutting your per-token cost by 50-75%.
Combine Bedrock with self-hosted — Route high-volume, cost-sensitive workloads to self-hosted instances while using Bedrock for low-volume, multi-model, or experimental workloads.
Use Inf2 instances for supported models — Whether on SageMaker or EC2, Inferentia2 instances offer 25-40% lower cost per inference than equivalent GPU instances.
Implement request batching — Batch multiple inference requests together to maximize GPU utilization. This is especially impactful for self-hosted deployments where you are paying per hour regardless of utilization.

Llm Hosting Vs Api Costs optimization checklist

Related Guides

FAQ

At what token volume should I switch from Bedrock to self-hosting?

For 7B-class models (Llama 8B), Bedrock's per-token pricing is so low ($0.22/1M tokens) that self-hosting rarely makes sense unless you are processing billions of tokens monthly. For 70B-class models, the break-even point is around 10-25B tokens/month depending on whether you use Reserved Instances.

Can I use Bedrock Provisioned Throughput as a middle ground?

Yes. Bedrock Provisioned Throughput gives you dedicated model capacity billed per model unit per hour — similar to the SageMaker pricing model but without managing infrastructure. It is 20-35% cheaper than on-demand Bedrock for sustained workloads and eliminates the throttling risk of on-demand.

How do I calculate total cost of ownership for self-hosted LLMs?

Add compute costs (instance hours), storage costs (EBS for model weights, S3 for logs), networking costs (load balancer, data transfer), and engineering costs (DevOps time for monitoring, scaling, patching, and model updates). Most teams underestimate the engineering component, which can add $3,000-6,000/month for a dedicated ML platform engineer.

Llm Hosting Vs Api Costs pricing formula

Lower Your LLM Hosting Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your LLM inference costs. Through group buying power, Wring negotiates better rates so you pay less per token across Bedrock, SageMaker, and EC2.

Start saving on AWS →