Wring
All articlesAWS Guides

LLM Hosting vs API: AWS Cost Comparison

Compare Bedrock API, SageMaker endpoints, and self-hosted EC2 for LLM inference. Break-even analysis at 1M, 10M, and 100M tokens/month.

Wring Team
March 15, 2026
9 min read
LLM hosting costsself-hosted LLMBedrock vs self-hostedinference costs
Cloud infrastructure comparison for LLM hosting and API services
Cloud infrastructure comparison for LLM hosting and API services

There are three ways to run LLM inference on AWS: call Bedrock's API and pay per token, deploy a model on a SageMaker endpoint and pay per hour, or run the model on bare EC2 instances with full control. Each approach has a different cost structure, and the cheapest option depends almost entirely on your monthly token volume. Bedrock wins below 10M tokens/month, SageMaker wins at 10-50M tokens/month for managed ease, and self-hosted EC2 wins above 50M tokens/month when you have the engineering team to support it.

TL;DR: For a 7B model: Bedrock API costs $0.22-0.44 per 1M tokens (cheapest under 10M tokens/month). A SageMaker endpoint on ml.g5.xlarge costs $1,015/month (break-even at ~10M tokens). Self-hosted EC2 g5.xlarge costs $724/month with on-demand or $290/month with Reserved (cheapest above 50M tokens/month).


Three Approaches Compared

FeatureBedrock APISageMaker EndpointSelf-Hosted EC2
Pricing modelPer tokenPer hourPer hour
Minimum cost$0 (pay-per-use)~$1,015/month (24/7 endpoint)~$724/month (on-demand 24/7)
ScalingAutomaticAuto-scaling policiesManual or custom
Model selectionBedrock catalog onlyAny modelAny model
Infrastructure managementNoneMinimalFull responsibility
Latency controlLimitedFullFull
GPU utilizationShared (AWS-managed)DedicatedDedicated
Llm Hosting Vs Api Costs comparison chart

Bedrock API Pricing (Pay-Per-Token)

Bedrock charges per input and output token with no minimum commitment.

Comparable Model Pricing on Bedrock

ModelInput/1K TokensOutput/1K TokensEffective Cost per 1M Mixed Tokens
Llama 3.1 8B$0.00022$0.00022$0.22
Llama 3.1 70B$0.00099$0.00099$0.99
Mistral Small$0.001$0.003$2.00
Claude Haiku$0.0008$0.004$2.40
Claude Sonnet$0.003$0.015$9.00

Mixed tokens assumes a 50/50 input/output ratio.

Bedrock Provisioned Throughput

For sustained high-volume workloads, Bedrock Provisioned Throughput offers dedicated capacity:

CommitmentDiscount vs On-Demand
No commitment~15% off
1-month~20% off
6-month~35% off

Provisioned Throughput is priced per model unit, not per token. It makes sense when your token volume is high enough that the per-unit cost beats pay-per-token pricing.


SageMaker Endpoint Pricing (Per-Hour)

SageMaker endpoints bill for the ML instance running your model, regardless of how many (or few) queries it serves.

Common SageMaker Inference Instances

InstanceGPUGPU MemorySageMaker Price/hrMonthly (24/7)
ml.g5.xlarge1x A10G24 GB$1.408$1,028
ml.g5.2xlarge1x A10G24 GB$1.694$1,237
ml.g5.12xlarge4x A10G96 GB$7.941$5,797
ml.inf2.xlarge1x Inferentia232 GB$1.109$810
ml.p4d.24xlarge8x A100320 GB$37.688$27,512

SageMaker Cost Advantages

  • Auto-scaling: Scale to zero during off-hours (Serverless Inference or auto-scaling policies)
  • Multi-model endpoints: Share one instance across multiple models
  • Managed infrastructure: No patching, no CUDA driver management

Self-Hosted EC2 Pricing (Full Control)

Running your own inference server on EC2 gives you the lowest per-hour cost but the highest operational overhead.

EC2 Inference Instance Pricing

InstanceGPUOn-Demand/hr1-Year Reserved/hr3-Year Reserved/hrMonthly (On-Demand)
g5.xlarge1x A10G$1.006$0.636$0.399$724
g5.2xlarge1x A10G$1.212$0.768$0.482$873
g5.12xlarge4x A10G$5.672$3.590$2.253$4,084
inf2.xlarge1x Inferentia2$0.758$0.479$0.301$546
p4d.24xlarge8x A100$32.77$20.37$12.58$23,594

Self-Hosted Cost Advantages

  • No SageMaker surcharge: Save 15-40% vs SageMaker pricing
  • Reserved Instances: 37-62% savings with commitments
  • Spot Instances: 60-70% savings for fault-tolerant inference
  • Full CUDA control: Custom kernels, vLLM, TensorRT-LLM
Llm Hosting Vs Api Costs process flow diagram

Break-Even Analysis

7B Model (Llama 3.1 8B equivalent)

Monthly TokensBedrock API CostSageMaker (ml.g5.xlarge)EC2 On-DemandEC2 Reserved (1-yr)
1M$0.22$1,028$724$464
5M$1.10$1,028$724$464
10M$2.20$1,028$724$464
50M$11.00$1,028$724$464
100M$22.00$1,028$724$464
500M$110.00$1,028$724$464
5B$1,100.00$1,028$724$464
10B$2,200.00$1,028$724$464

Break-even points for 7B model:

  • Bedrock vs SageMaker: ~4.7B tokens/month
  • Bedrock vs EC2 On-Demand: ~3.3B tokens/month
  • Bedrock vs EC2 Reserved: ~2.1B tokens/month

For small models like Llama 8B, Bedrock's per-token pricing is extremely competitive. You would need billions of tokens per month before self-hosting becomes cheaper.

70B Model (Llama 3.1 70B equivalent)

Monthly TokensBedrock API CostSageMaker (ml.g5.48xlarge)EC2 On-DemandEC2 Reserved (1-yr)
1M$0.99$22,770$11,730$7,416
10M$9.90$22,770$11,730$7,416
100M$99.00$22,770$11,730$7,416
1B$990.00$22,770$11,730$7,416
10B$9,900.00$22,770$11,730$7,416
50B$49,500.00$22,770$11,730$7,416

Break-even points for 70B model:

  • Bedrock vs SageMaker: ~23B tokens/month
  • Bedrock vs EC2 On-Demand: ~11.8B tokens/month
  • Bedrock vs EC2 Reserved: ~7.5B tokens/month

Operational Overhead Comparison

Cost is not just compute — engineering time matters too.

TaskBedrockSageMakerSelf-Hosted EC2
Initial setupMinutesHoursDays
Model updatesAutomaticContainer rebuildFull redeployment
ScalingAutomaticAuto-scaling configCustom solution
MonitoringCloudWatch built-inCloudWatch + SageMaker metricsCustom dashboards
GPU driver managementNoneNoneManual
Security patchingNoneMinimalFull responsibility
Estimated DevOps hours/month0-22-810-40

At $150/hr for ML engineering time, 20 hours/month of additional self-hosted operations adds $3,000/month to your effective cost.


When Each Option Wins

Choose Bedrock API When:

  • Token volume is under 5B/month for small models or under 10B/month for large models
  • You need access to multiple model families (Claude, Llama, Mistral) through one API
  • You want zero infrastructure management
  • You need Guardrails, Knowledge Bases, or Agents built in

Choose SageMaker Endpoints When:

  • You need to deploy custom or fine-tuned models not available on Bedrock
  • You want managed infrastructure with more control than Bedrock
  • You need multi-model endpoints to serve several models from one instance
  • Token volume is moderate and predictable

Choose Self-Hosted EC2 When:

  • Token volume exceeds 10B/month consistently
  • You have an ML platform team to manage infrastructure
  • You need custom inference optimizations (vLLM, TensorRT-LLM, custom kernels)
  • You can commit to 1-3 year Reserved Instances for the deepest discounts

Cost Optimization Tips

  1. Start with Bedrock, graduate to self-hosted — Begin with Bedrock API for rapid iteration and to establish your actual token volume. Migrate to SageMaker or EC2 only when you have predictable high volume that justifies the fixed cost.

  2. Use SageMaker Serverless Inference for bursty workloads — Serverless endpoints scale to zero when idle, eliminating the 24/7 cost. You pay only for the compute time during active inference.

  3. Deploy with vLLM or TensorRT-LLM on self-hosted instances — These inference engines provide 2-4x throughput improvement over naive model serving, effectively cutting your per-token cost by 50-75%.

  4. Combine Bedrock with self-hosted — Route high-volume, cost-sensitive workloads to self-hosted instances while using Bedrock for low-volume, multi-model, or experimental workloads.

  5. Use Inf2 instances for supported models — Whether on SageMaker or EC2, Inferentia2 instances offer 25-40% lower cost per inference than equivalent GPU instances.

  6. Implement request batching — Batch multiple inference requests together to maximize GPU utilization. This is especially impactful for self-hosted deployments where you are paying per hour regardless of utilization.

Llm Hosting Vs Api Costs optimization checklist

Related Guides


FAQ

At what token volume should I switch from Bedrock to self-hosting?

For 7B-class models (Llama 8B), Bedrock's per-token pricing is so low ($0.22/1M tokens) that self-hosting rarely makes sense unless you are processing billions of tokens monthly. For 70B-class models, the break-even point is around 10-25B tokens/month depending on whether you use Reserved Instances.

Can I use Bedrock Provisioned Throughput as a middle ground?

Yes. Bedrock Provisioned Throughput gives you dedicated model capacity billed per model unit per hour — similar to the SageMaker pricing model but without managing infrastructure. It is 20-35% cheaper than on-demand Bedrock for sustained workloads and eliminates the throttling risk of on-demand.

How do I calculate total cost of ownership for self-hosted LLMs?

Add compute costs (instance hours), storage costs (EBS for model weights, S3 for logs), networking costs (load balancer, data transfer), and engineering costs (DevOps time for monitoring, scaling, patching, and model updates). Most teams underestimate the engineering component, which can add $3,000-6,000/month for a dedicated ML platform engineer.

Llm Hosting Vs Api Costs pricing formula

Lower Your LLM Hosting Costs with Wring

Wring helps you access AWS credits and volume discounts to lower your LLM inference costs. Through group buying power, Wring negotiates better rates so you pay less per token across Bedrock, SageMaker, and EC2.

Start saving on AWS →