AI and LLM Cost Calculator

Compare GPT-4.1, Claude Sonnet 4.6, Gemini 2.5, DeepSeek, Mistral, and self-hosted Llama. Find the right model for your budget and use case.

17 modelsAPI + self-hostedUpdated April 2026

AI and LLM Cost Calculator

Compare GPT-4.1, Claude Sonnet 4.6, Gemini 2.5, DeepSeek, Mistral, and self-hosted Llama. Enter your usage and see the monthly cost for every major model.

Quick start: pick a use case

Your Usage

1 token ≈ 4 characters · typical prompt: 100–2 000 tokens

short answer: 100–400 · paragraph: 400–800 · long: 800+

Monthly requests30,000
Monthly tokens19.5M
🏆

Cheapest: Llama 3.1 8B (Groq)

$1.33/month · $0.044 per 1K requests

API Models

Sorted by monthly cost · no infrastructure to manage · ~ = estimated price

ModelMonthlyPer 1K reqs
GroqBudget

Llama 3.1 8B

Lowest-cost API option. Ultra-fast inference via Groq.

$1.33$0.044
MistralBudget

Mistral Small 3

European provider. Excellent value for structured tasks.

$4.35$0.145
DeepSeekBalanced

DeepSeek V3

GPT-4 class quality at a fraction of the cost.

$4.41$0.147
GroqBalanced

Llama 3.3 70B

Open-source via Groq. Extremely fast, great value.

$13.91$0.464
OpenAIBudget

GPT-4.1 mini

1M token context. Practical replacement for GPT-4o mini.

$22.20$0.740
MistralBalanced

Mistral Medium 3

Strong GDPR compliance story. Good mid-tier option for EU teams.

$27.00$0.900
DeepSeekReasoning

DeepSeek R1

o1-level reasoning. Open weights, ~96% cheaper than o1.

$30.41$1.01
GoogleBudget

Gemini 2.5 Flash

1M token context. Fast and cost-effective at scale.

$32.25$1.07
OpenAIReasoning

o4-mini

Fast, affordable reasoning. Best value in the o-series.

$61.05$2.04
AnthropicBudget

Claude Haiku 4.5

Fastest Claude. Great for high-volume, latency-sensitive tasks.

$67.50$2.25
OpenAIBalanced

GPT-4.1

1M token context window. Best for long document tasks.

$111$3.70
OpenAIReasoning

o3

Full o3 reasoning, significantly repriced from launch.

$111$3.70
OpenAIPowerful

GPT-5

OpenAI's frontier model. Cheaper to prompt than GPT-4.1.

$129$4.31
GoogleBalanced

Gemini 2.5 Pro

Best-in-class for long context, up to 1M tokens.

$129$4.31
AnthropicBalanced

Claude Sonnet 4.6

Top choice for coding, analysis, and agentic tasks.

$203$6.75
AnthropicPowerful

Claude Opus 4.6

Anthropic's most capable model. 67% cheaper than Claude 3 Opus.

$338$11.25

Self-hosted (Open Source)

Fixed monthly GPU cost. Per-token cost drops as volume grows.

Self-hosted

Llama 3.1 8B

1x A10G (24 GB VRAM)

$1100

fixed / month

Effective $/1M tokens

$56.41

Monthly tokens

19.5M

GPU utilization

0.9%

Break-even vs APIs

824.0x current volume

GPU utilization at your volume0.9%

⚠ GPU would be mostly idle. Self-hosting only makes sense at much higher volume.

Cheapest self-hosted option. Good for classification, simple Q&A.

Self-hosted

Llama 3.3 70B

2x A100 80GB

$5200

fixed / month

Effective $/1M tokens

$267

Monthly tokens

19.5M

GPU utilization

3.7%

Break-even vs APIs

3895.1x current volume

GPU utilization at your volume3.7%

⚠ GPU would be mostly idle. Self-hosting only makes sense at much higher volume.

GPT-4 class quality. Becomes cost-effective at high volume.

Self-hosted

DeepSeek R1 (self-hosted)

8x H100 80GB

$32.0K

fixed / month

Effective $/1M tokens

$1641

Monthly tokens

19.5M

GPU utilization

24.7%

Break-even vs APIs

23970.0x current volume

GPU utilization at your volume24.7%

Full reasoning model on your own infra. Only viable at massive scale.

Self-hosted

Llama 3.1 405B

8x A100 80GB

$21.0K

fixed / month

Effective $/1M tokens

$1077

Monthly tokens

19.5M

GPU utilization

14.8%

Break-even vs APIs

15730.3x current volume

GPU utilization at your volume14.8%

Frontier-class open model. Only viable at very large scale.

Prices are published rates as of April 2026. Verify current rates on each provider's pricing page before committing spend. Self-hosted costs use AWS GPU instance pricing (on-demand, 24/7).

How it works

1

Describe your usage

Enter how many requests your app makes per day, and the average input and output token counts per request.

2

Pick a preset or enter custom values

Use presets for common use cases (chatbot, RAG, code assistant) or enter your own numbers.

3

Compare models side by side

See monthly cost, cost per 1K requests, and context window across every major model. Self-hosted GPU costs are included.

Models covered

OpenAI

GPT-4.1 mini, GPT-4.1, GPT-5, o4-mini, o3

Anthropic

Claude Haiku 4.5, Claude Sonnet 4.6, Claude Opus 4.6

Google

Gemini 2.5 Flash, Gemini 2.5 Pro

Mistral

Mistral Small 3, Mistral Medium 3

DeepSeek

DeepSeek V3, DeepSeek R1

Groq (open source)

Llama 3.1 8B, Llama 3.3 70B via Groq inference

Also estimate your cloud infrastructure costs

LLM costs are one part of your bill. Compare AWS, GCP, and Azure for the rest.

Cloud Cost Estimator →