AI and LLM Cost Calculator

Compare GPT-4.1, Claude Sonnet 4.6, Gemini 2.5, DeepSeek, Mistral, and self-hosted Llama. Find the right model for your budget and use case.

17 modelsAPI + self-hostedUpdated April 2026

AI and LLM Cost Calculator

Compare GPT-4.1, Claude Sonnet 4.6, Gemini 2.5, DeepSeek, Mistral, and self-hosted Llama. Enter your usage and see the monthly cost for every major model.

Quick start: pick a use case

🏆

Cheapest: Llama 3.1 8B (Groq)

$1.33/month · $0.044 per 1K requests

API Models

Sorted by monthly cost · no infrastructure to manage · ~ = estimated price

Model	Monthly	Per 1K reqs	Context	Input / Output per 1M
GroqBudget Llama 3.1 8B Lowest-cost API option. Ultra-fast inference via Groq.	$1.33	$0.044	128K	$0.05 / $0.08
MistralBudget Mistral Small 3 European provider. Excellent value for structured tasks.	$4.35	$0.145	32K	$0.1 / $0.3
DeepSeekBalanced DeepSeek V3 GPT-4 class quality at a fraction of the cost.	$4.41	$0.147	64K	$0.14 / $0.28
GroqBalanced Llama 3.3 70B Open-source via Groq. Extremely fast, great value.	$13.91	$0.464	128K	$0.59 / $0.79
OpenAIBudget GPT-4.1 mini 1M token context. Practical replacement for GPT-4o mini.	$22.20	$0.740	1M	$0.4 / $1.6
MistralBalanced Mistral Medium 3 Strong GDPR compliance story. Good mid-tier option for EU teams.	$27.00	$0.900	128K	$0.4 / $2
DeepSeekReasoning DeepSeek R1 o1-level reasoning. Open weights, ~96% cheaper than o1.	$30.41	$1.01	64K	$0.55 / $2.19
GoogleBudget Gemini 2.5 Flash 1M token context. Fast and cost-effective at scale.	$32.25	$1.07	1M	$0.3 / $2.5
OpenAIReasoning o4-mini Fast, affordable reasoning. Best value in the o-series.	$61.05	$2.04	200K	$1.1 / $4.4
AnthropicBudget Claude Haiku 4.5 Fastest Claude. Great for high-volume, latency-sensitive tasks.	$67.50	$2.25	200K	$1 / $5
OpenAIBalanced GPT-4.1 1M token context window. Best for long document tasks.	$111	$3.70	1M	$2 / $8
OpenAIReasoning o3 Full o3 reasoning, significantly repriced from launch.	$111	$3.70	200K	$2 / $8
OpenAIPowerful GPT-5 OpenAI's frontier model. Cheaper to prompt than GPT-4.1.	$129	$4.31	1M	$1.25 / $10
GoogleBalanced Gemini 2.5 Pro Best-in-class for long context, up to 1M tokens.	$129	$4.31	1M	$1.25 / $10
AnthropicBalanced Claude Sonnet 4.6 Top choice for coding, analysis, and agentic tasks.	$203	$6.75	200K	$3 / $15
AnthropicPowerful Claude Opus 4.6 Anthropic's most capable model. 67% cheaper than Claude 3 Opus.	$338	$11.25	200K	$5 / $25

Self-hosted (Open Source)

Fixed monthly GPU cost. Per-token cost drops as volume grows.

Self-hosted

Llama 3.1 8B

1x A10G (24 GB VRAM)

$1100

fixed / month

Effective $/1M tokens

$56.41

Monthly tokens

19.5M

GPU utilization

0.9%

Break-even vs APIs

824.0x current volume

GPU utilization at your volume0.9%

⚠ GPU would be mostly idle. Self-hosting only makes sense at much higher volume.

Cheapest self-hosted option. Good for classification, simple Q&A.

Self-hosted

Llama 3.3 70B

2x A100 80GB

$5200

fixed / month

Effective $/1M tokens

$267

Monthly tokens

19.5M

GPU utilization

3.7%

Break-even vs APIs

3895.1x current volume

GPU utilization at your volume3.7%

⚠ GPU would be mostly idle. Self-hosting only makes sense at much higher volume.

GPT-4 class quality. Becomes cost-effective at high volume.

Self-hosted

DeepSeek R1 (self-hosted)

8x H100 80GB

$32.0K

fixed / month

Effective $/1M tokens

$1641

Monthly tokens

19.5M

GPU utilization

24.7%

Break-even vs APIs

23970.0x current volume

GPU utilization at your volume24.7%

Full reasoning model on your own infra. Only viable at massive scale.

Self-hosted

Llama 3.1 405B

8x A100 80GB

$21.0K

fixed / month

Effective $/1M tokens

$1077

Monthly tokens

19.5M

GPU utilization

14.8%

Break-even vs APIs

15730.3x current volume

GPU utilization at your volume14.8%

Frontier-class open model. Only viable at very large scale.

Prices are published rates as of April 2026. Verify current rates on each provider's pricing page before committing spend. Self-hosted costs use AWS GPU instance pricing (on-demand, 24/7).

How it works

Describe your usage

Enter how many requests your app makes per day, and the average input and output token counts per request.

Pick a preset or enter custom values

Use presets for common use cases (chatbot, RAG, code assistant) or enter your own numbers.

Compare models side by side

See monthly cost, cost per 1K requests, and context window across every major model. Self-hosted GPU costs are included.

Models covered

OpenAI

GPT-4.1 mini, GPT-4.1, GPT-5, o4-mini, o3

Anthropic

Claude Haiku 4.5, Claude Sonnet 4.6, Claude Opus 4.6

Google

Gemini 2.5 Flash, Gemini 2.5 Pro

Mistral

Mistral Small 3, Mistral Medium 3

DeepSeek

DeepSeek V3, DeepSeek R1

Groq (open source)

Llama 3.1 8B, Llama 3.3 70B via Groq inference

Also estimate your cloud infrastructure costs

LLM costs are one part of your bill. Compare AWS, GCP, and Azure for the rest.

Cloud Cost Estimator →

AI and LLM Cost Calculator

AI and LLM Cost Calculator

Your Usage

API Models

Self-hosted (Open Source)

How it works

Models covered