LLM API pricing in 2026 spans a 250x range: from Gemini 3 Flash at $0.075/$0.30 per million input/output tokens to Claude Opus 4 at $15/$75, with mid-range options like GPT-4o ($2.50/$10), GPT-5.4 ($2.50/$15), Claude Sonnet 4 ($3/$15), and DeepSeek V4 Flash ($0.14/$0.28) offering strong price-performance for most workloads.
This is the most comprehensive LLM API pricing reference for 2026. We cover every major model from OpenAI, Anthropic, Google, DeepSeek, Meta, and Mistral — with exact pricing, best use cases, and practical guidance on which model to use for which task. Bookmark this page; we update it as pricing changes.
For strategies on reducing these costs, see our guide on how to reduce LLM API costs. For routing between models automatically, learn about what an LLM router does.
Complete LLM API Pricing Table — March 2026
| Provider | Model | Input (/1M tokens) | Output (/1M tokens) | Context Window | Best Use Case | |----------|-------|--------------------|--------------------|----------------|---------------| | Anthropic | Claude Opus 4 | $15.00 | $75.00 | 200K | Complex reasoning, research, analysis | | Anthropic | Claude Sonnet 4 | $3.00 | $15.00 | 200K | General-purpose, coding, writing | | Anthropic | Claude Haiku 3.5 | $0.25 | $1.25 | 200K | Classification, extraction, simple Q&A | | OpenAI | GPT-5.5 | $5.00 | $30.00 | 256K | OpenAI flagship (April 2026) | | OpenAI | GPT-5.4 | $2.50 | $15.00 | 256K | OpenAI workhorse, multimodal | | OpenAI | GPT-4o | $2.50 | $10.00 | 128K | General-purpose, vision, coding | | OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128K | Lightweight tasks, high-volume | | Google | Gemini 3 Pro | $1.25 | $5.00 | 1M | Long-context, multimodal | | Google | Gemini 3 Flash | $0.075 | $0.30 | 1M | High-speed, cost-sensitive workloads | | DeepSeek | DeepSeek V4 Pro | $1.74 | $3.48 | 128K | Premium coding, 1.6T MoE, 81% SWE-Bench Verified | | DeepSeek | DeepSeek V4 Flash | $0.14 | $0.28 | 128K | Coding, math, general tasks | | DeepSeek | DeepSeek V4 Flash (Thinking) | $0.14 | $0.28 | 128K | Reasoning, chain-of-thought, tool-use | | Moonshot | Kimi K2.6 | $0.60 | $4.00 | 256K | Long-context, 58.6% SWE-Bench Pro | | Z.ai | GLM-5.1 | $1.40 | $4.40 | 128K | Coding, 58.4% SWE-Bench Pro | | Meta | Llama 3.3 70B | $0.18 | $0.40 | 128K | Open-source, privacy-sensitive | | Mistral | Mistral Large | $2.00 | $6.00 | 128K | European compliance, multilingual | | Mistral | Mistral Small 3 | $0.10 | $0.30 | 128K | Fast inference, edge deployment |
Prices as of March 2026. All prices are per 1 million tokens. Actual costs may vary by provider plan and volume discounts.
Pricing by Provider: Detailed Breakdown
Anthropic (Claude) Pricing 2026
Anthropic offers three tiers targeting different price-performance needs:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | Claude Opus 4 | $15.00 | $75.00 | 1:5 | Best reasoning, complex analysis, research-grade outputs | | Claude Sonnet 4 | $3.00 | $15.00 | 1:5 | Strong all-rounder, excellent for coding and writing | | Claude Haiku 3.5 | $0.25 | $1.25 | 1:5 | Fast, cheap, great for classification and extraction |
Key insight: Anthropic maintains a consistent 1:5 input-to-output price ratio across all models. Opus is 60x more expensive than Haiku for output tokens.
When to use each:
- Opus 4: Multi-step reasoning, academic research, complex code architecture, tasks where accuracy is critical and cost is secondary
- Sonnet 4: Day-to-day coding, content writing, customer support, most production workloads
- Haiku 3.5: Data extraction, text classification, simple Q&A, high-volume processing pipelines
Cost example — 1M tokens processed (500K in, 500K out):
- Opus 4: $7.50 + $37.50 = $45.00
- Sonnet 4: $1.50 + $7.50 = $9.00
- Haiku 3.5: $0.125 + $0.625 = $0.75
OpenAI (GPT) Pricing 2026
OpenAI's April 2026 lineup introduces the new GPT-5.5 flagship and GPT-5.4 workhorse alongside the proven GPT-4o family:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | GPT-5.5 | $5.00 | $30.00 | 1:6 | New OpenAI flagship (April 2026), agentic workflows | | GPT-5.4 | $2.50 | $15.00 | 1:6 | Workhorse: multimodal, coding, general reasoning | | GPT-4o | $2.50 | $10.00 | 1:4 | Reliable general-purpose, strong vision capabilities | | GPT-4o-mini | $0.15 | $0.60 | 1:4 | Budget workhorse, excellent quality per dollar |
Key insight: GPT-5.5 is OpenAI's new flagship at $5/$30, positioned at about 40% of Opus output pricing. GPT-5.4 matches Claude Sonnet 4 on output ($15/M) with cheaper input, making it the default choice for teams migrating off GPT-4o to a newer-generation model without paying flagship prices.
When to use each:
- GPT-5.5: Complex multi-step reasoning, agentic workflows where frontier capability matters — at a lower price point than Opus
- GPT-5.4: General coding, writing, analysis — the default newer-generation model when you need quality but not flagship pricing
- GPT-4o: Proven workhorse for vision/multimodal tasks
- GPT-4o-mini: High-volume tasks, chatbots, autocomplete, any task where "good enough" quality at minimal cost is the goal
Cost example — 1M tokens processed (500K in, 500K out):
- GPT-5.5: $2.50 + $15.00 = $17.50
- GPT-5.4: $1.25 + $7.50 = $8.75
- GPT-4o: $1.25 + $5.00 = $6.25
- GPT-4o-mini: $0.075 + $0.30 = $0.375
Google (Gemini) Pricing 2026
Google's Gemini 3 lineup offers the widest price range and the largest context windows:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | Gemini 3 Pro | $1.25 | $5.00 | 1:4 | 1M token context, excellent multimodal | | Gemini 3 Flash | $0.075 | $0.30 | 1:4 | Ultra-cheap, fast inference, 1M context |
Key insight: Gemini 3 Flash at $0.075/$0.30 is the cheapest model from a major provider. Combined with a 1M token context window, it's uniquely positioned for long-document processing at minimal cost.
When to use each:
- Gemini 3 Pro: Long-document analysis, multimodal tasks combining text/image/video, tasks requiring large context windows without breaking the bank
- Gemini 3 Flash: Bulk processing, classification, simple extraction, any high-volume task where speed and cost matter more than maximum quality
Cost example — 1M tokens processed (500K in, 500K out):
- Gemini 3 Pro: $0.625 + $2.50 = $3.125
- Gemini 3 Flash: $0.0375 + $0.15 = $0.1875
DeepSeek Pricing 2026
DeepSeek's April 2026 V4 lineup halves prices again and adds a premium tier:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | DeepSeek V4 Pro | $1.74 | $3.48 | 1:2 | 1.6T MoE, 81% SWE-Bench Verified — premium coding | | DeepSeek V4 Flash | $0.14 | $0.28 | 1:2 | Strong coding, math, half the price of V3 | | DeepSeek V4 Flash (Thinking) | $0.14 | $0.28 | 1:2 | Chain-of-thought reasoning + tool-use (new in V4) |
Key insight: DeepSeek V4 Flash at $0.14/$0.28 delivers coding quality approaching Claude Sonnet 4 at one-fiftieth the price. V4 Pro ($1.74/$3.48) is the first premium DeepSeek tier — 1.6T parameter MoE that scores 81% on SWE-Bench Verified, rivaling frontier coding models.
When to use each:
- DeepSeek V4 Pro: Complex coding that used to demand Sonnet or Opus — premium quality at ~4% of Opus price
- DeepSeek V4 Flash: Code generation, debugging, mathematical tasks, general-purpose work where you want Sonnet-level quality at budget pricing
- DeepSeek V4 Flash (Thinking): Reasoning-heavy coding, chain-of-thought tasks, tool-using agents
Cost example — 1M tokens processed (500K in, 500K out):
- DeepSeek V4 Pro: $0.87 + $1.74 = $2.61
- DeepSeek V4 Flash: $0.07 + $0.14 = $0.21
- DeepSeek V4 Flash (Thinking): $0.07 + $0.14 = $0.21
Moonshot (Kimi) Pricing 2026
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | Kimi K2.6 | $0.60 | $4.00 | 1:6.7 | 256K context, 58.6% SWE-Bench Pro (matches GPT-5.5) |
Kimi K2.6 (released 2026-04-20) is Moonshot's strongest coding model yet. Supports OpenAI-style prompt caching (50% off cached input tokens).
Z.ai (formerly Zhipu) Pricing 2026
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | GLM-5.1 | $1.40 | $4.40 | 1:3.1 | Coding agent backend, 58.4% SWE-Bench Pro |
GLM-5.1 (released 2026-03-27) is Z.ai's new flagship after the Zhipu rebrand, with coding performance in the same tier as Kimi K2.6.
Meta (Llama) Pricing 2026
Llama models are open-source but typically accessed through hosting providers:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | Llama 3.3 70B | $0.18 | $0.40 | 1:2.2 | Open-source, data privacy, customizable |
Key insight: Llama's pricing varies by hosting provider. The numbers above are typical for providers like Together AI and Fireworks. Self-hosting eliminates per-token costs but adds infrastructure costs.
When to use:
- Data sovereignty requirements where data can't leave your infrastructure
- Custom fine-tuning workflows
- Budget-conscious workloads where open-source licensing matters
Mistral Pricing 2026
Mistral offers models optimized for European markets and edge deployment:
| Model | Input | Output | Ratio | Strengths | |-------|-------|--------|-------|-----------| | Mistral Large | $2.00 | $6.00 | 1:3 | European compliance, strong multilingual | | Mistral Small 3 | $0.10 | $0.30 | 1:3 | Edge-ready, extremely fast inference |
Key insight: Mistral Small 3 at $0.10/$0.30 competes directly with Gemini 3 Flash for the "cheapest capable model" title, with particularly strong performance in European languages.
When to use each:
- Mistral Large: European data residency requirements, French/German/Spanish language tasks, tasks requiring GDPR-compliant infrastructure
- Mistral Small 3: Edge deployment, extremely latency-sensitive applications, budget high-volume processing
Price-Performance Rankings
Best Value Models (Quality per Dollar)
| Rank | Model | Output Cost | Quality Level | Value Score | |------|-------|------------|---------------|-------------| | 1 | DeepSeek V4 Flash | $0.28 | High | ★★★★★ | | 2 | DeepSeek V4 Pro | $3.48 | Very High | ★★★★★ | | 3 | GPT-4o-mini | $0.60 | Good | ★★★★★ | | 4 | Gemini 3 Flash | $0.30 | Good | ★★★★☆ | | 5 | Llama 3.3 70B | $0.40 | Good | ★★★★☆ |
Best for Specific Tasks
| Task | Best Budget Model | Best Quality Model | |------|------------------|-------------------| | Code generation | DeepSeek V4 Flash ($0.28) | Claude Opus 4 ($75) / DeepSeek V4 Pro ($3.48) | | Content writing | GPT-4o-mini ($0.60) | Claude Sonnet 4 ($15) | | Classification | Gemini 3 Flash ($0.30) | Claude Haiku 3.5 ($1.25) | | Data extraction | Gemini 3 Flash ($0.30) | GPT-4o ($10) | | Complex reasoning | DeepSeek V4 Flash (Thinking) ($0.28) | Claude Opus 4 ($75) | | Long documents | Gemini 3 Flash ($0.30) | Gemini 3 Pro ($5) / Kimi K2.6 ($4) | | Multilingual | Mistral Small 3 ($0.30) | Mistral Large ($6) | | Math/Science | DeepSeek V4 Flash ($0.28) | GPT-5.5 ($30) |
Cost Calculation Guide
Formula for Estimating Monthly Costs
Monthly Cost = (Daily Requests × Avg Input Tokens × Input Price/1M × 30)
+ (Daily Requests × Avg Output Tokens × Output Price/1M × 30)
Quick Cost Calculator
For common workloads using Claude Sonnet 4 ($3/$15):
| Workload | Daily Requests | Tokens (in/out) | Monthly Cost | |----------|---------------|-----------------|-------------| | Personal chatbot | 50 | 1K/500 | $16 | | Team chatbot | 500 | 1K/500 | $158 | | Coding assistant (solo) | 200 | 2K/800 | $108 | | Coding assistant (team of 10) | 2,000 | 2K/800 | $1,080 | | Customer support bot | 1,000 | 1.5K/600 | $405 | | Document processing | 5,000 | 3K/500 | $1,575 | | RAG pipeline | 2,000 | 4K/1K | $1,260 |
The same workloads with smart routing (mixed models):
| Workload | Single Model | Smart Routed | Savings | |----------|-------------|-------------|---------| | Personal chatbot | $16 | $4 | 75% | | Team chatbot | $158 | $40 | 75% | | Coding assistant (solo) | $108 | $28 | 74% | | Coding assistant (team of 10) | $1,080 | $280 | 74% | | Customer support bot | $405 | $81 | 80% | | Document processing | $1,575 | $189 | 88% | | RAG pipeline | $1,260 | $315 | 75% |
Smart routing through ClawRouters automatically achieves these savings by sending simple tasks to cheap models and complex tasks to premium models.
How to Reduce LLM API Costs
Strategy 1: Smart Routing (60-80% Savings)
The most impactful cost optimization. Use an LLM router to automatically classify requests and route them to the cheapest model that can handle each task:
import openai
# ClawRouters smart routing
client = openai.OpenAI(
base_url="https://api.clawrouters.com/v1",
api_key="your-clawrouters-key"
)
# "auto" model = smart routing picks the best model
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's 2+2?"}]
)
# → Routed to Gemini Flash ($0.30/M output) instead of Sonnet ($15/M)
Strategy 2: Prompt Optimization (20-40% Savings)
Reduce input tokens by writing concise prompts:
- Eliminate redundant system instructions
- Use abbreviated few-shot examples
- Remove unnecessary context from messages
Strategy 3: Output Length Control (10-30% Savings)
Set appropriate max_tokens:
response = client.chat.completions.create(
model="auto",
messages=[...],
max_tokens=100 # Don't let classification tasks generate essays
)
Strategy 4: Caching (30-50% Savings for Repetitive Workloads)
Use semantic caching for requests that repeat with minor variations. Some LLM gateways include built-in caching.
Strategy 5: Batch APIs (50% Savings for Non-Real-Time)
Most providers offer batch processing at 50% discount for workloads that don't need real-time responses.
Pricing Trends: What to Expect in Late 2026
Based on historical patterns and current announcements:
- Budget models will get cheaper: Expect Flash/mini-class models to drop below $0.20/M output by end of 2026
- Frontier prices will hold: Opus and GPT-5 class models will maintain premium pricing
- The gap will widen: The cost difference between frontier and budget will exceed 300x
- New entrants: More providers entering the budget model space will increase competition
- Volume discounts: Expect more aggressive tiered pricing for high-volume customers
The implication: smart routing becomes more valuable as the price gap widens. Using an LLM router that automatically takes advantage of cheap models for simple tasks will yield even greater savings.
Choosing the Right Model for Your Budget
| Monthly Budget | Recommended Approach | Expected Quality | |---------------|---------------------|-----------------| | Under $50 | Gemini Flash + GPT-4o-mini only | Good for most tasks | | $50-200 | Smart routing with budget models | Good to great | | $200-1,000 | Smart routing with Sonnet/GPT-4o for complex | Great | | $1,000-5,000 | Full smart routing including Opus for hardest tasks | Excellent | | $5,000+ | Enterprise routing with premium models available | Best possible |
For most teams, the $200-1,000 range with smart routing through ClawRouters delivers the best balance of quality and cost — you get Opus-quality answers on hard problems and Flash-speed responses on simple ones, all within a predictable budget.
For a broader look at cost optimization, see our guide to reducing LLM API costs.