TL;DR — Moonshot's current Kimi API pricing in 2026 is $0.60 per million input tokens and $4.00 per million output tokens for Kimi K2.6, the flagship open-weight model. Cached input drops to $0.30/M thanks to Moonshot's automatic 50% cache discount. That makes Kimi K2.6 roughly 10x cheaper than Claude Opus 4.8 on input while scoring 58.6% on SWE-Bench Pro — tying GPT-5.5. The catch: Kimi's $4.00 output price is high relative to its input, so output-heavy workloads erode the savings fast. Teams using ClawRouters to auto-route each request to the cheapest model that can actually handle it cut their total LLM bill 40-60% without locking into any single provider.
If you've been searching for moonshot kimi api pricing per million tokens 2026, this is the definitive breakdown. We cover every current Kimi model's exact per-token cost, the auto-cache discount most pricing pages skip, real monthly cost scenarios, and a head-to-head comparison against OpenAI, Anthropic, Google, and DeepSeek. For the broader market, see our full LLM API pricing guide for 2026.
Moonshot Kimi API Pricing Table (June 2026)
All prices are per million tokens (MTok). These reflect Moonshot AI's published platform rates.
| Model | Input (/1M) | Output (/1M) | Cached Input (/1M) | Context | Best For | |-------|------------|-------------|-------------------|---------|----------| | Kimi K2.6 | $0.60 | $4.00 | $0.30 | 256K | Agentic coding, long-horizon reasoning, Chinese | | Kimi K2.5 (legacy) | $0.60 | $2.50 | $0.30 | 256K | General chat, translation, back-compat |
Prices as of June 2026. Per 1 million tokens. Kimi K2.6 released 2026-04-20.
What makes Kimi pricing different
Three things set moonshot kimi api pricing apart from Western providers:
- Automatic cache discount. Moonshot caches your prompt prefixes automatically and bills cache reads at $0.30/M — a flat 50% off input. You don't manage cache breakpoints manually like you do with Anthropic. For repeated system prompts or RAG context, this is effectively free money.
- A wide input-to-output ratio. Kimi K2.6's output is 6.7x its input price ($4.00 vs $0.60). Most providers run a 4:1 or 5:1 ratio. This means Kimi is a bargain for read-heavy tasks (classification, extraction, long-context Q&A) but less dominant for generation-heavy ones (long-form writing, verbose agents).
- Open weights. Kimi K2.6 is an open-weight MoE (1T total parameters, 32B active). You can self-host if you have the GPUs — but for most teams the hosted API at this price is cheaper than running the hardware.
Kimi K2.6: What You're Paying For
Kimi K2.6 isn't a budget model that's cheap because it's weak. The kimi api cost is low because Moonshot prices aggressively, not because the model underperforms:
- SWE-Bench Pro: 58.6% — ties GPT-5.5 and edges out several frontier models on real-world software engineering tasks.
- HLE (Humanity's Last Exam) 54.0 with tools — beats Claude Opus 4.6 and GPT-5.4 on this reasoning benchmark.
- Purpose-built for agents — scales to 300 sub-agents and 4,000 steps in long-horizon agentic workflows.
- 256K context window — enough for large codebases, long documents, and multi-turn agent histories.
In short: you get near-frontier coding and reasoning at roughly one-tenth the input cost of Claude Opus. That's why the router prefers Kimi K2.6 for coding and reasoning tasks inside high-context agentic workloads.
Real-World Monthly Kimi API Cost Scenarios
Per-token numbers are abstract. Here's what the moonshot kimi api pricing translates to at real production volumes for Kimi K2.6:
| Daily Volume | Without cache /month | With 50% cache on input /month | |-------------|----------------------|--------------------------------| | 500K in + 500K out | $69 | $64.50 | | 2M in + 2M out | $276 | $258 | | 5M in + 5M out | $690 | $645 | | 10M in + 2M out (read-heavy) | $420 | $330 | | 2M in + 10M out (write-heavy) | $1,236 | $1,227 |
Two patterns jump out:
- Cache savings scale with input volume. The read-heavy row (10M in / 2M out) saves $90/month from caching alone. If your workload reuses system prompts or retrieval context, Kimi's auto-cache is a structural advantage.
- Output dominates write-heavy bills. The write-heavy row costs nearly $1,236/month because output is $4.00/M. Caching barely helps — it only discounts input. For verbose generation, a model with cheaper output may win even at a higher input price.
This is exactly the kind of tradeoff that makes single-provider lock-in expensive. The right model depends on your input/output mix, and that mix varies per request.
Moonshot Kimi vs. Other LLM Providers (2026)
Here's how moonshot ai pricing 2026 stacks up against the major providers, sorted by input cost:
| Provider | Model | Input (/1M) | Output (/1M) | Context | Notes | |----------|-------|------------|-------------|---------|-------| | Google | Gemini 3 Flash | $0.075 | $0.30 | 1M | Cheapest overall, great for high-volume simple tasks | | DeepSeek | DeepSeek V4 Flash | $0.14 | $0.28 | 128K | Best output value, strong coding/math | | Moonshot | Kimi K2.6 | $0.60 | $4.00 | 256K | Near-frontier coding, wide I/O ratio | | Google | Gemini 3 Pro | $1.25 | $5.00 | 1M | Long-context multimodal | | DeepSeek | DeepSeek V4 Pro | $1.74 | $3.48 | 128K | Premium coding, 81% SWE-Bench Verified | | OpenAI | GPT-4o | $2.50 | $10.00 | 128K | General-purpose, vision | | Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | Strong all-rounder | | OpenAI | GPT-5.5 | $5.00 | $30.00 | 256K | OpenAI flagship | | Anthropic | Claude Opus 4.8 | $5.00 | $25.00 | 1M | Best reasoning, agentic coding |
Prices as of June 2026, per 1 million tokens.
Kimi vs. DeepSeek: the closest comparison
The most direct competitor to Kimi is DeepSeek, another Chinese open-weight provider. On input, DeepSeek V4 Flash ($0.14) is over 4x cheaper than Kimi K2.6 ($0.60). On output, DeepSeek V4 Flash ($0.28) is a staggering 14x cheaper.
So why use Kimi at all? Because Kimi K2.6 is built for long-horizon agentic coding — the 256K context, the 300-sub-agent scaling, and the 58.6% SWE-Bench Pro score make it the better choice for complex, multi-step engineering agents. DeepSeek V4 Flash is unbeatable on pure cost for medium-complexity coding and math; Kimi K2.6 earns its premium on the hardest agentic tasks. For raw cost-per-task on coding, see our cheapest AI API for coding guide.
Kimi vs. Claude & GPT: the value gap
Against the Western frontier, the kimi api cost advantage is enormous. Kimi K2.6 input ($0.60) is roughly 8x cheaper than Claude Opus 4.8 or GPT-5.5 ($5.00). On output, Kimi ($4.00) undercuts Opus 4.8 ($25.00) by 6x and GPT-5.5 ($30.00) by 7.5x — while posting a SWE-Bench Pro score that ties GPT-5.5.
The honest caveat: Claude and GPT still lead on instruction-following reliability and tool-call consistency in production. That's why our router holds Chinese-provider instruction-following scores with a small reliability buffer rather than treating raw benchmarks as gospel. For mission-critical tool chains, a more expensive model sometimes pays for itself in fewer retries.
When Kimi K2.6 Is the Right Choice
Based on the pricing structure and benchmarks, Kimi K2.6 is the cost-optimal pick when:
- You run agentic coding workloads. Long-horizon agents with many sub-agents and steps are exactly what K2.6 was designed for, and the per-step cost is a fraction of Opus or GPT-5.5.
- Your prompts are input-heavy. Large codebases, long documents, RAG context — the $0.60 input price plus auto-cache makes context cheap.
- You need Chinese-language strength. Kimi is a Chinese-language specialist; for bilingual or Chinese-first products it outperforms Western models per dollar.
- You want near-frontier quality on a budget. When DeepSeek isn't quite strong enough but Opus is overkill, Kimi K2.6 fills the gap.
Kimi is the wrong choice when your workload is output-heavy (long-form content generation), where the $4.00 output price erases the input savings, or when you need maximum tool-call reliability for production agents — cases where Sonnet 4.6 or GPT-5.5 may be cheaper per successful completion.
How to Cut Your Kimi API Cost Further
Even at $0.60/$4.00, you can reduce your moonshot kimi api pricing bill:
- Lean on the auto-cache. Structure prompts so the static portion (system prompt, instructions, retrieval context) comes first. Moonshot caches prefixes automatically and bills reads at $0.30/M.
- Disable thinking mode when you don't need it. Kimi K2.6 defaults to thinking mode, which burns extra output tokens (the priciest part). For straightforward tasks, disabling it cuts output token consumption directly.
- Don't send everything to Kimi. Simple classification or extraction belongs on Gemini 3 Flash ($0.075/$0.30) or DeepSeek V4 Flash. Reserve Kimi K2.6 for the agentic coding and reasoning tasks where its quality justifies the price.
- Route by request, not by provider. The single biggest lever is matching each request to the cheapest model that can handle it — which no static configuration can do well, because complexity varies request to request.
Let ClawRouters Optimize Kimi Pricing Automatically
Here's the core problem with picking any single model — including Kimi: the optimal model changes per request. A simple extraction call wastes money on Kimi when Gemini Flash would do. A hard agentic coding task underperforms on a cheaper model and costs you in retries. And an output-heavy generation task is cheaper on a model with a tighter I/O ratio.
ClawRouters solves this by analyzing each incoming prompt and routing it to the optimal model across Moonshot, OpenAI, Anthropic, Google, DeepSeek, and other Chinese providers — based on task type, complexity, and your cost strategy. You keep an OpenAI-compatible API; you just change your base_url. Kimi K2.6 is already in the routing pool, automatically selected for the coding and high-context reasoning tasks where its kimi api pricing is unbeatable, and skipped where a cheaper or more reliable model wins.
The result: teams cut their total LLM spend 40-60% versus pinning everything to one model — Kimi included — with no quality loss and no provider lock-in. You get Kimi's prices where Kimi is best, and something cheaper or better everywhere else.
To see how this compares to pinning Kimi or using a static gateway, read why OpenRouter won't cut your AI bill and our LLM API pricing guide for 2026.
Frequently Asked Questions
What is Moonshot Kimi API pricing per million tokens in 2026? Kimi K2.6, Moonshot's flagship, costs $0.60 per million input tokens and $4.00 per million output tokens. Cached input is billed at $0.30/M (a 50% automatic discount). The 256K-context model released on 2026-04-20.
How much cheaper is Kimi than Claude or GPT? On input, Kimi K2.6 ($0.60/M) is roughly 8x cheaper than Claude Opus 4.8 or GPT-5.5 ($5.00/M). On output, it's 6-7.5x cheaper. It ties GPT-5.5's 58.6% SWE-Bench Pro score, so the value gap on coding is dramatic.
Is Kimi cheaper than DeepSeek? No. DeepSeek V4 Flash ($0.14/$0.28) is cheaper on both input and output. Kimi K2.6 costs more but is purpose-built for long-horizon agentic coding with a larger context window, making it the better pick for complex multi-step agents.
Does Kimi have a cache discount? Yes. Moonshot automatically caches prompt prefixes and bills cache reads at $0.30/M — a flat 50% off input. You don't need to manage cache breakpoints manually.
How do I reduce my Kimi API cost? Lean on the auto-cache by front-loading static context, disable thinking mode for simple tasks, and route only the right requests to Kimi. The biggest savings come from per-request routing across providers — which is exactly what ClawRouters automates.
Pricing reflects Moonshot AI's published rates as of June 2026 and may change. ClawRouters keeps its routing pool and cost data current as providers update pricing.