If you are pricing out the xAI Grok API for a build in 2026, the two things that matter are the model tier you pick and whether you opt into xAI's data-sharing program for free credits. This guide walks through how Grok API pricing is structured, where Grok 4 and Grok 4 Fast sit relative to each other, and how to keep your token bill low. Prices move, so treat every number here as a ballpark and check the live per-model rates in the rankings before you commit.
How xAI prices the Grok API
Like most LLM APIs, Grok bills per token, quoted per million tokens, with separate input and output rates. A few structural points are worth knowing up front:
- Output tokens cost more than input tokens, often several times more, so the shape of your workload (short prompts, long generations vs long prompts, short answers) drives the bill as much as the rate card.
- Higher-capability models cost more per token than the fast or lightweight variants. Picking the right tier per task is the single biggest lever you have.
- Prompt caching can lower the effective input rate when you reuse a large, stable context across many calls.
Because xAI revises its lineup and rates regularly, the safest move is to read the rate off the live tracker rather than a static blog number. The Perkstack rankings show the cheapest verified endpoint per model and are re-pulled on a schedule, and the catalog tracks the credit offers.
Grok 4 vs Grok 4 Fast
xAI splits its Grok line into a flagship tier and faster, cheaper variants. The exact names and numbers shift across releases, but the trade-off is consistent:
Grok 4 (flagship tier)
- The most capable Grok models, aimed at hard reasoning, long context, and agentic work.
- The highest per-token rates in the lineup. On recent flagship models this has been in the low single digits of dollars per million input tokens and higher for output, but verify the current figure before budgeting.
- Best reserved for the requests that actually need the extra capability.
Grok 4 Fast (budget tier)
- A lower-latency, lower-cost variant tuned for high-volume and cost-sensitive workloads.
- Per-token rates that can be roughly an order of magnitude cheaper than the flagship on both input and output, which makes it the default choice for routine calls.
- Often paired with a large context window, so you can feed it sizable inputs without jumping to the flagship.
The practical pattern most builders land on: route everything to the Fast tier by default, and escalate only the requests that visibly need the flagship. That keeps the average cost per request close to the cheap tier. For the same idea applied across providers, see the cheapest way to run LLMs.
xAI API free credits and the data-sharing program
The most distinctive thing about xAI's economics is its free-credit structure, which has two parts:
- A modest one-time signup credit when you create an account, enough to run real test traffic.
- A recurring monthly credit allowance (reported in the low hundreds of dollars per month) that you unlock by opting your team into xAI's data-sharing program through the console.
There are conditions, and they matter:
- Data sharing means xAI may use your API prompts and responses to improve future models. For anything sensitive or proprietary, weigh that carefully before opting in.
- Reports indicate the program operates at the team level, can require a small minimum spend before you qualify, and may not be reversible once enabled. Confirm the current terms in the xAI console rather than relying on any summary, including this one.
If the data-sharing trade-off is acceptable for your use case, the recurring allowance is among the more generous free offers from a major LLM provider. If it is not, you fall back to paying the standard per-token rates. We track the live xAI offer alongside other providers in the catalog. For how other vendors handle this, compare OpenAI, Claude and Gemini.
Keeping your Grok bill low
A few habits do most of the work:
- Default to Grok 4 Fast and escalate to the flagship only when a task fails on the cheaper model.
- Trim system prompts and avoid resending context you do not need. Output tokens are the expensive ones, so cap response length where you can.
- Use prompt caching for large, repeated context so you pay the cached input rate.
- Batch background and non-interactive jobs rather than firing them one at a time.
- Before you lock in a provider, compare the live per-model rate against alternatives in the rankings. The same class of model can be cheaper elsewhere depending on your input-to-output ratio.
Bottom line
Grok API pricing in 2026 comes down to two decisions: pick Grok 4 Fast for the bulk of your traffic and reserve the flagship Grok 4 tier for the hard requests, and decide whether the data-sharing free credits are worth the trade-off for your data. Both choices can swing your bill by a large multiple. Because the exact rates change, confirm them live in the rankings and check the current free-credit terms in the catalog. To track xAI's offer alongside every other provider, create a free Perkstack account.