Back to Blog

GitHub Copilot goes metered

As of today, GitHub Copilot moves from Premium Request Units to token-metered GitHub AI Credits. The sticker prices didn't change — the meter did. Completions stay free, every chat turn and agent step is now denominated in input, output and cached tokens against per-model rates. The configs, REST calls and policies you set this week decide what your July invoice looks like.

CL
CLWD // DevEx & Microsoft Copilot

From premium requests to AI Credits

Until today, paid Copilot plans were billed in Premium Request Units — a fixed multiplier per model where one chat turn cost 1 × multiplier, regardless of whether you sent it 200 tokens or 200,000. Crude but easy to cap: you knew how many "premium requests" you had left, you knew when to slow down.

From June 1, that model is gone. Every Copilot plan now ships with a monthly allowance of GitHub AI Credits denominated in a fiat balance (we'll quote everything in euro), and every metered interaction is billed against published per-model token rates — input, output, and cached input each counted separately. Paid plans can buy overage; Free cannot. Allowances reset monthly and don't roll over.

Plan Base price Monthly AI Credits Overage Notes
Free €0 Small allowance, selected models None 2,000 completions/month cap. Stops on exhaustion.
Pro €10 / month ~€10 worth Self-serve, capped Unlimited completions remain free.
Pro+ €39 / month ~€20 worth Self-serve, capped Adds premium models (Opus 4.7/4.8, GPT-5.5).
Max €100 / month Highest individual allowance Self-serve From today, upgrade-only for existing Copilot users.
Business €19 / seat / month Pooled at org Org-level budget Centralised policy, audit logs, IP indemnity.
Enterprise €39 / seat / month Larger pooled allowance Org-level budget Org instructions, content exclusion, priority access.
Annual plans

Existing annual Pro and Pro+ subscribers can stay on the legacy request-based billing model until renewal — but with bumped model multipliers from today. The details live under docs.github.com → Copilot billing → Request-based billing (legacy) → Model multipliers for annual plans. If you renew, you land on AI Credits like everyone else.

What costs credits, what doesn't

The first thing to internalise is that completions are still free. Inline suggestions, next-edit suggestions and "Review selection" code review do not consume AI Credits — even for the heaviest tab-completers, Pro at €10 is effectively a flat-rate product. The meter only spins for everything that is genuinely model-driven.

Feature Billing Notes
Inline completions, next-edit suggestions Unmetered Free across all paid plans; 2,000/month on Free.
"Review selection" in VS Code Unmetered Lightweight code review only.
Copilot Chat (IDE, github.com, mobile) Metered Input + output tokens per turn.
Agent mode Metered Every tool round-trip is another model call.
Copilot cloud agent Metered + GitHub Actions minutes Repo ops burn Actions on top of credits.
Full-PR code review Metered + Actions minutes Same as above for repo-wide reviews.
MCP tool calls that hit a model Metered Pure tool execution is free; the LLM round-trip isn't.

The published token rates that matter day-to-day, in euro per 1M tokens (GitHub publishes in USD; we've shown rounded EUR equivalents at roughly 1:1 — the meter behaviour is identical):

Model Input Output Typical use
GPT-5 mini ~€0.25 ~€2 Routine chat, default for cost-sensitive workloads.
GPT-5.4 €2.50 €15 General-purpose, balanced quality.
Claude Sonnet 4.6 €3 €15 Agent mode default for most teams.
Gemini 3.1 Pro €2 €12 Long-context tasks, large files.
Claude Opus 4.7 ~€15 ~€75 Premium. Use for hard reasoning, not boilerplate.

Cached input tokens — content the provider has already seen in the same thread — bill at roughly a tenth of the fresh input rate. That detail matters more than it sounds; we'll come back to it.

The mistake that drives every "Copilot got expensive" tweet is treating these rates as abstract. They aren't. Here is what a single, fairly typical chat turn actually costs:

Token math — one medium chat turn
# Scenario: medium chat turn on Claude Sonnet 4.6 # 18,000 input tokens (open files + chat history) # + 1,200 output tokens (the answer) input_cost = 18000 * 3 / 1_000_000 # = €0.054 output_cost = 1200 * 15 / 1_000_000 # = €0.018 turn_cost = input_cost + output_cost # = €0.072 per turn # Pro's €10 allowance ≈ 138 such turns / month # Same turn on Opus 4.7 (~€15 in / €75 out): # = 18000*15/1M + 1200*75/1M = €0.27 + €0.09 = €0.36 # = ~27 turns on the same €10 allowance
Where the bill shock comes from

A multi-hour agent session on Opus 4.7 with full-repo context will routinely burn €5–€20 of credits per run. Reports of ~9× cost jumps for power users moving from PRU multipliers to token billing are not exaggerations — they are the predictable consequence of letting the agent pick its own model on its own context.

Five habits that keep the bill predictable

None of what follows is exotic. It is the same FinOps loop we apply to Azure — observe, constrain, default, review — bolted onto the Copilot surface. The order matters: cap first, default second, observe third.

  1. Cap overage at €0 first, ask questions later.

    The single highest-leverage action you can take today is to set the overage budget to zero. Individuals: Settings → Billing & Plans → Copilot → Budgets. Org admins: Organization → Settings → Copilot → Budgets. Zero overage means the agent pauses when the included allowance runs out — no invoice surprise, just a notification. Re-enable deliberately, never by default.

    gh CLI — set €0 org overage with 80% alert
    gh api -X POST /orgs/{org}/settings/billing/budgets \ -f product=copilot \ -F amount=0 \ -f alert_threshold="80"
  2. Pin a default model per workload.

    Letting every developer pick Opus by reflex is how you torch a pooled allowance in week one. Set a sensible default in settings.json and reserve premium models for tasks that actually need them.

    VS Code — settings.json
    // User or workspace settings { "github.copilot.chat.defaultModel": "gpt-5-mini", "github.copilot.chat.agent.defaultModel": "claude-sonnet-4.6", "github.copilot.advanced": { "length": 500 } }

    Opus 4.7 and GPT-5.4 stay one model-picker click away — they should not be the floor.

  3. Pull usage weekly via the API, not monthly via the invoice.

    The new /copilot/usage and /copilot/billing/seats endpoints expose per-user, per-model, per-feature token counts. Pipe them to CSV or a Log Analytics workspace, chart them in Power BI, and you'll catch the runaway agent before the bill does.

    gh CLI — daily token burn by model
    gh api /orgs/{org}/copilot/usage \ -q '.[] | { day: .day, chat_tokens: .chat_metrics.total_tokens, agent_tokens: .copilot_agent_metrics.total_tokens, models: .model_breakdown }' \ > copilot-usage.json
  4. Shrink the context window before the LLM sees it.

    Input tokens dominate the bill for most agentic workloads. Prefer file-scoped chat (#file:src/api.ts) over @workspace, which on a 200k-file monorepo will happily ship 100k+ tokens of context to every turn. For repo-wide reasoning, swap @workspace for a scoped MCP server — a ripgrep or semgrep-backed retriever returning 2k tokens of hits beats 200k tokens of file tree every time.

    .vscode/mcp.json — narrow retrieval server
    { "servers": { "internal-docs": { "command": "npx", "args": ["-y", "@your-org/mcp-docs", "--root", "./docs", "--max-hits", "8"] } } }
  5. Constrain agents with instructions and prompt files.

    .github/copilot-instructions.md is read on every chat in the repo — push stack, package manager, test command, banned patterns and "always run X first" rules there so agents don't burn turns rediscovering them. Per-task prompt files under .github/prompts/*.prompt.md let you ship repeatable jobs with a fixed model, fixed toolset and explicit scope.

    .github/prompts/refactor-controller.prompt.md
    --- mode: agent model: claude-sonnet-4.6 tools: [editFiles, runCommands] --- Refactor the controller in #file:${input:path} to: - extract validation into a service class - preserve all existing public method signatures - run `npm test` after each edit; stop on failure Do not touch files outside the controller's folder.
Pooled credits, per-user caps

On Business and Enterprise, pooled credits mean one heavy user no longer torches their own seat — but they can still drain the org pool. Pair the org-level €0 overage cap with per-user soft caps in Org → Copilot → Policies. A noisy team becomes a contained incident instead of a finance ticket.

Getting more out of fewer credits

Beyond the five habits, a handful of small choices noticeably bend the curve.

Lean on custom instructions. .github/copilot-instructions.md is cached as part of the system prompt. The 30-line block below stops three turns of "what test runner do you use?" from ever happening:

.github/copilot-instructions.md
# Project context - Stack: TypeScript 5.5, Node 22, Fastify, Postgres 16 via Drizzle ORM. - Package manager: pnpm. Never suggest npm or yarn. - Test runner: vitest. Run `pnpm test -- --run` after changes. - Linting: biome. Run `pnpm lint --apply` before committing. - Banned: axios (use built-in fetch), moment (use Temporal), lodash. - Logging: pino, never console.log outside of scripts. - Always type errors explicitly; no any in new code.

Stay in the thread. Cached input bills at roughly 10% of the fresh input rate. A 20-turn conversation in one thread is dramatically cheaper than 20 single-turn threads with the same context. Resist the urge to "start fresh" — start scoped instead.

Use the CLI for one-shots. gh copilot suggest "rotate the KV secret used by the api app" and gh copilot explain "$(git diff HEAD~1)" are short, cheap, single-call interactions. Skip the chat-session overhead for questions that don't need a back-and-forth.

Use agent mode for things that would otherwise be 30 chat turns. A multi-file refactor with a clear acceptance criterion is genuinely cheaper as one agent run than as a manual ping-pong — fewer redundant context reloads, more cached tokens.

Watch what MCP servers do. A noisy MCP server that returns 50 KB of JSON on every call inflates every subsequent turn. Audit your registered servers, prefer ones that return structured, bounded results, and disable the rest at the repo level.

Two Copilots, two billing models, two jobs

The unfortunate thing about Microsoft naming both of its flagship AI products "Copilot" is that, from today, they sit at opposite ends of the cost-predictability spectrum. They share a name and a vendor — almost nothing else.

GitHub Copilot Microsoft 365 Copilot M365 Copilot Chat
Built for Writing and shipping code Knowledge work in Office Casual, web-grounded chat
Surface VS Code, Visual Studio, JetBrains, Xcode, Eclipse, Vim, gh CLI, github.com Word, Excel, Outlook, Teams, PowerPoint, Loop, SharePoint, OneNote m365.cloud.microsoft, Edge sidebar, Teams
Data grounding Repo + open files + MCP servers Microsoft Graph (mail, files, chats, calendar) + tenant search Public web; no Graph data
Pricing From June 1: AI Credits (token-metered) + flat base €31,80 / user / month, annual commit Included with Entra ID, no extra licence
Admin centre GitHub org settings, Copilot policies, content exclusion Microsoft 365 admin centre + Copilot Control System + Purview Microsoft 365 admin centre
Cost shape Variable above the base, capped if you cap it Flat per seat — predictable, painful to leave idle €0

The technical kicker most teams miss: the data-protection story is dramatically deeper on the Microsoft 365 side. M365 Copilot honours Purview sensitivity labels in its grounding, respects DLP policies on its responses, surfaces in eDiscovery (prompts and answers), and can be scoped via sensitivity-label-driven SharePoint permissions. GitHub Copilot's enterprise controls are a different shape — content exclusion patterns, audit logs, IP indemnity, org-level model and tool policies. Different problem, different tooling, different admin centre.

From today, the practical consequence is unambiguous: Microsoft 365 Copilot is the flat-spend product, GitHub Copilot is the metered product. Running both without a unified governance plan means surprises in two dashboards instead of one — and they belong to two different teams.

Govern both, or pay for both blindly

Most CLWD customers run both: Microsoft 365 Copilot for the business, GitHub Copilot for engineering. Today's change means the FinOps loop we already apply to Azure — Inform → Optimize → Commit → Operate — now applies to Copilot as a first-class spend surface. Same loop, two products, one dashboard.

What "good" looks like in practice:

  • Inform. Weekly /copilot/usage export to a Log Analytics workspace or Microsoft Fabric. M365 Copilot adoption export via https://config.office.com. One Power BI report joining seat cost and token spend per cost centre.
  • Optimize. Org-level model allowlist (no Opus by default), per-user soft caps, prompt files for repeated workflows, MCP-narrowed context for monorepos, sensitivity-label-driven scoping for M365 Copilot grounding.
  • Commit. Right-size Pro / Pro+ / Max versus Business per persona. Reclaim idle M365 Copilot seats monthly via Reports → Microsoft 365 Copilot Usage. Pooled-credit orgs: review who actually consumed the pool, not just who has a seat.
  • Operate. Clear RACI between IT (M365 licensing, Purview labels, DLP, eDiscovery) and Engineering (GitHub org policies, repo instructions, MCP catalogue, model allowlist). One monthly review, both products on the same agenda.

The teams that get burned this month aren't the ones using Copilot the most. They are the ones running both products with zero shared governance — one team picked the licences, another team picked the agents, and nobody owned the bill.

Key takeaways

01

The meter changed, not the sticker

From today, Copilot bills in input / output / cached tokens per model. Base subscription prices are unchanged; everything above the included allowance is metered.

02

Completions stay free

Inline suggestions and next-edit suggestions don't burn credits. For tab-completion-heavy workloads, Pro at €10 is effectively flat.

03

Cap overage at €0 first

Set the org and individual overage budget to zero today. Raise it deliberately, never by accident — paused agent beats surprise invoice.

04

Model choice is now a budget decision

Pin a sensible default (gpt-5-mini for chat, claude-sonnet-4.6 for agents). Reserve Opus 4.7 and GPT-5.4 for hard reasoning.

05

Shrink the context, not the model

File-scoped chat, custom instructions, prompt files and MCP retrievers beat @workspace on large repos — every time, by a lot.

06

Pull usage weekly via API

The /copilot/usage endpoint exposes per-user, per-model, per-feature tokens. Catch the runaway agent before the invoice does.

07

Two Copilots, two billing shapes

GitHub Copilot is now metered. Microsoft 365 Copilot is still flat per seat. Govern both with one loop, not two.

08

FinOps applies to AI now

Inform, Optimize, Commit, Operate — the loop that tames Azure spend works for Copilot spend. Same monthly review, same owners.

Plan your Copilot spend before the next invoice

We help teams set up the budgets, model policies, usage dashboards and governance loop for both GitHub Copilot and Microsoft 365 Copilot — so July looks like June, only cheaper.

Talk to us