Model Selection

SuperDocs offers four model tiers and three thinking depths. All tiers are available on every plan.

Model tiers

Set model_tier in your request to choose a model:

Tier	Best for	Speed	Thinking depth control
`core`	Everyday editing, quick tasks	Fast	Yes
`turbo`	Speed-critical workflows	Fastest	No — always optimized
`pro`	Complex analysis, multi-step edits	Moderate	No — always optimized
`max`	Challenging documents, nuanced tasks	Slower	Yes

Default: core

Thinking depth

Set thinking_depth to control how much reasoning the AI applies:

Depth	Behavior
`fast`	Quick responses, minimal reasoning
`balanced`	AI decides when to reason deeply (default)
`deep`	Extended reasoning for complex problems

Default: balanced

Thinking depth control is available for the core and max tiers. The turbo and pro tiers use their own optimized reasoning — the thinking_depth parameter is ignored for these models. You don’t need to configure it; they always use the best reasoning depth for the task.

Usage

# Core with custom thinking depth
curl -X POST https://api.superdocs.app/v1/chat \
  -H "Authorization: Bearer sk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Analyze this contract for potential risks",
    "session_id": "my-session",
    "document_html": "...",
    "model_tier": "core",
    "thinking_depth": "deep"
  }'

# Pro — no thinking_depth needed, reasoning is always optimized
curl -X POST https://api.superdocs.app/v1/chat \
  -H "Authorization: Bearer sk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Analyze this contract for potential risks",
    "session_id": "my-session",
    "document_html": "...",
    "model_tier": "pro"
  }'

Recommendations

Task	Suggested tier	Suggested depth
Fix a typo	`core`	`fast`
Rewrite a paragraph	`core`	`balanced`
Draft a multi-section document	`core` or `max`	`deep`
Analyze a complex contract	`pro` or `max`	`deep` (max only)
Batch processing many documents	`turbo`	—

Token cost expectations by operation type

Use this as a planning guide for how many output tokens — and roughly how much money — a given operation will spend.

Typical per-operation output token usage

Operation	Sections involved	Expected output tokens	Approx. cost (default tier)
Single-section edit (typo, reword, format)	1	5,000 – 20,000	$0.001 –$ 0.004
Multi-section edit (3–5 sections)	3–5	20,000 – 80,000	$0.004 –$ 0.016
Full-document operation (rewrite, restructure)	10–20	80,000 – 250,000	$0.016 –$ 0.050

Anything materially above these ranges (e.g. a single-section edit using 100,000+ output tokens) suggests a runaway plan or an unusually large section. Inspect the operation in your usage dashboard if you see this — and feel free to flag it to us, since it’s also useful as a signal for our own quality monitoring.

Reasoning tokens are additive

Each thinking depth adds reasoning tokens on top of the operation’s output:

Thinking depth	Approx. reasoning tokens added	Latency added
`fast`	up to 2,000	< 1 s
`balanced` (default)	dynamic — typically 4,000 – 8,000	1 – 3 s
`deep`	up to 16,000	3 – 8 s

Reasoning tokens are billed at the model’s standard output rate, so on the default tier a deep reasoning addition contributes < $0.005 per request even at the upper end.

Picking a tier for batch jobs

If you’re processing 100+ documents in a batch:

Cost-first — model_tier: "turbo". Fastest, lowest cost, slight precision tradeoff. Good for analytics-style passes (extract, classify, summarize).
Balanced — model_tier: "core" with thinking_depth: "fast". Solid precision, moderate speed and cost. Good default for most batch flows.
High-stakes — model_tier: "pro" or "max". Use when the cost of a wrong edit (lawyer review hours, regulatory exposure, reputational damage) far exceeds the per-document token cost — i.e. always for legal, regulatory, medical, or financial documents.

Choosing for precision

If the AI’s output isn’t what you wanted, the right tier change is usually obvious once you name the symptom:

Symptom	What to try
AI edited sections you didn’t ask about	Switch to `model_tier: "pro"` or `"max"`. Also verify your prompt is single-section explicit (e.g. “edit Section 3” rather than “fix the document”). Vague prompts invite wide edits.
Edits miss nuance in legal / regulatory / medical language	`model_tier: "max"` with `thinking_depth: "deep"`. Max is the most capable model and Deep gives it room to reason carefully about wording.
Too slow	`model_tier: "turbo"` (loses some precision but is the fastest). Or stay on `core` and pass `thinking_depth: "fast"` (smaller cost, smaller precision drop).
Want a balance — don’t know which to pick	Stay on the default: `core` + `balanced`. The model picks reasoning effort dynamically per request. Most edits won’t need deep reasoning; complex ones will.
Want maximum precision, cost is no object	`model_tier: "max"` + `thinking_depth: "deep"`. Most expensive, most accurate.

What `balanced` actually does

thinking_depth: "balanced" is dynamic — the model decides how much reasoning to apply based on the prompt’s complexity. Most everyday edits trigger minimal reasoning (cheap and fast); complex multi-step edits trigger more (slower, more expensive). This is the recommended default for general editing, and it’s what core + balanced gives you out of the box. fast (2,048-token reasoning ceiling) is faster but can over-narrow on the AI’s part — sometimes it picks a wider scope than the prompt warranted because it didn’t have budget to reason carefully about scope. Use fast when you’ve verified your prompts are unambiguous and you want the speed. deep (16,384-token reasoning ceiling) gives the model room to reason carefully. Use it for high-stakes edits where one bad output is more expensive than the extra tokens.

When to use Deep — and what it costs

Deep reasoning roughly 6× the output tokens versus Balanced for a typical 1,000-token edit. On core that translates to roughly

0.04 vs

0.007 per edit; on max it’s higher — roughly

0.05 vs

0.009. The wall-clock latency increase is typically 1–3 seconds. That cost is well worth it when:

The document is high-stakes (a contract clause, a regulatory disclosure, a medical instruction)
Output errors are expensive to fix downstream (executive review, lawyer hours)
You’re running few-but-important edits, not high-volume batch work

It is NOT worth it when:

You’re doing fast iterative editing (a writer’s draft, a Slack-formatting cleanup)
You’re processing high volumes (use turbo instead)
The user is sitting in front of the screen waiting (latency dominates UX)

For legal, regulatory, medical, financial, or compliance documents — default to Pro or Max. The cost of one wrong edit on a contract clause or a HIPAA-relevant medical instruction massively exceeds the cost of running a more capable tier. Don’t optimise for token cost on documents where the human cost of a mistake is in lawyer hours, regulatory exposure, or patient harm.

Default model preference

You can set a default model tier in your account preferences (via the web app). Per-request model_tier overrides your default.

Getting Started

Capabilities

Account & Billing

Core Concepts

Guides

MCP Integration

Errors & Limits

Code Examples

Model Selection

Model Selection

Model tiers

Thinking depth

Usage

Recommendations

Token cost expectations by operation type

Typical per-operation output token usage

Reasoning tokens are additive

Picking a tier for batch jobs

Choosing for precision

What `balanced` actually does

When to use Deep — and what it costs

Default model preference

Getting Started

Capabilities

Account & Billing

Core Concepts

Guides

MCP Integration

Errors & Limits

Code Examples

​Model Selection

​Model tiers

​Thinking depth

​Usage

​Recommendations

​Token cost expectations by operation type

​Typical per-operation output token usage

​Reasoning tokens are additive

​Picking a tier for batch jobs

​Choosing for precision

​What balanced actually does

​When to use Deep — and what it costs

​Default model preference

Model Selection

Model tiers

Thinking depth

Usage

Recommendations

Token cost expectations by operation type

Typical per-operation output token usage

Reasoning tokens are additive

Picking a tier for batch jobs

Choosing for precision

What `balanced` actually does

When to use Deep — and what it costs

Default model preference