Model Selection
SuperDocs offers four model tiers and three thinking depths. All tiers are available on every plan.
Model tiers
Set model_tier in your request to choose a model:
| Tier | Best for | Speed | Thinking depth control |
|---|
core | Everyday editing, quick tasks | Fast | Yes |
turbo | Speed-critical workflows | Fastest | No — always optimized |
pro | Complex analysis, multi-step edits | Moderate | No — always optimized |
max | Challenging documents, nuanced tasks | Slower | Yes |
Default: core
Thinking depth
Set thinking_depth to control how much reasoning the AI applies:
| Depth | Behavior |
|---|
fast | Quick responses, minimal reasoning |
balanced | AI decides when to reason deeply (default) |
deep | Extended reasoning for complex problems |
Default: balanced
Thinking depth control is available for the core and max tiers. The turbo and pro tiers use their own optimized reasoning — the thinking_depth parameter is ignored for these models. You don’t need to configure it; they always use the best reasoning depth for the task.
Usage
# Core with custom thinking depth
curl -X POST https://api.superdocs.app/v1/chat \
-H "Authorization: Bearer sk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"message": "Analyze this contract for potential risks",
"session_id": "my-session",
"document_html": "...",
"model_tier": "core",
"thinking_depth": "deep"
}'
# Pro — no thinking_depth needed, reasoning is always optimized
curl -X POST https://api.superdocs.app/v1/chat \
-H "Authorization: Bearer sk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"message": "Analyze this contract for potential risks",
"session_id": "my-session",
"document_html": "...",
"model_tier": "pro"
}'
Recommendations
| Task | Suggested tier | Suggested depth |
|---|
| Fix a typo | core | fast |
| Rewrite a paragraph | core | balanced |
| Draft a multi-section document | core or max | deep |
| Analyze a complex contract | pro or max | deep (max only) |
| Batch processing many documents | turbo | — |
Token cost expectations by operation type
Use this as a planning guide for how many output tokens — and roughly how much money — a given operation will spend.
Typical per-operation output token usage
| Operation | Sections involved | Expected output tokens | Approx. cost (default tier) |
|---|
| Single-section edit (typo, reword, format) | 1 | 5,000 – 20,000 | 0.001–0.004 |
| Multi-section edit (3–5 sections) | 3–5 | 20,000 – 80,000 | 0.004–0.016 |
| Full-document operation (rewrite, restructure) | 10–20 | 80,000 – 250,000 | 0.016–0.050 |
Anything materially above these ranges (e.g. a single-section edit using 100,000+ output tokens) suggests a runaway plan or an unusually large section. Inspect the operation in your usage dashboard if you see this — and feel free to flag it to us, since it’s also useful as a signal for our own quality monitoring.
Reasoning tokens are additive
Each thinking depth adds reasoning tokens on top of the operation’s output:
| Thinking depth | Approx. reasoning tokens added | Latency added |
|---|
fast | up to 2,000 | < 1 s |
balanced (default) | dynamic — typically 4,000 – 8,000 | 1 – 3 s |
deep | up to 16,000 | 3 – 8 s |
Reasoning tokens are billed at the model’s standard output rate, so on the default tier a deep reasoning addition contributes < $0.005 per request even at the upper end.
Picking a tier for batch jobs
If you’re processing 100+ documents in a batch:
- Cost-first —
model_tier: "turbo". Fastest, lowest cost, slight precision tradeoff. Good for analytics-style passes (extract, classify, summarize).
- Balanced —
model_tier: "core" with thinking_depth: "fast". Solid precision, moderate speed and cost. Good default for most batch flows.
- High-stakes —
model_tier: "pro" or "max". Use when the cost of a wrong edit (lawyer review hours, regulatory exposure, reputational damage) far exceeds the per-document token cost — i.e. always for legal, regulatory, medical, or financial documents.
Choosing for precision
If the AI’s output isn’t what you wanted, the right tier change is usually obvious once you name the symptom:
| Symptom | What to try |
|---|
| AI edited sections you didn’t ask about | Switch to model_tier: "pro" or "max". Also verify your prompt is single-section explicit (e.g. “edit Section 3” rather than “fix the document”). Vague prompts invite wide edits. |
| Edits miss nuance in legal / regulatory / medical language | model_tier: "max" with thinking_depth: "deep". Max is the most capable model and Deep gives it room to reason carefully about wording. |
| Too slow | model_tier: "turbo" (loses some precision but is the fastest). Or stay on core and pass thinking_depth: "fast" (smaller cost, smaller precision drop). |
| Want a balance — don’t know which to pick | Stay on the default: core + balanced. The model picks reasoning effort dynamically per request. Most edits won’t need deep reasoning; complex ones will. |
| Want maximum precision, cost is no object | model_tier: "max" + thinking_depth: "deep". Most expensive, most accurate. |
What balanced actually does
thinking_depth: "balanced" is dynamic — the model decides how much reasoning to apply based on the prompt’s complexity. Most everyday edits trigger minimal reasoning (cheap and fast); complex multi-step edits trigger more (slower, more expensive). This is the recommended default for general editing, and it’s what core + balanced gives you out of the box.
fast (2,048-token reasoning ceiling) is faster but can over-narrow on the AI’s part — sometimes it picks a wider scope than the prompt warranted because it didn’t have budget to reason carefully about scope. Use fast when you’ve verified your prompts are unambiguous and you want the speed.
deep (16,384-token reasoning ceiling) gives the model room to reason carefully. Use it for high-stakes edits where one bad output is more expensive than the extra tokens.
When to use Deep — and what it costs
Deep reasoning roughly 6× the output tokens versus Balanced for a typical 1,000-token edit. On core that translates to roughly 0.04vs0.007 per edit; on max it’s higher — roughly 0.05vs0.009. The wall-clock latency increase is typically 1–3 seconds.
That cost is well worth it when:
- The document is high-stakes (a contract clause, a regulatory disclosure, a medical instruction)
- Output errors are expensive to fix downstream (executive review, lawyer hours)
- You’re running few-but-important edits, not high-volume batch work
It is NOT worth it when:
- You’re doing fast iterative editing (a writer’s draft, a Slack-formatting cleanup)
- You’re processing high volumes (use
turbo instead)
- The user is sitting in front of the screen waiting (latency dominates UX)
For legal, regulatory, medical, financial, or compliance documents — default to Pro or Max. The cost of one wrong edit on a contract clause or a HIPAA-relevant medical instruction massively exceeds the cost of running a more capable tier. Don’t optimise for token cost on documents where the human cost of a mistake is in lawyer hours, regulatory exposure, or patient harm.
Default model preference
You can set a default model tier in your account preferences (via the web app). Per-request model_tier overrides your default.