Features
SuperDocs is a universal AI document platform — one API for editing, drafting, searching, summarizing, and exporting styled documents (.docx, PDF, HTML, Markdown, RTF). Every capability below is live in production at https://api.superdocs.app and exposed identically over the REST API and the MCP server at https://api.superdocs.app/mcp (21 tools + 4 user-invocable workflow prompts on a single endpoint).
This page is exhaustive — it’s a reference, not a marketing summary. Skip to the group that matches what you’re trying to build.
Core editing intelligence
Section-precision editing
Documents load as structured HTML where every paragraph, heading, table, row, and cell carries a unique identifier internally. The AI can target a specific section without touching anything else — “remove row 3 of the pricing table”, “bold the second paragraph in section 4”, “replace the governing-law clause” all work as natural-language instructions. This survives across multi-turn editing, so a 100-page document edited 30 times still has the same coherent structure at the end as it did at the start. Plain-text rewrites lose this completely. Best for: Long documents, contracts, SOPs, formal reports — anywhere targeted edits matter more than wholesale rewrites.Style preservation on edit and export
Tables (with borders, alternating row shading, merged cells), fonts, font sizes, colors, inline styling, lists, indentation, and headers/footers all survive both AI edits AND the round-trip to.docx / PDF / HTML. Most general-purpose AI tools strip or mangle these — SuperDocs is built around preserving them.
Best for: Branded templates, legal documents, formal correspondence — anywhere format fidelity is part of the deliverable.
Document intelligence
Search within documents
Search isn’t just full-text matching — the AI understands semantic meaning. Ask “find all indemnity clauses” across a 100-page contract and get back the exact sections, even when they use different wording. Works against the active document or across all attachments at once. Results come back with surrounding context so you can verify the match without opening the file. Best for: Contract review, compliance audits, extracting specific terms from long documents, finding all references to a concept across multiple files.Summarize sections on demand
Ask “summarize the force majeure section” or “give me a 2-line summary of the payment terms” and the AI extracts just that section, then summarizes it at the level of detail you want. Pair with semantic search — “find and summarize all limitation-of-liability clauses” works as a single instruction. No need to read line-by-line. Best for: Contract reviews, due diligence, quick understanding of specific parts of long documents, briefing materials.Summarize entire documents at any length
Get a concise summary of an entire document at a length you specify. “Executive summary in three sentences”, “one-page overview”, “detailed bullet-point summary by section” — all work. The AI reads the full document and distills it efficiently. Summaries cover key sections, critical terms, and overall structure. Works on documents up to ~100 pages without context-window pressure. Best for: Quick understanding of newly-received documents, briefing executives, preparing for negotiations, extracting key terms from regulatory filings.Cross-document reference and synthesis
Upload multiple documents into a session and the AI references any of them while editing the active document. “Port the indemnity clause from the attached template into this contract”, “compare these two NDAs and align the payment terms”, “find clauses in the attached precedent that handle this case better than what’s currently in the draft” — all work as natural-language instructions. The AI searches across all attachments, surfaces matches, and adapts content to the current context. Best for: Standardizing language across contracts, porting terms from templates, comparing versions, drafting new docs based on prior precedents, building from clause libraries.Rich editing & formatting
Full rich-text formatting toolbar
Beyond basic editing — the AI applies bold, italic, underline, strikethrough, text color (10-color palette), text highlight, custom font selection, six preset font sizes (12px–30px), configurable line spacing (single, 1.15, 1.5, double), and text alignment (left, center, right, justify). All preserved on round-trip exports. The format painter lets you copy a style from one selection and apply it to another with a single click — useful for matching styling across a long document. Best for: Visually distinctive documents, applying brand style guides, adapting documents for different audiences, preserving template styling on edit.Heading and list hierarchies
Apply heading styles (H1, H2, H3) with hierarchy preserved on export. Create bulleted and numbered lists with automatic numbering. Nest lists up to 8 levels deep. Convert between bullet and numbered formats by natural-language instruction. Add blockquotes for citations or emphasis. Horizontal rules to divide sections. The AI respects these structures — “convert the first three paragraphs to a bulleted list” works without reformatting surrounding content. Best for: SOPs, structured documentation, proposals, outlines, academic papers, anywhere hierarchy matters.Table editing with cell-level control
Insert and delete tables and rows. Merge or split cells. Apply alternating row shading. Set border styles. All cell-level operations work via natural language: “merge the first three columns of row 2”, “add a row after row 4 with these values”, “split the merged cell in A1”. Tables survive round-trip to.docx and PDF with formatting intact. Useful for pricing tables, comparison matrices, and any data-heavy document.
Best for: Documents with pricing or data — proposals, contracts with pricing schedules, technical specifications, comparison charts.
Headers, footers, and hyperlinks
Add headers and footers that appear on every page (or specific pages) — page numbers, document titles, company names, dates. Edit existing headers/footers directly. All survives PDF and.docx export. Hyperlinks work the same way: “make this text link to our website” or “remove the broken link in section 4”. Both internal document references and external URLs supported.
Best for: Formal business documents, branded templates, compliance documents requiring page numbering, documents with cross-references or call-to-action links.
Knowledge & attachments
Multimodal vision on attachments
Attach images (PNG, JPG, GIF, WebP), screenshots, scanned forms, diagrams, and charts — the AI interprets them visually while editing the active document. Transcribe a screenshot into structured text, extract numbers from a chart, reference a diagram while drafting documentation, identify entities in a scanned form. Best for: Workflows that mix text documents with image references — design docs, compliance docs that reference scanned forms, technical writing that references architecture diagrams.Build a knowledge base from attachments
Upload your organization’s template library, style guides, past contracts, or SOP documents as attachments. The AI references them automatically when editing the active document — “make this sound like our standard tone”, “follow the format of our usual NDAs”, “use the indemnity language from our gold-standard contract”. Builds an institutional memory that shapes AI behavior. Attachments are scoped per session, so a knowledge-base session can be reused as a starting point and copied for each new document. Best for: B2B teams with house styles or templates, legal teams with clause libraries, marketing teams with brand voice, compliance teams with corporate policies.Semantic search across attachments
Every attachment is semantically indexed when uploaded — the AI knows what’s in it, not just the words it contains. Ask “find the data processing clause from the attached regulations” and the search returns the matching section even when the attached doc uses entirely different wording. Useful for large attachments (50+ pages) where keyword search would miss relevant sections, and for reference documents you query repeatedly across sessions. Best for: Reference docs (regulations, industry standards, policy manuals), large attachments queried multiple times, building searchable knowledge over time.Conversation & robustness
Persistent conversation context
Every message in a session carries the full conversation history. Ask “make that more formal”, then “add three paragraphs of detail to what you just wrote”, then “change the tone back to friendly” — the AI remembers all prior context, edits, and stated preferences. History persists across server restarts and redeploys. Reload a session weeks later and the AI still has the full context (the document, the attachments, every turn of the conversation, every change made). Best for: Iterative editing workflows, multi-turn refinements, long-running document projects, anything where the user comes back to continue work later.Automatic error recovery with fallbacks
When an edit fails on the first approach (e.g., the section the AI tried to find didn’t exist with those exact keywords), it automatically tries broader strategies — semantic search instead of keyword match, alternative phrasings of the user intent, looking in adjacent sections. Retries happen without user intervention. If all approaches genuinely fail, the AI explains what it tried and asks for clarification with specifics rather than a generic error. Dramatically reduces “I didn’t understand your request” loops. Best for: Complex documents with varied terminology, ambiguous instructions, long documents where sections may have been edited since last viewed.Human control & approval
Human-in-the-loop approval
For sensitive edits (legal contracts, financial filings, anything user-facing), setapproval_mode='ask_every_time' on chat_async and the agent surfaces each proposed change as a structured diff (chunk-level before/after HTML + an explanation) for the user to approve, deny, or send back with feedback. Approved changes apply atomically; denied changes leave the document untouched. State persists across server restarts so multi-step approvals survive autoscaling.
Best for: Multi-stakeholder workflows, regulated industries, anywhere the cost of a bad edit landing without review exceeds the cost of a confirmation click.
Compact response mode for long editing sessions
On documents larger than ~20 pages, the defaultchat response includes the entire updated HTML on every turn — that’s ~130K tokens for a 100-page styled doc. Set response_mode='compact' and the response includes only chunk_diffs (per-section before/after for sections that actually changed) — typically 1-3 chunks, ~500-2,000 tokens.
For a 5-turn editing session on a 100-page doc, compact mode reduces total response context from ~650K tokens to ~3K tokens. To read sections in compact mode, just ask in natural language — “show me the force majeure clause” returns the content in the chat reply text.
Best for: AI agents editing documents larger than ~20 pages where context window pressure matters.
Real-time progress via SSE
Subscribe to/v1/chat/{session_id}/stream to receive intermediate progress events while the agent works — intermediate (status updates), proposed_change (HITL diffs), document_sync (chunk-ID sync after upload), final (completed response), usage (operation count + tokens), error. Auto-reconnect on drop.
Best for: Frontends that show live progress, status bars, or partial results before the full answer is ready.
Multi-format I/O
Multi-format input and output
| Input formats | Output formats |
|---|---|
.docx, PDF, HTML, Markdown, RTF, plain text | .docx, PDF, HTML |
.docx, Markdown → PDF), or accepting user uploads in whatever format they have.
Pre-signed URL upload and download
For files larger than ~100KB, the agent gets a 5-minute pre-signed PUT URL plus a ready-to-run curl example. The agent shells the file directly to cloud storage — bytes never pass through the agent’s context window. Same pattern in reverse for downloads (15-minute GET URL). For a 100-page styled.docx, this saves the agent ~70K tokens per upload (vs base64 inline). Five turns of editing on the same document drops from ~700K tokens of context overhead to ~3K tokens with the matching response_mode='compact' on chat.
Best for: Production AI agents working with real-world document sizes (multi-page contracts, manuals, regulatory filings). Max file size 100 MB.
Developer & integration
MCP server — 21 tools, every major client
The same capabilities are exposed as a Model Context Protocol server athttps://api.superdocs.app/mcp (Streamable HTTP transport). Compatible clients render the 21 tools natively:
| Client | Status |
|---|---|
| Claude Code | Tools + Prompts ✓ |
| Claude Desktop | Tools + Prompts ✓ |
| Cursor | Tools + Prompts ✓ |
| VS Code (GitHub Copilot) | Tools + Prompts ✓ |
| Zed, Continue, Amazon Q CLI, and others | Tools + Prompts ✓ |
| Windsurf, Cline | Tools only |
/superdocs:edit_styled_docx, /superdocs:convert_format, etc. in clients that render MCP prompts as slash commands). One MCP server entry in your client config; both tools and prompts come together.
Best for: AI-coding-tool users who want SuperDocs available in their editor without writing API integration code.
Three authentication paths
| Method | Best for |
|---|---|
| Web app login | The web app at use.superdocs.app (auto-issued via Google Sign-In or email/password) |
User API key (sk_…) | Individual developers, MCP integrations, scripts |
Organization API key (lce_…) | B2B integrations with shared usage limits |
REST API works with any programming language
One REST API (22 endpoints) works with any language that can make HTTP requests. No NPM packages to keep current, no language-specific SDKs to maintain, no framework dependencies to upgrade. Integrate with a few lines of code in whatever language your backend already speaks. Full OpenAPI specification published — generate your own typed client withopenapi-generator, Stainless, or any other codegen tool of your choice if you want type hints in your editor.
Best for: Polyglot engineering teams, backend services across multiple languages, avoiding SDK lock-in, simple integrations that don’t warrant a full SDK.
Real-time usage tracking and transparency
Every API response includes usage data — operation count consumed, tokens used. The SSE stream emits ausage event after each operation. Your dashboard shows remaining operations in your current tier and resets on your billing cycle. Promo allowances deplete before paid-tier operations so you always burn the cheapest credits first. Full transparency at every step — no hidden costs, no monthly surprises, no need to call sales for usage data.
Best for: Cost-conscious integrations, monitoring spend, forecasting overage, teams on a budget, self-service deployments at scale.
Scale & operations
Sessions and persistence
Every conversation is asession_id — a string the caller chooses. The full document state, conversation history, attachments, pending HITL changes, and AI working memory persist across calls and across server restarts. Reload an old session days later and the AI still has the full context.
Best for: Long-running document workflows, async editing where the user comes back hours later, multi-turn editing where state carries between turns.
Async jobs with HITL state
Long-running edits and HITL workflows return ajob_id; the client polls or subscribes to SSE updates. State persists in a database, so any backend instance can pick up a job mid-flight (autoscaling, restarts, redeploys all safe). Approved changes resume automatically.
Best for: Any workflow that takes more than 30 seconds or needs human approval mid-flight.
Per-organization feature flags
B2B deployments can toggle specific features on or off per organization. Offer one platform integration to all customers but let Enterprise org A use a custom branding skin while Startup org B sticks with the default. Different orgs can have different rate limits, feature sets, or experimental rollouts — all from the same codebase, all controlled by API. Useful for staged feature rollouts, customer-specific customization, and enterprise tier differentiation. Best for: Multi-tenant B2B platforms serving different customer tiers, gradual feature rollouts to specific orgs, enterprise customization without per-customer deployments.Promo codes and credit allowances
Issue promo codes that grant temporary operation allowances (“LAUNCH50” = 50 ops valid for 30 days, max 200 redemptions). Users redeem in Settings. Promo operations deplete before paid-tier operations so users get the most out of their allowance. Every redemption is tracked and auditable. Useful for go-to-market campaigns, partner enablement, customer pilots, and time-limited free trial extensions. Best for: Product launches, partner programs, customer pilots, trials, conferences, hackathons.Multi-language editing
Natural-language instructions and document content both work across many languages — production users have edited documents in English, Spanish, French, Hebrew, Korean, Mandarin, and others (16+ languages confirmed in real usage so far). Write your prompt in one language, edit a document in another, get the AI’s reply in whichever language you wrote the request. Multilingual documents (e.g., bilingual contracts) handled correctly. Tone and formality conventions adapted per language. Best for: International teams, multilingual document workflows, organizations serving non-English markets, contract translation and adaptation, cross-border legal work.Build vertical AI on SuperDocs
Combine attachments (your domain knowledge), sessions (long-running workflows), and chat instructions to build domain-specific applications on top of SuperDocs. Contract AI: attach your standard clause library + draft instructions, get an AI that writes contracts in your house style. Compliance AI: attach your regulations + policy templates, get an AI that audits documents for compliance gaps. Marketing AI: attach your brand voice guides + past collateral, get an AI that produces on-brand content. Same platform, different domain — all configurable per session or per organization. Best for: Vertical SaaS platforms, agencies serving specific industries, organizations with strong domain languages, anyone building specialized document workflows.On the roadmap
Time Travel & Version History (Coming Soon)
Restore documents to any prior state and browse a timeline of every edit. Useful for “oops, undo that” moments, compliance audits, and exploring alternate editing paths without losing the current version. Audit trail of who edited what, when. The state-snapshot infrastructure already runs internally on every chat turn; the user-facing API and UI to browse and restore versions ships next. Best for: Multi-stakeholder workflows, regulatory compliance audits, exploring alternate edits without losing the current version, recovering from mistakes.What ships when
A live timeline of major capabilities and when they shipped. Older capabilities don’t get less reliable over time — once shipped, they stay covered by the regression suite and the production monitoring stack.| Date | Capability | Why it matters |
|---|---|---|
| 2026-04-25 | Expanded features documentation (this page) — 30+ capabilities grouped by use case | Developers and AI agents form a complete mental model of what SuperDocs can do |
| 2026-04-25 | MCP server unified — 21 tools + 4 user-invocable workflow prompts on a single /mcp endpoint | Single MCP config entry covers both; discoverable slash commands for Cursor/Claude Code/Claude Desktop users |
| 2026-04-25 | Pre-signed URL upload/download flow (request_upload_url, process_uploaded_document, request_download_url) | Bytes no longer pass through agent context window; viable for real-world file sizes |
| 2026-04-25 | Compact response mode (response_mode='compact' + chunk_diffs) | ~140× token reduction for editing sessions on large documents |
| 2026-04-25 | Capability-forward MCP tool descriptions across all 21 tools | Agents form correct mental models of when to use SuperDocs vs build from scratch |
| 2026-04-23 | MCP HTTP transport reliability fix | Eliminates 5-minute hang clients (Claude Code, Cursor, Bun) saw on first connect |
| 2026-04-22 | OAuth Protected Resource Metadata (RFC 9728) for MCP | Cursor 3.x / Claude Code 2.x / mcp-remote can now connect without 60s metadata-probe timeout |
| 2026-04-19 | Editing latency improvements for large documents | 4m55s → ~10s for typical edit operations on large documents |
| 2026-04-18 | Editing precision improvements for nuanced instructions | Eliminates over-broad edits when the user’s instruction was narrowly-scoped |
| Earlier 2026 | Async jobs + HITL durable state, SSE streaming, multimodal vision, multi-format export, MCP server, promo codes, billing | Foundation |
For schemas, parameters, and code examples, see the API Reference and the MCP Tools Reference. For workflow guides, see the Guides section.

