Skip to main content

Features

SuperDocs is a universal AI document platform — one API for editing, drafting, searching, summarizing, generating visual content (images, diagrams, drawings, equations), and exporting styled documents (.docx, PDF, HTML, Markdown, plain text). Every capability below is live in production at https://api.superdocs.app and exposed identically over the REST API and the MCP server at https://api.superdocs.app/mcp (25 tools + 4 user-invocable workflow prompts on a single endpoint). This page is exhaustive — it’s a reference, not a marketing summary. Skip to the group that matches what you’re trying to build.

Core editing intelligence

Section-precision editing

Documents load as structured HTML where every paragraph, heading, table, row, and cell carries a unique identifier internally. The AI can target a specific section without touching anything else — “remove row 3 of the pricing table”, “bold the second paragraph in section 4”, “replace the governing-law clause” all work as natural-language instructions. This survives across multi-turn editing, so a 100-page document edited 30 times still has the same coherent structure at the end as it did at the start. Plain-text rewrites lose this completely. Best for: Long documents, contracts, SOPs, formal reports — anywhere targeted edits matter more than wholesale rewrites.

Style preservation on edit and export

Tables (with borders, alternating row shading, merged cells), fonts, font sizes, colors, inline styling, lists, indentation, and headers/footers all survive both AI edits AND the round-trip to .docx / PDF / HTML. Most general-purpose AI tools strip or mangle these — SuperDocs is built around preserving them. Best for: Branded templates, legal documents, formal correspondence — anywhere format fidelity is part of the deliverable.

Document intelligence

Search within documents

Search isn’t just full-text matching — the AI understands semantic meaning. Ask “find all indemnity clauses” across a 100-page contract and get back the exact sections, even when they use different wording. Works against the active document or across all attachments at once. Results come back with surrounding context so you can verify the match without opening the file. Best for: Contract review, compliance audits, extracting specific terms from long documents, finding all references to a concept across multiple files.

Summarize sections on demand

Ask “summarize the force majeure section” or “give me a 2-line summary of the payment terms” and the AI extracts just that section, then summarizes it at the level of detail you want. Pair with semantic search — “find and summarize all limitation-of-liability clauses” works as a single instruction. No need to read line-by-line. Best for: Contract reviews, due diligence, quick understanding of specific parts of long documents, briefing materials.

Summarize entire documents at any length

Get a concise summary of an entire document at a length you specify. “Executive summary in three sentences”, “one-page overview”, “detailed bullet-point summary by section” — all work. The AI reads the full document and distills it efficiently. Summaries cover key sections, critical terms, and overall structure. Works on documents up to ~100 pages without context-window pressure. Best for: Quick understanding of newly-received documents, briefing executives, preparing for negotiations, extracting key terms from regulatory filings.

Cross-document reference and synthesis

Upload multiple documents into a session and the AI references any of them while editing the active document. “Port the indemnity clause from the attached template into this contract”, “compare these two NDAs and align the payment terms”, “find clauses in the attached precedent that handle this case better than what’s currently in the draft” — all work as natural-language instructions. The AI searches across all attachments, surfaces matches, and adapts content to the current context. Best for: Standardizing language across contracts, porting terms from templates, comparing versions, drafting new docs based on prior precedents, building from clause libraries.

Rich editing & formatting

Full rich-text formatting toolbar

Beyond basic editing — the AI applies bold, italic, underline, strikethrough, text color (10-color palette), text highlight, custom font selection, six preset font sizes (12px–30px), configurable line spacing (single, 1.15, 1.5, double), and text alignment (left, center, right, justify). All preserved on round-trip exports. The format painter lets you copy a style from one selection and apply it to another with a single click — useful for matching styling across a long document. Best for: Visually distinctive documents, applying brand style guides, adapting documents for different audiences, preserving template styling on edit.

Heading and list hierarchies

Apply heading styles (H1, H2, H3) with hierarchy preserved on export. Create bulleted and numbered lists with automatic numbering. Nest lists up to 8 levels deep. Convert between bullet and numbered formats by natural-language instruction. Add blockquotes for citations or emphasis. Horizontal rules to divide sections. The AI respects these structures — “convert the first three paragraphs to a bulleted list” works without reformatting surrounding content. Best for: SOPs, structured documentation, proposals, outlines, academic papers, anywhere hierarchy matters.

Table editing with cell-level control

Insert and delete tables and rows. Merge or split cells. Apply alternating row shading. Set border styles. All cell-level operations work via natural language: “merge the first three columns of row 2”, “add a row after row 4 with these values”, “split the merged cell in A1”. Tables survive round-trip to .docx and PDF with formatting intact. Useful for pricing tables, comparison matrices, and any data-heavy document. Best for: Documents with pricing or data — proposals, contracts with pricing schedules, technical specifications, comparison charts. Add headers and footers that appear on every page (or specific pages) — page numbers, document titles, company names, dates. Edit existing headers/footers directly. All survives PDF and .docx export. Hyperlinks work the same way: “make this text link to our website” or “remove the broken link in section 4”. Both internal document references and external URLs supported. Best for: Formal business documents, branded templates, compliance documents requiring page numbering, documents with cross-references or call-to-action links.

Visual content & media

Generate, edit, and embed images, diagrams, drawings, and equations directly inside your document — all by chatting. Every operation rides the same chat interface; no separate tools or modes to learn.

Generate images from a description

Describe the image you want and the AI creates it and inserts it into the document. Photorealistic shots, illustrations, hero images, logos, infographics with text — all by natural-language instruction. The AI picks the right generation model automatically: a photorealistic image generator by default, and a text-rendering-optimized model for prompts that need legible text on the image (logos, infographics, slide hero shots, anything where letterforms matter more than photographic realism). Generated images live in the document like any other photo — they can be edited, replaced, or removed afterward by chat. Best for: Marketing pages, proposals, sales decks, blog posts, product specs, anything where finding stock imagery is friction.

Edit any image with natural language

Click any image in the editor and tell the AI what to change. Supported edits include brightness, darkness, contrast, recolor (e.g., “warm sepia”, “cooler blues”), circular mask with transparent corners, tight crop to the subject, semi-transparent fade, and full background removal — plus anything else you’d describe in plain English to a photo editor. The AI generates new bytes and swaps them in. Subject identity (faces, geometry, branding) is preserved across the edit. Each edit takes about 20 seconds. You can iterate (“a bit warmer”, “now crop tighter”) and use Reset to Original to revert to the original bytes at any time. Best for: Marketing assets that need to match a campaign, reports and contracts where logos need a quick cleanup, design docs where reference shots need treatment, anywhere a small photo edit would otherwise mean leaving the document.

Replace an image with an attachment

Attach a fresh image to chat (the new logo, the updated headshot, the corrected diagram screenshot) and ask the AI to swap it in for an existing image in the document. Pure URL swap — no AI generation cost, no waiting. Works for one-off updates (“replace the cover photo with this”) and for recurring patterns where users supply a fresh asset on every iteration of the document. Best for: Branded templates that need per-customer assets, document workflows where users supply their own imagery, content updates after a rebrand.

Auto-rendered diagrams (Mermaid)

Describe a flowchart, sequence diagram, org chart, ER diagram, mindmap, timeline, or Gantt and the AI emits the diagram code; the editor live-renders it as a clean SVG. Re-editing is instant and free of regeneration cost — say “add a Police Verification step after Background Check” and the AI rewrites the diagram in place rather than regenerating the whole image. Diagrams export correctly in PDF and .docx (rendered to image at export time). The full Mermaid syntax catalog is supported — flowchart, sequence, class, state, ER, user journey, gantt, pie, mindmap, timeline, xychart. Best for: Technical documentation, process docs, runbooks, architecture overviews, training materials — anywhere a diagram beats a paragraph.

Hand-drawn sketches with a built-in canvas

Open a freehand drawing canvas inside the editor — sketch quickly with shapes, arrows, and freehand strokes, save, and the drawing lands in the document as an image. Click it later, hit Redraw, and the canvas reopens with your original strokes so you can tweak them. The same drawing can also be edited as an image via natural language (“turn this rough sketch into a realistic logo”) — and the original strokes are still kept on the image, so you can redraw from scratch even after AI editing. Best for: Whiteboarding inside a doc, quick architecture sketches, annotating a screenshot, turning a hand-sketched logo into something polished without leaving the editor.

Math equations (LaTeX / KaTeX)

Insert math equations using familiar LaTeX syntax — inline ($x^2$) or block ($$x^2 + y^2 = z^2$$). The editor live-renders them with KaTeX. Equations survive .docx and PDF export. The toolbar has an equation button that prompts for LaTeX so you don’t have to remember the delimiters; in chat you can also just say “add the quadratic formula here” and the AI will insert the right LaTeX in place. Best for: Academic papers, technical reports, finance documents with formulas, engineering specs, anywhere math notation needs to be written correctly the first time.

Auto-generated table of contents

Drop a table-of-contents block into the document and it auto-populates from your H1/H2/H3 headings. Add or rename a heading later — the TOC updates itself. Click a TOC entry to jump to that section. Survives export to PDF and .docx (rendered as a list of section titles). Best for: Long documents, manuals, books, regulatory filings — anywhere 10+ pages where readers need to navigate.

Knowledge & attachments

Multimodal vision on attachments

Attach images (PNG, JPG, GIF, WebP), screenshots, scanned forms, diagrams, and charts — the AI interprets them visually while editing the active document. Transcribe a screenshot into structured text, extract numbers from a chart, reference a diagram while drafting documentation, identify entities in a scanned form. Best for: Workflows that mix text documents with image references — design docs, compliance docs that reference scanned forms, technical writing that references architecture diagrams.

Build a knowledge base from attachments

Upload your organization’s template library, style guides, past contracts, or SOP documents as attachments. The AI references them automatically when editing the active document — “make this sound like our standard tone”, “follow the format of our usual NDAs”, “use the indemnity language from our gold-standard contract”. Builds an institutional memory that shapes AI behavior. Attachments are scoped per session, so a knowledge-base session can be reused as a starting point and copied for each new document. Best for: B2B teams with house styles or templates, legal teams with clause libraries, marketing teams with brand voice, compliance teams with corporate policies.

Semantic search across attachments

Every attachment is semantically indexed when uploaded — the AI knows what’s in it, not just the words it contains. Ask “find the data processing clause from the attached regulations” and the search returns the matching section even when the attached doc uses entirely different wording. Useful for large attachments (50+ pages) where keyword search would miss relevant sections, and for reference documents you query repeatedly across sessions. Best for: Reference docs (regulations, industry standards, policy manuals), large attachments queried multiple times, building searchable knowledge over time.

Conversation & robustness

Multiple open documents in one session

A session can hold several open documents at once — like editor tabs. The AI sees the full roster, edits whichever document your request targets (explicit document_id, named in the message, or the focused one by default), reads or searches across all of them, moves content between them, and can open a brand-new document on request (“put the summary in a new document”). Uploads choose their behavior via open_mode (replace / new_focused / background), and three endpoints manage the roster (list / focus / close) — also exposed as MCP tools. See the Multi-Document Sessions guide. Best for: invoice + rate card workflows, contract + amendment pairs, splitting AI output into a separate deliverable, any flow where one conversation spans several files.

Persistent conversation context

Every message in a session carries the full conversation history. Ask “make that more formal”, then “add three paragraphs of detail to what you just wrote”, then “change the tone back to friendly” — the AI remembers all prior context, edits, and stated preferences. History persists across server restarts and redeploys. Reload a session weeks later and the AI still has the full context (the document, the attachments, every turn of the conversation, every change made). Best for: Iterative editing workflows, multi-turn refinements, long-running document projects, anything where the user comes back to continue work later.

Automatic error recovery with fallbacks

When an edit fails on the first approach (e.g., the section the AI tried to find didn’t exist with those exact keywords), it automatically tries broader strategies — semantic search instead of keyword match, alternative phrasings of the user intent, looking in adjacent sections. Retries happen without user intervention. If all approaches genuinely fail, the AI explains what it tried and asks for clarification with specifics rather than a generic error. Dramatically reduces “I didn’t understand your request” loops. Best for: Complex documents with varied terminology, ambiguous instructions, long documents where sections may have been edited since last viewed.

Human control & approval

Human-in-the-loop approval

For sensitive edits (legal contracts, financial filings, anything user-facing), set approval_mode='ask_every_time' on chat_async and the agent surfaces each proposed change as a structured diff (chunk-level before/after HTML + an explanation) for the user to approve, deny, or send back with feedback. Approved changes apply atomically; denied changes leave the document untouched. State persists across server restarts so multi-step approvals survive autoscaling. Best for: Multi-stakeholder workflows, regulated industries, anywhere the cost of a bad edit landing without review exceeds the cost of a confirmation click.

Compact response mode for long editing sessions

On documents larger than ~20 pages, the default chat response includes the entire updated HTML on every turn — that’s ~130K tokens for a 100-page styled doc. Set response_mode='compact' and the response includes only chunk_diffs (per-section before/after for sections that actually changed) — typically 1-3 chunks, ~500-2,000 tokens. For a 5-turn editing session on a 100-page doc, compact mode reduces total response context from ~650K tokens to ~3K tokens. To read sections in compact mode, just ask in natural language — “show me the force majeure clause” returns the content in the chat reply text. Best for: AI agents editing documents larger than ~20 pages where context window pressure matters.

Real-time progress via SSE

Subscribe to /v1/chat/{session_id}/stream to receive intermediate progress events while the agent works — intermediate (status updates), proposed_change (HITL diff for an isolated change), proposed_change_batch (HITL diff for a multi-change turn delivered as one event), document_sync (chunk-ID sync after upload), final (completed response), usage (operation count + tokens), ui_pointer (highlight an export/download UI affordance the user asked about), error. Auto-reconnect on drop. Best for: Frontends that show live progress, status bars, or partial results before the full answer is ready.

Multi-format I/O

Multi-format input and output

Input formatsOutput formats
.docx, PDF, HTML, Markdown, RTF, plain text.docx, PDF, HTML, Markdown, plain text
Same parsing pipeline regardless of input — once a document is loaded into a session, every editing tool works identically. Export from any session in any of the five output formats, with per-export customisation for paper size, orientation, margins, filename, image embedding, and optional PDF watermarks. Documents above 20 MB use a pre-signed upload pattern to bypass standard request body limits; documents above 100 MB are rendered asynchronously and delivered via email. Best for: Format conversion workflows (HTML → styled .docx, Markdown → PDF, .docx → Markdown), accepting user uploads in whatever format they have, or producing print-ready PDFs with custom paper sizes and watermarks.

High-fidelity PDF import

PDFs upload with their full content preserved — page text, annotations (sticky notes, highlights, FreeText callouts, reviewer comments), and embedded images. Every annotation is visible to the AI with the original author and date attached, so editing instructions like “apply the corrections from the auditor’s notes” or “reflect the reviewer’s comments in section 3” work natively. Embedded images (logos, photos, diagrams) become full first-class images you can replace, edit, or reference visually — they don’t get silently dropped to text-only. Best for: Compliance and audit workflows where PDFs carry reviewer annotations; brand and marketing collateral PDFs where embedded photos must round-trip; any enterprise GRC review cycle that uses annotated PDFs.

Word track-changes import

Word documents uploaded with Track Changes enabled keep all of it on import — insertions, deletions, comments — each tagged with the reviewer’s name and date. The AI sees who edited what and can act on it: “apply all the proofreader’s insertions”, “reject the deletions in section 2”, “address the comments from the legal reviewer”. Editorial workflows that depend on review markup (academic supervisor edits, legal redlines, teacher-corrected student work) translate straight from Word into SuperDocs. Best for: Editorial review cycles, redline workflows, teacher-student document review, multi-stakeholder review where preserving the trail of who-changed-what matters.

Reliable multi-file attachments

Attach more than one file in the same message and reference each by filename — “compare the clauses in contract_v2.pdf against contract_v1.pdf or “merge outline.txt into proposal.docx. Filename-based lookup is exact (case-insensitive, partial match) so the AI doesn’t have to guess from content similarity which file you mean. Best for: Document comparison and merge workflows, multi-file review, proposal assembly from supporting attachments.

Document length awareness

Ask “how many pages is this?” and get a real answer — exact for paginated formats (.docx, PDF) and a clearly-labelled approximation (“approximately N pages”) for unpaginated formats (HTML, Markdown, plain text). The page count is available as part of attachment metadata so any agent can read it without an extra round trip. Best for: Length-sensitive workflows (briefs with strict page limits, summaries that target a length, budgeting AI work against document size).

Pre-signed URL upload and download

For files larger than ~100KB, the agent gets a 5-minute pre-signed PUT URL plus a ready-to-run curl example. The agent shells the file directly to cloud storage — bytes never pass through the agent’s context window. Same pattern in reverse for downloads (15-minute GET URL). For a 100-page styled .docx, this saves the agent ~70K tokens per upload (vs base64 inline). Five turns of editing on the same document drops from ~700K tokens of context overhead to ~3K tokens with the matching response_mode='compact' on chat. Best for: Production AI agents working with real-world document sizes (multi-page contracts, manuals, regulatory filings). Max file size 100 MB.

Developer & integration

MCP server — 25 tools, every major client

The same capabilities are exposed as a Model Context Protocol server at https://api.superdocs.app/mcp (Streamable HTTP transport). Compatible clients render the 25 tools natively:
ClientStatus
Claude CodeTools + Prompts ✓
Claude DesktopTools + Prompts ✓
CursorTools + Prompts ✓
VS Code (GitHub Copilot)Tools + Prompts ✓
Zed, Continue, Amazon Q CLI, and othersTools + Prompts ✓
Windsurf, ClineTools only
The same MCP server also exposes 4 user-invocable workflow templates (surfaced as /superdocs:edit_styled_docx, /superdocs:convert_format, etc. in clients that render MCP prompts as slash commands). One MCP server entry in your client config; both tools and prompts come together. Best for: AI-coding-tool users who want SuperDocs available in their editor without writing API integration code.

Three authentication paths

MethodBest for
Web app loginThe web app at use.superdocs.app (auto-issued via Google Sign-In or email/password)
User API key (sk_…)Individual developers, MCP integrations, scripts
Organization API key (lce_…)B2B integrations with shared usage limits
All three reach the same 25 MCP tools and 27 REST endpoints with identical scoping. Users own and manage their own keys; orgs manage theirs separately.

REST API works with any programming language

One REST API (24 endpoints) works with any language that can make HTTP requests. No NPM packages to keep current, no language-specific SDKs to maintain, no framework dependencies to upgrade. Integrate with a few lines of code in whatever language your backend already speaks. Full OpenAPI specification published — generate your own typed client with openapi-generator, Stainless, or any other codegen tool of your choice if you want type hints in your editor. Best for: Polyglot engineering teams, backend services across multiple languages, avoiding SDK lock-in, simple integrations that don’t warrant a full SDK.

Real-time usage tracking and transparency

Every API response includes usage data — operation count consumed, tokens used. The SSE stream emits a usage event after each operation. Your dashboard shows remaining operations in your current tier and resets on your billing cycle. Promo allowances deplete before paid-tier operations so you always burn the cheapest credits first. Full transparency at every step — no hidden costs, no monthly surprises, no need to call sales for usage data. Best for: Cost-conscious integrations, monitoring spend, forecasting overage, teams on a budget, self-service deployments at scale.

Scale & operations

Sessions and persistence

Every conversation is a session_id — a string the caller chooses. The full document state, conversation history, attachments, pending HITL changes, and AI working memory persist across calls and across server restarts. Reload an old session days later and the AI still has the full context. Best for: Long-running document workflows, async editing where the user comes back hours later, multi-turn editing where state carries between turns.

Async jobs with HITL state

Long-running edits and HITL workflows return a job_id; the client polls or subscribes to SSE updates. State persists in a database, so any backend instance can pick up a job mid-flight (autoscaling, restarts, redeploys all safe). Approved changes resume automatically. Best for: Any workflow that takes more than 30 seconds or needs human approval mid-flight.

Per-organization feature flags

B2B deployments can toggle specific features on or off per organization. Offer one platform integration to all customers but let Enterprise org A use a custom branding skin while Startup org B sticks with the default. Different orgs can have different rate limits, feature sets, or experimental rollouts — all from the same codebase, all controlled by API. Useful for staged feature rollouts, customer-specific customization, and enterprise tier differentiation. Best for: Multi-tenant B2B platforms serving different customer tiers, gradual feature rollouts to specific orgs, enterprise customization without per-customer deployments.

Promo codes and credit allowances

Issue promo codes that grant temporary operation allowances (“LAUNCH50” = 50 ops valid for 30 days, max 200 redemptions). Users redeem in Settings. Promo operations deplete before paid-tier operations so users get the most out of their allowance. Every redemption is tracked and auditable. Useful for go-to-market campaigns, partner enablement, customer pilots, and time-limited free trial extensions. Best for: Product launches, partner programs, customer pilots, trials, conferences, hackathons.

Multi-language editing

Natural-language instructions and document content both work across many languages — production users have edited documents in English, Spanish, French, Hebrew, Korean, Mandarin, and others (16+ languages confirmed in real usage so far). Write your prompt in one language, edit a document in another, get the AI’s reply in whichever language you wrote the request. Multilingual documents (e.g., bilingual contracts) handled correctly. Tone and formality conventions adapted per language. Best for: International teams, multilingual document workflows, organizations serving non-English markets, contract translation and adaptation, cross-border legal work.

Build vertical AI on SuperDocs

Combine attachments (your domain knowledge), sessions (long-running workflows), and chat instructions to build domain-specific applications on top of SuperDocs. Contract AI: attach your standard clause library + draft instructions, get an AI that writes contracts in your house style. Compliance AI: attach your regulations + policy templates, get an AI that audits documents for compliance gaps. Marketing AI: attach your brand voice guides + past collateral, get an AI that produces on-brand content. Same platform, different domain — all configurable per session or per organization. Best for: Vertical SaaS platforms, agencies serving specific industries, organizations with strong domain languages, anyone building specialized document workflows.

Per-message chat revert

Rewind a chat session to before any specific user message — both the conversation and the document state snap back together. The reverted message text is returned so your UI can pre-fill a compose box for editing. The original branch is kept on the server for audit; the new branch becomes the active timeline. Available three ways:
  • Web app: a ”↶ Revert” button under every user message bubble, with a confirmation dialog before the rewind.
  • REST: POST /v1/sessions/{session_id}/revert with {turn_index} — see Sessions.
  • MCP: the revert_session_to_message tool, callable by Claude Code, Cursor, Claude Desktop, and any MCP-compatible agent.
If a chat job is currently running for the session, revert returns 409 — wait for it to settle. Available on chats started after the feature shipped (older sessions don’t carry the marker the rewind needs). Best for: “oops, undo that AI change” moments, exploring an alternate prompt without starting from scratch, recovering from a misunderstood instruction without losing the rest of your work.

On the roadmap

Branch switcher

Today the original branch is preserved on the server side after a revert, but it isn’t visible in the UI — your active timeline is always the new branch. A future update will surface a switcher so you can navigate between alternate conversation paths the way ChatGPT and Claude do.

What ships when

A live timeline of major capabilities and when they shipped. Older capabilities don’t get less reliable over time — once shipped, they stay covered by the regression suite and the production monitoring stack.
DateCapabilityWhy it matters
2026-06-11Multi-document sessions — a session now holds multiple open documents: open_mode on upload (replace/new_focused/background) with the document roster in the response, three new endpoints + MCP tools to list/focus/close session documents (MCP total: 25), optional document_id targeting on /v1/chat + /v1/chat/async, cross-document reads/search/edits in one turn, and AI-created new documents from chat. SSE additionsdocuments_changed (multi-doc auto-apply badge signal) and model_fallback (automatic Pro-tier failover notice) events; every event now carries sequence and reconnects can pass last_sequence to replay only newer events. Thinking depth on every tierthinking_depth (fast/balanced/deep) is now honored on all four model tiers. Page geometry — document payloads include a nullable page_setup (size/orientation/margins detected from DOCX/PDF). Billing clarity — a multi-section edit in one request bills exactly one operation; denied review-mode changes are never billed.One conversation can orchestrate several files the way a person with multiple tabs would; integrations get deterministic per-document targeting and review grouping; streams resume cleanly after disconnects; reasoning control is uniform across tiers; true-size page rendering becomes possible; operation counting matches intuition.
2026-05-27Export pipeline overhaul — multi-format export (docx default, pdf, html, markdown, txt) with per-request options (paper size, orientation, margins, filename, image embedding, PDF watermark + opacity), structured X-Export-Warnings header for non-fatal issues, three-tier flow for large documents (direct POST up to 20 MB → pre-signed upload for 20-100 MB via upload_id → email fallback for >100 MB via POST /v1/documents/export/email-request), structured 413 detail body with error_code and suggested_action, and full round-trip of Word track-changes on .docx export.Integrators get five output formats from one endpoint, predictable size handling without a 413 surprise, and clear error contracts. The default-format flip from doc to docx is a soft compatibility change — callers that omit format and switch on Content-Type now receive Open XML.
2026-05-26Batched HITL approvals — multi-change turns deliver every proposed change in a single proposed_change_batch SSE event instead of fanning out N individual proposed_change events. Single-change turns continue to use proposed_change.Editors approving large changesets render one approval card instead of N. Wire load scales with turns, not changes. Register both proposed_change and proposed_change_batch listeners — clients that only listen to the original event will silently miss every batched turn.
2026-05-2530-minute wall-clock cap on chat turns (up from 10 minutes), parallel-edit speedup, latency improvements for very large documents.Hundred-section edits in a single turn now finish inside the cap instead of timing out. Same 504 contract; the failure copy points users to splitting the work across smaller turns.
2026-05-17Context-aware AI inserts — when you ask the AI to insert an image, diagram, or new section without naming a position (“insert an image of a circle”, “add a flowchart showing X”, “add a new section here”), it now lands right after the section the user’s cursor was in instead of always appending at end-of-document. Send the new optional cursor_context field on /v1/chat and /v1/chat/async to enable this; omit it and prior end-of-doc behaviour is preserved. The web app at use.superdocs.app sends it automatically. Vague-verb clarification — broad ambiguous requests like “improve this document” or “make this better” now trigger a clarifying question with concrete options instead of an unexpected wide rewrite.Integrators get a single-param opt-in to a noticeable UX improvement; existing integrations behave exactly as before if they omit the field. Web-app users see new content land near where they were looking. Vague broad-edit prompts no longer produce surprise rewrites — the AI checks intent first.
2026-05-16Multilingual + multi-cohort reliability — the AI reasons about user intent across all languages (not enumerated phrase lists); RTL emails render correctly in Hebrew / Arabic / Persian / Urdu / etc.; per-cohort response tone (fresh vs returning vs committed users); style-attribute search (“change all the blue writing”) matches inline HTML; ui_pointer SSE event for export-affordance hintsNon-English and RTL-language users get parity quality; first-touch users see encouragement-shaped responses, returning users see direct-and-literal responses; visual-style queries resolve correctly across any HTML structure; safety-filtered AI responses degrade gracefully instead of crashing
2026-05-06High-fidelity ingest — PDF annotations + embedded images + Word track-changes preserved on import; reliable multi-file attachment-by-filename; document length awareness (“how many pages?”)PDFs, annotated review documents, and Word redlines no longer get silently flattened on upload; multi-file workflows resolve files by name; users get real page-count answers
2026-05-06Web app first-paint upload affordances — empty editor renders a drop zone for drag-drop, click-to-pick file picker, and a paste-as-content prompt when long content is pasted into a blank editor; “Starter Templates” relabel on the templates surface so it’s clearly distinct from per-session attachmentsFirst-time visitors find the upload path in seconds without asking; users who paste content as their first action get explicit “load as document content vs. treat as a chat instruction” choice
2026-04-30Visual content & media — image generate / edit / replace by chat, auto-rendered Mermaid diagrams, in-editor drawing canvas, KaTeX equations, auto table of contentsDocuments are no longer text-only; images, diagrams, drawings, and equations live in the same chat-driven editing flow as the rest of the content
2026-04-29Per-message chat revert (web app + REST + MCP revert_session_to_message)“Oops, undo that” rewinds chat and document together; original branch preserved server-side
2026-04-25Expanded features documentation (this page) — 30+ capabilities grouped by use caseDevelopers and AI agents form a complete mental model of what SuperDocs can do
2026-04-25MCP server unified — 21 tools + 4 user-invocable workflow prompts on a single /mcp endpointSingle MCP config entry covers both; discoverable slash commands for Cursor/Claude Code/Claude Desktop users
2026-04-25Pre-signed URL upload/download flow (request_upload_url, process_uploaded_document, request_download_url)Bytes no longer pass through agent context window; viable for real-world file sizes
2026-04-25Compact response mode (response_mode='compact' + chunk_diffs)~140× token reduction for editing sessions on large documents
2026-04-25Capability-forward MCP tool descriptions across all 21 toolsAgents form correct mental models of when to use SuperDocs vs build from scratch
2026-04-23MCP HTTP transport reliability fixEliminates 5-minute hang clients (Claude Code, Cursor, Bun) saw on first connect
2026-04-22OAuth Protected Resource Metadata (RFC 9728) for MCPCursor 3.x / Claude Code 2.x / mcp-remote can now connect without 60s metadata-probe timeout
2026-04-19Editing latency improvements for large documents4m55s → ~10s for typical edit operations on large documents
2026-04-18Editing precision improvements for nuanced instructionsEliminates over-broad edits when the user’s instruction was narrowly-scoped
Earlier 2026Async jobs + HITL durable state, SSE streaming, multimodal vision, multi-format export, MCP server, promo codes, billingFoundation

For schemas, parameters, and code examples, see the API Reference and the MCP Tools Reference. For workflow guides, see the Guides section.