Compare commits

...

70 Commits

Author SHA1 Message Date
Enrico Ros 55bde68a4d Roll AIX 2026-05-05 04:17:39 -07:00
Enrico Ros 26ae3545a7 BlockOpUpstreamResume: full recovery. Fixes #1088 2026-05-05 04:14:00 -07:00
Enrico Ros 0001f7392b AIX: Gemini Interactions: relax 2026-05-05 03:32:13 -07:00
Enrico Ros d7e83e578b BlockOpUpstreamResume: remove cancel - unused? 2026-05-05 03:25:27 -07:00
Enrico Ros 901d93b5f0 LLMs/AIX: Gemini: Agentic models: recovery mode (non-streaming). Fixes #1088 2026-05-05 03:23:35 -07:00
Enrico Ros 6858b0b94a KB: LLMs: Gemini Interactions takeaways 2026-05-05 03:12:13 -07:00
Enrico Ros 9d88bf9b82 LLMs/AIX: Gemini: Agentic models: add option to disable visualizations. Fixes #1095 2026-05-05 03:06:30 -07:00
Enrico Ros 1bf1b744b9 llm-registry-sync: export models 2026-05-05 01:33:06 -07:00
Enrico Ros ee2d7114c7 llm-registry-sync: record/sync pub date
the next update won't have the spam (pub date not used for change detection)
2026-05-05 01:33:06 -07:00
Enrico Ros 3b1b54b3a3 KB: +llm-editorial 2026-05-05 01:33:06 -07:00
Enrico Ros 524029a882 Models List: show new (<30 days) models 2026-05-05 00:54:34 -07:00
Enrico Ros 69161d29a7 LLMs: Gemini typo 2026-05-05 00:29:13 -07:00
Enrico Ros 8a542c1af4 LLMs: display the pubDate 2026-05-05 00:16:01 -07:00
Enrico Ros fe16970624 LLMs: PubDates 2026-05-05 00:01:06 -07:00
Enrico Ros e21abdef45 LLMs: pubDate support 2026-05-04 13:48:29 -07:00
Enrico Ros acdbb2fbaf AIX: ContentReassembler: verbose post termination issues 2026-05-03 22:32:58 -07:00
Enrico Ros 14be134ef2 AIX: xAI: always request reasoning summaries. Fixes #1091 2026-05-03 14:40:48 -07:00
Enrico Ros f56f6eb3cd CLAUDE.md: branching hints 2026-05-03 14:27:59 -07:00
Enrico Ros d3a7b75d1c LLMs: Grok 4.3 support 2026-05-03 14:27:59 -07:00
Enrico Ros d5d7cf5a21 ContentFragments: do not display for empty 'ma' summaries or text. #1091 2026-05-03 14:27:59 -07:00
Enrico Ros 13b928d68b AIX: OpenAI Responses: non-fatal error if sealed
OpenAI sometimes emits a trailing 'error' event (e.g. rate-limit/TPM
advisory) AFTER 'response.completed'. The blanket error handler treated
it as fatal, calling setDialectTerminatingIssue which:
  - injected a red [Openai Issue] fragment into the finished message
  - overrode the prior setDialectEnded('done-dialect') with 'issue-dialect'
  - flipped the AIX outcome to 'failed', turning the Beam ray red

Track a #responseSealed flag set by the three terminal events
(response.completed/failed/incomplete) and short-circuit trailing 'error'
events with a server-log only - keeping mid-stream errors fatal as before.
2026-05-03 13:15:43 -07:00
Enrico Ros 31948a62f9 ChatDrawer: scroll active chat into view when filters clear 2026-05-03 13:15:43 -07:00
Enrico Ros bf2d00a936 AppChat: filter by open beams 2026-05-03 13:15:43 -07:00
Enrico Ros ed4edd7c0b AIX: Anthropic: disable sticky execution continuity from simple prior container presence. #1087 2026-04-28 19:25:08 -07:00
Enrico Ros e5de61d682 AIX: Anthropic: do not turn on code execution just for dynamic filtering. #1087 2026-04-28 18:24:00 -07:00
Enrico Ros ac69c62020 Sort LLM Categories by names 2026-04-28 17:49:00 -07:00
Enrico Ros a43b6a2cf5 AIX: Part xAI vs. OpenAI encrypted reasoning 2026-04-28 09:22:31 -07:00
Enrico Ros e8e3366fe2 AIX: XAI: enable entrypted reasoning (if disabled breaks subsequent turns) 2026-04-27 18:05:28 -07:00
Enrico Ros d813810a28 Anthropic: downgraded a throw to warn 2026-04-27 16:57:43 -07:00
Enrico Ros c400aa7543 Chat: hide expires while pending in BlockOpUpstreamResume 2026-04-27 01:13:13 -07:00
Enrico Ros 9fc0b39730 AIX: Transmit token stop errors, if provided 2026-04-24 17:08:40 -07:00
Enrico Ros 194bfe23a1 AIX: OpenAI: mark the need for roundtrip of hosted tool pairs 2026-04-24 17:08:40 -07:00
Enrico Ros 35110480ef Beam: Fix ghost columns. Fixes #1073 2026-04-24 16:04:29 -07:00
Enrico Ros 959595e33a Merge: smaller copy update 2026-04-24 16:04:29 -07:00
Enrico Ros a960424dfb Merge: copy update. Fixes #1083 2026-04-24 15:56:13 -07:00
Enrico Ros 0df6c7d08b Merge: copy. Fixes #1083 2026-04-24 15:48:56 -07:00
Enrico Ros 65c841e7a7 Roll AIX 2026-04-24 15:23:30 -07:00
Enrico Ros b21b8cc982 AIX: Anthropic: show refusal details, if present, as inline text 2026-04-24 15:20:10 -07:00
Enrico Ros aa2c4f06b7 AI Inspector: compress intermediate large string fields 2026-04-24 15:19:35 -07:00
Enrico Ros b8d7b4ec10 AIX: OpenAI: fix svs on !ma for for NS 2026-04-24 15:19:35 -07:00
Enrico Ros c48520255a AIX: OpenAI: fix tool reparsing for NS 2026-04-24 15:19:34 -07:00
Enrico Ros 0790da989d Don't truncate the Beam Title on Edit. Fix #1085 part 1. 2026-04-24 15:19:34 -07:00
Enrico Ros 506d24d2fd AIX: OpenAI Response: fix reparse of tools 2026-04-24 15:19:34 -07:00
Enrico Ros 1348dbf493 AIX: update _upstreams 2026-04-24 15:19:33 -07:00
Enrico Ros ce677f3cd9 LLMs: OpenAI: GPT 5.5 2026-04-24 15:19:33 -07:00
Enrico Ros 39203d78e3 LLMs: OpenAI: hide lots of older models, so by default the lastest are shown 2026-04-24 15:19:33 -07:00
Enrico Ros 2ef7daf369 LLMs: Gemini: hide 3.0 Pro (silently remapped to 3.1 by Gemini). Fixes #1082 2026-04-24 15:19:33 -07:00
Enrico Ros cff3d90613 AIX: DeepSeek V4: fix function calling 2026-04-24 05:45:53 -07:00
Enrico Ros 9f89243d7f AIX: DeepSeek V4: fix swalling of tool parts 2026-04-24 05:45:53 -07:00
Enrico Ros 784ee9a4da AIX: DeepSeek V4: wires and parser NS 2026-04-24 05:45:53 -07:00
Enrico Ros 678e6b8ba1 AIX: Gemini Interactions: terminate on error 2026-04-24 05:45:53 -07:00
Enrico Ros 30e301c496 BlockOpUpstreamResume: Stop/Cancel 2026-04-24 03:59:50 -07:00
Enrico Ros b22904f6bb AIX: Gemini Interactions: Cancel + Delete
Also see: googleapis/python-genai#1971
2026-04-24 03:40:34 -07:00
Enrico Ros 3f0de7ddca CH: Auto-Title beam chats when done. Fixes #1078 2026-04-24 03:32:04 -07:00
Enrico Ros 9a6f0f9202 AppChat: never re-open an opened beam. Fixes #1079 2026-04-24 03:24:56 -07:00
Enrico Ros 4f0bae5657 AppChat: do not re-beam or regenerate while beam is open. Fixes #1079 2026-04-24 03:19:17 -07:00
Enrico Ros 2101f06195 Roll AIX 2026-04-24 03:04:09 -07:00
Enrico Ros 6d54b5594c Autotitle: Use natural capitalization. Fixes #1077 2026-04-24 02:48:28 -07:00
Enrico Ros 36b8e5b1df Chat: show Stop/Cancel on streaming upstream runs 2026-04-24 02:47:17 -07:00
Enrico Ros 8252d671c7 LLMs: Gemini: Deep Research models support images 2026-04-24 02:47:13 -07:00
Enrico Ros 30d97c94aa LLMs: DeepSeek: bits (note: vision is still not available) 2026-04-24 02:47:13 -07:00
Enrico Ros 82654a00d4 AIX: Streaming (hinting) review and Gemini Interactions API fix 2026-04-24 02:47:09 -07:00
Enrico Ros 9595f14ddc LLM: DeepSeek V4 (flash, pro) + thinking/reasoning_effort fix 2026-04-23 23:59:09 -07:00
Enrico Ros 8c496074b2 LLMs: DeepSeek: add V4 models 2026-04-23 23:30:41 -07:00
Enrico Ros 4d097d7136 LLMs: DeepSeek: add V4 support infra 2026-04-23 23:30:34 -07:00
Enrico Ros 178619d275 AI Settings: match the defaults description. Fixes #1076 2026-04-23 23:29:20 -07:00
Enrico Ros 59c8b2538d Merge pull request #1074 from tredondo/patch-1
chore: fix Zod 4 type-strictness issue (#1072)
2026-04-23 22:57:01 -07:00
Enrico Ros 443b72c52a AIX: OpenAI Responses: fix Zod 4 build error in tools .catch()
Bare `return;` produced `void`, which Zod 4 rejects for a
`.catch()` on `z.array(...).optional()` expecting `Tool[] | undefined`.
Return `undefined` explicitly, matching the existing pattern at
line 1204.

Fixes #1072
2026-04-23 22:56:19 -07:00
Enrico Ros ae13abef45 Nobody can tell @fredliubojin what to resume 2026-04-23 22:22:16 -07:00
Ted Robertson 83ae02ef9b chore: fix Zod 4 type-strictness issue (#1072) 2026-04-23 19:51:49 -07:00
85 changed files with 1941 additions and 434 deletions
+10
View File
@@ -32,6 +32,16 @@ The `gh` command is available to interact with GitHub from the terminal, but **N
- **Always use `git mv` instead of `mv`** when renaming or moving files - preserves git history tracking
- **NEVER run `git stash`** - it causes work loss
**Branch contents:**
- `main` is the open-source build: local-first, BYO-keys, full AIX and provider coverage
- `dev` extends `main` with the hosted/cloud layer: auth, Zync sync, Cloud Fabric, Stripe, multi-tenant, admin pages, it's the way to go for users, the best user experience of any multi-model chat application
- Cloud/auth/sync code stays on `dev`; non-cloud improvements (UX, AIX, model support, bug fixes) can land on either branch
**Branch workflow:**
- `dev` is rebased on top of `main` (never merged) - `main` changes flow into `dev` on the next rebase, no manual forward-port needed
- Never `git merge` between the two branches - breaks the linear topology
- Backporting `dev` -> `main` is a re-implementation, never a cherry-pick - keep `main`-side edits minimal/additive so the existing `dev` version lands cleanly on rebase; split into small commits when natural
### Core Directory Structure
You are started from the root of the repository (i.e. where the git folder is or scripts should be run from).
+7
View File
@@ -17,6 +17,13 @@ Architecture and system documentation is available in the `/kb/` knowledge base,
#### CSF - Client-Side Fetch
- **[CSF.md](systems/client-side-fetch.md)** - Direct browser-to-API communication for LLM requests
#### LLM - Language Model Metadata
- **[LLM-editorial-control.md](modules/LLM-editorial-pubdate.md)** - Where we have editorial control over per-model metadata vs dynamic discovery; `pubDate` field semantics, propagation chain, resolution rules, per-vendor matrix
- **[LLM-models-catalog-pipeline.md](modules/LLM-models-catalog-pipeline.md)** - Forward-looking pipeline: extraction script, snapshot artifact, website consumption, future schema extensions
#### LLM - Vendor APIs
- **[LLM-gemini-interactions.md](modules/LLM-gemini-interactions.md)** - Gemini Interactions API (Deep Research): endpoints, status taxonomy, two retrieval paths (SSE replay vs JSON GET), known failure modes (10-min cuts, zombies), UI surface
### Systems Documentation
#### Core Platform Systems
+106
View File
@@ -0,0 +1,106 @@
# LLM Editorial Control Surface
This document maps where Big-AGI has editorial control over per-model metadata (and therefore can guarantee fields like `pubDate`, curated `description`, `chatPrice`, `benchmark`, `parameterSpecs`, etc.) versus where it must rely on the vendor API's dynamic discovery (and therefore cannot guarantee them).
For the forward-looking pipeline (extraction script, snapshot, website consumption, future schema extensions), see [LLM-models-catalog-pipeline.md](LLM-models-catalog-pipeline.md).
## The `pubDate` field
`pubDate?: string` (validated as `/^\d{8}$/`, e.g. `'20250929'`) is **optional** in the wire schema and on `DLLM`. It was added to:
- `ModelDescription_schema` in `src/modules/llms/server/llm.server.types.ts` - the canonical wire type
- `OrtVendorLookupResult` in the same file - so OpenRouter inherits it via `llmOrt*Lookup`
- `DLLM` in `src/common/stores/llms/llms.types.ts` - the persisted client model
### Where `pubDate` is guaranteed (always emitted)
- **Editorial entries** in 12 hybrid/editorial vendors (282 models). Hand-curated, externally corroborated. Future entries in these arrays are expected to include `pubDate`.
- **Anthropic 0-day placeholder** (`llmsAntCreatePlaceholderModel`): when the API surfaces an Anthropic model not in the editorial list, the placeholder uses the API's `created_at` ISO date, falling back to today via `formatPubDate()`.
- **Gemini 0-day fallback** (`geminiModelToModelDescription`): when the API returns a Gemini model not in `_knownGeminiModels`, the converter falls back to today via `formatPubDate()` (Gemini API does not expose a creation timestamp).
### Where `pubDate` is omitted (optional)
- **Symlink entries** (`KnownLink`) - inherit the target's `pubDate` via the merge logic in `fromManualMapping`.
- **Unknown variants resolved through `super`/`fallback`** in `fromManualMapping` for non-Anthropic/non-Gemini vendors - the field is left undefined rather than fabricated.
- **Dynamic-only vendors** (OpenRouter, TogetherAI, Novita, ChutesAI, FireworksAI, TLUS, Azure, LM Studio, LocalAI, FastAPI, ArceeAI, LLMAPI) - no editorial knob; pubDate flows in only when the underlying lookup or upstream API populates it.
The rationale: today's date is a defensible 0-day proxy only when we know we're seeing a brand-new model the vendor just announced (Anthropic and Gemini's "discovery via official model list" paths). For arbitrary dynamic vendors, fabricating today would mark old/well-known models as new - misleading. Better to omit.
### Propagation chain
- `fromManualMapping()` in `src/modules/llms/server/models.mappings.ts` - copies the field for OAI-style vendors when present
- `geminiModelToModelDescription()` in `src/modules/llms/server/gemini/gemini.models.ts` - copies for Gemini, falls back to today for unknowns
- `llmsAntCreatePlaceholderModel()` in `src/modules/llms/server/anthropic/anthropic.models.ts` - emits from API `created_at` (or today)
- `_mergeLookup()` in `src/modules/llms/server/openai/models/openrouter.models.ts` - merges for OpenRouter cross-vendor inheritance
- `_createDLLMFromModelDescription()` in `src/modules/llms/llm.client.ts` - copies onto the persisted DLLM when present
- `formatPubDate()` helper in `src/modules/llms/server/models.mappings.ts` - shared `'YYYYMMDD'` formatter for the 0-day-fillable paths
### Semantics
`pubDate` is the **earliest public availability** of the model - the date on which the vendor first made this specific model usable by external users via any channel (consumer app, web, console, API, partner, open-weights upload).
It is **not**:
- The date Big-AGI added the entry to its catalog (Ollama uses `added` for that)
- The training-data cutoff (proposed but not implemented; see `src/common/stores/llms/llms.types.next.ts:217`)
- The date the model snapshot was built (suffixes like `-1212` may refer to build dates, but `pubDate` tracks public availability)
### Resolution rules (when sources conflict)
1. **Date-suffixed model IDs**: when the suffix matches a documented announcement, the suffix is canonical (vendor convention). xAI, OpenAI, and Mistral all use suffixes that closely track release dates.
2. **Anthropic exception**: Anthropic's date suffixes are typically the **snapshot/training-cutoff date, not the public release date**. For example, `claude-3-7-sonnet-20250219` was released on 2025-02-24, `claude-opus-4-20250514` was released 2025-05-22, and `claude-haiku-4-5-20251001` was released 2025-10-15. Always corroborate against Anthropic's blog/press for the actual release date. Only `claude-sonnet-4-5-20250929` and `claude-opus-4-1-20250805` have suffixes that match.
3. **Closed beta -> public beta -> GA**: use the first date *external* users could access the specific variant.
4. **Family-headline IDs and dated snapshots** (e.g., `claude-opus-4-1` and `claude-opus-4-1-20250805`): typically share a release date.
5. **Hosted on a third party** (Groq hosting Llama, OpenPipe mirroring others, OpenRouter aggregating): use the *underlying* model's original release date by its creator, not when the host added it.
6. **Symlinks** (entries with `symLink:`): inherit the target's date.
7. **Partial dates** (only month known): use the 1st of the month and tag as MEDIUM confidence in the editor's note.
## Editorial control matrix
Three categories:
- **Editorial** - the vendor file contains hand-curated entries; we control descriptions, pricing, benchmarks, interfaces, parameter specs, and `pubDate`.
- **Hybrid** - the API returns the live model list, and editorial entries (keyed by id/idPrefix) merge over the API data via `fromManualMapping`. We control everything except *which models exist*.
- **Dynamic** - the API is the only source of model identity and metadata. Big-AGI cannot reliably populate `pubDate` here (no editorial knob).
| Vendor | Category | File | Array | Entries | `pubDate` populated |
|---|---|---|---|---|---|
| Anthropic | Hybrid | `anthropic/anthropic.models.ts` | `hardcodedAnthropicModels` | 12 | 12/12 HIGH |
| Gemini | Hybrid | `gemini/gemini.models.ts` | `_knownGeminiModels` | 33 | 33/33 HIGH |
| OpenAI | Hybrid | `openai/models/openai.models.ts` | `_knownOpenAIChatModels` | 96 | 95/96 HIGH/MED (`osb-120b` skipped, speculative) |
| xAI | Hybrid | `openai/models/xai.models.ts` | `_knownXAIChatModels` | 13 | 13/13 HIGH (pilot) |
| Mistral | Hybrid | `openai/models/mistral.models.ts` | `_knownMistralModelDetails` | 41 | 41/41 (40 HIGH, 1 MED for legacy `mistral-medium`) |
| Moonshot (Kimi) | Hybrid | `openai/models/moonshot.models.ts` | `_knownMoonshotModels` | 13 | 13/13 (10 HIGH, 3 MED for v1 base models) |
| Perplexity | Editorial | `openai/models/perplexity.models.ts` | `_knownPerplexityChatModels` | 4 | 4/4 HIGH |
| MiniMax | Editorial | `openai/models/minimax.models.ts` | `_knownMiniMaxModels` | 10 | 10/10 HIGH |
| DeepSeek | Hybrid | `openai/models/deepseek.models.ts` | `_knownDeepseekChatModels` | 4 | 4/4 HIGH |
| Groq | Hybrid (host) | `openai/models/groq.models.ts` | `_knownGroqModels` | 11 | 11/11 HIGH (underlying-model date) |
| Z.AI / GLM | Hybrid | `openai/models/zai.models.ts` | `_knownZAIModels` | 17 | 16/17 (`glm-5-code` UNCONFIRMED) |
| OpenPipe | Editorial (mirror) | `openai/models/openpipe.models.ts` | `_knownOpenPipeChatModels` | 30 | 30/30 HIGH (all upstream-mirror, no OpenPipe originals) |
| Bedrock | Reuses Anthropic | `bedrock/bedrock.models.ts` | -> `hardcodedAnthropicModels` | (12) | inherited |
| Ollama | Editorial (catalog) | `ollama/ollama.models.ts` | `OLLAMA_BASE_MODELS` | 209 | **deferred** - see notes |
| Arcee AI | Dynamic | `openai/models/arceeai.models.ts` | `_arceeKnownModels` | 0 | n/a (empty) |
| LLMAPI | Dynamic | `openai/models/llmapi.models.ts` | `_llmapiKnownModels` | 0 | n/a (empty) |
| Alibaba | Dynamic | `openai/models/alibaba.models.ts` | `_knownAlibabaChatModels` | 0 | n/a (empty) |
| OpenRouter | Dynamic + delegated lookup | `openai/models/openrouter.models.ts` | (parser) | -- | inherited via `llmOrt*Lookup` |
| TogetherAI | Dynamic | `openai/models/together.models.ts` | (parser) | -- | no |
| FireworksAI | Dynamic | `openai/models/fireworksai.models.ts` | (parser) | -- | no |
| Novita | Dynamic | `openai/models/novita.models.ts` | (parser) | -- | no |
| ChutesAI | Dynamic | `openai/models/chutesai.models.ts` | (parser) | -- | no |
| TLUS | Dynamic | `openai/models/tlusapi.models.ts` | (parser) | -- | no |
| Azure | Dynamic | `openai/models/azure.models.ts` | (parser) | -- | no |
| LM Studio | Dynamic | `openai/models/lmstudio.models.ts` | (parser) | -- | no |
| LocalAI | Dynamic | `openai/models/localai.models.ts` | (parser) | -- | no |
| FastAPI | Dynamic | `openai/models/fastapi.models.ts` | (parser) | -- | no |
**Totals**: 284 editorial entries across 12 vendors, of which **282** have corroborated `pubDate` and **2** are intentional gaps (`osb-120b` speculative, `glm-5-code` not yet announced). All 12 vendor files type-check clean.
### Notes
- **Hybrid** vendors are still effectively editorial for the models we know about: when an API id matches a hardcoded `idPrefix` (or `id`), `fromManualMapping` injects all the editorial fields. Unknown ids fall through to a default-shaped placeholder where `pubDate` is undefined.
- **OpenRouter** delegates back to Anthropic / Gemini / OpenAI editorial lookups via `llmOrtAntLookup_ThinkingVariants`, `llmOrtGemLookup`, `llmOrtOaiLookup`. `pubDate` flows through these lookups, so OpenRouter-served Claude/Gemini/GPT models get `pubDate` automatically once the underlying editorial entry has it.
- **Bedrock** finds Anthropic editorial via `llmBedrockFindAnthropicModel` and strips unsupported interfaces - `pubDate` inherits from Anthropic.
- **Ollama** is deferred: 209 entries keyed by upstream model family (e.g. `qwen3.6`, `kimi-k2`, `glm-4.6`). Each entry's `pubDate` would need to be the upstream creator's release date (Meta, Alibaba, Moonshot, Z.AI, etc.). This is large-scale upstream research; better handled in a follow-up pass once cross-vendor `pubDate` data is consolidated and reusable.
- **Dynamic-only** vendors get nothing automatic. To add `pubDate` for them we'd have to seed editorial entries (which is what `fromManualMapping`'s mapping mechanism was built for); this is a per-vendor decision and out of scope for the initial rollout.
+88
View File
@@ -0,0 +1,88 @@
# Gemini Interactions API
The Interactions API powers Gemini's agent runs (Deep Research today, more agent types planned). This doc is the source of truth for protocol shape, failure modes, and the recovery model — code comments link here instead of repeating the rationale.
## References
- **GH [#1088](https://github.com/enricoros/big-AGI/issues/1088)** — Auto-resume for Deep Research; Recover button
- **GH [#1095](https://github.com/enricoros/big-AGI/issues/1095)** — Visualizations toggle (`agent_config.visualization`)
- **Google forum [143098](https://discuss.ai.google.dev/t/interactions-api-connection-breaks-at-the-10-minutes-mark/143098)** — 10-min SSE cut
- **Google forum [143099](https://discuss.ai.google.dev/t/streaming-resume-broken-on-interactions-api-deep-research-often-cannot-resume/143099)** — Streaming resume re-cuts
- **Upstream specs** — `_upstream/gemini.interactions.spec.md`, `gemini.interactions.guide.md`, `gemini.deep-research.guide.md`
## Endpoints
| Verb | Path | Purpose |
|--------|-------------------------------------------|-------------------------------------------------------------------|
| POST | `/v1beta/interactions` | Start a run. We always send `stream:true, background:true, store:true` |
| GET | `/v1beta/interactions/{id}?stream=true` | Reattach via SSE replay (full event sequence from start) |
| GET | `/v1beta/interactions/{id}` | Fetch the resource as JSON (one-shot) |
| POST | `/v1beta/interactions/{id}/cancel` | Stop a background run |
| DELETE | `/v1beta/interactions/{id}` | Remove the stored record (does NOT cancel an in-flight run) |
Retention: 1 day free, 55 days paid.
## Status taxonomy
| Status | Meaning | Handling |
|-------------------|-----------------------------------------------|-------------------------------------------------------|
| `in_progress` | Live run **or** zombie (see C) | Surface diagnostics; offer Resume/Recover/Stop |
| `completed` | Done with content in `outputs[]` | Emit fragments, `tokenStopReason='ok'` |
| `failed` | Server-side failure | Terminating issue |
| `cancelled` | We or another client cancelled | Close as `cg-issue` |
| `incomplete` | Stopped early (token limit) — partial outputs | Note + `tokenStopReason='out-of-tokens'` |
| `requires_action` | Not expected for Deep Research | Fail loudly so we notice |
## Two retrieval paths
| Path | Endpoint | Parser | Use case |
|-----------------------|-----------------------------------|-------------------------------------------|-----------------------------------|
| SSE replay | `GET ?stream=true` | `createGeminiInteractionsParserSSE` | Canonical resume; live deltas |
| JSON GET (recovery) | `GET` (no `stream`) | `createGeminiInteractionsParserNS` | Recover when SSE is broken |
Both replay from the start — `ContentReassembler` REPLACES content on reattach, so partial replay (`last_event_id`) is intentionally NOT used. The NS parser walks `outputs[]` (thoughts, text, images, audio) and emits the same particles the SSE parser would, in one batch.
## Failure modes
### A. 10-minute SSE cut (forum 143098)
The SSE connection gets cut at exactly 600 s, regardless of activity. The cut is malformed (JSON error array instead of clean SSE close) and we treat it as stream-closed-early. The run typically **continues** server-side and reaches `completed`. **Recover (JSON GET)** retrieves the full report.
### B. Streaming resume re-cuts (forum 143099)
A fresh SSE replay can re-cut at the same 10-minute boundary on long runs, so Resume alone never reaches `interaction.complete`. **Recover** is the fallback.
### C. Zombie interactions (#1088)
Resource sits in `status: in_progress` for **days** with `outputs: []` — the generator crashed but the status never transitioned. **Not recoverable** (no data was ever produced). The NS parser surfaces `created`, `updated`, output count, and a "stuck for over an hour" hint so the user can decide to delete and retry.
### D. Connection drop mid-run
Network blip; resource is fine. **Resume (SSE replay)** picks up cleanly.
## UI
`BlockOpUpstreamResume` renders up to three buttons:
| Button | Action | Shown when |
|----------|-----------------------------------|---------------------------------------------------------|
| Resume | SSE replay | `onResume` provided |
| Recover | JSON GET (one-shot) | `upstreamHandle.uht``_NS_RECOVER_UHTS` |
| Stop | Cancel + delete upstream resource | `onDelete` provided |
The Recover gate is an inline `uht === 'vnd.gem.interactions'` check in `BlockOpUpstreamResume.tsx` — extend when another vendor needs the same fallback. Stop is intentionally NOT gated by Resume/Recover busy state — it's the escape hatch for hung resumes.
## Visualization control (#1095)
Deep Research accepts `agent_config.visualization: 'auto' | 'off'`. Exposed as `llmVndGeminiAgentViz` (label "Visualizations"). Forwarded only when explicitly `'off'` so the upstream `'auto'` default stays untouched. Useful when merging multiple reports — image fragments break Beam fusion.
## Code map
| File | Role |
|--------------------------------------------------------------------------------------|-------------------------------------------------------|
| `aix/server/dispatch/wiretypes/gemini.interactions.wiretypes.ts` | Zod schemas (RequestBody, Interaction, StreamEvent) |
| `aix/server/dispatch/chatGenerate/adapters/gemini.interactionsCreate.ts` | POST body (input + agent_config) |
| `aix/server/dispatch/chatGenerate/parsers/gemini.interactions.parser.ts` | SSE parser + NS parser |
| `aix/server/dispatch/chatGenerate/chatGenerate.dispatch.ts` (`gemini` case) | Resume dispatch: SSE vs JSON branch |
| `apps/chat/components/message/BlockOpUpstreamResume.tsx` | Resume / Recover / Stop UI |
| `apps/chat/components/ChatMessageList.tsx` (`handleMessageUpstreamResume`) | Wires click handler to `aixReattachContent_DMessage_orThrow` |
+78
View File
@@ -0,0 +1,78 @@
# LLM Models Catalog Pipeline (forward-looking)
Status: **proposal / partially implemented**. Companion to [LLM-editorial-control.md](LLM-editorial-pubdate.md) which describes the durable reference (`pubDate` semantics, editorial-vs-dynamic matrix, propagation chain).
This document captures the forward-looking pipeline that turns Big-AGI's editorial model metadata into website value-add (plots, decision helpers, comparison tools at big-agi.com).
## Goal
Stand up a database/datastore that the website (`~/dev/website`) can query for plots, decision helpers, and comparison tools - without requiring the website to call our authenticated tRPC endpoints.
## Stages
### Stage 1: source of truth (in this repo) — DONE
Editorial files in `src/modules/llms/server/` remain the canonical source for:
- Identity: id, label, vendor
- Capabilities: `interfaces`, `parameterSpecs`, `contextWindow`, `maxCompletionTokens`
- Pricing: `chatPrice` (input / output / cache tiers)
- Benchmarks: `benchmark.cbaElo` (Chat Bot Arena ELO)
- Lifecycle: `pubDate`, `isLegacy`, `isPreview`, `hidden`, deprecation comments
Well-typed, version-controlled, reviewed - every model edit is a code change with diff history. 282 entries currently carry `pubDate` (see editorial-control matrix).
### Stage 2: extraction script — IN PROGRESS
A build-time script (e.g. `scripts/llms/export-models.ts`) that:
1. Loads every editorial vendor's model array.
2. Normalizes per-vendor shapes (array vs Record, `id` vs `idPrefix`, `KnownLink` symlinks) to a single row format.
3. Resolves symlinks (target's `pubDate` flows through).
4. Writes a single JSON snapshot: `data/models-catalog.json` (one row per model, with vendor + the editorial fields above).
Open question: do we want this committed (gives the website a stable artifact / public URL) or built on-demand in CI? **Recommend committed snapshot** under `data/` so consumers get a stable URL.
### Stage 3: enrichment — NOT STARTED
The exported snapshot gets enriched with data we don't currently track in editorial files:
- **Knowledge cutoff** (proposed in `llms.types.next.ts:217` but never implemented; should be added to `ModelDescription_schema` as a follow-up).
- **MMLU / HumanEval / SWE-bench / GPQA / MATH** scores (currently only `cbaElo`; richer benchmarks belong in a separate block).
- **Throughput / latency** numbers (per-vendor, possibly per-region).
- **Modalities matrix** (input image, input audio, input video, input PDF, output image, output audio).
- **Weights availability** (closed / open / restricted), license.
Sources for enrichment: HuggingFace cards, vendor docs, Artificial Analysis, LLM-Stats, official benchmarks. Some can be scraped on a cadence; some needs editorial review.
### Stage 4: website consumption — NOT STARTED
The website (`~/dev/website`) consumes the snapshot to render:
- **Timeline plot**: `pubDate` (x-axis) vs `cbaElo` (y-axis), grouped by vendor - shows the frontier and rate of progress.
- **Cost-per-quality plot**: `chatPrice.output` vs `cbaElo` - "best model per dollar".
- **Decision helpers**: filter by capability (`interfaces`), context window, pricing tier, vendor.
- **Comparison cards**: side-by-side specs.
- **Lifecycle alerts**: deprecation warnings for retiring models.
## Open questions
1. **Where does enrichment data live?** A separate `data/models-enrichment.json` (joined by id at build time) keeps editorial files clean but introduces a join surface. Alternative: extend `ModelDescription_schema` with optional enrichment fields and treat editorial files as the only source. Recommend the separate file approach - editorial files stay focused on vendor-API integration; enrichment evolves on a different cadence.
2. **How fresh does the website need to be?** If daily, build the snapshot in CI on push and publish to a static URL. If real-time, consume tRPC directly - more work but fewer freshness gaps.
3. **Do we expose `pubDate` and other editorial metadata via tRPC publicly, or only via the snapshot?** The current tRPC routes require auth; the website should consume the snapshot, not live tRPC.
4. **Schema versioning** - if `ModelDescription_schema` evolves, the snapshot consumers need to be tolerant. Include a `schemaVersion` field in the snapshot envelope.
## Future extensions to `ModelDescription_schema`
Beyond `pubDate`, the natural follow-ups (in priority order):
1. **`knowledgeCutoff?: string`** (`'YYYY-MM'` or `'YYYY-MM-DD'`) - already proposed in `llms.types.next.ts`. Useful for the timeline plot and for context-aware prompts.
2. **`deprecationDate?: string`** - currently exists informally as `deprecated?: string` on `_knownGeminiModels`; should be promoted to the schema.
3. **`license?: string`** - especially important for open-weights models (apache-2.0, mit, llama-community, custom).
4. **`weights?: 'closed' | 'open' | 'restricted'`** - quick filter for "can I run this myself?".
5. **`benchmarks?: { mmlu?: number, humaneval?: number, gpqa?: number, ... }`** - richer than the current `cbaElo`-only block.
6. **`modalities?: { in: string[], out: string[] }`** - more precise than `interfaces` for input/output capability matrices.
+6 -4
View File
@@ -583,9 +583,11 @@ export function AppChat() {
}, []);
useGlobalShortcuts('AppChat', React.useMemo(() => [
// focused conversation
{ key: 'z', ctrl: true, shift: true, disabled: isFocusedChatEmpty, action: handleMessageRegenerateLastInFocusedPane, description: 'Retry' },
{ key: 'b', ctrl: true, shift: true, disabled: isFocusedChatEmpty, action: handleMessageBeamLastInFocusedPane, description: 'Beam Edit' },
// focused conversation (excluded when Beam is open so the keystroke passes through to the browser)
...(beamOpenStoreInFocusedPane ? [] : [
{ key: 'z', ctrl: true, shift: true, disabled: isFocusedChatEmpty, action: handleMessageRegenerateLastInFocusedPane, description: 'Retry' },
{ key: 'b', ctrl: true, shift: true, disabled: isFocusedChatEmpty, action: handleMessageBeamLastInFocusedPane, description: 'Beam Edit' },
]),
{ key: 'o', ctrl: true, action: handleConversationsImportFormFilePicker },
{ key: 's', ctrl: true, action: () => handleFileSaveConversation(focusedPaneConversationId) },
{ key: 'n', ctrl: true, shift: true, action: () => handleConversationNewInFocusedPane(false, false) },
@@ -603,7 +605,7 @@ export function AppChat() {
{ key: 'p', ctrl: true, action: () => personaDropdownRef.current?.openListbox() /*, description: 'Open Persona Dropdown'*/ },
// focused conversation llm
{ key: 'o', ctrl: true, shift: true, action: handleOpenChatLlmOptions },
], [focusedPaneConversationId, handleConversationNewInFocusedPane, handleConversationReset, handleConversationsImportFormFilePicker, handleDeleteConversations, handleFileSaveConversation, handleMessageBeamLastInFocusedPane, handleMessageRegenerateLastInFocusedPane, handleMoveFocus, handleNavigateHistoryInFocusedPane, handleOpenChatLlmOptions, isFocusedChatEmpty]));
], [beamOpenStoreInFocusedPane, focusedPaneConversationId, handleConversationNewInFocusedPane, handleConversationReset, handleConversationsImportFormFilePicker, handleDeleteConversations, handleFileSaveConversation, handleMessageBeamLastInFocusedPane, handleMessageRegenerateLastInFocusedPane, handleMoveFocus, handleNavigateHistoryInFocusedPane, handleOpenChatLlmOptions, isFocusedChatEmpty]));
return <>
+52 -15
View File
@@ -6,6 +6,7 @@ import { Box, List } from '@mui/joy';
import type { SystemPurposeExample } from '../../../data';
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
import type { DiagramConfig } from '~/modules/aifn/digrams/DiagramsModal';
import { speakText } from '~/modules/speex/speex.client';
@@ -123,7 +124,16 @@ export function ChatMessageList(props: {
}
}, [conversationHandler, conversationId, onConversationExecuteHistory]);
const handleMessageUpstreamResume = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId) => {
// Resume in-flight tracking - lives at this level (NOT inside BlockOpUpstreamResume) so it
// survives any remount of the message bubble during a long-running stream (e.g. Deep Research).
// - `resumeInFlight` (state) drives the loading/Detach UI on BlockOpUpstreamResume via props.
// - `resumeAbortersRef` (ref) holds the AbortController so Detach can abort even after a remount.
// Map keyed by messageId so multiple messages could in principle resume concurrently.
const [resumeInFlight, setResumeInFlight] = React.useState<Record<DMessageId, AixReattachMode>>({});
const resumeAbortersRef = React.useRef<Map<DMessageId, AbortController>>(new Map());
const handleMessageUpstreamResume = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId, mode: AixReattachMode) => {
if (!conversationId || !conversationHandler) return;
if (!generator.upstreamHandle) throw new Error('No upstream handle on generator');
@@ -131,20 +141,36 @@ export function ChatMessageList(props: {
const llmId = generator.mgt === 'aix' ? generator.aix.mId : undefined;
if (!llmId) throw new Error('No model id on generator');
const controller = new AbortController();
resumeAbortersRef.current.set(messageId, controller);
setResumeInFlight(prev => ({ ...prev, [messageId]: mode }));
const { aixCreateChatGenerateContext, aixReattachContent_DMessage_orThrow } = await import('~/modules/aix/client/aix.client');
const result = await aixReattachContent_DMessage_orThrow(
llmId,
generator,
aixCreateChatGenerateContext('conversation', conversationId),
{ abortSignal: 'NON_ABORTABLE', throttleParallelThreads: 0 },
async (update, isDone) => {
conversationHandler.messageEdit(messageId, {
fragments: update.fragments,
generator: update.generator,
pendingIncomplete: update.pendingIncomplete,
}, isDone, isDone); // remove the pending state and updte only when done
},
);
try {
await aixReattachContent_DMessage_orThrow(
llmId,
generator,
aixCreateChatGenerateContext('conversation', conversationId),
mode,
{ abortSignal: controller.signal, throttleParallelThreads: 0 }, // Detach: aborting kills the local fetch; upstream run keeps going.
async (update, isDone) => {
conversationHandler.messageEdit(messageId, {
fragments: update.fragments,
generator: update.generator,
pendingIncomplete: update.pendingIncomplete,
}, isDone, isDone); // remove the pending state and update only when done
},
);
} finally {
// Clear local tracking only if this attempt is still the current one (avoid races on rapid retry)
if (resumeAbortersRef.current.get(messageId) === controller)
resumeAbortersRef.current.delete(messageId);
setResumeInFlight(prev => {
if (prev[messageId] !== mode) return prev;
const { [messageId]: _, ...rest } = prev;
return rest;
});
}
// Manual reattach is one-shot: on failure (e.g. upstream 404 from expired or already-consumed handle),
// drop the upstreamHandle so the Resume button doesn't keep luring the user into the same error.
@@ -156,6 +182,11 @@ export function ChatMessageList(props: {
// }, false /* messageComplete */, true /* touch */);
}, [conversationHandler, conversationId]);
const handleMessageUpstreamDetach = React.useCallback((messageId: DMessageId) => {
resumeAbortersRef.current.get(messageId)?.abort();
}, []);
const handleMessageUpstreamDelete = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId) => {
if (!conversationId || !conversationHandler) return;
if (!generator.upstreamHandle) throw new Error('No upstream handle on generator');
@@ -395,7 +426,11 @@ export function ChatMessageList(props: {
{filteredMessages.map((message, idx) => {
// Optimization: only memo complete components, or we'd be memoizing garbage
// Optimization: only memo complete components, or we'd be memoizing garbage (fragments
// change every chunk during streaming, so the equality check would always fail).
// CAVEAT: switching between memo and non-memo at the same position causes React to
// remount the subtree (different component types). Any state that must survive that
// boundary lives on this component (e.g. resumeInFlight, resumeAbortersRef).
const ChatMessageMemoOrNot = !message.pendingIncomplete ? ChatMessageMemo : ChatMessage;
return props.isMessageSelectionMode ? (
@@ -427,7 +462,9 @@ export function ChatMessageList(props: {
onMessageBranch={handleMessageBranch}
onMessageContinue={handleMessageContinue}
onMessageUpstreamResume={handleMessageUpstreamResume}
onMessageUpstreamDetach={handleMessageUpstreamDetach}
onMessageUpstreamDelete={handleMessageUpstreamDelete}
upstreamResumeMode={resumeInFlight[message.id]}
onMessageDelete={handleMessageDelete}
onMessageFragmentAppend={handleMessageAppendFragment}
onMessageFragmentDelete={handleMessageDeleteFragment}
@@ -33,7 +33,10 @@ const _styles = {
} as const,
'& nav > ol > li:first-of-type': {
overflow: 'hidden',
maxWidth: { xs: '110px', md: '140px' },
// allow the chat title to use available space, shrinking gracefully when the bar is narrow
// NOTE: already performed by virtue of the breadcrumb having agi-ellipsize on the crumbs
// flexShrink: 1,
// minWidth: '60px',
} as const,
} as const,
@@ -15,6 +15,7 @@ import { KeyStroke } from '~/common/components/KeyStroke';
import { OptimaBarControlMethods, OptimaBarDropdownMemo, OptimaDropdownItems } from '~/common/layout/optima/bar/OptimaBarDropdown';
import { findModelsServiceOrNull } from '~/common/stores/llms/store-llms';
import { isDeepEqual } from '~/common/util/hooks/useDeep';
import { sortLLMsByServiceLabel } from '~/common/stores/llms/components/llms.dropdown.utils';
import { optimaActions, optimaOpenModels } from '~/common/layout/optima/useOptima';
import { useAllLLMs } from '~/common/stores/llms/hooks/useAllLLMs';
import { useModelDomain } from '~/common/stores/llms/hooks/useModelDomain';
@@ -72,7 +73,10 @@ function LLMDropdown(props: {
return lcFilterString ? true : isLLMVisible(llm);
});
for (const llm of filteredLLMs) {
// sort by service label so vendor groups appear alphabetically (groups remain contiguous because sort is stable on equal keys)
const sortedLLMs = sortLLMsByServiceLabel(filteredLLMs);
for (const llm of sortedLLMs) {
// add separators when changing services
if (!prevServiceId || llm.sId !== prevServiceId) {
const vendor = findModelVendor(llm.vId);
@@ -16,6 +16,7 @@ import MoreVertIcon from '@mui/icons-material/MoreVert';
import StarOutlineRoundedIcon from '@mui/icons-material/StarOutlineRounded';
import type { DConversationId } from '~/common/stores/chat/chat.conversation';
import { ChatBeamIcon } from '~/common/components/icons/ChatBeamIcon';
import { CloseablePopup } from '~/common/components/CloseablePopup';
import { DFolder, useFolderStore } from '~/common/stores/folders/store-chat-folders';
import { DebouncedInputMemo } from '~/common/components/DebouncedInput';
@@ -89,6 +90,7 @@ function ChatDrawer(props: {
// external state
const {
clearFilters,
filterHasBeamOpen, toggleFilterHasBeamOpen,
filterHasDocFragments, toggleFilterHasDocFragments,
filterHasImageAssets, toggleFilterHasImageAssets,
filterHasStars, toggleFilterHasStars,
@@ -98,7 +100,7 @@ function ChatDrawer(props: {
} = useChatDrawerFilters();
const { activeFolder, allFolders, enableFolders, toggleEnableFolders } = useFolders(props.activeFolderId);
const { filteredChatsCount, filteredChatIDs, filteredChatsAreEmpty, filteredChatsBarBasis, filteredChatsIncludeActive, renderNavItems } = useChatDrawerRenderItems(
props.activeConversationId, props.chatPanesConversationIds, debouncedSearchQuery, activeFolder, allFolders, filterHasStars, filterHasImageAssets, filterHasDocFragments, filterIsArchived, navGrouping, searchSorting, showRelativeSize, searchDepth,
props.activeConversationId, props.chatPanesConversationIds, debouncedSearchQuery, activeFolder, allFolders, filterHasBeamOpen, filterHasStars, filterHasImageAssets, filterHasDocFragments, filterIsArchived, navGrouping, searchSorting, showRelativeSize, searchDepth,
);
const [uiComplexityMode, contentScaling] = useUIPreferencesStore(useShallow((state) => [state.complexityMode, state.contentScaling]));
const zenMode = uiComplexityMode === 'minimal';
@@ -240,6 +242,10 @@ function ChatDrawer(props: {
<ListItemDecorator>{filterHasDocFragments && <CheckRoundedIcon />}</ListItemDecorator>
Has Attachments <AttachFileRoundedIcon />
</MenuItem>
<MenuItem onClick={toggleFilterHasBeamOpen}>
<ListItemDecorator>{filterHasBeamOpen && <CheckRoundedIcon />}</ListItemDecorator>
Beam Open <ChatBeamIcon />
</MenuItem>
<ListDivider />
<ListItem>
@@ -288,8 +294,8 @@ function ChatDrawer(props: {
)}
</Dropdown>
), [
filterHasDocFragments, filterHasImageAssets, filterHasStars, isSearching, navGrouping, searchSorting, searchDepth, filterIsArchived, showPersonaIcons, showRelativeSize,
toggleFilterHasDocFragments, toggleFilterHasImageAssets, toggleFilterHasStars, toggleFilterIsArchived, toggleShowPersonaIcons, toggleShowRelativeSize,
filterHasBeamOpen, filterHasDocFragments, filterHasImageAssets, filterHasStars, isSearching, navGrouping, searchSorting, searchDepth, filterIsArchived, showPersonaIcons, showRelativeSize,
toggleFilterHasBeamOpen, toggleFilterHasDocFragments, toggleFilterHasImageAssets, toggleFilterHasStars, toggleFilterIsArchived, toggleShowPersonaIcons, toggleShowRelativeSize,
]);
const displayNavItems = React.useMemo(() => {
@@ -304,6 +310,18 @@ function ChatDrawer(props: {
return activeItem ? [...sliced, activeItem] : sliced;
}, [renderNavItems, renderLimit, props.activeConversationId]);
// when filters/search transition from active to inactive, the active chat may end up
// submerged below the fold of a much longer list - scroll it back into view
const chatsListRef = React.useRef<HTMLDivElement>(null);
const isFiltering = isSearching || filterHasBeamOpen || filterHasDocFragments || filterHasImageAssets || filterHasStars || filterIsArchived;
React.useLayoutEffect(() => {
if (isFiltering) return;
const activeEl = chatsListRef.current?.querySelector('[aria-current="true"]') as HTMLElement | null;
activeEl?.scrollIntoView({ block: 'nearest' });
}, [isFiltering]);
return <>
{/* Drawer Header */}
@@ -390,7 +408,7 @@ function ChatDrawer(props: {
</Box>
{/* Chat Titles List (shrink as half the rate as the Folders List) */}
<Box sx={{ flexGrow: 1, flexShrink: 1, flexBasis: '20rem', overflowY: 'auto', ...themeScalingMap[contentScaling].chatDrawerItemSx }}>
<Box key='chatlist' ref={chatsListRef} sx={{ flexGrow: 1, flexShrink: 1, flexBasis: '20rem', overflowY: 'auto', ...themeScalingMap[contentScaling].chatDrawerItemSx }}>
{displayNavItems.map((item, idx) => item.type === 'nav-item-chat-data' ? (
<ChatDrawerItemMemo
key={'nav-chat-' + item.conversationId}
@@ -422,7 +440,7 @@ function ChatDrawer(props: {
{filterHasStars && <StarOutlineRoundedIcon sx={{ color: 'primary.softColor', fontSize: 'xl', mb: -0.5, mr: 1 }} />}
{item.message}
</Typography>
{(filterHasStars || filterHasImageAssets || filterHasDocFragments || filterIsArchived) && (
{(filterHasBeamOpen || filterHasStars || filterHasImageAssets || filterHasDocFragments || filterIsArchived) && (
<Tooltip title='Clear Filters'>
<IconButton size='sm' color='primary' onClick={clearFilters}>
<ClearIcon />
@@ -308,6 +308,7 @@ function ChatDrawerItem(props: {
// Active or Also Open
<Sheet
aria-current={isActive ? 'true' : undefined}
variant={isActive ? 'solid' : 'outlined'}
invertedColors={isActive}
onClick={!isActive ? handleConversationActivate : undefined}
@@ -86,6 +86,7 @@ export function useChatDrawerRenderItems(
filterByQuery: string,
activeFolder: DFolder | null,
allFolders: DFolder[],
filterHasBeamOpen: boolean,
filterHasStars: boolean,
filterHasImageAssets: boolean,
filterHasDocFragments: boolean,
@@ -146,7 +147,8 @@ export function useChatDrawerRenderItems(
}
// filter for required attributes
if ((filterHasStars && !hasStars) || (filterHasImageAssets && !hasImages) || (filterHasDocFragments && !hasDocs))
const hasBeamOpen = openBeamConversationIds[_c.id];
if ((filterHasBeamOpen && !hasBeamOpen) || (filterHasStars && !hasStars) || (filterHasImageAssets && !hasImages) || (filterHasDocFragments && !hasDocs))
return null;
// rich properties
@@ -186,7 +188,7 @@ export function useChatDrawerRenderItems(
? allFolders.find(folder => folder.conversationIds.includes(_c.id)) ?? null
: null,
updatedAt: _c.updated || _c.created || 0,
hasBeamOpen: !!openBeamConversationIds?.[_c.id],
hasBeamOpen,
messageCount,
beingGenerated: !!_c._abortController, // FIXME: when the AbortController is moved at the message level, derive the state in the conv
systemPurposeId: _c.systemPurposeId,
@@ -287,19 +289,21 @@ export function useChatDrawerRenderItems(
renderNavItems.push({
type: 'nav-item-info-message',
message: (filterHasStars && (filterHasImageAssets || filterHasDocFragments)) ? 'No results'
: filterHasDocFragments ? 'No attachment results'
: filterHasImageAssets ? 'No image results'
: filterHasStars ? 'No starred results'
: filterIsArchived ? 'No archived conversations'
: isSearching ? 'Text not found'
: 'No conversations in folder',
: filterHasBeamOpen ? 'No beam conversations'
: filterHasDocFragments ? 'No attachment results'
: filterHasImageAssets ? 'No image results'
: filterHasStars ? 'No starred results'
: filterIsArchived ? 'No archived conversations'
: isSearching ? 'Text not found'
: 'No conversations in folder',
});
} else {
// filtering reminder (will be rendered with a clear button too)
if (filterHasStars || filterHasImageAssets || filterHasDocFragments || filterIsArchived) {
if (filterHasBeamOpen || filterHasStars || filterHasImageAssets || filterHasDocFragments || filterIsArchived) {
renderNavItems.unshift({
type: 'nav-item-info-message',
message: `${filterIsArchived ? 'Showing' : 'Filtering by'} ${[
filterHasBeamOpen && 'beam',
filterHasStars && 'stars',
filterHasImageAssets && 'images',
filterHasDocFragments && 'attachments',
@@ -2,9 +2,13 @@ import * as React from 'react';
import TimeAgo from 'react-timeago';
import { Box, Button, ButtonGroup, Tooltip, Typography } from '@mui/joy';
import DownloadIcon from '@mui/icons-material/Download';
import LinkOffRoundedIcon from '@mui/icons-material/LinkOffRounded';
import PlayArrowRoundedIcon from '@mui/icons-material/PlayArrowRounded';
import StopRoundedIcon from '@mui/icons-material/StopRounded';
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
import type { DMessageGenerator } from '~/common/stores/chat/chat.message';
@@ -12,53 +16,65 @@ const ARM_TIMEOUT_MS = 4000;
/**
* FIXME: COMPLETE THIS
* Resume controls for an upstream-stored run.
* - Resume: SSE replay (live deltas) - canonical path. Always offered when onResume exists.
* - Recover: one-shot JSON GET - shown only for vendors that benefit from it (Gemini Interactions).
* - Detach: abort the local fetch but leave the upstream run alive. Visible only when a resume
* is in-flight (`inFlightMode != null`). Resume/Recover stay available afterwards.
* - Stop: terminate the upstream run + delete the resource.
*
* IMPORTANT: in-flight state is owned by the parent (`inFlightMode` + `onDetach`) so it survives
* remounts that happen while a long-running stream is active (e.g. Deep Research).
*/
export function BlockOpUpstreamResume(props: {
upstreamHandle: Exclude<DMessageGenerator['upstreamHandle'], undefined>,
onResume?: () => void | Promise<void>;
onCancel?: () => void | Promise<void>;
pending?: boolean; // true iff a local in-flight op (initial POST or resume); drives the state machine + hides the expiry footer
inFlightMode?: AixReattachMode; // set by the parent while a resume is in flight; drives the loading/Detach UI
onResume?: (mode: AixReattachMode) => void | Promise<void>;
onDetach?: () => void;
onDelete?: () => void | Promise<void>;
}) {
// state
const [isResuming, setIsResuming] = React.useState(false);
const [isCancelling, setIsCancelling] = React.useState(false);
// local state - only for short-lived ops the parent doesn't own
const [isDeleting, setIsDeleting] = React.useState(false);
const [deleteArmed, setDeleteArmed] = React.useState(false);
const [error, setError] = React.useState<string | null>(null);
// expiration: boolean is evaluated at render (may lag briefly if nothing re-renders past expiry).
// TimeAgo handles its own tick for the label; the button's disabled state is the only consumer of this flag.
const { expiresAt, runId = '' } = props.upstreamHandle;
const isExpired = expiresAt != null && Date.now() > expiresAt;
const { expiresAt /*, runId = ''*/ } = props.upstreamHandle;
// State machine - mutually exclusive triplet (idle | initial-POST | resume | recover):
// - Idle : !pending - run not active locally (incl. post-reload, since
// chats.converters.ts clears pendingIncomplete on hydrate).
// - Initial POST : pending && !inFlightMode - first generation streaming.
// - Resume replay : pending && mode='replay' - we own this resume cycle.
// - Recover snap : pending && mode='snapshot' - we own this snapshot fetch.
//
// Visibility matrix (see BlockOpUpstreamResume props doc):
// Resume Recover Detach Cancel
// Idle ✅ ✅¹ — ✅
// Initial POST — — — ✅
// Resume in flight — — ✅ ✅
// Recover in flight — ✅² — —
// ¹ only for Gemini Interactions ² with loading spinner
const isReplaying = props.inFlightMode === 'replay';
const isSnapshotting = props.inFlightMode === 'snapshot';
const isIdle = !props.pending;
const canRecoverVendor = props.upstreamHandle.uht === 'vnd.gem.interactions';
const showResume = isIdle && !!props.onResume;
const showRecover = (isIdle || isSnapshotting) && !!props.onResume && canRecoverVendor;
const showDetach = isReplaying && !!props.onDetach;
const showCancel = !isSnapshotting && !!props.onDelete;
// handlers
const handleResume = React.useCallback(async () => {
const handleResume = React.useCallback((mode: AixReattachMode) => {
if (!props.onResume) return;
setError(null);
setIsResuming(true);
try {
await props.onResume();
} catch (err: any) {
setError(err?.message || 'Resume failed');
} finally {
setIsResuming(false);
}
}, [props]);
const handleCancel = React.useCallback(async () => {
if (!props.onCancel) return;
setError(null);
setIsCancelling(true);
try {
await props.onCancel();
} catch (err: any) {
setError(err?.message || 'Cancel failed');
} finally {
setIsCancelling(false);
}
// fire-and-forget: parent owns the promise lifecycle and the abort controller.
// If it rejects, the parent surfaces the error via its own UI; we stay silent.
Promise.resolve(props.onResume(mode)).catch(() => { /* parent handles */ });
}, [props]);
// Two-click arm: first click arms (visible red "Confirm?"), second click (within ARM_TIMEOUT_MS) executes.
@@ -87,7 +103,6 @@ export function BlockOpUpstreamResume(props: {
return () => clearTimeout(t);
}, [deleteArmed]);
return (
<Box
sx={{
@@ -99,41 +114,53 @@ export function BlockOpUpstreamResume(props: {
}}
>
<ButtonGroup>
{props.onResume && (
<Tooltip title='Resume generation from last checkpoint'>
{showResume && (
<Tooltip title='Resume by re-streaming from the upstream run'>
<Button
disabled={isResuming || isCancelling || isDeleting || isExpired}
loading={isResuming}
disabled={isDeleting}
startDecorator={<PlayArrowRoundedIcon color='success' />}
onClick={handleResume}
onClick={() => handleResume('replay')}
>
Resume
</Button>
</Tooltip>
)}
{props.onCancel && (
<Tooltip title='Cancel the response generation'>
{showRecover && (
<Tooltip title='Fetch the result without streaming - recovers stuck or hung runs'>
<Button
disabled={isResuming || isCancelling || isDeleting}
loading={isCancelling}
// startDecorator={<CancelIcon />}
onClick={handleCancel}
disabled={isDeleting}
loading={isSnapshotting}
loadingPosition='start'
startDecorator={<DownloadIcon />}
onClick={() => handleResume('snapshot')}
>
Cancel
Recover
</Button>
</Tooltip>
)}
{props.onDelete && (
<Tooltip title={deleteArmed ? 'Click again to confirm - cancels the run upstream (no resume after)' : 'Cancel the upstream run'}>
{showDetach && (
<Tooltip title='Close this connection only - the upstream run keeps going. Click Resume or Recover later to fetch results.'>
<Button
disabled={isDeleting}
startDecorator={<LinkOffRoundedIcon />}
onClick={props.onDetach}
>
Detach
</Button>
</Tooltip>
)}
{showCancel && (
<Tooltip title={deleteArmed ? 'Click again to confirm - cancels the upstream run and clears the handle' : 'Cancel the upstream run'}>
<Button
loading={isDeleting}
color={deleteArmed ? 'danger' : 'neutral'}
variant={deleteArmed ? 'solid' : 'outlined'}
startDecorator={<StopRoundedIcon />}
onClick={handleDelete}
disabled={isResuming || isCancelling || isDeleting}
disabled={isDeleting}
>
{deleteArmed ? 'Confirm?' : 'Cancel'}
</Button>
@@ -147,7 +174,7 @@ export function BlockOpUpstreamResume(props: {
</Typography>
)}
{!!expiresAt && <Typography level='body-xs' sx={{ fontSize: '0.65rem', opacity: 0.6 }}>
{!props.pending && !!expiresAt && <Typography level='body-xs' sx={{ fontSize: '0.65rem', opacity: 0.6 }}>
{/*Run ID: {runId.slice(0, 12)}...*/}
{/*{!!expiresAt && <> · Expires <TimeAgo date={expiresAt} /></>}*/}
Expires <TimeAgo date={expiresAt} />
@@ -29,6 +29,7 @@ import VerticalAlignBottomIcon from '@mui/icons-material/VerticalAlignBottom';
import VisibilityIcon from '@mui/icons-material/Visibility';
import VisibilityOffIcon from '@mui/icons-material/VisibilityOff';
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
import { ModelVendorAnthropic } from '~/modules/llms/vendors/anthropic/anthropic.vendor';
import { AnthropicIcon } from '~/common/components/icons/vendors/AnthropicIcon';
@@ -161,8 +162,10 @@ export function ChatMessage(props: {
onMessageBeam?: (messageId: string) => Promise<void>,
onMessageBranch?: (messageId: string) => void,
onMessageContinue?: (messageId: string, continueText: null | string) => void,
onMessageUpstreamResume?: (generator: DMessageGenerator, messageId: string) => Promise<void>,
onMessageUpstreamResume?: (generator: DMessageGenerator, messageId: string, mode: AixReattachMode) => Promise<void>,
onMessageUpstreamDetach?: (messageId: string) => void,
onMessageUpstreamDelete?: (generator: DMessageGenerator, messageId: string) => Promise<void>,
upstreamResumeMode?: AixReattachMode, // set by parent while a resume is in flight on this message
onMessageDelete?: (messageId: string) => void,
onMessageFragmentAppend?: (messageId: DMessageId, fragment: DMessageFragment) => void
onMessageFragmentDelete?: (messageId: DMessageId, fragmentId: DMessageFragmentId) => void,
@@ -247,7 +250,7 @@ export function ChatMessage(props: {
// const wordsDiff = useWordsDifference(textSubject, props.diffPreviousText, showDiff);
const { onMessageAssistantFrom, onMessageDelete, onMessageFragmentAppend, onMessageFragmentDelete, onMessageFragmentReplace, onMessageContinue, onMessageUpstreamResume, onMessageUpstreamDelete } = props;
const { onMessageAssistantFrom, onMessageDelete, onMessageFragmentAppend, onMessageFragmentDelete, onMessageFragmentReplace, onMessageContinue, onMessageUpstreamResume, onMessageUpstreamDetach, onMessageUpstreamDelete } = props;
const handleFragmentNew = React.useCallback(() => {
onMessageFragmentAppend?.(messageId, createTextContentFragment(''));
@@ -265,11 +268,15 @@ export function ChatMessage(props: {
onMessageContinue?.(messageId, continueText);
}, [messageId, onMessageContinue]);
const handleUpstreamResume = React.useCallback(() => {
const handleUpstreamResume = React.useCallback((mode: AixReattachMode) => {
if (!messageGenerator) return;
return onMessageUpstreamResume?.(messageGenerator, messageId);
return onMessageUpstreamResume?.(messageGenerator, messageId, mode);
}, [messageGenerator, messageId, onMessageUpstreamResume]);
const handleUpstreamDetach = React.useCallback(() => {
onMessageUpstreamDetach?.(messageId);
}, [messageId, onMessageUpstreamDetach]);
const handleUpstreamDelete = React.useCallback(() => {
if (!messageGenerator) return;
return onMessageUpstreamDelete?.(messageGenerator, messageId);
@@ -898,11 +905,14 @@ export function ChatMessage(props: {
/>
)}
{/* Upstream Resume - shows whenever there's a stored handle (incl. post-reload, where no error fragment is present) */}
{!messagePendingIncomplete && props.isBottom && fromAssistant && messageGenerator?.upstreamHandle && (!!onMessageUpstreamResume || !!onMessageUpstreamDelete) && (
{/* Upstream Resume - shows whenever there's a stored handle (incl. post-reload, and while streaming so Stop can cancel the upstream run) */}
{props.isBottom && fromAssistant && messageGenerator?.upstreamHandle && (!!onMessageUpstreamResume || !!onMessageUpstreamDelete) && (
<BlockOpUpstreamResume
upstreamHandle={messageGenerator.upstreamHandle}
pending={messagePendingIncomplete}
inFlightMode={props.upstreamResumeMode}
onResume={onMessageUpstreamResume ? handleUpstreamResume : undefined}
onDetach={onMessageUpstreamDetach ? handleUpstreamDetach : undefined}
onDelete={onMessageUpstreamDelete ? handleUpstreamDelete : undefined}
/>
)}
@@ -149,6 +149,10 @@ export function ContentFragments(props: {
// return null;
case 'ma':
// skip rendering empty reasoning fragments (created as vehicles for vendor state / reasoning continuity)
const isActivelyStreaming = isLastFragment && !!props.messagePendingIncomplete;
if (!part.aText && !part.redactedData?.length && !isActivelyStreaming)
return null;
const BlockPartModelAuxMemoOrNot = optimizeMemoBeforeLastBlock ? BlockPartModelAuxMemo : BlockPartModelAux;
return (
<BlockPartModelAuxMemoOrNot
@@ -166,9 +166,9 @@ export function AppChatSettingsAI() {
tooltip={<>
When Claude uses tools like code execution, it may produce text and image files stored in Anthropic&apos;s File API. This setting controls whether Big-AGI should automatically download and embed them in the chat.
<ul>
<li><b>Off</b>: keep as references (default).</li>
<li><b>Inline</b>: download and embed text/images.</li>
<li><b>Inline + Free</b>: embed, then delete from Anthropic to free storage.</li>
<li><b>Show</b>: keep as references.</li>
<li><b>Embed</b>: download and embed text/images (default).</li>
<li><b>Embed + Free</b>: embed, then delete from Anthropic to free storage.</li>
</ul>
Only affects Anthropic models.
</>}
+1 -1
View File
@@ -23,7 +23,7 @@ export const Release = {
// this is here to trigger revalidation of data, e.g. models refresh
Monotonics: {
Aix: 67,
Aix: 70,
NewsVersion: 204,
},
@@ -5,6 +5,7 @@ import { bareBonesPromptMixer } from '~/modules/persona/pmix/pmix';
import { SystemPurposes } from '../../data';
import { BeamStore, createBeamVanillaStore } from '~/modules/beam/store-beam_vanilla';
import { autoConversationTitle } from '~/modules/aifn/autotitle/autoTitle';
import { useModuleBeamStore } from '~/modules/beam/store-module-beam';
import type { DConversationId } from '~/common/stores/chat/chat.conversation';
@@ -275,6 +276,10 @@ export class ConversationHandler {
// close beam
terminateKeepingSettings();
// auto-title the conversation if enabled (parity with chat-persona flow — fixes #1078)
if (getChatAutoAI().autoTitleChat)
void autoConversationTitle(this.conversationId, false);
};
beamOpen(viewHistory, getChatLLMId(), !!destReplaceMessageId, onBeamSuccess);
+6 -2
View File
@@ -19,6 +19,7 @@ import { StarIconUnstyled, StarredNoXL2 } from '~/common/components/StarIcons';
import { TooltipOutlined } from '~/common/components/TooltipOutlined';
import { findModelsServiceOrNull, getChatLLMId, llmsStoreActions } from '~/common/stores/llms/store-llms';
import { optimaActions, optimaOpenModels } from '~/common/layout/optima/useOptima';
import { sortLLMsByServiceLabel } from '~/common/stores/llms/components/llms.dropdown.utils';
import { useToggleableStringSet } from '~/common/util/hooks/useToggleableStringSet';
import { useUIPreferencesStore } from '~/common/stores/store-ui';
import { useVisibleLLMs } from '~/common/stores/llms/llms.hooks';
@@ -202,12 +203,15 @@ export function useLLMSelect(
const optimizeToSingleVisibleId = (!controlledOpen && _filteredLLMs.length > LLM_SELECT_REDUCE_OPTIONS) ? llmId : null; // id to keep visible when optimizing
const optionsArray = React.useMemo(() => {
// sort LLMs alphabetically by service label so vendor groups appear in a stable order (groups remain contiguous because sort is stable on equal keys)
const sortedLLMs = sortLLMsByServiceLabel(_filteredLLMs);
// check if we have multiple services (to show collapsible headers)
const hasMultipleServices = _filteredLLMs.some((llm, i, arr) => i > 0 && llm.sId !== arr[i - 1].sId);
const hasMultipleServices = sortedLLMs.some((llm, i, arr) => i > 0 && llm.sId !== arr[i - 1].sId);
// create the option items
let prevServiceId: DModelsServiceId | null = null;
return _filteredLLMs.reduce((acc, llm, _index) => {
return sortedLLMs.reduce((acc, llm, _index) => {
if (optimizeToSingleVisibleId && llm.id !== optimizeToSingleVisibleId)
return acc;
+7 -1
View File
@@ -103,7 +103,13 @@ export type DMessageFragmentVendorState = Record<string, unknown> & {
thoughtSignature?: string; // Gemini 3+ - echoed back to maintain reasoning context
};
openai?: {
// Responses API reasoning item continuity handle
// Responses API reasoning item continuity handle.
// IMPORTANT: OpenAI-private encryption + server-side item id; never round-trip to xAI.
reasoningItem?: { id?: string; encryptedContent?: string; };
};
xai?: {
// xAI Responses API reasoning item continuity handle.
// IMPORTANT: xAI-private encryption + server-side item id; never round-trip to OpenAI.
reasoningItem?: { id?: string; encryptedContent?: string; };
};
// Future: anthropic?: { ... }
@@ -42,17 +42,44 @@ export interface LLMServiceGroup {
}
/**
* Group LLMs by service, resolving service display labels.
* Resolve display label for each unique service in the input.
* Fallback chain: service.label -> vendor.name -> service.id.
*/
function _resolveServiceLabels(llms: ReadonlyArray<DLLM>): Map<DModelsServiceId, string> {
const labelById = new Map<DModelsServiceId, string>();
for (const llm of llms) {
if (labelById.has(llm.sId)) continue;
const vendor = findModelVendor(llm.vId);
labelById.set(llm.sId, findModelsServiceOrNull(llm.sId)?.label || vendor?.name || llm.sId);
}
return labelById;
}
/**
* Stably sort LLMs by their service label (alphabetical, locale-aware).
* Preserves intra-service order (e.g. starred-first), since JS sort is stable.
*/
export function sortLLMsByServiceLabel<T extends DLLM>(llms: ReadonlyArray<T>): T[] {
if (llms.length < 2) return [...llms];
const labelById = _resolveServiceLabels(llms);
return [...llms].sort((a, b) => labelById.get(a.sId)!.localeCompare(labelById.get(b.sId)!));
}
/**
* Group LLMs by service, alphabetically sorted by service label.
* Preserves intra-service order.
*/
export function groupLLMsByService(llms: ReadonlyArray<DLLM>): LLMServiceGroup[] {
const labelById = _resolveServiceLabels(llms);
if (llms.length >= 2)
llms = [...llms].sort((a, b) => labelById.get(a.sId)!.localeCompare(labelById.get(b.sId)!));
const groups: LLMServiceGroup[] = [];
let currentGroup: LLMServiceGroup | null = null;
for (const llm of llms) {
if (!currentGroup || currentGroup.serviceId !== llm.sId) {
const vendor = findModelVendor(llm.vId);
const serviceLabel = findModelsServiceOrNull(llm.sId)?.label || vendor?.name || llm.sId;
currentGroup = { serviceId: llm.sId, serviceLabel, models: [] };
currentGroup = { serviceId: llm.sId, serviceLabel: labelById.get(llm.sId)!, models: [] };
groups.push(currentGroup);
}
currentGroup.models.push(llm);
+11 -1
View File
@@ -175,7 +175,8 @@ export const DModelParameterRegistry = {
label: 'Thinking',
type: 'enum',
description: 'Enable or disable extended thinking mode.',
values: ['none', 'high'],
values: ['none', 'high', 'max'],
// 'max' is for now DeepSeek V4-specific (reasoning_effort=max); other vendors restrict via enumValues
// undefined means vendor default (usually 'high', i.e. thinking enabled)
}),
@@ -348,6 +349,15 @@ export const DModelParameterRegistry = {
// when undefined, the model chooses automatically
},
// Gemini Interactions API agent_config - per-agent knobs (Deep Research only today)
llmVndGeminiAgentViz: _enumDef({
label: 'Visualizations',
type: 'enum',
description: 'Charts and images in Deep Research reports. Disable for text-only output (helpful when merging multiple reports).',
values: ['auto', 'off'],
// undefined means upstream default ('auto'); we only forward when explicitly 'off'
}),
// NOTE: we don't have this as a parameter, as for now we use it in tandem with llmVndGeminiGoogleSearch
// llmVndGeminiUrlContext: {
// label: 'URL Context',
+15
View File
@@ -25,6 +25,7 @@ export interface DLLM {
label: string;
created: number | 0;
updated?: number | 0;
pubDate?: string; // official release date in 'YYYYMMDD'
description: string;
hidden: boolean;
@@ -137,6 +138,20 @@ export function getLLMMaxOutputTokens(llm: DLLM | null): DLLMMaxOutputTokens | u
return llm.userMaxOutputTokens ?? llm.maxOutputTokens;
}
/**
* Parse the model's editorial `pubDate` ('YYYYMMDD') into a Date, or null if missing/malformed.
* Date is constructed at local midnight - pubDate is day-precision, no time component.
*/
export function getLLMPubDate(llm: DLLM | null | undefined): Date | null {
const p = llm?.pubDate;
if (!p || !/^\d{8}$/.test(p)) return null;
const y = parseInt(p.slice(0, 4), 10);
const m = parseInt(p.slice(4, 6), 10) - 1; // JS Date months are 0-indexed
const d = parseInt(p.slice(6, 8), 10);
const date = new Date(y, m, d);
return Number.isFinite(date.getTime()) ? date : null;
}
/// Interfaces ///
// do not change anything below! those will be persisted in data
+1 -1
View File
@@ -49,7 +49,7 @@ export async function autoConversationTitle(conversationId: string, forceReplace
autoTitleLlmId,
'You are an AI conversation titles assistant who specializes in creating expressive yet few-words chat titles.',
`Analyze the given short conversation (every line is truncated) and extract a concise chat title that summarizes the conversation in as little as a couple of words.
Only respond with the lowercase short title and nothing else.
Only respond with the short title and nothing else.
\`\`\`
${historyLines.join('\n')}
+23 -8
View File
@@ -834,11 +834,11 @@ export class ContentReassembler {
}
private onSetVendorState(vs: Extract<AixWire_Particles.PartParticleOp, { p: 'svs' }>): void {
private onSetVendorState({ state, vendor }: Extract<AixWire_Particles.PartParticleOp, { p: 'svs' }>): void {
// Promote Anthropic container state -> Generator (message-scoped, for cross-turn reuse)
if (vs.vendor === 'anthropic' && 'container' in vs.state) {
const { id, expiresAt } = vs.state.container;
if (vendor === 'anthropic' && 'container' in state) {
const { id, expiresAt } = state.container;
if (id && expiresAt)
this.S.generator = {
...this.S.generator,
@@ -855,11 +855,12 @@ export class ContentReassembler {
return;
}
// Guard: OpenAI reasoningItem state must land on the ma (reasoning) fragment that produced it.
// Guard: reasoningItem state must land on the ma (reasoning) fragment that produced it.
// If no summary was appended during the reasoning item (summary disabled / skipped), the last
// fragment will belong to an unrelated preceding item - dropping the handle is safer than contaminating.
if (vs.vendor === 'openai' && 'reasoningItem' in vs.state && lastFragment.part.pt !== 'ma') {
console.warn('[ContentReassembler] OpenAI reasoningItem state without preceding ma fragment - dropping continuity handle', { lastFragmentPt: lastFragment.part.pt });
// Applies to both OpenAI and xAI namespaces; each is opaque/private to its producing vendor.
if ((vendor === 'openai' || vendor === 'xai') && 'reasoningItem' in state && lastFragment.part.pt !== 'ma') {
console.warn(`[ContentReassembler] ${vendor} reasoningItem state without preceding ma fragment - dropping continuity handle`, { lastFragmentPt: lastFragment.part.pt });
return;
}
@@ -868,7 +869,7 @@ export class ContentReassembler {
...lastFragment,
vendorState: {
...lastFragment.vendorState,
[vs.vendor]: vs.state,
[vendor]: state,
},
});
}
@@ -905,9 +906,18 @@ export class ContentReassembler {
/**
* Stores raw termination data from the wire - classification deferred to finalizeReassembly()
*/
private onCGEnd({ terminationReason, tokenStopReason }: Extract<AixWire_Particles.ChatGenerateOp, { cg: 'end' }>): void {
private onCGEnd({ terminationReason, tokenStopReason, tokenStopError }: Extract<AixWire_Particles.ChatGenerateOp, { cg: 'end' }>): void {
// Diagnostic: detect late 'end' particles overriding a prior termination (parser bug, replayed wire, or upstream advisory after a clean end).
// Behavior unchanged - we still apply the override - but the warning makes the override visible client-side, mirroring the server-side
// 'setDialectEnded ... (overriding)' warning in ChatGenerateTransmitter and the existing setClientAborted/setClientExcepted warnings here.
if (this.S.terminationReason)
console.warn(`[DEV] [ContentReassembler] onCGEnd: overriding prior termination '${this.S.terminationReason}' with '${terminationReason}' (wire stop: ${this.S.dialectStopReason ?? 'none'} -> ${tokenStopReason ?? 'none'})`);
this.S.terminationReason = terminationReason;
this.S.dialectStopReason = tokenStopReason;
// Vendor-composed stop error, surfaced as a complementary error fragment alongside the generic classification message
if (tokenStopError)
this._appendErrorFragment(tokenStopError);
}
/**
@@ -989,6 +999,11 @@ export class ContentReassembler {
}
private onCGIssue({ issueId: _issueId /* Redundant as we add an Error Fragment already */, issueText, issueHint }: Extract<AixWire_Particles.ChatGenerateOp, { cg: 'issue' }> & { issueHint?: DMessageErrorPart['hint'] }): void {
// Diagnostic: detect issue particles arriving after a clean termination (e.g. OpenAI rate-limit advisory after response.completed).
// Behavior unchanged - the issue is still appended - but the warning surfaces that we are mutating a finished message.
if (this.S.terminationReason && this.S.terminationReason === 'done-dialect')
console.warn(`[DEV] [ContentReassembler] onCGIssue: appending issue after clean '${this.S.terminationReason}' (wire stop: ${this.S.dialectStopReason ?? 'none'}): ${issueText}`);
// NOTE: not sure I like the flow at all here
// there seem to be some bad conditions when issues are raised while the active part is not text
if (MERGE_ISSUES_INTO_TEXT_PART_IF_OPEN) {
@@ -409,11 +409,15 @@ export async function aixCGR_ChatSequence_FromDMessagesOrThrow(
break;
case 'ma':
// Preserve reasoning continuity across turns. Two channels, any one is sufficient:
// Preserve reasoning continuity across turns. Three channels, any one is sufficient:
// - Anthropic: part.textSignature / part.redactedData (bespoke fields, see Anthropic extended thinking docs)
// - OpenAI/Gemini: _vnd sidecar (reasoningItem.* / thoughtSignature, generic vendor-state mechanism)
// - OpenAI Responses / Gemini: _vnd sidecar (reasoningItem.* / thoughtSignature, opaque continuity handle)
// - DeepSeek V4 (OpenAI chat-completions): plain reasoning text in aText is the payload itself
const oaiReasoning = _vnd?.openai?.reasoningItem;
const hasReasoningHandle = aPart.textSignature || aPart.redactedData?.length || oaiReasoning?.encryptedContent || oaiReasoning?.id;
const hasReasoningHandle =
(aPart.textSignature || aPart.redactedData?.length)
|| (oaiReasoning?.encryptedContent || oaiReasoning?.id)
|| (aPart.aText && aPart.aType === 'reasoning'); // DeepSeek V4 reasoning in plain text - NOTE: will send LOTS of 'ma' parts (e.g. to Gemini, which doesn't even need them)
if (hasReasoningHandle) {
const aModelAuxPart = aPart as AixParts_ModelAuxPart; // NOTE: this is a forced cast from readonly string[] to string[], but not a big deal here
modelMessage.parts.push(_vnd ? { ...aModelAuxPart, _vnd } : aModelAuxPart);
@@ -653,7 +657,7 @@ function _clientCreateAixMetaInReferenceToPart(items: DMetaReferenceItem[]): Aix
export async function clientHotFixGenerateRequest_ApplyAll(llmInterfaces: DLLM['interfaces'], aixChatGenerate: AixAPIChatGenerate_Request, modelName: string): Promise<{
shallDisableStreaming: boolean;
hotfixNoStream: boolean;
workaroundsCount: number;
}> {
@@ -676,12 +680,12 @@ export async function clientHotFixGenerateRequest_ApplyAll(llmInterfaces: DLLM['
workaroundsCount += await clientHotFixGenerateRequest_ConvertWebP(aixChatGenerate, 'image/jpeg');
// Disable streaming for select chat models that don't support it (e.g. o1-preview (old) and o1-2024-12-17)
const shallDisableStreaming = llmInterfaces.includes(LLM_IF_HOTFIX_NoStream);
const hotfixNoStream = llmInterfaces.includes(LLM_IF_HOTFIX_NoStream);
if (workaroundsCount > 0)
console.warn(`[DEV] Working around '${modelName}' model limitations: client-side applied ${workaroundsCount} workarounds`);
return { shallDisableStreaming, workaroundsCount };
return { hotfixNoStream, workaroundsCount };
}
@@ -37,7 +37,7 @@ export async function* clientSideChatGenerate(
return dispatch;
});
yield* executeChatGenerateWithContinuation(dispatchCreator, streaming, abortSignal, _d);
yield* executeChatGenerateWithContinuation(dispatchCreator, abortSignal, _d);
}
/**
@@ -48,7 +48,7 @@ export async function* clientSideReattachUpstream(
access: AixAPI_Access,
resumeHandle: AixAPI_ResumeHandle,
context: AixAPI_Context_ChatGenerate,
streaming: true,
streaming: boolean,
connectionOptions: AixAPI_ConnectionOptions_ChatGenerate,
abortSignal: AbortSignal,
): AsyncGenerator<AixWire_Particles.ChatGenerateOp, void> {
@@ -56,7 +56,7 @@ export async function* clientSideReattachUpstream(
const _d: AixDebugObject = _createClientDebugConfig(access, connectionOptions, context.name);
const dispatchCreator = () => createChatGenerateResumeDispatch(access, resumeHandle, streaming);
yield * executeChatGenerateWithContinuation(dispatchCreator, streaming, abortSignal, _d);
yield * executeChatGenerateWithContinuation(dispatchCreator, abortSignal, _d);
}
/**
+47 -34
View File
@@ -70,7 +70,7 @@ export function aixCreateModelFromLLMOptions(
llmVndAntEffort, llmVndGemEffort, llmVndOaiEffort, llmVndMiscEffort,
llmVndAnt1MContext, llmVndAntInfSpeed, llmVndAntSkills, llmVndAntThinkingBudget, llmVndAntWebDynamic, llmVndAntWebFetch, llmVndAntWebFetchMaxUses, llmVndAntWebSearch, llmVndAntWebSearchMaxUses,
llmVndBedrockAPI,
llmVndGeminiAspectRatio, llmVndGeminiImageSize, llmVndGeminiCodeExecution, llmVndGeminiComputerUse, llmVndGeminiGoogleSearch, llmVndGeminiMediaResolution, llmVndGeminiThinkingBudget,
llmVndGeminiAgentViz, llmVndGeminiAspectRatio, llmVndGeminiImageSize, llmVndGeminiCodeExecution, llmVndGeminiComputerUse, llmVndGeminiGoogleSearch, llmVndGeminiMediaResolution, llmVndGeminiThinkingBudget,
// llmVndMoonshotWebSearch,
llmVndOaiRestoreMarkdown, llmVndOaiVerbosity, llmVndOaiWebSearchContext, llmVndOaiWebSearchGeolocation, llmVndOaiImageGeneration, llmVndOaiCodeInterpreter,
llmVndOrtWebSearch,
@@ -143,6 +143,7 @@ export function aixCreateModelFromLLMOptions(
// Gemini
...(llmVndGeminiInteractions ? { vndGeminiAPI: 'interactions-agent' } : {}),
...(llmVndGeminiAgentViz === 'off' ? { vndGeminiAgentViz: 'off' } : {}), // Deep Research agent_config.visualization - only forward when explicitly disabled
...(llmVndGeminiAspectRatio ? { vndGeminiAspectRatio: llmVndGeminiAspectRatio } : {}),
...(llmVndGeminiCodeExecution === 'auto' ? { vndGeminiCodeExecution: llmVndGeminiCodeExecution } : {}),
...(llmVndGeminiComputerUse ? { vndGeminiComputerUse: llmVndGeminiComputerUse } : {}),
@@ -342,7 +343,7 @@ export async function aixChatGenerateText_Simple(
aixContextRef: AixAPI_Context_ChatGenerate['ref'],
// optional options
clientOptions?: Partial<AixClientOptions>, // this makes the abortController optional
// optional callback for streaming
// optional callback - if provided, streaming is activated
onTextStreamUpdate?: (text: string, isDone: boolean, generator: DMessageGenerator) => MaybePromise<void>,
): Promise<string> {
@@ -363,14 +364,13 @@ export async function aixChatGenerateText_Simple(
// Aix Context
const aixContext = aixCreateChatGenerateContext(aixContextName, aixContextRef);
// Aix Streaming - implicit if the callback is provided
let aixStreaming = !!onTextStreamUpdate;
// Caller streaming preference - implicit: stream if a callback is provided
const callerStreaming = !!onTextStreamUpdate;
// Client-side late stage model HotFixes
const { shallDisableStreaming } = await clientHotFixGenerateRequest_ApplyAll(llm.interfaces, aixChatGenerate, llmParameters.llmRef || llm.id);
if (shallDisableStreaming || aixModel.forceNoStream)
aixStreaming = false;
const { hotfixNoStream } = await clientHotFixGenerateRequest_ApplyAll(llm.interfaces, aixChatGenerate, llmParameters.llmRef || llm.id);
const wireStreaming = !hotfixNoStream && !aixModel.forceNoStream ? callerStreaming : false;
// Variable to store the final text
@@ -398,11 +398,11 @@ export async function aixChatGenerateText_Simple(
aixModel,
aixChatGenerate,
aixContext,
aixStreaming,
wireStreaming,
state.generator,
abortSignal,
clientOptions?.throttleParallelThreads ?? 0,
!aixStreaming ? undefined : async (ll: AixChatGenerateContent_LL, _isDone: boolean /* we want to issue this, in case the next action is an exception */) => {
!onTextStreamUpdate ? undefined : async (ll: AixChatGenerateContent_LL, _isDone: boolean /* we want to issue this, in case the next action is an exception */) => {
_llToL2Simple(ll, state);
if (onTextStreamUpdate && state.text !== null)
await onTextStreamUpdate(state.text, false, state.generator);
@@ -521,7 +521,7 @@ type _AixChatGenerateContent_DMessageGuts_WithOutcome = AixChatGenerateContent_D
* @param llmId - ID of the Language Model to use
* @param aixChatGenerate - Multi-modal chat generation request specifics, including Tools and high-level metadata
* @param aixContext - Information about how this chat generation is being used
* @param aixStreaming - Whether to use streaming for generation
* @param aixStreaming - Caller's wire-streaming preference. Subject to override by model/hotfix constraints, or dispatch constraints
* @param clientOptions - Client options for the operation
* @param onStreamingUpdate - Optional callback for streaming updates
*
@@ -551,10 +551,9 @@ export async function aixChatGenerateContent_DMessage_orThrow<TServiceSettings e
vndAntTransformInlineFiles: aixAccess.dialect === 'anthropic' ? getVndAntInlineFiles() : undefined,
});
// Client-side late stage model HotFixes
const { shallDisableStreaming } = await clientHotFixGenerateRequest_ApplyAll(llm.interfaces, aixChatGenerate, llmParameters.llmRef || llm.id);
if (shallDisableStreaming || aixModel.forceNoStream)
aixStreaming = false;
// Client-side late stage model HotFixes - collapse the caller's requested streaming preference into the effective wire-streaming decision after constraints (hotfix gate, model.forceNoStream)
const { hotfixNoStream } = await clientHotFixGenerateRequest_ApplyAll(llm.interfaces, aixChatGenerate, llmParameters.llmRef || llm.id);
const wireStreaming = !hotfixNoStream && !aixModel.forceNoStream ? aixStreaming : false;
// Legacy Note: awaited OpenAI moderation check was removed (was only on this codepath)
@@ -584,7 +583,7 @@ export async function aixChatGenerateContent_DMessage_orThrow<TServiceSettings e
aixModel,
aixChatGenerate,
aixContext,
aixStreaming,
wireStreaming,
dMessage.generator,
clientOptions.abortSignal,
clientOptions.throttleParallelThreads ?? 0,
@@ -646,22 +645,30 @@ function _finalizeLlmMetricsWithCosts(cgMetricsLg: undefined | DMetricsChatGener
// --- L2 - Content Generation reattachment as DMessage ---
/**
* Reattach mode selects how to reconstruct an in-progress upstream run:
* - 'replay' - canonical: SSE replays the event sequence from the start. Live deltas reach
* the UI as the run progresses (or as past content is replayed).
* - 'snapshot' - one-shot JSON GET returns the resource as-is right now. Used to recover when
* the SSE endpoint is broken upstream but the resource itself is still readable.
*
* Names describe what you get, not how. See `kb/modules/LLM-gemini-interactions.md` for failure modes.
*/
export type AixReattachMode = 'replay' | 'snapshot';
/**
* Reattach facade: wraps `aixChatGenerateContent_DMessage_orThrow` for the reattach-to-upstream flow.
* - Validates the generator carries an `upstreamHandle`
* - Stubs the unused chat-generate request, and
* - Seeds the base function so the LL's reattach branch fires.
*
* On an in-progress upstream run (Gemini Deep Research today, extensible to OAI Responses), the server
* just needs the handle to GET-poll; no chat-generate body is needed. This facade:
* - validates the generator carries an `upstreamHandle`,
* - stubs the chat-generate request (unused on the reattach path - the server uses the handle),
* - seeds the base function via `clientOptions.reattachGenerator` so the LL's reattach branch fires.
*
* The reassembler starts with empty fragments; since Gemini Interactions snapshots are cumulative,
* the stream will rebuild the complete content from scratch. Any partial content from the original run is replaced.
* The reassembler replaces content on reattach (Gemini Interactions snapshots are cumulative, so this rebuilds from scratch).
*/
export async function aixReattachContent_DMessage_orThrow(
llmId: DLLMId,
reattachGenerator: Readonly<DMessageGenerator>,
aixContext: AixAPI_Context_ChatGenerate,
mode: AixReattachMode,
clientOptions: Pick<AixClientOptions, 'abortSignal' | 'throttleParallelThreads'>,
onStreamingUpdate?: (update: AixChatGenerateContent_DMessageGuts, isDone: boolean) => MaybePromise<void>,
): Promise<_AixChatGenerateContent_DMessageGuts_WithOutcome> {
@@ -676,7 +683,7 @@ export async function aixReattachContent_DMessage_orThrow(
llmId,
stubChatGenerate,
aixContext,
true, // streaming
mode === 'replay', // wire-level: SSE demuxer (replay) vs one-shot JSON body (snapshot)
{ ...clientOptions, reattachGenerator: reattachGenerator as any /* guaranteed by the check */ },
onStreamingUpdate,
);
@@ -753,7 +760,7 @@ export type AixChatGenerateTerminal_LL = 'completed' | 'aborted' | 'failed';
*
* Contract:
* - empty fragments means no content yet, and no error
* - aixStreaming hints the source, but can be respected or not
* - wireStreaming hints the wire transport (SSE vs single response), but can be respected or not by the dispatch (e.g. SSE-only APIs ignore a `false` value)
* - onReassemblyUpdate is optional, you can ignore the updates and await the final result
* - errors become Error fragments, and they can be dialect-sent, dispatch-excepts, client-read issues or even user aborts
* - DOES NOT THROW, but the final accumulator may contain error fragments
@@ -772,7 +779,7 @@ export type AixChatGenerateTerminal_LL = 'completed' | 'aborted' | 'failed';
* - special parts include 'In Reference To' (a decorator of messages)
* - other special parts include the Anthropic Caching hints, on select message
* @param aixContext specifies the scope of the caller, such as what's the high level objective of this call
* @param aixStreaming requests the source to provide incremental updates
* @param wireStreaming the effective wire-level streaming decision (already collapsed from caller preference + model/hotfix constraints); drives tRPC `streaming` field and downstream dispatch body shape
* @param initialGenerator generator initial value, which will be updated for every new piece of information received
* @param abortSignal allows the caller to stop the operation
* @param throttleParallelThreads allows the caller to limit the number of parallel threads
@@ -790,7 +797,7 @@ async function _aixChatGenerateContent_LL(
aixModel: AixAPI_Model,
aixChatGenerate: AixAPIChatGenerate_Request,
aixContext: AixAPI_Context_ChatGenerate,
aixStreaming: boolean,
wireStreaming: boolean,
// others
initialGenerator: DMessageGenerator,
abortSignal: AbortSignal,
@@ -804,10 +811,13 @@ async function _aixChatGenerateContent_LL(
const inspectorTransport = !inspectorEnabled ? undefined : aixAccess.clientSideFetch ? 'csf' : 'trpc';
const inspectorContext = !inspectorEnabled ? undefined : { contextName: aixContext.name, contextRef: aixContext.ref };
// [DEV] Inspector - request body override
// Inspector - override request body
const requestBodyOverrideJson = inspectorEnabled && aixClientDebuggerGetRBO();
const debugRequestBodyOverride = !requestBodyOverrideJson ? false : JSON.parse(requestBodyOverrideJson);
// Inspector - force disable streaming (note: dispatches may still override this)
if (getAixDebuggerNoStreaming()) wireStreaming = false;
/**
* FIXME: implement client selection of resumability - aixAccess option?
* NOTE: for Gemini Deep Research, it's on by default, so both auto-reattach on network breaks (currently disabled)
@@ -827,8 +837,11 @@ async function _aixChatGenerateContent_LL(
// [CSF] Pre-load client-side executor if needed - type inference works here, no need to type
let clientSideChatGenerate;
let clientSideReattachUpstream;
if (aixAccess.clientSideFetch)
({ clientSideChatGenerate, clientSideReattachUpstream } = await _loadCsfModuleOrThrow());
if (aixAccess.clientSideFetch) {
const csf = await _loadCsfModuleOrThrow();
clientSideChatGenerate = csf.clientSideChatGenerate;
clientSideReattachUpstream = csf.clientSideReattachUpstream;
}
// Client-side particle transforms:
@@ -891,7 +904,7 @@ async function _aixChatGenerateContent_LL(
aixModel,
aixChatGenerate,
aixContext,
getAixDebuggerNoStreaming() ? false : aixStreaming,
wireStreaming,
aixConnectionOptions,
abortSignal,
) :
@@ -901,7 +914,7 @@ async function _aixChatGenerateContent_LL(
model: aixModel,
chatGenerate: aixChatGenerate,
context: aixContext,
streaming: getAixDebuggerNoStreaming() ? false : aixStreaming, // [DEV] disable streaming if set in the UX (testing)
streaming: wireStreaming,
connectionOptions: aixConnectionOptions,
}, { signal: abortSignal })
@@ -912,7 +925,7 @@ async function _aixChatGenerateContent_LL(
aixAccess,
accumulator_LL.generator.upstreamHandle,
aixContext,
true, // streaming - reattach is only validated for streaming for now
wireStreaming,
aixConnectionOptions,
abortSignal,
) :
@@ -921,7 +934,7 @@ async function _aixChatGenerateContent_LL(
access: aixAccess,
upstreamHandle: accumulator_LL.generator.upstreamHandle,
context: aixContext,
streaming: true,
streaming: wireStreaming,
connectionOptions: aixConnectionOptions,
}, { signal: abortSignal })
@@ -7,6 +7,7 @@ import { Box, Card, Chip, Divider, Sheet, Typography } from '@mui/joy';
import { RenderCodeMemo } from '~/modules/blocks/code/RenderCode';
import { ExpanderControlledBox } from '~/common/components/ExpanderControlledBox';
import { objectDeepCloneWithStringLimit } from '~/common/util/objectUtils';
import TimelapseIcon from '@mui/icons-material/Timelapse';
import type { AixClientDebugger } from './memstore-aix-client-debugger';
@@ -184,12 +185,10 @@ export function AixDebuggerFrame(props: {
{/* List of particles */}
{frame.particles.map((particle, idx) => {
// truncated preview of particle content
// preview of particle content: preserve structure, trim long string fields
let jsonPreview = '';
try {
const content = particle.content;
jsonPreview = JSON.stringify(content).substring(0, 1024);
if (jsonPreview.length >= 1024) jsonPreview += '...';
jsonPreview = JSON.stringify(objectDeepCloneWithStringLimit(particle.content, 'aix-debugger-particle', 64));
} catch (e) {
jsonPreview = 'Error parsing content';
}
+3 -3
View File
@@ -30,7 +30,7 @@ export const aixRouter = createTRPCRouter({
const _d = _createDebugConfig(input.access, input.connectionOptions, input.context.name);
const dispatchCreator = () => createChatGenerateDispatch(input.access, input.model, input.chatGenerate, input.streaming, !!input.connectionOptions?.enableResumability);
yield* executeChatGenerateWithContinuation(dispatchCreator, input.streaming, ctx.reqSignal, _d);
yield* executeChatGenerateWithContinuation(dispatchCreator, ctx.reqSignal, _d);
}),
/**
@@ -42,14 +42,14 @@ export const aixRouter = createTRPCRouter({
access: AixWire_API.Access_schema,
upstreamHandle: AixWire_API.UpstreamHandle_schema, // reattach uses a handle instead of 'model + chatGenerate'
context: AixWire_API.ContextChatGenerate_schema,
streaming: z.literal(true), // reattach is always streaming
streaming: z.boolean(),
connectionOptions: AixWire_API.ConnectionOptionsChatGenerate_schema.pick({ debugDispatchRequest: true }).optional(), // debugDispatchRequest
}))
.mutation(async function* ({ input, ctx }) {
const _d = _createDebugConfig(input.access, input.connectionOptions, input.context.name);
const dispatchCreator = () => createChatGenerateResumeDispatch(input.access, input.upstreamHandle, input.streaming);
yield* executeChatGenerateWithContinuation(dispatchCreator, input.streaming, ctx.reqSignal, _d);
yield* executeChatGenerateWithContinuation(dispatchCreator, ctx.reqSignal, _d);
}),
/**
+12 -1
View File
@@ -104,11 +104,20 @@ export namespace AixWire_Parts {
openai: z.object({
// Responses API reasoning item continuity handle. Sub-object mirrors the shape of the source output item
// and parallels _vnd Anthropic's { container: { id, expiresAt } } pattern.
// IMPORTANT: this blob is OpenAI-server-encrypted; do NOT round-trip to xAI (different keys + private item ids).
reasoningItem: z.object({
id: z.string().optional(), // rs_... - item id
encryptedContent: z.string().optional(), // blob returned when include:['reasoning.encrypted_content']
}).optional(),
}).optional(),
xai: z.object({
// xAI Responses API reasoning item continuity handle. Same WIRE shape as OpenAI's, but the encrypted_content
// is encrypted with xAI's keys and the item id references xAI server state - NOT cross-portable to OpenAI.
reasoningItem: z.object({
id: z.string().optional(),
encryptedContent: z.string().optional(),
}).optional(),
}).optional(),
// NOTE: we do NOT use this mechanism for per-vendor customization/ALT for parts
// anthropic: z.object({
// containerUpload: z.object({
@@ -507,6 +516,7 @@ export namespace AixWire_API {
// Gemini
vndGeminiAPI: z.enum(['interactions-agent']).optional(), // opt-in per-model API dialect; unset = generateContent
vndGeminiAgentViz: z.enum(['auto', 'off']).optional(), // agent_config.visualization; default 'auto' upstream
vndGeminiAspectRatio: z.enum(['1:1', '2:3', '3:2', '3:4', '4:3', '9:16', '16:9', '21:9']).optional(),
vndGeminiCodeExecution: z.enum(['auto']).optional(),
vndGeminiComputerUse: z.enum(['browser']).optional(),
@@ -689,7 +699,7 @@ export namespace AixWire_Particles {
export type ChatControlOp =
// | { cg: 'start' } // not really used for now
| { cg: 'end', terminationReason: CGEndReason /* we know why we're sending 'end' */, tokenStopReason?: GCTokenStopReason /* we may or not have gotten a logical token stop reason from the dispatch */ }
| { cg: 'end', terminationReason: CGEndReason /* we know why we're sending 'end' */, tokenStopReason?: GCTokenStopReason /* we may or not have gotten a logical token stop reason from the dispatch */, tokenStopError?: string /* optional vendor-composed human-readable detail paired with tokenStopReason */ }
| { cg: 'issue', issueId: CGIssueId, issueText: string }
| { cg: 'aix-info', ait: 'flow-cont' /* important: establishes a checkpoint */, text: string }
| { cg: 'aix-retry-reset', rScope: 'srv-dispatch' | 'srv-op' | 'cli-ll', rClearStrategy: 'none' | 'since-checkpoint' | 'all', reason: string, attempt: number, maxAttempts: number, delayMs: number, causeHttp?: number, causeConn?: string }
@@ -786,6 +796,7 @@ export namespace AixWire_Particles {
| { vendor: 'anthropic', state: { container: { id: string; expiresAt: string } } } // message-level
| { vendor: 'gemini', state: { thoughtSignature: string } } // fragment-level
| { vendor: 'openai', state: { reasoningItem: { id?: string, encryptedContent?: string } } } // fragment-level (attach to ma reasoning fragment)
| { vendor: 'xai', state: { reasoningItem: { id?: string, encryptedContent?: string } } } // fragment-level - DISTINCT from openai (different encryption keys, different server-side ids)
// | { vendor: string, state: Record<string, unknown> } // disable catch-all becasue it forces casts in type discriminations
)
;
@@ -56,6 +56,7 @@ export class ChatGenerateTransmitter implements IParticleTransmitter {
// Token stop reason
private tokenStopReason: AixWire_Particles.GCTokenStopReason | undefined = undefined;
private tokenStopError: string | undefined = undefined;
// Metrics
private accMetrics: AixWire_Particles.CGSelectMetrics | undefined = undefined;
@@ -105,6 +106,7 @@ export class ChatGenerateTransmitter implements IParticleTransmitter {
cg: 'end',
terminationReason: this.terminationReason,
tokenStopReason: this.tokenStopReason, // See NOTE above - || (dispatchOrDialectIssue ? 'cg-issue' : 'ok'),
...(this.tokenStopError && { tokenStopError: this.tokenStopError }),
});
// Keep this in a terminated state, so that every subsequent call will yield errors (not implemented)
// this.terminationReason = null;
@@ -201,12 +203,13 @@ export class ChatGenerateTransmitter implements IParticleTransmitter {
this.setDialectEnded('issue-dialect');
}
setTokenStopReason(reason: AixWire_Particles.GCTokenStopReason) {
setTokenStopReason(reason: AixWire_Particles.GCTokenStopReason, errorText?: string) {
if (SERVER_DEBUG_WIRE)
console.log('|token-stop|', reason);
console.log('|token-stop|', reason, errorText ?? '');
if (this.tokenStopReason && this.tokenStopReason !== reason)
console.warn(`[Aix.${this.prettyDialect}] setTokenStopReason('${reason}'): already has token stop reason '${this.tokenStopReason}' (overriding)`);
this.tokenStopReason = reason;
if (errorText) this.tokenStopError = errorText;
}
@@ -35,7 +35,7 @@ export function aixAnthropicHostedFeatures(model: AixAPI_Model, chatGenerate: Ai
const _hasAixToolRestrictivePolicy = chatGenerate.toolsPolicy?.type === 'any' || chatGenerate.toolsPolicy?.type === 'function_call';
// Dynamic web tools (20260209) require code execution for programmatic tool calling
const hasDynamicWebTools = model.vndAntWebDynamic === true && (model.vndAntWebSearch === 'auto' || model.vndAntWebFetch === 'auto');
// const hasDynamicWebTools = model.vndAntWebDynamic === true && (model.vndAntWebSearch === 'auto' || model.vndAntWebFetch === 'auto');
// Programmatic Tool Calling - tools with allowed_callers or input_examples
const programmaticToolCalling = chatGenerate.tools?.some(tool =>
@@ -45,10 +45,17 @@ export function aixAnthropicHostedFeatures(model: AixAPI_Model, chatGenerate: Ai
),
) ?? false;
// [Anthropic, issue #1087] Dynamic web tools (20260209) have INTERNAL code execution. We do not
// explicitly add the code_execution tool nor the beta header for them: Anthropic enables what is
// needed implicitly behind the scenes.
return {
disableAllHostedTools: !!(_hasAixCustomTools && _hasAixToolRestrictivePolicy),
enable1MContext: model.vndAnt1MContext === true,
enableCodeExecution: !!model.vndAntSkills || !!model.vndAntContainerId || hasDynamicWebTools || programmaticToolCalling,
enableCodeExecution:
!!model.vndAntSkills ||
// || hasDynamicWebTools // https://platform.claude.com/docs/en/agents-and-tools/tool-use/server-tools#dynamic-filtering-with-code-execution
// || !!model.vndAntContainerId // do not re-enable code execution jsut for continuity - would have parasitic effects: https://github.com/enricoros/big-AGI/issues/1087#issuecomment-4340352958
programmaticToolCalling,
enableFastMode: model.vndAntInfSpeed === 'fast',
enableSkills: !!model.vndAntSkills,
enableStrictOutputs: !!model.strictJsonOutput || !!model.strictToolInvocations,
@@ -284,7 +291,9 @@ export function aixToAnthropicMessageCreate(model: AixAPI_Model, _chatGenerate:
name: 'tool_search_tool_bm25',
});
// Code Execution tool - required for dynamic filtering, Skills, etc.
// Code Execution tool - for Skills, container reuse, and Programmatic Tool Calling.
// Note: NOT added for dynamic web tools (_20260209) - they execute code internally and adding
// a standalone environment confuses the model (issue #1087).
if (enableCodeExecution)
hostedTools.push({ type: 'code_execution_20260120', name: 'code_execution' });
@@ -415,8 +424,10 @@ function* _generateAnthropicMessagesContentBlocks({ parts, role }: AixMessages_C
break;
case 'ma':
if (!part.aText && !part.textSignature && !part.redactedData)
throw new Error('Extended Thinking data is missing');
if (!part.aText && !part.textSignature && !part.redactedData) {
console.warn('Anthropic: broken empty thinking block', { part });
break;
}
if (part.aText && part.textSignature)
yield { role: 'assistant', content: AnthropicWire_Blocks.ThinkingBlock(part.aText, part.textSignature) };
for (const redactedData of part.redactedData || [])
@@ -86,7 +86,8 @@ export function aixToGeminiInteractionsCreate(model: AixAPI_Model, chatGenerateR
agent_config: {
type: 'deep-research',
thinking_summaries: 'auto', // Enable thought_summary blocks - without this the API would not emit summaries during streaming
// visualization defaults to 'auto' upstream; leave unset to keep the default (agent may generate charts/images).
// visualization: forwarded only when the client explicitly opts out; 'auto' (default) is left unset so the agent may generate charts/images.
...(model.vndGeminiAgentViz === 'off' && { visualization: 'off' }),
},
}),
// non-DR agents: use native system_instruction field (matches gemini.generateContent.ts convention)
@@ -37,6 +37,7 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
const chatGenerate = aixSpillSystemToUser(_chatGenerate);
// Dialect incompatibilities -> Hotfixes
// [DeepSeek, 2026-04-24] V4 doesn't require strict alternation but we keep coalescing for cleanliness; the reducer only merges assistant/user, tool messages stay separate (parallel tool_calls).
const hotFixAlternateUserAssistantRoles = openAIDialect === 'deepseek' || openAIDialect === 'perplexity';
const hotFixRemoveEmptyMessages = openAIDialect === 'moonshot' || openAIDialect === 'perplexity'; // [Moonshot, 2026-02-10] consecutive assistant messages (empty + content) break Moonshot - coalesce to fix
const hotFixRemoveStreamOptions = openAIDialect === 'azure' || openAIDialect === 'mistral';
@@ -59,7 +60,7 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
throw new Error('This service does not support function calls');
// Convert the chat messages to the OpenAI 4-Messages format
let chatMessages = _toOpenAIMessages(chatGenerate.systemMessage, chatGenerate.chatSequence, hotFixOpenAIOFamily);
let chatMessages = _toOpenAIMessages(openAIDialect, chatGenerate.systemMessage, chatGenerate.chatSequence, hotFixOpenAIOFamily);
// Apply hotfixes
@@ -69,6 +70,13 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
if (hotFixAlternateUserAssistantRoles)
chatMessages = _fixAlternateUserAssistantRoles(chatMessages);
// [DeepSeek, 2026-04-24] When tools are present and thinking isn't disabled, V4 demands reasoning_content on EVERY assistant message in history
// Inject '' placeholder where missing; real reasoning is attached by _toOpenAIMessages
if (openAIDialect === 'deepseek' && chatGenerate.tools?.length)
for (const m of chatMessages)
if (m.role === 'assistant' && m.reasoning_content === undefined)
m.reasoning_content = '';
// constrained output modes - both JSON and tool invocations
// const strictJsonOutput = !!model.strictJsonOutput;
@@ -145,18 +153,23 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
&& openAIDialect !== 'deepseek' && openAIDialect !== 'moonshot' && openAIDialect !== 'zai' // MoonShot maps to none->disabled / high->enabled
&& openAIDialect !== 'perplexity' // Perplexity has its own block below with stricter validation
) {
if (reasoningEffort === 'max') // domain validation
throw new Error(`OpenAI ChatCompletions API does not support '${reasoningEffort}' reasoning effort`);
// for: 'alibaba' | 'azure' | 'groq' | 'lmstudio' | 'localai' | 'mistral' | 'openai' | 'openpipe' | 'togetherai' | 'xai'
payload.reasoning_effort = reasoningEffort;
}
// [Moonshot] Kimi K2.5 reasoning effort -> thinking mode (only 'none' and 'high' supported for now)
// [Z.ai] GLM thinking mode: binary enabled/disabled (supports GLM-4.5 series and higher) - https://docs.z.ai/guides/capabilities/thinking-mode
// [DeepSeek, 2026-04-23] V4 thinking control https://api-docs.deepseek.com/guides/thinking_mode
if (reasoningEffort && (openAIDialect === 'deepseek' || openAIDialect === 'moonshot' || openAIDialect === 'zai')) {
if (reasoningEffort !== 'none' && reasoningEffort !== 'high') // domain validation
throw new Error(`${openAIDialect} only supports reasoning effort 'none' or 'high', got '${reasoningEffort}'`);
const allowedEffort = openAIDialect === 'deepseek' ? ['none', 'high', 'max'] : ['none', 'high'];
if (!allowedEffort.includes(reasoningEffort)) // domain validation
throw new Error(`${openAIDialect} only supports reasoning effort ${allowedEffort.join(', ')}, got '${reasoningEffort}'`);
payload.thinking = { type: reasoningEffort === 'none' ? 'disabled' : 'enabled' };
payload.thinking = { type: reasoningEffort !== 'none' ? 'enabled' : 'disabled' };
// [DeepSeek, 2026-04-23] DeepSeek also supports effort control for reasoning-enabled requests - set it here as it was carved from the reasoningEffort setter before
if (openAIDialect === 'deepseek' && reasoningEffort !== 'none')
payload.reasoning_effort = reasoningEffort;
}
@@ -348,19 +361,23 @@ function _fixAlternateUserAssistantRoles(chatMessages: TRequestMessages): TReque
};
}
// if the current item has the same role as the last item, concatenate their content
// If current item has the same role as the last, coalesce ONLY assistant/user.
// Tool/system/developer must stay separate - tool messages each pair with a tool_call_id; merging corrupts the protocol.
if (acc.length > 0) {
const lastItem = acc[acc.length - 1];
if (lastItem.role === historyItem.role) {
if (lastItem.role === 'assistant') {
lastItem.content += hotFixSquashTextSeparator + historyItem.content;
} else if (lastItem.role === 'user') {
return acc;
}
if (lastItem.role === 'user') {
lastItem.content = [
...(Array.isArray(lastItem.content) ? lastItem.content : [OpenAIWire_ContentParts.TextContentPart(lastItem.content)]),
...(Array.isArray(historyItem.content) ? historyItem.content : historyItem.content ? [OpenAIWire_ContentParts.TextContentPart(historyItem.content)] : []),
];
return acc;
}
return acc;
// fall through to push for tool/system/developer - each stays its own message
}
}
@@ -442,7 +459,10 @@ function _fixVndOaiRestoreMarkdown_Inline(payload: TRequest) {
}*/
function _toOpenAIMessages(systemMessage: AixMessages_SystemMessage | null, chatSequence: AixMessages_ChatMessage[], hotFixOpenAIo1Family: boolean): TRequestMessages {
function _toOpenAIMessages(openAIDialect: OpenAIDialects, systemMessage: AixMessages_SystemMessage | null, chatSequence: AixMessages_ChatMessage[], hotFixOpenAIo1Family: boolean): TRequestMessages {
// [DeepSeek, 2026-04-24] V4 thinking-by-default - reasoning_content must round-trip on tool-call turns; payload is the 'ma' part's aText (unlike Gemini/OpenAI-Responses which carry opaque handles).
const echoDeepseekReasoning = openAIDialect === 'deepseek';
// Transform the chat messages into OpenAI's format (an array of 'system', 'user', 'assistant', and 'tool' messages)
const chatMessages: TRequestMessages = [];
@@ -555,6 +575,8 @@ function _toOpenAIMessages(systemMessage: AixMessages_SystemMessage | null, chat
break;
case 'model':
// Accumulate 'ma' reasoning text across this turn; echoed below onto the assistant message if it carries tool_calls (DeepSeek only).
let pendingReasoningText = '';
for (const part of parts) {
const currentMessage = chatMessages[chatMessages.length - 1];
switch (part.pt) {
@@ -630,7 +652,9 @@ function _toOpenAIMessages(systemMessage: AixMessages_SystemMessage | null, chat
break;
case 'ma':
// ignore this thinking block - Anthropic only
// [DeepSeek only] accumulate reasoning text for the echo-back below. Other dialects ignore 'ma' (reasoning continuity flows via _vnd opaque handles, not via this adapter).
if (echoDeepseekReasoning && part.aType === 'reasoning' && part.aText)
pendingReasoningText += part.aText;
break;
case 'tool_response':
@@ -651,6 +675,18 @@ function _toOpenAIMessages(systemMessage: AixMessages_SystemMessage | null, chat
}
}
// [DeepSeek] attach accumulated reasoning to this turn's assistant message only if it carries tool_calls; plain-text turns don't need the echo per docs.
if (echoDeepseekReasoning && pendingReasoningText) {
for (let i = chatMessages.length - 1; i >= 0; i--) {
const m = chatMessages[i];
if (m.role !== 'assistant') continue;
if (m.tool_calls?.length)
m.reasoning_content = pendingReasoningText;
break; // stop at the most recent assistant message from this turn
}
}
break;
}
}
@@ -495,12 +495,15 @@ function _toOpenAIResponsesRequestInput(systemMessage: AixMessages_SystemMessage
case 'ma':
// Preserve reasoning continuity across turns via _vnd.openai.reasoningItem (set by openai.responses.parser).
// Stateless (store=false, our default): encryptedContent is the protocol-critical blob for the provider to reconstruct internal reasoning state.
// Round-trip ONLY when both encrypted_content AND id are present (canonical, complete handle).
// - bare id without EC -> 404 "Item with id rs_... not found" in stateless mode
// - bare EC without id -> torn handle, undefined behavior across providers/versions
// Defense-in-depth: matches the parser's capture gate; rejects torn handles even if any sneak through.
// ma fragments without an openai handle are common (e.g., DeepSeek reasoning_content emits ma fragments
// with no continuity blob) - skip without warning to avoid log noise on cross-vendor history.
const oaiReasoning = modelPart._vnd?.openai?.reasoningItem;
if (oaiReasoning?.encryptedContent || oaiReasoning?.id)
if (oaiReasoning?.encryptedContent && oaiReasoning?.id)
newReasoningMessage(oaiReasoning.id, oaiReasoning.encryptedContent);
else
console.warn('[DEV] OpenAI Responses: skipping reasoning item due to missing encrypted content and id', { modelPart });
break;
case 'tool_response':
@@ -8,7 +8,7 @@ import { aixSpillShallFlush, aixSpillSystemToUser, approxDocPart_To_String } fro
// configuration
const AIX_XAI_ADD_ENCRYPTED_REASONING = false;
const AIX_XAI_ADD_ENCRYPTED_REASONING = true;
// const AIX_XAI_ADD_INLINE_CITATIONS = true; // yes but we don't know how yet
@@ -99,13 +99,13 @@ export function aixToXAIResponses(
if (reasoningEffort === 'none' || reasoningEffort === 'minimal' || reasoningEffort === 'xhigh' || reasoningEffort === 'max') // domain validation
throw new Error(`XAI Responses API does not support reasoning effort '${reasoningEffort}'`);
if (reasoningEffort) {
payload.reasoning = {
effort: reasoningEffort,
// generate_summary: unsupported
// summary: unsupported, defaults to 'detailed'
};
}
// Always request detailed reasoning summaries - grok-4.3 and others have always-on reasoning
// but only return summary text when explicitly requested. Also set effort when configured
// (only grok-4.20-multi-agent supports effort).
payload.reasoning = {
...(reasoningEffort ? { effort: reasoningEffort } : {}),
summary: 'detailed',
};
// Add include options for reasoning and specialized for tool sources
if (AIX_XAI_ADD_ENCRYPTED_REASONING)
@@ -329,12 +329,15 @@ function _toXAIResponsesInput(
break;
case 'ma':
// xAI reuses the OpenAI Responses continuity namespace (_vnd.openai.reasoningItem).
// Only active when AIX_XAI_ADD_ENCRYPTED_REASONING is enabled and encrypted_content is captured;
// otherwise the handle is absent and we skip to avoid "Item with id rs_... not found" style errors.
const oaiReasoning = part._vnd?.openai?.reasoningItem;
if (oaiReasoning?.encryptedContent || oaiReasoning?.id)
newReasoningItem(oaiReasoning.id, oaiReasoning.encryptedContent);
// xAI uses its OWN _vnd namespace - the wire schema mirrors OpenAI's, but encrypted_content is
// encrypted with xAI-private keys and the rs_... id references xAI-private server state. Crossing
// these (e.g., replaying an OpenAI handle to xAI or vice versa) yields "Item with id rs_... not
// found" or silent reasoning corruption.
// Round-trip ONLY when both encrypted_content AND id are present (canonical, complete handle).
// Defense-in-depth: matches the parser's capture gate; rejects torn handles even if any sneak through.
const xaiReasoning = part._vnd?.xai?.reasoningItem;
if (xaiReasoning?.encryptedContent && xaiReasoning?.id)
newReasoningItem(xaiReasoning.id, xaiReasoning.encryptedContent);
break;
case 'tool_response':
@@ -55,7 +55,6 @@ export class DispatchContinuationSignal extends Error {
*/
export async function* executeChatGenerateWithContinuation(
dispatchCreatorFn: () => Promise<ChatGenerateDispatch>,
streaming: boolean,
abortSignal: AbortSignal,
_d: AixDebugObject,
): AsyncGenerator<AixWire_Particles.ChatGenerateOp, void> {
@@ -65,7 +64,7 @@ export async function* executeChatGenerateWithContinuation(
for (let turn = 0; turn <= MAX_CONTINUATION_TURNS; turn++) {
try {
yield* executeChatGenerateWithOperationRetry(currentCreator, streaming, abortSignal, _d);
yield* executeChatGenerateWithOperationRetry(currentCreator, abortSignal, _d);
return; // normal completion
} catch (error) {
@@ -25,7 +25,7 @@ import { createAnthropicFileInlineTransform } from './parsers/anthropic.transfor
import { createAnthropicMessageParser, createAnthropicMessageParserNS } from './parsers/anthropic.parser';
import { createBedrockConverseParserNS, createBedrockConverseStreamParser } from './parsers/bedrock-converse.parser';
import { createGeminiGenerateContentResponseParser } from './parsers/gemini.parser';
import { createGeminiInteractionsParser } from './parsers/gemini.interactions.parser';
import { createGeminiInteractionsParserNS, createGeminiInteractionsParserSSE } from './parsers/gemini.interactions.parser';
import { createOpenAIChatCompletionsChunkParser, createOpenAIChatCompletionsParserNS } from './parsers/openai.parser';
import { createOpenAIResponseParserNS, createOpenAIResponsesEventParser } from './parsers/openai.responses.parser';
@@ -37,7 +37,8 @@ export type ChatGenerateDispatch = {
/** Used by dialects that need multi-step I/O. The returned response is consumed normally via demuxerFormat/chatGenerateParse */
customConnect?: (signal: AbortSignal) => Promise<Response>;
bodyTransform?: AixDemuxers.StreamBodyTransform;
demuxerFormat: AixDemuxers.StreamDemuxerFormat;
/** Source of truth for the consumer mode: null = NS */
demuxerFormat: null | AixDemuxers.StreamDemuxerFormat;
chatGenerateParse: ChatGenerateParseFunction;
particleTransform?: ChatGenerateParticleTransformFunction;
};
@@ -173,6 +174,7 @@ export async function createChatGenerateDispatch(access: AixAPI_Access, model: A
// [Gemini Interactions API - ALPHA TEST] SSE-native: POST with stream=true, upstream returns event-stream we pipe through the fast-sse demuxer.
if (model.vndGeminiAPI === 'interactions-agent') {
if (!streaming) console.warn(`[DEV] Gemini Interactions API - only supported in SSE mode, ignoring streaming=false for model ${model.id}`);
const request: ChatGenerateDispatchRequest = {
...geminiAccess(access, null, GeminiInteractionsWire_API_Interactions.postPath, false),
method: 'POST',
@@ -186,8 +188,9 @@ export async function createChatGenerateDispatch(access: AixAPI_Access, model: A
if (signal.aborted) throw error; // preserve abort identity for the executor's abort classifier
throw new Error(`Gemini Interactions POST: ${error?.message || 'upstream error'}`); // rewrapping TRPCFetcherError as plain Error makes the retrier treat it as non-retryable
}),
/** Upstream hardcodes stream=true + background=true (required by deep-research agents) and has no non-streaming alternative. */
demuxerFormat: 'fast-sse',
chatGenerateParse: createGeminiInteractionsParser(requestedModelName),
chatGenerateParse: createGeminiInteractionsParserSSE(requestedModelName),
};
}
@@ -244,9 +247,9 @@ export async function createChatGenerateDispatch(access: AixAPI_Access, model: A
case 'zai':
// newer: OpenAI Responses API, for models that support it and all XAI models
const isResponsesAPI = !!model.vndOaiResponsesAPI;
const isXAIModel = dialect === 'xai'; // All XAI models are accessed via Responses now
if (isResponsesAPI || isXAIModel) {
const isResponsesAPI = !!model.vndOaiResponsesAPI || isXAIModel;
if (isResponsesAPI) {
return {
request: {
...openAIAccess(access, model.id, OPENAI_API_PATHS.responses),
@@ -261,11 +264,17 @@ export async function createChatGenerateDispatch(access: AixAPI_Access, model: A
*
* Note: Response format is compatible with OpenAI parser.
*/
body: isXAIModel ? aixToXAIResponses(model, chatGenerate, streaming, enableResumability)
body: isXAIModel
? aixToXAIResponses(model, chatGenerate, streaming, enableResumability)
: aixToOpenAIResponses(dialect, model, chatGenerate, streaming, enableResumability),
},
demuxerFormat: streaming ? 'fast-sse' : null,
chatGenerateParse: streaming ? createOpenAIResponsesEventParser() : createOpenAIResponseParserNS(),
// IMPORTANT: tag the parser with the actual vendor so reasoning continuity blobs
// (encrypted_content + rs_... id) land in the matching _vnd namespace and never leak
// across providers (different keys + different server-side state).
chatGenerateParse: streaming
? createOpenAIResponsesEventParser(isXAIModel ? 'xai' : 'openai')
: createOpenAIResponseParserNS(isXAIModel ? 'xai' : 'openai'),
};
}
@@ -316,18 +325,20 @@ export async function createChatGenerateResumeDispatch(access: AixAPI_Access, re
return {
request: { url: `${url}?${queryParams.toString()}`, method: 'GET', headers },
demuxerFormat: streaming ? 'fast-sse' : null,
chatGenerateParse: streaming ? createOpenAIResponsesEventParser() : createOpenAIResponseParserNS(),
chatGenerateParse: streaming ? createOpenAIResponsesEventParser('openai') : createOpenAIResponseParserNS('openai'),
};
case 'gemini': {
// [Gemini Interactions] Reattach via SSE stream - GET /interactions/{id}?stream=true replays all events from the start (intentional - client's ContentReassembler replaces message content on reattach; partial resume via last_event_id is deliberately NOT used).
// [Gemini Interactions] Reattach: SSE replay (?stream=true) or JSON snapshot (no query). See kb/modules/LLM-gemini-interactions.md.
if (resumeHandle.uht !== 'vnd.gem.interactions')
throw new Error(`Resume handle mismatch for gemini: expected 'vnd.gem.interactions', got '${resumeHandle.uht}'`);
const { url: _baseUrl, headers: _headers } = geminiAccess(access, null, GeminiInteractionsWire_API_Interactions.getPath(resumeHandle.runId /* Gemini interaction.id */), false);
return {
request: { url: `${_baseUrl}${_baseUrl.includes('?') ? '&' : '?'}stream=true`, method: 'GET', headers: _headers },
demuxerFormat: 'fast-sse',
chatGenerateParse: createGeminiInteractionsParser(null /* model name unknown at resume time - caller's DMessage already has it */),
request: { url: streaming ? `${_baseUrl}${_baseUrl.includes('?') ? '&' : '?'}stream=true` : _baseUrl, method: 'GET', headers: _headers },
demuxerFormat: streaming ? 'fast-sse' : null,
chatGenerateParse: streaming
? createGeminiInteractionsParserSSE(null /* model name unknown at resume time - caller's DMessage already has it */)
: createGeminiInteractionsParserNS(null),
};
}
@@ -393,6 +404,21 @@ export async function executeChatGenerateDelete(access: AixAPI_Access, handle: A
case 'gemini':
if (handle.uht !== 'vnd.gem.interactions')
throw new Error(`Delete handle mismatch for gemini: expected 'vnd.gem.interactions', got '${handle.uht}'`);
// Gemini: cancel the background run first (stops token generation), then DELETE the stored record.
// The DELETE endpoint only removes the resource; it does NOT cancel an in-flight run.
// Cancel may 404 "Method not found" on the Developer API (API-key mode, googleapis/python-genai#1971) -
// we log the outcome and proceed to DELETE so local cleanup still happens.
const { url: cancelUrl, headers: cancelHeaders } = geminiAccess(access, null, GeminiInteractionsWire_API_Interactions.cancelPath(handle.runId), false);
try {
const cancelResp = await fetchResponseOrTRPCThrow({ url: cancelUrl, method: 'POST', body: {}, headers: cancelHeaders, signal: abortSignal, name: 'Aix.Gemini.Interactions.cancel', throwWithoutName: true });
console.log(`[AIX] Gemini.Interactions.cancel: ok=${cancelResp.ok} status=${cancelResp.status}`);
} catch (error: any) {
if (abortSignal.aborted) throw error;
const status = error instanceof TRPCFetcherError ? error.httpStatus : undefined;
console.log(`[AIX] Gemini.Interactions.cancel: failed status=${status ?? '?'} msg=${error?.message ?? 'unknown'}`);
}
({ url, headers } = geminiAccess(access, null, GeminiInteractionsWire_API_Interactions.deletePath(handle.runId), false));
name = 'Aix.Gemini.Interactions.delete';
break;
@@ -26,7 +26,6 @@ import { heartbeatsWhileAwaiting } from '../heartbeatsWhileAwaiting';
*/
export async function* executeChatGenerateDispatch(
dispatchCreatorFn: () => Promise<ChatGenerateDispatch>,
streaming: boolean,
intakeAbortSignal: AbortSignal,
_d: AixDebugObject,
parseContext?: { retriesAvailable: boolean },
@@ -59,7 +58,7 @@ export async function* executeChatGenerateDispatch(
const innerStream = (async function* () {
// Consume dispatch response
if (!streaming)
if (dispatch.demuxerFormat === null /* NS */)
yield* _consumeDispatchUnified(dispatchResponse, dispatch.chatGenerateParse, chatGenerateTx, _d, parseContext);
else
yield* _consumeDispatchStream(dispatchResponse, dispatch.bodyTransform ?? null, dispatch.demuxerFormat, dispatch.chatGenerateParse, chatGenerateTx, _d, parseContext);
@@ -44,7 +44,6 @@ export class OperationRetrySignal extends Error {
*/
export async function* executeChatGenerateWithOperationRetry(
dispatchCreatorFn: () => Promise<ChatGenerateDispatch>,
streaming: boolean,
abortSignal: AbortSignal,
_d: AixDebugObject,
): AsyncGenerator<AixWire_Particles.ChatGenerateOp, void> {
@@ -55,7 +54,7 @@ export async function* executeChatGenerateWithOperationRetry(
while (true) {
try {
yield* executeChatGenerateDispatch(dispatchCreatorFn, streaming, abortSignal, _d, {
yield* executeChatGenerateDispatch(dispatchCreatorFn, abortSignal, _d, {
retriesAvailable: attemptNumber < maxAttempts,
});
@@ -15,8 +15,8 @@ export interface IParticleTransmitter {
/** End the current part and flush it, which also calls `setDialectEnded('issue-dialect')` */
setDialectTerminatingIssue(dialectText: string, symbol: string | null, serverLog: ParticleServerLogLevel): void;
/** Communicates the finish reason to the client - Data only, this does not do Control, like the above */
setTokenStopReason(reason: AixWire_Particles.GCTokenStopReason): void;
/** Communicates the finish reason to the client - Data only. Optional `errorText` is a vendor-composed string rendered as a complementary error fragment alongside the generic classification message. */
setTokenStopReason(reason: AixWire_Particles.GCTokenStopReason, errorText?: string): void;
// Parts data //
@@ -404,7 +404,7 @@ export function createAnthropicMessageParser(): ChatGenerateParseFunction {
// -> Token Stop Reason
const tokenStopReason = _fromAnthropicStopReason(delta.stop_reason, 'message_delta');
if (tokenStopReason !== null)
pt.setTokenStopReason(tokenStopReason);
pt.setTokenStopReason(tokenStopReason, _formatAnthropicStopError(delta.stop_details));
// NOTE: we have more fields we're not parsing yet - https://platform.claude.com/docs/en/api/typescript/messages#message_delta_usage
if (usage?.output_tokens && messageStartTime) {
@@ -511,6 +511,7 @@ export function createAnthropicMessageParserNS(): ChatGenerateParseFunction {
content,
container,
stop_reason,
stop_details,
usage,
} = AnthropicWire_API_Message_Create.Response_schema.parse(JSON.parse(fullData));
@@ -653,7 +654,7 @@ export function createAnthropicMessageParserNS(): ChatGenerateParseFunction {
// -> Token Stop Reason (pause_turn already thrown above)
const tokenStopReason = _fromAnthropicStopReason(stop_reason, 'parser_NS');
if (tokenStopReason !== null)
pt.setTokenStopReason(tokenStopReason);
pt.setTokenStopReason(tokenStopReason, _formatAnthropicStopError(stop_details));
};
}
@@ -681,6 +682,19 @@ function _emitContainerState(pt: IParticleTransmitter, container: { id: string;
});
}
/** Compose a human-readable error string from Anthropic's stop_details. Returns undefined when nothing useful to surface. */
function _formatAnthropicStopError(stopDetails: { type: string; category?: string | null; explanation?: string | null } | null | undefined): string | undefined {
if (!stopDetails) return undefined;
if (stopDetails.type !== 'refusal') {
aixResilientUnknownValue('Anthropic', 'stopDetailsType', stopDetails.type);
return undefined;
}
const parts: string[] = [];
if (stopDetails.category) parts.push(`[${stopDetails.category}]`);
if (stopDetails.explanation) parts.push(stopDetails.explanation);
return parts.length ? `Refusal: ${parts.join(' ')}` : undefined;
}
// --- Shared server tool result handlers (used by both S and NS parsers) ---
@@ -5,6 +5,7 @@ import type { ChatGenerateParseFunction } from '../chatGenerate.dispatch';
import type { IParticleTransmitter } from './IParticleTransmitter';
import { GeminiInteractionsWire_API_Interactions } from '../../wiretypes/gemini.interactions.wiretypes';
import { IssueSymbols } from '../ChatGenerateTransmitter';
import { geminiConvertPCM2WAV } from './gemini.audioutils';
@@ -44,7 +45,7 @@ type BlockState = {
* the cursor (or from start if omitted). Our parser is position-idempotent within a single run
* because the transmitter's state carries across events.
*/
export function createGeminiInteractionsParser(requestedModelName: string | null): ChatGenerateParseFunction {
export function createGeminiInteractionsParserSSE(requestedModelName: string | null): ChatGenerateParseFunction {
const parserCreationTimestamp = Date.now();
let timeToFirstContent: number | undefined;
@@ -150,6 +151,9 @@ export function createGeminiInteractionsParser(requestedModelName: string | null
if (!deltaParse.success) {
// Empty deltas ({}) appear alongside placeholder blocks (e.g. internal tool slots) - silent skip
if (event.delta && Object.keys(event.delta).length === 0) break;
// Known-but-not-surfaced delta types (mirrors NS parser's INTERNAL_OUTPUT_TYPES policy + spec's document/video variants we don't model) - silent skip
const deltaType = (event.delta as { type?: string })?.type;
if (deltaType && (GeminiInteractionsWire_API_Interactions.INTERNAL_OUTPUT_TYPES.has(deltaType) || deltaType === 'document' || deltaType === 'video')) break;
console.warn('[GeminiInteractions] unknown content.delta shape at index', event.index, event.delta);
break;
}
@@ -218,11 +222,16 @@ export function createGeminiInteractionsParser(requestedModelName: string | null
}
case 'error':
// Observed mid-stream with an empty payload between content blocks - non-fatal, the stream
// continues with further events and eventually an interaction.complete. Silent-skip empty
// payloads (Beta noise); warn only when actual error info is present.
if (event.error?.message || event.error?.code)
console.warn('[GeminiInteractions] SSE error event:', event.error);
// Two observed shapes:
// 1) Empty payload mid-stream (Beta noise): the stream continues with further events and
// eventually an interaction.complete - silent-skip.
// 2) Populated payload with message/code: terminal upstream error (also how Gemini reports
// cancelled interactions: HTTP 500 to the cancel call + an error SSE on the stream).
// Surface as a dialect-terminating issue so the UI renders it and the stream ends cleanly.
if (event.error?.message || event.error?.code) {
const errorText = `${event.error.code ? `${event.error.code}: ` : ''}${event.error.message || 'Upstream error.'}`;
pt.setDialectTerminatingIssue(errorText, IssueSymbols.Generic, 'srv-warn');
}
break;
default: {
@@ -235,6 +244,192 @@ export function createGeminiInteractionsParser(requestedModelName: string | null
}
/**
* Non-streaming parser: reads the GET /v1beta/interactions/{id} JSON body once and emits the same
* particles the SSE parser would, in a single batch.
*
* Used by the "Recover" path when SSE delivery is broken upstream (10-min cuts; see KB doc) but the
* resource is still fetchable. We always re-emit the upstream handle so failed/in_progress runs
* remain retryable; only `status: completed` clears it (via the reassembler's outcome=='completed' policy).
*
* See `kb/modules/LLM-gemini-interactions.md` for failure modes and recovery model.
*/
export function createGeminiInteractionsParserNS(requestedModelName: string | null): ChatGenerateParseFunction {
const parserCreationTimestamp = Date.now();
return function parse(pt: IParticleTransmitter, rawEventData: string, _eventName?: string): void {
// model name (preserved from caller's DMessage on resume; first-call only on fresh fetches)
if (requestedModelName != null)
pt.setModelName(requestedModelName);
// parse + validate against the Interaction resource schema (looseObject - tolerant to upstream additions)
let rawJson: unknown;
try {
rawJson = JSON.parse(rawEventData);
} catch (e: any) {
throw new Error(`malformed Interaction JSON: ${e?.message || String(e)}`);
}
const parsed = GeminiInteractionsWire_API_Interactions.Interaction_schema.safeParse(rawJson);
if (!parsed.success) {
console.warn('[GeminiInteractions-NS] unexpected Interaction shape:', rawJson);
throw new Error('Gemini Interactions: unexpected resource shape (no `id`/`status` fields)');
}
const interaction = parsed.data;
// upstream handle - preserve so user can retry / delete
pt.setUpstreamHandle(interaction.id, 'vnd.gem.interactions');
// Walk outputs in order. Each output is loose; we safeParse against KnownOutput_schema and
// silently skip INTERNAL_OUTPUT_TYPES (tool calls/results). Order matters - thoughts and
// text interleave in the report and the user reads them top-to-bottom.
const outputs = interaction.outputs ?? [];
let lastEmittedKind: 'thought' | 'text' | 'image' | 'audio' | null = null;
for (const rawOut of outputs) {
const outType = (rawOut as { type?: string })?.type;
// silent-skip internal tool-call outputs (matches SSE parser policy for INTERNAL_OUTPUT_TYPES)
if (outType && GeminiInteractionsWire_API_Interactions.INTERNAL_OUTPUT_TYPES.has(outType))
continue;
const knownOut = GeminiInteractionsWire_API_Interactions.KnownOutput_schema.safeParse(rawOut);
if (!knownOut.success) {
if (outType) console.warn('[GeminiInteractions-NS] unknown output type, skipping:', outType);
continue;
}
// emit a part boundary when switching kinds, mirrors SSE behavior on content.start across indices
if (lastEmittedKind !== null && lastEmittedKind !== knownOut.data.type)
pt.endMessagePart();
switch (knownOut.data.type) {
case 'thought': {
const summary = knownOut.data.summary;
if (typeof summary === 'string') {
if (summary) pt.appendReasoningText(summary);
} else if (Array.isArray(summary)) {
for (const item of summary)
if (item.text) pt.appendReasoningText(item.text);
}
if (knownOut.data.signature)
pt.setReasoningSignature(knownOut.data.signature);
lastEmittedKind = 'thought';
break;
}
case 'text': {
if (knownOut.data.text)
pt.appendText(knownOut.data.text);
// Citations: matches SSE policy - DISABLE_CITATIONS kill-switch dictates Deep Research drops them
if (!DISABLE_CITATIONS && knownOut.data.annotations) {
for (const annRaw of knownOut.data.annotations) {
const ann = GeminiInteractionsWire_API_Interactions.UrlCitationAnnotation_schema.safeParse(annRaw);
if (!ann.success) continue;
const a = ann.data;
pt.appendUrlCitation(a.title || a.url, a.url, undefined, a.start_index, a.end_index, undefined, undefined);
}
}
lastEmittedKind = 'text';
break;
}
case 'image': {
if (knownOut.data.data && knownOut.data.mime_type)
pt.appendImageInline(knownOut.data.mime_type, knownOut.data.data, 'Gemini Generated Image', 'Gemini', '', true);
else if (knownOut.data.uri)
pt.appendText(`\n[Image: ${knownOut.data.uri}]\n`);
lastEmittedKind = 'image';
break;
}
case 'audio': {
if (knownOut.data.data && knownOut.data.mime_type) {
const mime = knownOut.data.mime_type.toLowerCase();
const isPCM = mime.startsWith('audio/l16') || mime.includes('codec=pcm');
if (isPCM) {
try {
const wav = geminiConvertPCM2WAV(knownOut.data.mime_type, knownOut.data.data);
pt.appendAudioInline(wav.mimeType, wav.base64Data, 'Gemini Generated Audio', 'Gemini', wav.durationMs);
} catch (error) {
console.warn('[GeminiInteractions-NS] audio PCM convert failed:', error);
}
} else {
pt.appendAudioInline(knownOut.data.mime_type, knownOut.data.data, 'Gemini Generated Audio', 'Gemini', 0);
}
}
lastEmittedKind = 'audio';
break;
}
default: {
const _exhaustive: never = knownOut.data;
break;
}
}
}
// close out any open part before the terminal status emission
if (lastEmittedKind !== null) pt.endMessagePart();
// Terminal status -> stop reason + dialect end (mirrors _handleInteractionComplete)
switch (interaction.status) {
case 'completed':
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
pt.setTokenStopReason('ok');
pt.setDialectEnded('done-dialect');
break;
case 'failed':
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
pt.setDialectTerminatingIssue('Deep Research interaction failed', null, 'srv-warn');
break;
case 'cancelled':
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
pt.setTokenStopReason('cg-issue');
pt.setDialectEnded('done-dialect');
break;
case 'incomplete':
pt.appendText('\n_Response incomplete (run stopped early)._\n');
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
pt.setTokenStopReason('out-of-tokens');
pt.setDialectEnded('done-dialect');
break;
case 'requires_action':
pt.setDialectTerminatingIssue('Deep Research returned requires_action (not supported in this client)', null, 'srv-warn');
break;
case 'in_progress': {
// Two scenarios both surface as `in_progress`:
// 1) Run is genuinely live server-side (just slow) - polling later will yield content.
// 2) "Zombie": the generator crashed but the status never transitioned. Stays `in_progress`
// for days with no outputs. Not recoverable - the only remedy is delete + retry.
// We can't disambiguate from one frame, so we surface {created, updated, outputs.length}
// and let the user decide. `tokenStopReason='cg-issue'` keeps the upstream handle alive
// (vs 'ok' which would clear it via the reassembler's clean-completion policy).
// see kb/modules/LLM-gemini-interactions.md#failure-modes (C)
const elapsedMin = _minutesSince(interaction.created);
const updatedMin = _minutesSince(interaction.updated);
const outCount = (interaction.outputs ?? []).length;
const lines: string[] = ['\n_Deep Research run is **`in_progress`** server-side._\n'];
if (elapsedMin != null) lines.push(`- Started: **${_humanDuration(elapsedMin)} ago**`);
if (updatedMin != null && updatedMin !== elapsedMin) lines.push(`- Last server update: **${_humanDuration(updatedMin)} ago**`);
lines.push(`- Outputs so far: **${outCount === 0 ? 'none' : outCount}**`);
// Heuristic threshold: stale-and-empty for >60 min is almost certainly a zombie.
const looksStuck = outCount === 0 && elapsedMin != null && elapsedMin > 60;
if (looksStuck)
lines.push('\nThis run looks **stuck** (no content for over an hour). Click **Cancel** to delete it and try again.');
else
lines.push('\nTry **Recover** again in a few minutes; if it stays empty, click **Cancel** to delete and retry.');
pt.appendText(lines.join('\n') + '\n');
pt.setTokenStopReason('cg-issue');
pt.setDialectEnded('done-dialect');
break;
}
default: {
const _exhaustiveCheck: never = interaction.status;
console.warn('[GeminiInteractions-NS] unreachable status', interaction.status);
break;
}
}
};
}
// --- helpers ---
function _classifyContentKind(rawType: unknown): BlockState['kind'] {
@@ -364,3 +559,22 @@ function _emitUsageMetrics(
pt.updateMetrics(m);
}
/** Minutes elapsed between an upstream ISO 8601 timestamp and now. Returns null on parse failure. */
function _minutesSince(iso: string | undefined | null): number | null {
if (!iso) return null;
const ms = Date.parse(iso);
if (!Number.isFinite(ms)) return null;
return Math.max(0, (Date.now() - ms) / 60_000);
}
/** Human-readable elapsed-time string for in_progress diagnostic messages. */
function _humanDuration(minutes: number): string {
if (minutes < 1) return 'less than a minute';
if (minutes < 60) return `${Math.round(minutes)} min`;
const hours = minutes / 60;
if (hours < 24) return `${Math.round(hours * 10) / 10} hours`;
const days = hours / 24;
return `${Math.round(days * 10) / 10} days`;
}
@@ -494,6 +494,10 @@ export function createOpenAIChatCompletionsParserNS(): ChatGenerateParseFunction
} else if (message.content !== undefined && message.content !== null)
throw new Error(`unexpected message content type: ${typeof message.content}`);
// [DeepSeek, 2026-04-24] Non-streaming reasoning_content -> 'ma' reasoning part (mirror of streaming path above)
if (typeof message.reasoning_content === 'string' && message.reasoning_content)
pt.appendReasoningText(message.reasoning_content);
// [OpenRouter, 2025-01-20] Handle structured reasoning_details
if (Array.isArray(message.reasoning_details)) {
for (const reasoningDetail of message.reasoning_details) {
@@ -18,6 +18,21 @@ const OPENAI_RESPONSES_SAME_PART_SPACER = '\n\n';
const INLINE_IMAGE_SKIP_RESIZE_MAX_B64_BYTES = 250_000; // skip resize for small images (e.g. code interpreter charts)
/**
* Wishlist marker: hosted tool calls (web_search_call, image_generation_call, code_interpreter_call, ...)
* are rendered via ephemeral OperationState/inline-asset paths and are NOT round-tripped as structured
* fragments. This breaks stateless multi-turn with reasoning models. See PRD.FUTURE-atol.md "Wishlist:
* Hosted tool invocations as first-class fragments".
*/
// const _hostedToolWishlistSeen = new Set<string>();
function _hostedToolWishlistHint(family: 'web_search' | 'image_generation' | 'code_interpreter' | 'custom_tool'): void {
// if (_hostedToolWishlistSeen.has(family)) return;
// _hostedToolWishlistSeen.add(family);
// NOTE: disable the log because it's logging all the time evenrwyehre; just implement this
// console.log(`[DEV] AIX: ATOL wishlist - hosted '${family}' call observed; not round-tripped as a structured fragment yet (see kb/product/PRD.FUTURE-atol.md)`);
}
/**
* Safely sanitizes a URL for display in placeholders by removing query parameters and paths
* to prevent leaking sensitive information while keeping the domain recognizable.
@@ -46,6 +61,11 @@ type TEventType = OpenAIWire_API_Responses.StreamingEvent['type'];
// cached config for the image_generation hosted tool, captured at response.created
type TImageGenToolCfg = Extract<OpenAIWire_Responses_Tools.Tool, { type: 'image_generation' }>;
/** Extract the image_generation tool config from the echoed tools array (API does not echo `model` per-item). Shared by streaming and non-streaming paths. */
function _findImageGenToolCfg(tools: TResponse['tools']): TImageGenToolCfg | undefined {
return tools?.find((t): t is TImageGenToolCfg => t.type === 'image_generation');
}
/**
* We need this just to ensure events are not out of order, as out streaming is progressive
@@ -79,6 +99,7 @@ class ResponseParserStateMachine {
// streaming state tracking
#hasFunctionCalls: boolean = false; // tracks if we've seen function_call output items
#responseSealed: boolean = false; // true once response.completed/failed/incomplete has been processed - trailing 'error' events are advisory only
// hosted tool configuration echo (captured at response.created)
#imageGenToolCfg: TImageGenToolCfg | undefined;
@@ -244,12 +265,19 @@ class ResponseParserStateMachine {
return this.#hasFunctionCalls;
}
markResponseSealed() {
this.#responseSealed = true;
}
get responseSealed() {
return this.#responseSealed;
}
// Hosted tool config capture
captureHostedToolConfigs(tools: TResponse['tools']) {
if (!tools?.length) return;
this.#imageGenToolCfg = tools.find((t): t is TImageGenToolCfg => t.type === 'image_generation');
this.#imageGenToolCfg = _findImageGenToolCfg(tools);
}
get imageGenToolCfg() {
@@ -261,8 +289,13 @@ class ResponseParserStateMachine {
/**
* OpenAI Responses API Streaming Parser
*
* @param vendor 'openai' (default) or 'xai' - tags the reasoning continuity handle so it round-trips back
* to the SAME provider. The OpenAI Responses wire format is shared with xAI, but the encrypted_content blob
* and the rs_... id are vendor-server-private (different keys, different state). Mixing them produces
* "Item with id rs_... not found" or worse silent corruption.
*/
export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
export function createOpenAIResponsesEventParser(vendor: 'openai' | 'xai'): ChatGenerateParseFunction {
const R = new ResponseParserStateMachine();
@@ -353,11 +386,13 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
}
// -> End of the response
R.markResponseSealed();
pt.setDialectEnded('done-dialect'); // OpenAI Responses: 'response.completed'
break;
case 'response.failed':
R.setResponse(eventType, event.response);
R.markResponseSealed();
pt.setTokenStopReason('cg-issue'); // generic issue?
console.warn(`[DEV] AIX: FIXME: OpenAI-Response failed ${eventType}:`, event.response);
// TODO: extract and forward error details
@@ -366,6 +401,7 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
case 'response.incomplete':
// TODO: We haven't seen one of those events yet; we need to see what happens and parse it!
R.setResponse(eventType, event.response);
R.markResponseSealed();
// -> Status: handle incomplete response
if (event.response.incomplete_details?.reason === 'max_output_tokens')
@@ -406,22 +442,28 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
// NOTE: the authoritative encrypted_content arrives on .done (differs from the earlier .added event).
const { id: reasoningId, encrypted_content: reasoningEC } = doneItem;
// [DEV] surface cases that diverge from our continuity round-trip expectations
// Capture ONLY when BOTH encrypted_content AND id are present (the canonical reasoning item shape).
// - id-only: refers to server state we don't keep in stateless mode (store: false, our default) -> 404 next turn
// - EC-only: a "torn" handle that breaks future stateful flows and possible id<->EC integrity checks
// - neither: nothing to round-trip
// [DEV] surface divergences from this contract
if (!reasoningId && !reasoningEC)
console.warn('[DEV] AIX: OpenAI Responses: reasoning item done with neither id nor encrypted_content - no continuity handle captured for this turn', { doneItem });
console.warn(`[DEV] AIX: ${vendor} Responses: reasoning item done with neither id nor encrypted_content - no continuity handle captured for this turn`, { doneItem });
else if (!reasoningEC)
console.log('[DEV] AIX: OpenAI Responses: reasoning item done has id but no encrypted_content - stateless round-trip requires include:[\'reasoning.encrypted_content\'] on the request');
console.log(`[DEV] AIX: ${vendor} Responses: reasoning item done has id but no encrypted_content - dropping handle (stateless round-trip requires include:['reasoning.encrypted_content'] on the request)`);
else if (!reasoningId)
console.log(`[DEV] AIX: ${vendor} Responses: reasoning item done has encrypted_content but no id - dropping handle (incomplete reasoning item from upstream)`);
if (reasoningEC || reasoningId) {
if (reasoningEC && reasoningId) {
// Defensive: ensure an ma fragment exists as the attach target for the svs particle below.
pt.appendReasoningText('');
pt.sendSetVendorState({
p: 'svs',
vendor: 'openai',
vendor: vendor,
state: {
reasoningItem: {
...(reasoningId ? { id: reasoningId } : {}),
...(reasoningEC ? { encryptedContent: reasoningEC } : {}),
id: reasoningId,
encryptedContent: reasoningEC,
},
},
});
@@ -448,6 +490,7 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
break;
case 'image_generation_call':
_hostedToolWishlistHint('image_generation');
// -> IGC: process completed image generation using 'ii' particle for inline images
const { id: igId, result: igResult, revised_prompt: igRevisedPrompt } = doneItem;
const igDoneText = !igRevisedPrompt?.length ? 'Image generated'
@@ -698,6 +741,14 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
const errorMessage = safeErrorString(event.error?.message || event?.message) ?? undefined;
const errorParam = safeErrorString(event.error?.param || event?.param) ?? undefined;
// Trailing-error guard: if the response already reached a terminal state (completed/failed/incomplete),
// an 'error' event arriving after is an upstream advisory (e.g. rate-limit headroom) and must NOT
// override the prior termination - otherwise it flips the message to red and the Beam ray to 'error'.
if (R.responseSealed) {
console.warn(`[DEV] AIX: OpenAI Responses: trailing 'error' after sealed response - ignored: ${errorCode || 'Error'}: ${errorMessage || 'unknown.'}${errorParam ? ` (param: ${errorParam})` : ''}`);
break;
}
// Transmit the error as text - note: throw if you want to transmit as 'error'
// FIXME: potential point for throwing OperationRetrySignal (using 'srv-warn' for now)
pt.setDialectTerminatingIssue(`${errorCode || 'Error'}: ${errorMessage || 'unknown.'}${errorParam ? ` (param: ${errorParam})` : ''}`, IssueSymbols.Generic, 'srv-warn');
@@ -740,8 +791,11 @@ export function createOpenAIResponsesEventParser(): ChatGenerateParseFunction {
/**
* OpenAI Responses API Non-Streaming Parser
*
* @param vendor 'openai' (default) or 'xai' - see createOpenAIResponsesEventParser for the rationale on
* why xAI gets its own _vnd namespace (different encryption keys + private item ids).
*/
export function createOpenAIResponseParserNS(): ChatGenerateParseFunction {
export function createOpenAIResponseParserNS(vendor: 'openai' | 'xai'): ChatGenerateParseFunction {
const parserCreationTimestamp = Date.now();
@@ -765,6 +819,9 @@ export function createOpenAIResponseParserNS(): ChatGenerateParseFunction {
if (response.model)
pt.setModelName(response.model);
// -> Hosted tool config capture (needed for enriching done-item particles with tool params the API does not echo per-item, e.g. image_generation.model)
const imageGenToolCfg = _findImageGenToolCfg(response.tools);
// -> Upstream Handle (for remote control: resume, cancel, delete)
// NOTE: we don't do it for full responses, because they're supposed to be 'complete' - i.e. no 'background' execution
@@ -875,25 +932,29 @@ export function createOpenAIResponseParserNS(): ChatGenerateParseFunction {
pt.appendReasoningText(item.text);
}
// Capture the continuity handle (encrypted_content + id) for stateless multi-turn round-tripping.
// Attached to the ma fragment produced by the summary above; if no summary was emitted, this may
// attach to an unrelated preceding fragment - tolerable as the worst case is a misfiled blob.
// FIXME: make sure we are attaching to an 'ma' (i.e. reasoning text or somehting was emitted)
if (reasoningEC || reasoningId)
// [DEV] surface cases that diverge from our continuity round-trip expectations (see streaming path for rationale)
if (!reasoningId && !reasoningEC)
console.warn(`[DEV] AIX: ${vendor}-Response-NS: reasoning item has neither id nor encrypted_content - no continuity handle captured for this turn`, { oItem });
else if (!reasoningEC)
console.log(`[DEV] AIX: ${vendor}-Response-NS: reasoning item has id but no encrypted_content - dropping handle (stateless round-trip requires include:['reasoning.encrypted_content'] on the request)`);
else if (!reasoningId)
console.log(`[DEV] AIX: ${vendor}-Response-NS: reasoning item has encrypted_content but no id - dropping handle (incomplete reasoning item from upstream)`);
// Capture ONLY when both id and encryptedContent are present (canonical, complete handle).
if (reasoningEC && reasoningId) {
// Defensive: ensure an ma fragment exists as the attach target for the svs particle below (parity with the streaming path).
pt.appendReasoningText('');
pt.sendSetVendorState({
p: 'svs',
vendor: 'openai',
vendor: vendor,
state: {
reasoningItem: {
...(reasoningId ? { id: reasoningId } : {}),
...(reasoningEC ? { encryptedContent: reasoningEC } : {}),
id: reasoningId,
encryptedContent: reasoningEC,
},
},
});
else if (!reasoningId && !reasoningEC)
console.warn('[DEV] AIX: OpenAI-Response-NS: reasoning item has neither id nor encrypted_content - no continuity handle captured for this turn', { oItem });
else if (!reasoningEC)
console.log('[DEV] AIX: OpenAI-Response-NS: reasoning item has id but no encrypted_content - stateless round-trip requires include:[\'reasoning.encrypted_content\'] on the request');
}
break;
// Message contains the main 'assistant' response
@@ -957,6 +1018,7 @@ export function createOpenAIResponseParserNS(): ChatGenerateParseFunction {
break;
case 'image_generation_call':
_hostedToolWishlistHint('image_generation');
// -> IGC: process completed image generation using 'ii' particle for inline images
const { result: igResult, revised_prompt: igRevisedPrompt } = oItem;
// Create inline image with base64 data
@@ -965,7 +1027,7 @@ export function createOpenAIResponseParserNS(): ChatGenerateParseFunction {
_imageGenerationMimeType(oItem), // infer from output_format echoed in the item
igResult,
igRevisedPrompt || 'Generated image',
AIX_OAI_DEFAULT_IMAGE_GEN_MODEL, // generator: non-streaming path has no captured tool config, use current default
imageGenToolCfg?.model || AIX_OAI_DEFAULT_IMAGE_GEN_MODEL, // generator: read from echoed tools (API does not echo model per-item), fallback to current default
igRevisedPrompt || '', // prompt used
);
else
@@ -1150,6 +1212,7 @@ function _imageGenerationMimeType(item: { output_format?: string }): string {
* - citations: High-quality links (2-3) via annotations in message content
*/
function _forwardDoneWebSearchCallItem(pt: IParticleTransmitter, webSearchCall: Extract<OpenAIWire_API_Responses.Response['output'][number], { type: 'web_search_call' }>, opId: string): void {
_hostedToolWishlistHint('web_search');
const { action, status } = webSearchCall;
const doneOpts = { opId, state: 'done' } as const;
@@ -1203,6 +1266,7 @@ function _forwardDoneWebSearchCallItem(pt: IParticleTransmitter, webSearchCall:
* - addCodeExecutionResponse for each output result
*/
function _forwardDoneCodeInterpreterCallItem(pt: IParticleTransmitter, codeInterpreterCall: Extract<OpenAIWire_API_Responses.Response['output'][number], { type: 'code_interpreter_call' }>): void {
_hostedToolWishlistHint('code_interpreter');
const { id, code, outputs, status /*,container_id*/ } = codeInterpreterCall;
// <- Emit code (like Gemini's executableCode)
@@ -21,7 +21,7 @@ export namespace AixDemuxers {
* - 'fast-sse' is our own parser, optimized for performance. to be preferred when possible over 'sse' (check for full compatibility with the upstream)
* - 'json-nl' is used by Ollama
*/
export type StreamDemuxerFormat = 'fast-sse' | 'json-nl' | null;
export type StreamDemuxerFormat = 'fast-sse' | 'json-nl';
/**
@@ -34,8 +34,8 @@ export namespace AixDemuxers {
return createFastEventSourceDemuxer();
case 'json-nl':
return _createJsonNlDemuxer();
case null:
return _nullStreamDemuxerWarn;
default:
throw new Error(`Unsupported stream demuxer format: ${format}`);
}
}
@@ -115,12 +115,3 @@ function _createJsonNlDemuxer(): AixDemuxers.StreamDemuxer {
},
};
}
const _nullStreamDemuxerWarn: AixDemuxers.StreamDemuxer = {
demux: () => {
console.warn('Null demuxer called - shall not happen, as it is only created in non-streaming');
return [];
},
flushRemaining: () => [],
};
@@ -1,7 +1,7 @@
<!--
Upstream snapshot - DO NOT EDIT - run _upstream/sync.sh to refresh
Source: https://platform.claude.com/docs/en/api/messages/create.md
Synced: 2026-04-23
Synced: 2026-04-24
Consumed by: anthropic.wiretypes.ts, anthropic.parser.ts, anthropic.messageCreate.ts, anthropic.transform-fileInline.ts
-->
@@ -2429,7 +2429,7 @@ Learn more about the Messages API in our [user guide](https://docs.claude.com/en
Configuration options for the model's output, such as the output format.
- `effort: optional "low" or "medium" or "high" or 2 more`
- `effort: optional "low" or "medium" or "high" or "max"`
All possible effort levels.
@@ -2439,8 +2439,6 @@ Learn more about the Messages API in our [user guide](https://docs.claude.com/en
- `"high"`
- `"xhigh"`
- `"max"`
- `format: optional JSONOutputFormat`
@@ -3822,15 +3820,15 @@ Learn more about the Messages API in our [user guide](https://docs.claude.com/en
Used to remove "long tail" low probability responses. [Learn more technical details here](https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277).
Recommended for advanced use cases only. You usually only need to use `temperature`.
Recommended for advanced use cases only.
- `top_p: optional number`
Use nucleus sampling.
In nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by `top_p`. You should either alter `temperature` or `top_p`, but not both.
In nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by `top_p`.
Recommended for advanced use cases only. You usually only need to use `temperature`.
Recommended for advanced use cases only.
### Returns
@@ -1,7 +1,7 @@
<!--
Upstream snapshot - DO NOT EDIT - run _upstream/sync.sh to refresh
Source: https://ai.google.dev/gemini-api/docs/deep-research.md.txt
Synced: 2026-04-23
Synced: 2026-04-24
Consumed by: gemini.interactions.wiretypes.ts, gemini.interactions.parser.ts, gemini.interactionsCreate.ts, gemini.interactionsPoller.ts
Companion: ./gemini.interactions.guide.md (the Interactions API guide)
-->
@@ -1,7 +1,7 @@
<!--
Upstream snapshot - DO NOT EDIT - run _upstream/sync.sh to refresh
Source: https://ai.google.dev/gemini-api/docs/interactions.md.txt
Synced: 2026-04-23
Synced: 2026-04-24
Consumed by: gemini.interactions.wiretypes.ts, gemini.interactions.parser.ts, gemini.interactionsCreate.ts, gemini.interactionsPoller.ts
Companion: ./gemini.interactions.spec.md (the Interactions API reference spec), ./gemini.deep-research.guide.md (the Deep Research agent guide)
-->
@@ -1,7 +1,7 @@
<!--
Upstream snapshot - DO NOT EDIT - run _upstream/sync.sh to refresh
Source: https://ai.google.dev/api/interactions-api.md.txt
Synced: 2026-04-23
Synced: 2026-04-24
Consumed by: gemini.interactions.wiretypes.ts, gemini.interactions.parser.ts, gemini.interactionsCreate.ts, gemini.interactionsPoller.ts
Companion: ./gemini.interactions.guide.md (the Interactions API guide)
-->
@@ -1,7 +1,7 @@
<!--
Upstream snapshot - DO NOT EDIT - run _upstream/sync.sh to refresh
Source: https://developers.openai.com/api/reference/resources/responses/methods/create/index.md
Synced: 2026-04-23
Synced: 2026-04-24
Consumed by: openai.wiretypes.ts, openai.responses.parser.ts, openai.responsesCreate.ts
-->
@@ -13,6 +13,10 @@ const hotFixAntShipNoEmptyTextBlocks = true; // Replace empty text blocks with a
*
* ## Updates
*
* ### 2026-04-24 - API Sync: stop_details for structured refusals
* - Response: added `stop_details` ({ type: 'refusal', category: 'cyber'|'bio'|null, explanation: string|null })
* - event_MessageDelta.delta: added `stop_details` (arrives alongside stop_reason in streaming)
*
* ### 2026-03-21 - API Sync: GA tool versions, thinking display, caller updates, cache_control
* - Tools: Added web_search_20260209 (GA), web_fetch_20260209/20260309 (GA), code_execution_20260120 (GA REPL)
* - Request: Added top-level `cache_control` for automatic caching (Feb 2026)
@@ -825,6 +829,16 @@ export namespace AnthropicWire_API_Message_Create {
'model_context_window_exceeded',
]);
/**
* Structured stop details, paired with stop_reason. Currently only populated when stop_reason === 'refusal'.
* Both `type` and `category` are loosely typed for forward-compat - parser warns on unknown `type`.
*/
const StopDetails_schema = z.object({
type: z.enum(['refusal']).or(z.string()),
category: z.enum(['cyber', 'bio']).or(z.string()).nullish(),
explanation: z.string().nullish(),
});
/// Request
export type Request = z.infer<typeof Request_schema>;
@@ -1030,6 +1044,12 @@ export namespace AnthropicWire_API_Message_Create {
// Which custom stop sequence was generated, if any.
stop_sequence: z.string().nullable(),
/**
* Structured stop details. Present when stop_reason === 'refusal' (carries category + explanation).
* In streaming, stop_details is null at message_start and appears on message_delta alongside stop_reason.
*/
stop_details: StopDetails_schema.nullish(),
/**
* Billing and rate-limit usage.
* Token counts represent the underlying cost to Anthropic's systems.
@@ -1088,6 +1108,10 @@ export namespace AnthropicWire_API_Message_Create {
delta: z.object({
stop_reason: StopReason_schema.nullable(),
stop_sequence: z.string().nullable(),
/**
* Structured stop details - present alongside stop_reason === 'refusal' (category + explanation).
*/
stop_details: StopDetails_schema.nullish(),
/**
* Container state updates - present when Skills/code_execution tools are used.
* Provides container id/expiry that may differ from message_start if the container was created mid-stream.
@@ -23,8 +23,12 @@ export namespace GeminiInteractionsWire_API_Interactions {
export const getPath = (id: string) => `/v1beta/interactions/${encodeURIComponent(id)}`;
// DELETE. Removes the stored record. Orthogonal to cancel; when removed the original connection may still be running and streaming
export const deletePath = (id: string) => `/v1beta/interactions/${encodeURIComponent(id)}`;
// POST. Only cancels background interactions that are still running
export const cancelPath = (id: string) => `/v1beta/interactions/${encodeURIComponent(id)}/cancel`;
// -- Request Body (POST /v1beta/interactions) --
@@ -163,7 +167,7 @@ export namespace GeminiInteractionsWire_API_Interactions {
// the parser prefers inline and falls back to a URI note when only `uri` is present.
data: z.string().optional(), // base64-encoded bytes
uri: z.string().optional(),
mime_type: z.string(),
mime_type: z.string().optional(), // spec: optional - parser still requires it before emitting inline
resolution: z.string().optional(), // 'low' | 'medium' | 'high' | 'ultra_high'
});
@@ -172,7 +176,7 @@ export namespace GeminiInteractionsWire_API_Interactions {
// Per docs: data or uri, mime_type covers both PCM (audio/l16) and packaged formats (audio/wav, audio/mp3, ...).
data: z.string().optional(),
uri: z.string().optional(),
mime_type: z.string(),
mime_type: z.string().optional(), // spec: optional - parser still requires it before emitting inline
rate: z.number().optional(), // sample rate, when known
channels: z.number().optional(),
});
@@ -189,6 +189,13 @@ export namespace OpenAIWire_Messages {
/** [OpenRouter, 2025-01-20] Reasoning traces with multiple blocks (summary, text, encrypted). */
reasoning_details: z.array(OpenAIWire_ContentParts.OpenRouter_ReasoningDetail_schema).optional(),
/**
* [DeepSeek, 2026-04-24] Chain-of-thought reasoning text.
* - Response: emitted by V4 thinking-by-default; parsed into a 'ma' reasoning part.
* - (this) Request: MUST be echoed back on assistant turns that carry tool_calls (otherwise HTTP 400: "The reasoning_content in the thinking mode must be passed back to the API.").
*/
reasoning_content: z.string().nullable().optional(),
// function_call: // ignored, as it's deprecated
// name: _optionalParticipantName, // omitted by choice: generally unsupported
});
@@ -331,7 +338,7 @@ export namespace OpenAIWire_API_Chat_Completions {
stream_options: z.object({
include_usage: z.boolean().optional(), // If set, an additional chunk will be streamed with a 'usage' field on the entire request.
}).optional(),
reasoning_effort: z.enum(['none', 'minimal', 'low', 'medium', 'high', 'xhigh']).optional(), // [OpenAI, 2024-12-17] [Perplexity, 2025-06-23] reasoning effort
reasoning_effort: z.enum(['none', 'minimal', 'low', 'medium', 'high', 'xhigh', 'max']).optional(), // [OpenAI, 2024-12-17] [Perplexity, 2025-06-23] reasoning effort; [DeepSeek, 2026-04-23] 'max' added for V4
// OpenAI and [OpenRouter, 2025-01-20] Verbosity parameter - maps to output_config.effort for Anthropic models
// https://openrouter.ai/docs/api/reference/parameters#verbosity
verbosity: z.enum([
@@ -342,7 +349,7 @@ export namespace OpenAIWire_API_Chat_Completions {
// [OpenRouter, 2025-11-11] Unified reasoning parameter for all models
reasoning: z.object({
max_tokens: z.int().optional(), // Token-based control (Anthropic, Gemini): 1024-32000
effort: z.enum(['none', 'minimal', 'low', 'medium', 'high', 'xhigh']).optional(), // Effort-based control (OpenAI o1/o3/GPT-5, xAI, DeepSeek): allocates % of max_tokens
effort: z.enum(['none', 'minimal', 'low', 'medium', 'high', 'xhigh', 'max']).optional(), // Effort-based control (OpenAI o1/o3/GPT-5, xAI, DeepSeek): allocates % of max_tokens
enabled: z.boolean().optional(), // Simple enable with medium effort defaults
exclude: z.boolean().optional(), // Use reasoning internally without returning it in response
}).optional(),
@@ -447,6 +454,8 @@ export namespace OpenAIWire_API_Chat_Completions {
search_after_date_filter: z.string().optional(), // Date filter in MM/DD/YYYY format
// [Moonshot, 2026-01-26] Kimi K2.5 thinking mode control
// [Z.ai, 2025-xx] GLM thinking mode: type 'enabled' | 'disabled'
// [DeepSeek, 2026-04-23] V4 thinking mode: same binary shape; depth is controlled via top-level `reasoning_effort`
thinking: z.object({
type: z.enum(['enabled', 'disabled']),
}).optional(),
@@ -1174,9 +1183,11 @@ export namespace OpenAIWire_Responses_Items {
// [OpenAI 2026-03-xx] DEPRECATED query might not always be present in done event
query: z.string().optional(),
// the output websites, if any [{"type":"url","url":"https://www.enricoros.com/"}, {"type":"url","url": "https://linkedin.com/in/enricoros/"}, ...]
// [OpenAI 2026-04-23, GPT-5.5] new source types: { type: 'api', name: 'oai-calculator' } for hosted-tool invocations (no url)
sources: z.array(z.object({
type: z.literal('url').optional(), // source type
url: z.string(),
type: z.enum(['url', 'api']).or(z.string()).optional(), // 'url' (default) | 'api' (GPT-5.5 hosted tools) | future types
url: z.string().nullish(), // optional: 'api' sources have no url, only name
name: z.string().nullish(), // for 'api' sources (e.g., 'oai-calculator')
// [OpenAI 2026-03-xx] not present anymore
// title: z.string().optional(),
// snippet: z.string().optional(),
@@ -1437,6 +1448,7 @@ export namespace OpenAIWire_Responses_Tools {
const WebSearchTool_schema = z.object({
type: z.enum(['web_search', 'web_search_preview', 'web_search_preview_2025_03_11']),
search_context_size: z.enum(['low', 'medium', 'high']).optional(),
// [OpenAI 2026-04-23, GPT-5.5] API echoes user_location as `null` (not undefined) when unset - so .nullish()
user_location: z.object({
type: z.literal('approximate'),
// API echoes these as `null` when unset, not omitted - so .nullish()
@@ -1444,7 +1456,7 @@ export namespace OpenAIWire_Responses_Tools {
country: z.string().nullish(),
region: z.string().nullish(),
timezone: z.string().nullish(),
}).optional(),
}).nullish(),
external_web_access: z.boolean().optional(),
});
@@ -1641,7 +1653,7 @@ export namespace OpenAIWire_API_Responses {
// NOTE: .catch() gracefully degrades to undefined since this is a non-critical enrichment path
tools: z.array(OpenAIWire_Responses_Tools.Tool_schema).optional().catch((ctx) => {
console.warn('[DEV] AIX: OpenAI Responses: unable to parse echoed tools, ignoring:', { tools: ctx.value });
return;
return undefined;
}),
output: z.array(OpenAIWire_Responses_Items.OutputItem_schema),
@@ -118,9 +118,9 @@ export namespace XAIWire_API_Responses {
// configure reasoning
// [2026-01-22] OBSOLETE - only grok-3-mini)(!)
reasoning: z.object({
effort: z.enum([/*'none', 'minimal',*/ 'low', 'medium', 'high' /*, 'xhigh'*/]).nullish(), // XAI: 3 levels only
effort: z.enum([/*'none', 'minimal',*/ 'low', 'medium', 'high' /*, 'xhigh'*/]).nullish(), // only grok-4.20-multi-agent; grok-4.3 and grok-4-1-fast error if set
summary: z.enum(['auto', 'concise', 'detailed']).nullish(), // request reasoning summaries
// [XAI-UNSUPPORTED] // generate_summary: z.string().nullish(),
// [XAI-UNSUPPORTED] // summary: z.enum(['auto', 'concise', 'detailed']).nullish(), // XAI: The model shall always return 'detailed'
}).nullish(),
// configure search
+25 -6
View File
@@ -1,7 +1,9 @@
import * as React from 'react';
import { useShallow } from 'zustand/react/shallow';
import { Alert, Box, CircularProgress } from '@mui/joy';
import { Alert, Box, Button, CircularProgress } from '@mui/joy';
import ContentCopyIcon from '@mui/icons-material/ContentCopy';
import TelegramIcon from '@mui/icons-material/Telegram';
import { ConfirmationModal } from '~/common/components/modals/ConfirmationModal';
import { ShortcutKey, useGlobalShortcuts } from '~/common/components/shortcuts/useGlobalShortcuts';
@@ -204,13 +206,30 @@ export function BeamView(props: {
isMobile={props.isMobile}
rayIds={rayIds}
showRayAdd={cardAdd}
showRaysOps={(isScattering || raysReady < 2) ? undefined : raysReady}
hadImportedRays={hadImportedRays}
onIncreaseRayCount={handleRayIncreaseCount}
onRaysOperation={handleRaysOperation}
// linkedLlmId={currentGatherLlmId}
/>
{/* Rays Action Bar (2+ ready beams) - sibling of the grid (NOT a grid child); an in-grid spanning element with gridColumn:'1/-1' pins all auto-fit tracks open and leaves dead whitespace when raysCount < tracksCount. Fixes #1073. */}
{(!isScattering && raysReady >= 2) && (
<Box sx={{ display: 'flex', justifyContent: 'center', gap: 2, mx: 'var(--Pad)' }}>
<Button size='sm' variant='outlined' color='neutral' onClick={() => handleRaysOperation('copy')} endDecorator={<ContentCopyIcon sx={{ fontSize: 'md' }} />} sx={{
backgroundColor: 'background.surface',
'&:hover': { backgroundColor: 'background.popup' },
}}>
Copy {raysReady}
</Button>
<Button size='sm' variant='outlined' color='success' onClick={() => handleRaysOperation('use')} endDecorator={<TelegramIcon sx={{ fontSize: 'xl' }} />} sx={{
justifyContent: 'space-between',
backgroundColor: 'background.surface',
'&:hover': { backgroundColor: 'background.popup' },
}}>
Use {raysReady === 2 ? 'both' : 'all ' + raysReady} messages
</Button>
</Box>
)}
{/* Gapper between Rays and Merge, without compromising the auto margin of the Ray Grid */}
<Box />
@@ -246,9 +265,9 @@ export function BeamView(props: {
onPositive={handleStartMergeConfirmation}
// lowStakes
noTitleBar
confirmationText='Some responses are still being generated. Do you want to stop and proceed with merging the available responses now?'
positiveActionText='Proceed with Merge'
negativeActionText='Wait for All Responses'
confirmationText={'Some replies are still generating. Merge what\'s ready?'}
positiveActionText='Merge now'
negativeActionText='Wait for all'
negativeActionStartDecorator={
<CircularProgress color='neutral' sx={{ '--CircularProgress-size': '24px', '--CircularProgress-trackThickness': '1px' }} />
}
+2 -1
View File
@@ -149,7 +149,8 @@ export function BeamFusionGrid(props: {
</Box> : (
<Typography level='body-sm' sx={{ opacity: 0.8 }}>
{/*You need two or more replies for a {currentFactory?.shortLabel?.toLocaleLowerCase() ?? ''} merge.*/}
Waiting for multiple responses.
{/*Waiting for multiple responses.*/}
Merge needs 2+ replies. Beam some first.
</Typography>
)}
</BeamCard>
@@ -49,7 +49,7 @@ export async function executeGatherInstruction(_i: GatherInstruction, inputs: Ex
if (!inputs.chatMessages.length)
throw new Error('No conversation history available');
if (!inputs.rayMessages.length)
throw new Error('No responses available');
throw new Error('Needs two Beams at least');
for (let rayMessage of inputs.rayMessages)
if (rayMessage.role !== 'assistant')
throw new Error('Invalid response role');
@@ -58,7 +58,7 @@ export function gatherStartFusion(
if (chatMessages.length < 1)
return onError('No conversation history available');
if (rayMessages.length <= 1)
return onError('No responses available');
return onError('Needs two Beams at least');
if (!initialFusion.llmId)
return onError('No Merge model selected');
@@ -122,7 +122,7 @@ The final output should reflect a deep understanding of the user's preferences a
addLabel: 'Add Breakdown',
cardTitle: 'Evaluation Table',
Icon: TableViewRoundedIcon as typeof SvgIcon,
description: 'Analyzes and compares AI responses, offering a structured framework to support your response choice.',
description: 'Analyzes and compares replies, with a structured framework to support your choice.',
createInstructions: () => [
{
type: 'gather',
-23
View File
@@ -3,8 +3,6 @@ import * as React from 'react';
import type { SxProps } from '@mui/joy/styles/types';
import { Box, Button } from '@mui/joy';
import AddCircleOutlineRoundedIcon from '@mui/icons-material/AddCircleOutlineRounded';
import ContentCopyIcon from '@mui/icons-material/ContentCopy';
import TelegramIcon from '@mui/icons-material/Telegram';
import type { BeamStoreApi } from '../store-beam.hooks';
import { BeamCard } from '../BeamCard';
@@ -32,10 +30,8 @@ export function BeamRayGrid(props: {
hadImportedRays: boolean,
isMobile: boolean,
onIncreaseRayCount: () => void,
onRaysOperation: (operation: 'copy' | 'use') => void,
rayIds: string[],
showRayAdd: boolean,
showRaysOps: undefined | number,
}) {
const raysCount = props.rayIds.length;
@@ -71,25 +67,6 @@ export function BeamRayGrid(props: {
</BeamCard>
)}
{/* Multi-Use and Copy Buttons */}
{!!props.showRaysOps && (
<Box sx={{ gridColumn: '1 / -1', display: 'flex', justifyContent: 'center', gap: 2, mt: 2 }}>
<Button size='sm' variant='outlined' color='neutral' onClick={() => props.onRaysOperation('copy')} endDecorator={<ContentCopyIcon sx={{ fontSize: 'md' }} />} sx={{
backgroundColor: 'background.surface',
'&:hover': { backgroundColor: 'background.popup' },
}}>
Copy {props.showRaysOps}
</Button>
<Button size='sm' variant='outlined' color='success' onClick={() => props.onRaysOperation('use')} endDecorator={<TelegramIcon sx={{ fontSize: 'xl' }} />} sx={{
justifyContent: 'space-between',
backgroundColor: 'background.surface',
'&:hover': { backgroundColor: 'background.popup' },
}}>
Use {props.showRaysOps == 2 ? 'both' : 'all ' + props.showRaysOps} messages
</Button>
</Box>
)}
{/*/!* Takes a full row *!/*/}
{/*<Divider sx={{*/}
{/* gridColumn: '1 / -1',*/}
+6
View File
@@ -76,6 +76,12 @@ const createRootSlice: StateCreator<BeamStore, [], [], RootStoreSlice> = (_set,
open: (chatHistory: Readonly<DMessage[]>, initialChatLlmId: DLLMId | null, isEditMode: boolean, callback: BeamSuccessCallback) => {
const { isOpen: wasAlreadyOpen, terminateKeepingSettings, loadBeamConfig, hadImportedRays, setRayLlmIds, setCurrentGatherLlmId } = _get();
// if already open, preserve the live state (rays, fusions, callback) - re-invocation must never wipe an ongoing beam
if (wasAlreadyOpen) {
console.warn('[DEV] Beam is already open');
return;
}
// reset pending operations
terminateKeepingSettings();
+1
View File
@@ -107,6 +107,7 @@ function _createDLLMFromModelDescription(d: ModelDescriptionSchema, service: DMo
label: d.label,
created: d.created || 0,
updated: d.updated || 0,
...(d.pubDate && { pubDate: d.pubDate }),
description: d.description,
hidden: !!d.hidden,
@@ -15,7 +15,7 @@ import WarningRoundedIcon from '@mui/icons-material/WarningRounded';
import { type DPricingChatGenerate, isLLMChatFree_cached, llmChatPricing_adjusted } from '~/common/stores/llms/llms.pricing';
import type { ModelOptionsContext } from '~/common/layout/optima/store-layout-optima';
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import { FormLabelStart } from '~/common/components/forms/FormLabelStart';
import { GoodModal } from '~/common/components/modals/GoodModal';
import { LLMImplicitParametersRuntimeFallback } from '~/common/stores/llms/llms.parameters';
@@ -280,6 +280,7 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
// cache
const adjChatPricing = llmChatPricing_adjusted(llm);
const pubDate = getLLMPubDate(llm);
return (
@@ -502,7 +503,8 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
id: {llm.id}<br />
context: <b>{getLLMContextTokens(llm)?.toLocaleString() ?? 'not provided'}</b> tokens{` · `}
max output: <b>{getLLMMaxOutputTokens(llm)?.toLocaleString() ?? 'not provided'}</b><br />
{!!llm.created && <>created: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
{!!pubDate && <>published: <b>{pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}</b> · <TimeAgo date={pubDate} /><br /></>}
{!!llm.created && <>indexed: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
{/*· tags: {llm.tags.join(', ')}*/}
{!!adjChatPricing && prettyPricingComponent(adjChatPricing)}
{/*{!!llm.benchmark && <>benchmark: <b>{llm.benchmark.cbaElo?.toLocaleString() || '(unk) '}</b> CBA Elo<br /></>}*/}
@@ -51,6 +51,7 @@ const _oaiEffortOptions = [
] as const;
const _miscEffortOptions = [
{ value: 'max', label: 'Max', description: 'Hardest thinking' } as const,
{ value: 'high', label: 'On', description: 'Multi-step reasoning' } as const,
{ value: 'none', label: 'Off', description: 'Disable thinking mode' } as const,
{ value: _UNSPECIFIED, label: 'Default', description: 'Model Default' } as const,
@@ -122,6 +123,11 @@ const _geminiGoogleSearchOptions = [
{ value: _UNSPECIFIED, label: 'Off', description: 'Default (disabled)' },
] as const;
const _geminiAgentVizOptions = [
{ value: _UNSPECIFIED, label: 'Auto', description: 'Default - agent may include charts/images' },
{ value: 'off', label: 'Off', description: 'Text only (better when merging multiple reports)' },
] as const;
const _geminiMediaResolutionOptions = [
{ value: 'mr_high', label: 'High', description: 'Best quality' },
{ value: 'mr_medium', label: 'Medium', description: 'Balanced' },
@@ -244,6 +250,7 @@ export function LLMParametersEditor(props: {
llmVndAntWebSearch,
llmVndAntWebSearchMaxUses,
llmVndGemEffort,
llmVndGeminiAgentViz,
llmVndGeminiAspectRatio,
llmVndGeminiCodeExecution,
llmVndGeminiGoogleSearch,
@@ -686,6 +693,19 @@ export function LLMParametersEditor(props: {
/>
)}
{showParam('llmVndGeminiAgentViz') && (
<FormSelectControl
title='Visualizations'
tooltip='Charts and images in Deep Research reports. Disable for text-only output (helpful when merging multiple reports).'
value={llmVndGeminiAgentViz ?? _UNSPECIFIED}
onChange={(value) => {
if (value === _UNSPECIFIED || !value) onRemoveParameter('llmVndGeminiAgentViz');
else onChangeParameter({ llmVndGeminiAgentViz: value });
}}
options={_geminiAgentVizOptions}
/>
)}
{/*{showParam('llmVndMoonshotWebSearch') && (*/}
{/* <FormSelectControl*/}
+11 -3
View File
@@ -9,11 +9,12 @@ import VisibilityOutlinedIcon from '@mui/icons-material/VisibilityOutlined';
import type { DModelsServiceId } from '~/common/stores/llms/llms.service.types';
import { isLLMChatFree_cached } from '~/common/stores/llms/llms.pricing';
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
import { GoodTooltip } from '~/common/components/GoodTooltip';
import { PhGearSixIcon } from '~/common/components/icons/phosphor/PhGearSixIcon';
import { STAR_EMOJI, StarredToggle, starredToggleStyle } from '~/common/components/StarIcons';
import { findModelsServiceOrNull, llmsStoreActions } from '~/common/stores/llms/store-llms';
import { sortLLMsByServiceLabel } from '~/common/stores/llms/components/llms.dropdown.utils';
import { useLLMsByService } from '~/common/stores/llms/llms.hooks';
import { useIsMobile } from '~/common/components/useMatchMedia';
import { useModelDomains } from '~/common/stores/llms/hooks/useModelDomains';
@@ -98,6 +99,10 @@ export const ModelItem = React.memo(function ModelItem(props: {
const isNotSymlink = !llm.label.startsWith('🔗'); // getLLMLabel exception: need access to the base
const llmLabel = getLLMLabel(llm);
// "new" badge: shown only when pubDate is set AND within the last 30 days
const pubDate = getLLMPubDate(llm);
const isRecentlyPublished = pubDate ? (Date.now() - pubDate.getTime()) < 30 * 24 * 60 * 60 * 1000 : false;
const handleLLMConfigure = React.useCallback((event: React.MouseEvent) => {
event.stopPropagation();
@@ -226,6 +231,7 @@ export const ModelItem = React.memo(function ModelItem(props: {
</>}
{/* Features Chips - sync with `useLLMSelect.tsx` */}
{isRecentlyPublished && isNotSymlink && pubDate && <GoodTooltip title={`Released ${pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}`}><Chip size='sm' variant='solid' sx={isHidden ? styles.chipDisabled : { bgcolor: '#d4ff3a', color: 'black', fontWeight: 'lg' }}>new</Chip></GoodTooltip>}
{featuresChipMemo}
{seemsFree && isNotSymlink && <Chip size='sm' color='success' variant='plain' sx={isHidden ? styles.chipDisabled : styles.chipFree}>free</Chip>}
@@ -283,7 +289,9 @@ export function ModelsList(props: {
// are we showing multiple services
const showAllServices = !props.filterServiceId;
const hasManyServices = llms.length >= 2 && llms.some(llm => llm.sId !== llms[0].sId);
// sort by service label so vendor groups appear alphabetically when showing all services (single-service view keeps existing order)
const orderedLLMs = showAllServices ? sortLLMsByServiceLabel(llms) : llms;
const hasManyServices = orderedLLMs.length >= 2 && orderedLLMs.some(llm => llm.sId !== orderedLLMs[0].sId);
let lastGroupLabel = '';
// derived
@@ -293,7 +301,7 @@ export function ModelsList(props: {
// generate the list items, prepending headers when necessary
const items: React.JSX.Element[] = [];
for (const llm of llms) {
for (const llm of orderedLLMs) {
// skip hidden models if requested
if (!props.showHiddenModels && isLLMHidden(llm))
@@ -177,7 +177,8 @@ export function anthropicBetaFeatures(options?: AnthropicHostedFeatures): string
if (options?.enable1MContext)
bf.add('context-1m-2025-08-07');
// Code execution (for dynamic web tools PFC, or Skills) + files API for container downloads
// Code execution (for Skills, container reuse, Programmatic Tool Calling) + files API for container downloads.
// NOT enabled for dynamic web tools (_20260209): those have internal code execution managed by Anthropic.
// Note: SDK defines code-execution-2025-05-22; we use 2025-08-25 (newer iteration, not yet in SDK types).
// Code execution may be GA now (most SDK examples skip the beta namespace), but keeping for safety.
if (options?.enableCodeExecution) {
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
import { llmDevCheckModels_DEV } from '../models.mappings';
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
// Note: these model definitions are shared across Anthropic API, OpenRouter, and AWS Bedrock.
@@ -214,12 +214,13 @@ export function llmsAntInjectVariants(acc: ModelDescriptionSchema[], model: Mode
}
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean })[] = [
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean, pubDate: string /* make it required for the defs */ })[] = [
// Claude 4.7 models
{
id: 'claude-opus-4-7', // Active - 2026-04-16
label: 'Claude Opus 4.7',
pubDate: '20260416',
description: 'Most capable generally available model for complex reasoning and agentic coding',
contextWindow: 1_000_000, // 1M GA at standard pricing (no opt-in required)
maxCompletionTokens: 128000,
@@ -239,6 +240,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-6', // Active
label: 'Claude Opus 4.6',
pubDate: '20260205',
description: 'Previous most intelligent model for complex agents and coding, with adaptive thinking',
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
maxCompletionTokens: 128000,
@@ -255,6 +257,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-sonnet-4-6', // Active
label: 'Claude Sonnet 4.6',
pubDate: '20260217',
description: 'Best combination of speed and intelligence for everyday tasks',
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
maxCompletionTokens: 128000, // docs say 64000, API reports 128000
@@ -272,6 +275,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-5-20251101', // Active
label: 'Claude Opus 4.5',
pubDate: '20251124',
description: 'Previous most intelligent model with advanced reasoning for complex agentic workflows',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -286,6 +290,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-sonnet-4-5-20250929', // Active
label: 'Claude Sonnet 4.5',
pubDate: '20250929',
description: 'Previous best combination of speed and intelligence for complex agents and coding',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -311,6 +316,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-haiku-4-5-20251001', // Active
label: 'Claude Haiku 4.5',
pubDate: '20251015',
description: 'Fastest model with exceptional speed and performance',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -324,6 +330,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-1-20250805', // Active
label: 'Claude Opus 4.1',
pubDate: '20250805',
description: 'Exceptional model for specialized complex tasks requiring advanced reasoning',
contextWindow: 200000,
maxCompletionTokens: 32000,
@@ -338,6 +345,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-opus-4-7
id: 'claude-opus-4-20250514', // Deprecated
label: 'Claude Opus 4 [Deprecated]',
pubDate: '20250522',
description: 'Previous flagship model. Deprecated April 14, 2026, retiring June 15, 2026.',
contextWindow: 200000,
maxCompletionTokens: 32000,
@@ -351,6 +359,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-sonnet-4-6
id: 'claude-sonnet-4-20250514', // Deprecated
label: 'Claude Sonnet 4 [Deprecated]',
pubDate: '20250522',
description: 'High-performance model. Deprecated April 14, 2026, retiring June 15, 2026.',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -379,6 +388,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-3-7-sonnet-20250219', // Retired | Deprecated: October 28, 2025 | Retired: February 19, 2026 | Replacement: claude-opus-4-6
label: 'Claude Sonnet 3.7 [Retired]',
pubDate: '20250224',
description: 'High-performance model with early extended thinking. Retired February 19, 2026.',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -396,6 +406,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-3-5-haiku-20241022', // Retired | Deprecated: December 19, 2025 | Retired: February 19, 2026
label: 'Claude Haiku 3.5 [Retired]',
pubDate: '20241104',
description: 'Intelligence at blazing speeds. Retired February 19, 2026.',
contextWindow: 200000,
maxCompletionTokens: 8192,
@@ -413,6 +424,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // deprecated
id: 'claude-3-haiku-20240307', // Deprecated | Deprecated: February 19, 2026 | Retiring: April 20, 2026 | Replacement: claude-haiku-4-5-20251001
label: 'Claude Haiku 3 [Deprecated]',
pubDate: '20240313',
description: 'Fast and compact model for near-instant responsiveness. Deprecated February 19, 2026, retiring April 20, 2026.',
contextWindow: 200000,
maxCompletionTokens: 4096,
@@ -595,11 +607,13 @@ export function llmsAntCreatePlaceholderModel(model: AnthropicWire_API_Models_Li
parameterSpecs.push(...ANT_TOOLS);
const maxInputTokens = model.max_input_tokens;
const createdAt = model.created_at ? new Date(model.created_at) : undefined;
return {
id: model.id,
idVariant: '::placeholder',
label: model.display_name,
created: Math.round(new Date(model.created_at).getTime() / 1000),
created: createdAt ? Math.round(createdAt.getTime() / 1000) : undefined,
pubDate: formatPubDate(createdAt), // 0-day: use Anthropic API's created_at, or today if unset
description: 'Newest model, description not available yet.',
contextWindow: maxInputTokens ?? 200_000, // report API value as-is (no cap for unknown models)
maxCompletionTokens: model.max_tokens || 32768,
@@ -755,5 +769,5 @@ export function llmOrtAntLookup_ThinkingVariants(orModelName: string): OrtVendor
.map((spec) => ({ ...spec }));
// initialTemperature: not set - Anthropic models use the global fallback (0.5)
return { interfaces, parameterSpecs };
return { pubDate: model.pubDate, interfaces, parameterSpecs };
}
+55 -12
View File
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
import { llmDevCheckModels_DEV } from '../models.mappings';
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
// dev options
@@ -72,7 +72,7 @@ const geminiExpFree: ModelDescriptionSchema['chatPrice'] = {
};
// Pricing based on https://ai.google.dev/pricing (Apr 22, 2026)
// Pricing based on https://ai.google.dev/pricing (Apr 24, 2026)
const gemini31FlashLitePricing: ModelDescriptionSchema['chatPrice'] = {
input: 0.25, // text/image/video; audio is $0.50 but we don't differentiate yet
@@ -186,7 +186,7 @@ const _knownGeminiModels: ({
symLink?: string,
deprecated?: string, // Gemini may provide deprecation dates
// _delete removed - models are now physically removed from the list instead of marked for deletion
} & Pick<ModelDescriptionSchema, 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'>)[] = [
} & Pick<ModelDescriptionSchema, 'pubDate' | 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'> & { pubDate: string /* make it required */})[] = [
/// Generation 3.1
@@ -195,6 +195,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-pro-preview',
labelOverride: 'Gemini 3.1 Pro Preview',
pubDate: '20260219',
isPreview: true,
chatPrice: gemini30ProPricing, // same pricing as 3 Pro
interfaces: IF_30,
@@ -213,6 +214,7 @@ const _knownGeminiModels: ({
hidden: true, // specialized variant for custom tool prioritization
id: 'models/gemini-3.1-pro-preview-customtools',
labelOverride: 'Gemini 3.1 Pro Preview (Custom Tools)',
pubDate: '20260219',
isPreview: true,
chatPrice: gemini30ProPricing,
interfaces: IF_30,
@@ -230,6 +232,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-flash-image-preview',
labelOverride: 'Nano Banana 2',
pubDate: '20260226',
isPreview: true,
chatPrice: gemini31FlashImagePricing,
interfaces: IF_30,
@@ -247,6 +250,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-flash-lite-preview',
labelOverride: 'Gemini 3.1 Flash-Lite Preview',
pubDate: '20260303',
isPreview: true,
chatPrice: gemini31FlashLitePricing,
interfaces: IF_30,
@@ -262,10 +266,13 @@ const _knownGeminiModels: ({
/// Generation 3.0
// 3.0 Pro (Preview) - Released November 18, 2025; DEPRECATED: shutdown March 9, 2026 (still served by API as of Apr 17, 2026)
// 3.0 Pro (Preview) - Released November 18, 2025; SHUT DOWN March 9, 2026 - now silently routed to gemini-3.1-pro-preview
// Kept hidden (still returned by API) to avoid confusing users with a silently-redirected model.
{
hidden: true, // March 9, 2026: API silently routes 'gemini-3-pro-preview' to 'gemini-3.1-pro-preview' - hide to prevent user confusion
id: 'models/gemini-3-pro-preview',
labelOverride: 'Gemini 3 Pro Preview',
pubDate: '20251118',
isPreview: true,
deprecated: '2026-03-09',
chatPrice: gemini30ProPricing,
@@ -284,6 +291,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3-pro-image-preview',
labelOverride: 'Nano Banana Pro', // Marketing name for the technical model ID
pubDate: '20251120',
isPreview: true,
chatPrice: gemini30ProImagePricing,
interfaces: IF_30,
@@ -299,6 +307,7 @@ const _knownGeminiModels: ({
{
id: 'models/nano-banana-pro-preview',
labelOverride: 'Nano Banana Pro',
pubDate: '20251120',
symLink: 'models/gemini-3-pro-image-preview',
// copied from symlink
isPreview: true,
@@ -318,6 +327,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3-flash-preview',
labelOverride: 'Gemini 3 Flash Preview',
pubDate: '20251217',
isPreview: true,
chatPrice: gemini30FlashPricing,
interfaces: IF_30,
@@ -335,8 +345,10 @@ const _knownGeminiModels: ({
// 2.5 Pro (Stable) - Released June 17, 2025; DEPRECATED: shutdown June 17, 2026
{
hidden: true, // outperformed by 3.1 Pro (1493) and even 3 Flash (1474) - deprecated in 2 months
id: 'models/gemini-2.5-pro',
labelOverride: 'Gemini 2.5 Pro',
pubDate: '20250617',
deprecated: '2026-06-17',
chatPrice: gemini25ProPricing,
interfaces: IF_25,
@@ -359,6 +371,7 @@ const _knownGeminiModels: ({
{
hidden: true, // single-turn-only model - unhide and just send a message to make use of this
id: 'models/gemini-2.5-pro-preview-tts',
pubDate: '20250520',
isPreview: true,
chatPrice: gemini25ProPreviewTTSPricing,
interfaces: [
@@ -376,10 +389,11 @@ const _knownGeminiModels: ({
{
id: 'models/deep-research-preview-04-2026',
labelOverride: 'Deep Research Preview (2026-04)',
pubDate: '20260421',
isPreview: true,
chatPrice: gemini25ProPricing, // pricing not explicitly listed; using 2.5 Pro as baseline
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [],
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }],
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
// 128K input, 64K output
},
@@ -388,22 +402,24 @@ const _knownGeminiModels: ({
{
id: 'models/deep-research-max-preview-04-2026',
labelOverride: 'Deep Research Max Preview (2026-04)',
pubDate: '20260421',
isPreview: true,
chatPrice: gemini25ProPricing, // baseline estimate (see note above)
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [],
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }],
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
},
// Deep Research Pro Preview - Released December 12, 2025
// Deep Research Pro Preview - Released December 11, 2025
{
hidden: true, // yield to newer 2026-04 models
id: 'models/deep-research-pro-preview-12-2025',
labelOverride: 'Deep Research Pro Preview',
pubDate: '20251211',
isPreview: true,
chatPrice: gemini25ProPricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [{ paramId: 'llmVndGeminiThinkingBudget' }],
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }, { paramId: 'llmVndGeminiThinkingBudget' }],
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
// Note: 128K input context, 64K output context
},
@@ -412,8 +428,10 @@ const _knownGeminiModels: ({
// 2.5 Flash
{
hidden: true, // outperformed by 3 Flash Preview (1474 vs 1411) - deprecated in 2 months
id: 'models/gemini-2.5-flash',
labelOverride: 'Gemini 2.5 Flash',
pubDate: '20250617',
deprecated: '2026-06-17',
chatPrice: gemini25FlashPricing,
interfaces: IF_25,
@@ -441,6 +459,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-computer-use-preview-10-2025',
labelOverride: 'Gemini 2.5 Computer Use Preview 10-2025',
pubDate: '20251007',
isPreview: true,
chatPrice: gemini25ProPricing, // Uses same pricing as 2.5 Pro (pricing page doesn't list separately)
// NOTE: sweep shows fn=['auto'] only (no 'roundtrip') - partial Fn capability, do not advertise LLM_IF_OAI_Fn
@@ -458,6 +477,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-robotics-er-1.6-preview',
labelOverride: 'Gemini Robotics-ER 1.6 Preview',
pubDate: '20260414',
isPreview: true,
chatPrice: geminiRoboticsER16Pricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_OAI_Reasoning],
@@ -467,8 +487,10 @@ const _knownGeminiModels: ({
// 2.5 Flash-Based: Gemini Robotics-ER 1.5 Preview - Released September 25, 2025; DEPRECATED: shutdown April 30, 2026
{
hidden: true, // superseded by Robotics-ER 1.6 - shutdown April 30, 2026
id: 'models/gemini-robotics-er-1.5-preview',
labelOverride: 'Gemini Robotics-ER 1.5 Preview',
pubDate: '20250925',
isPreview: true,
deprecated: '2026-04-30',
chatPrice: gemini25FlashPricing, // Uses same pricing as 2.5 Flash per pricing page
@@ -481,6 +503,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-flash-image',
labelOverride: 'Nano Banana',
pubDate: '20251002',
deprecated: '2026-10-02',
chatPrice: { input: 0.30, output: undefined }, // Per pricing page: $0.30 text/image input, $0.039 per image output, but the text output is not stated
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -501,6 +524,7 @@ const _knownGeminiModels: ({
hidden: true, // audio outputs are unavailable
id: 'models/gemini-3.1-flash-tts-preview',
labelOverride: 'Gemini 3.1 Flash TTS Preview',
pubDate: '20260415',
isPreview: true,
chatPrice: gemini31FlashTTSPricing,
interfaces: [
@@ -516,6 +540,7 @@ const _knownGeminiModels: ({
{
hidden: true, // audio outputs are unavailable as of 2025-05-27
id: 'models/gemini-2.5-flash-preview-tts',
pubDate: '20250520',
isPreview: true,
chatPrice: gemini25FlashPreviewTTSPricing,
interfaces: [
@@ -543,6 +568,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-flash-lite',
labelOverride: 'Gemini 2.5 Flash-Lite',
pubDate: '20250722',
deprecated: '2026-07-22',
chatPrice: gemini25FlashLitePricing,
interfaces: IF_25,
@@ -573,14 +599,18 @@ const _knownGeminiModels: ({
// 2.0 Flash - DEPRECATED: shutdown June 1, 2026 (announced Feb 18, 2026)
{
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-001',
pubDate: '20250205',
deprecated: '2026-06-01',
chatPrice: gemini20FlashPricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_GEM_CodeExecution],
benchmark: { cbaElo: 1360 }, // gemini-2.0-flash-001
},
{
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash',
pubDate: '20250205',
symLink: 'models/gemini-2.0-flash-001',
deprecated: '2026-06-01',
// copied from symlink
@@ -591,7 +621,9 @@ const _knownGeminiModels: ({
// 2.0 Flash Lite - DEPRECATED: shutdown June 1, 2026 (announced Feb 18, 2026)
{
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-lite',
pubDate: '20250225',
chatPrice: gemini20FlashLitePricing,
symLink: 'models/gemini-2.0-flash-lite-001',
deprecated: '2026-06-01',
@@ -599,7 +631,9 @@ const _knownGeminiModels: ({
benchmark: { cbaElo: 1310 },
},
{
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-lite-001',
pubDate: '20250225',
chatPrice: gemini20FlashLitePricing,
deprecated: '2026-06-01',
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
@@ -639,6 +673,7 @@ const _knownGeminiModels: ({
// Gemma 4 Models - Released April 2, 2026
{
id: 'models/gemma-4-31b-it',
pubDate: '20260402',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
@@ -648,6 +683,7 @@ const _knownGeminiModels: ({
{
hidden: true, // smaller MoE variant
id: 'models/gemma-4-26b-a4b-it',
pubDate: '20260402',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
@@ -658,6 +694,7 @@ const _knownGeminiModels: ({
// Gemma 3n Model (newer than 3, first seen on the May 2025 update)
{
id: 'models/gemma-3n-e4b-it',
pubDate: '20250626',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Free tier only according to pricing page
@@ -665,6 +702,7 @@ const _knownGeminiModels: ({
},
{
id: 'models/gemma-3n-e2b-it',
pubDate: '20250626',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Free tier only according to pricing page
@@ -676,6 +714,7 @@ const _knownGeminiModels: ({
// - LLM_IF_HOTFIX_Sys0ToUsr0, because: "Developer instruction is not enabled for models/gemma-3-27b-it"
{
id: 'models/gemma-3-27b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Pricing page indicates free tier only
@@ -685,6 +724,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-12b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -693,6 +733,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-4b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -701,6 +742,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-1b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -939,6 +981,7 @@ export function geminiModelToModelDescription(geminiModel: GeminiWire_API_Models
label: label,
// created: ...
// updated: ...
pubDate: knownModel?.pubDate ?? formatPubDate(), // 0-day fallback; the editorial entry is the source of truth; today's date is a placeholder until editorial catches up
description: descriptionLong,
contextWindow: contextWindow,
maxCompletionTokens: outputTokenLimit,
@@ -1026,5 +1069,5 @@ export function llmOrtGemLookup(orModelName: string): OrtVendorLookupResult | un
?.filter(spec => _ORT_GEM_PARAM_ALLOWLIST.has(spec.paramId))
.map(spec => ({ ...spec }));
return { interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
return { pubDate: knownModel.pubDate, interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
}
@@ -94,6 +94,7 @@ const ModelParameterSpec_schema = z.object({
// Bedrock
'llmVndBedrockAPI',
// Gemini
'llmVndGeminiAgentViz',
'llmVndGeminiAspectRatio',
'llmVndGeminiCodeExecution',
'llmVndGeminiComputerUse',
@@ -137,6 +138,7 @@ export const ModelDescription_schema = z.object({
label: z.string(),
created: z.int().optional(),
updated: z.int().optional(),
pubDate: z.string().regex(/^\d{8}$/).optional(), // editorial: model's official public release date 'YYYYMMDD'. Required for editorial entries (KnownModelEditorial) and for 0-day-fillable paths (Anthropic placeholder, Gemini unknown-model fallback). Omitted for dynamic-only vendors and unknown variants where we have no reliable signal.
description: z.string(),
contextWindow: z.int().nullable(),
interfaces: z.array(z.enum(LLMS_ALL_INTERFACES).or(z.string())), // backward compatibility: to not Break client-side interface parsing on newer server
@@ -155,6 +157,7 @@ export const ModelDescription_schema = z.object({
// Each vendor's lookup filters to only what works through OpenRouter's OAI-compatible API.
// OpenRouter merges these with its own auto-detected interfaces and params.
export type OrtVendorLookupResult = {
pubDate?: ModelDescriptionSchema['pubDate'];
interfaces?: ModelDescriptionSchema['interfaces'];
parameterSpecs?: ModelDescriptionSchema['parameterSpecs'];
initialTemperature?: number; // vendor-specific default (e.g. Gemini 1.0); undefined = use global fallback (0.5)
@@ -111,6 +111,28 @@ export function llmDevValidateParameterSpecs_DEV(model: ModelDescriptionSchema):
}
// -- pubDate helpers --
/**
* Format an epoch / Date / nothing as 'YYYYMMDD'.
* Accepts either a Unix epoch (seconds), a Date, or undefined (-> today).
*/
export function formatPubDate(input?: number | Date): string {
let date: Date;
if (input instanceof Date && Number.isFinite(input.getTime()))
date = input;
else if (typeof input === 'number' && Number.isFinite(input) && input > 0) {
const candidate = new Date(input * 1000);
date = Number.isFinite(candidate.getTime()) ? candidate : new Date();
} else
date = new Date();
const y = date.getUTCFullYear();
const m = String(date.getUTCMonth() + 1).padStart(2, '0');
const d = String(date.getUTCDate()).padStart(2, '0');
return `${y}${m}${d}`;
}
// -- Manual model mappings: types and helper --
export type ManualMappings = (KnownModel | KnownLink)[];
@@ -224,6 +246,7 @@ export function fromManualMapping(mappings: (KnownModel | KnownLink)[], upstream
};
// apply optional fields
if (m.pubDate) md.pubDate = m.pubDate;
if (m.parameterSpecs) md.parameterSpecs = m.parameterSpecs;
if (m.maxCompletionTokens) md.maxCompletionTokens = m.maxCompletionTokens;
if (m.benchmark) md.benchmark = m.benchmark;
@@ -1,38 +1,72 @@
import { LLM_IF_HOTFIX_StripImages, LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import { LLM_IF_HOTFIX_StripImages, LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import type { ModelDescriptionSchema } from '../../llm.server.types';
import { fromManualMapping, ManualMappings } from '../../models.mappings';
const IF_3 = [LLM_IF_HOTFIX_StripImages, LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json];
const IF_4 = [LLM_IF_HOTFIX_StripImages, LLM_IF_OAI_Chat, LLM_IF_OAI_Fn];
// [DeepSeek, 2026-04-24] V4 release - https://api-docs.deepseek.com/news/news260424
// - V4-Pro: 1.6T total / 49B active params; V4-Flash: 284B total / 13B active params (Novel Attention: token-wise compression + DSA)
// - Model IDs listed by /models: deepseek-v4-flash, deepseek-v4-pro
// - 1M context is the default across services; text-only (no vision/multimodal)
// - Legacy aliases still accepted until 2026-07-24: deepseek-chat -> v4-flash (thinking disabled), deepseek-reasoner -> v4-flash (thinking enabled)
// - Reasoning control: object `thinking: { type: 'enabled'|'disabled', reasoning_effort?: 'high'|'max' }`
// (the live API also accepts type: 'adaptive', but it is undocumented and empirically behaves the same as 'enabled'
// on current builds -- deliberately not exposed here; add it once docs + semantics stabilize)
// - V3.2 endpoints no longer accessible via direct model ID (API returns only v4-flash/v4-pro)
const _knownDeepseekChatModels: ManualMappings = [
// [Models and Pricing](https://api-docs.deepseek.com/quick_start/pricing)
// [List Models](https://api-docs.deepseek.com/api/list-models)
// [Release Notes - V3.2](https://api-docs.deepseek.com/news/news251201) - Released 2025-12-01
{
idPrefix: 'deepseek-v4-pro',
label: 'DeepSeek V4 Pro',
pubDate: '20260424',
description: 'Premium reasoning model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
contextWindow: 1_048_576, // 1M
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
parameterSpecs: [
{ paramId: 'llmVndMiscEffort', enumValues: ['none', 'high', 'max'] },
],
maxCompletionTokens: 65536, // conservative default; docs advertise up to 384K
chatPrice: { input: 1.74, output: 3.48, cache: { cType: 'oai-ac', read: 0.145 } },
benchmark: { cbaElo: 1463 }, // lmarena: deepseek-v4-pro (thinking variant 1462, near-tied)
},
{
idPrefix: 'deepseek-v4-flash',
label: 'DeepSeek V4 Flash',
pubDate: '20260424',
description: 'Fast general-purpose model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
contextWindow: 1_048_576, // 1M
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
parameterSpecs: [
{ paramId: 'llmVndMiscEffort', enumValues: ['none', 'high', 'max'] },
],
maxCompletionTokens: 65536, // conservative default; docs advertise up to 384K
chatPrice: { input: 0.14, output: 0.28, cache: { cType: 'oai-ac', read: 0.028 } },
benchmark: { cbaElo: 1439 }, // lmarena: deepseek-v4-flash-thinking (non-thinking variant 1433)
},
// Legacy aliases - API routes both to deepseek-v4-flash with thinking pre-set
{
idPrefix: 'deepseek-reasoner',
label: 'DeepSeek V3.2 (Reasoner)',
description: 'Reasoning model with Chain-of-Thought capabilities, 128K context length. Supports JSON output and function calling.',
contextWindow: 131072, // 128K
interfaces: [...IF_3, LLM_IF_OAI_Reasoning],
// parameterSpecs: [
// { paramId: 'llmVndMiscEffort', enumValues: ['none', 'high'] }, // not supported: this model is reasoning only
// ],
maxCompletionTokens: 32768, // default, max: 65536
chatPrice: { input: 0.28, output: 0.42, cache: { cType: 'oai-ac', read: 0.028 } },
benchmark: { cbaElo: 1425 }, // deepseek-v3.2-exp-thinking
label: 'DeepSeek Reasoner (legacy)',
description: 'Legacy alias: routes to DeepSeek V4 Flash with thinking enabled. Retires 2026-07-24.',
contextWindow: 1_048_576,
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
maxCompletionTokens: 65536,
chatPrice: { input: 0.14, output: 0.28, cache: { cType: 'oai-ac', read: 0.028 } },
benchmark: { cbaElo: 1439 }, // lmarena: deepseek-v4-flash-thinking
isLegacy: true,
},
{
idPrefix: 'deepseek-chat',
label: 'DeepSeek V3.2',
description: 'General-purpose model with 128K context length. Supports JSON output and function calling.',
contextWindow: 131072, // 128K
interfaces: IF_3,
maxCompletionTokens: 8192, // default is 4096, max is 8192
chatPrice: { input: 0.28, output: 0.42, cache: { cType: 'oai-ac', read: 0.028 } },
benchmark: { cbaElo: 1424 }, // deepseek-v3.2
label: 'DeepSeek Chat (legacy)',
description: 'Legacy alias: routes to DeepSeek V4 Flash with thinking disabled. Retires 2026-07-24.',
contextWindow: 1_048_576,
interfaces: IF_4,
maxCompletionTokens: 65536,
chatPrice: { input: 0.14, output: 0.28, cache: { cType: 'oai-ac', read: 0.028 } },
benchmark: { cbaElo: 1433 }, // lmarena: deepseek-v4-flash (non-thinking)
isLegacy: true,
},
];
@@ -23,6 +23,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'meta-llama/llama-4-scout-17b-16e-instruct',
label: 'Llama 4 Scout · 17B × 16E (Preview)',
pubDate: '20250405',
description: 'Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -33,6 +34,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'qwen/qwen3-32b',
label: 'Qwen 3 · 32B (Preview)',
pubDate: '20250428',
description: 'Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 40960,
@@ -43,6 +45,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'moonshotai/kimi-k2-instruct-0905',
label: 'Kimi K2 Instruct 0905 (Preview)',
pubDate: '20250905',
description: 'Kimi K2 1T MoE model (32B active, 384 experts). Advanced agentic coding. 262K context, 16K max output. ~200 t/s on Groq.',
contextWindow: 262144,
maxCompletionTokens: 16384,
@@ -53,6 +56,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'moonshotai/kimi-k2-instruct',
label: 'Kimi K2 Instruct (Deprecated)',
pubDate: '20250711',
symLink: 'moonshotai/kimi-k2-instruct-0905',
contextWindow: 131072, // API returns 131K (vs 262K for the 0905 version)
maxCompletionTokens: 16384,
@@ -69,6 +73,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'groq/compound',
label: 'Compound (Agentic System)',
pubDate: '20250904',
description: 'Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -78,6 +83,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'groq/compound-mini',
label: 'Compound Mini (Agentic System)',
pubDate: '20250904',
description: 'Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -89,6 +95,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'openai/gpt-oss-120b',
label: 'GPT OSS 120B',
pubDate: '20250805',
description: 'OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -99,6 +106,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'openai/gpt-oss-safeguard-20b',
label: 'GPT OSS Safeguard 20B (Preview)',
pubDate: '20251029',
description: 'OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -108,6 +116,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'openai/gpt-oss-20b',
label: 'GPT OSS 20B',
pubDate: '20250805',
description: 'OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -120,6 +129,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'llama-3.3-70b-versatile',
label: 'Llama 3.3 · 70B Versatile',
pubDate: '20241206',
description: 'Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 32768,
@@ -129,6 +139,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'llama-3.1-8b-instant',
label: 'Llama 3.1 · 8B Instant',
pubDate: '20240723',
description: 'Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 131072,
@@ -22,6 +22,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.7',
label: 'MiniMax M2.7',
pubDate: '20260318',
description: 'Latest flagship with recursive self-improvement and agentic capabilities. 200K context, 131K max output. ~60 t/s.',
contextWindow: 204800,
maxCompletionTokens: 131072,
@@ -31,6 +32,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.7-highspeed',
label: 'MiniMax M2.7 (Highspeed)',
pubDate: '20260318',
description: 'Faster M2.7 variant at ~100 t/s. 200K context, 131K max output.',
contextWindow: 204800,
maxCompletionTokens: 131072,
@@ -42,6 +44,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.5',
label: 'MiniMax M2.5',
pubDate: '20260212',
description: 'Strong coding and reasoning, best value. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -51,6 +54,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.5-highspeed',
label: 'MiniMax M2.5 (Highspeed)',
pubDate: '20260212',
description: 'Faster M2.5 variant at ~100 t/s. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -62,6 +66,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2-her',
label: 'MiniMax M2-her',
pubDate: '20260127',
description: 'Dialogue-first model for immersive roleplay, character-driven chat, and expressive multi-turn conversations. 64K context.',
contextWindow: 65536,
maxCompletionTokens: 2048,
@@ -73,6 +78,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.1',
label: 'MiniMax M2.1',
pubDate: '20251223',
description: '230B params (10B active), multilingual coding. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -83,6 +89,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.1-highspeed',
label: 'MiniMax M2.1 (Highspeed)',
pubDate: '20251223',
description: 'Faster M2.1 variant. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -95,6 +102,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2',
label: 'MiniMax M2',
pubDate: '20251027',
description: '230B params (10B active), agentic and reasoning. 200K context, 128K max output.',
contextWindow: 204800,
maxCompletionTokens: 128000,
@@ -107,6 +115,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M1',
label: 'MiniMax M1',
pubDate: '20250616',
description: '456B total / 45.9B active MoE with lightning attention. 1M context, 40K max output.',
contextWindow: 1000000,
maxCompletionTokens: 40000,
@@ -119,6 +128,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-01',
label: 'MiniMax 01',
pubDate: '20250114',
description: 'Legacy flagship. 1M context.',
contextWindow: 1000192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -19,80 +19,81 @@ const DEV_DEBUG_MISTRAL_MODELS = Release.IsNodeDevBuild; // not in staging to re
const _knownMistralModelDetails: Record<string, {
label?: string; // override the API-provided name
pubDate?: string; // YYYYMMDD - earliest public availability (announcement / La Plateforme / HF upload)
chatPrice?: { input: number; output: number };
benchmark?: { cbaElo: number };
hidden?: boolean;
}> = {
// Premier models - Mistral 3 (Dec 2025)
'mistral-large-2512': { chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
'mistral-large-2411': { chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
'mistral-large-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
'mistral-large-2512': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
'mistral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
'mistral-large-latest': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
'mistral-medium-2508': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3
'mistral-medium-2505': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // older version
'mistral-medium-latest': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
'mistral-medium': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-medium-2508': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3.1
'mistral-medium-2505': { pubDate: '20250507', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // Mistral Medium 3
'mistral-medium-latest': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
'mistral-medium': { pubDate: '20231211', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink (legacy: original Mistral Medium prototype on La Plateforme beta)
'magistral-medium-2509': { chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
'magistral-medium-latest': { chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
'magistral-medium-2509': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
'magistral-medium-latest': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
'devstral-2512': { label: 'Devstral 2 (2512)', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
'devstral-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'devstral-medium-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
'devstral-medium-2507': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
'devstral-2512': { label: 'Devstral 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
'devstral-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'devstral-medium-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
'devstral-medium-2507': { pubDate: '20250710', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
'mistral-large-pixtral-2411': { chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
'pixtral-large-2411': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'pixtral-large-latest': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'mistral-large-pixtral-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
'pixtral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'pixtral-large-latest': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'codestral-2508': { chatPrice: { input: 0.3, output: 0.9 } }, // code generation
'codestral-latest': { chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
'codestral-2508': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 } }, // code generation (Codestral 25.08)
'codestral-latest': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
'voxtral-small-2507': { chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
'voxtral-small-latest': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'voxtral-small-2507': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
'voxtral-small-latest': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'voxtral-mini-2507': { chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
'voxtral-mini-latest': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
'voxtral-mini-2507': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
'voxtral-mini-latest': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
// Ministral 3 family (Dec 2025) - multimodal, multilingual, Apache 2.0
'ministral-14b-2512': { chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
'ministral-14b-latest': { chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
'ministral-14b-2512': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
'ministral-14b-latest': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
'ministral-8b-2512': { chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
'ministral-8b-2410': { chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
'ministral-8b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'ministral-8b-2512': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
'ministral-8b-2410': { pubDate: '20241016', chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
'ministral-8b-latest': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'ministral-3b-2512': { chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
'ministral-3b-2410': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
'ministral-3b-latest': { chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
'ministral-3b-2512': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
'ministral-3b-2410': { pubDate: '20241016', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
'ministral-3b-latest': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
// Open models
'mistral-small-2603': { chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
'mistral-small-2506': { chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
'mistral-small-latest': { chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
'mistral-small-2603': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
'mistral-small-2506': { pubDate: '20250620', chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
'mistral-small-latest': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
'labs-mistral-small-creative': { label: 'Mistral Small Creative', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
'labs-mistral-small-creative': { label: 'Mistral Small Creative', pubDate: '20251211', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
'labs-leanstral-2603': { label: 'Leanstral (2603)', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
'labs-leanstral-2603': { label: 'Leanstral (2603)', pubDate: '20260316', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
'magistral-small-2509': { chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
'magistral-small-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
'magistral-small-2509': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
'magistral-small-latest': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
'devstral-small-2507': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
'devstral-small-2507': { pubDate: '20250710', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version (Devstral Small 1.1)
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'pixtral-12b-2409': { chatPrice: { input: 0.15, output: 0.15 } }, // vision
'pixtral-12b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b-2409': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 } }, // vision
'pixtral-12b-latest': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'open-mistral-nemo-2407': { chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
'open-mistral-nemo': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'open-mistral-nemo-2407': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
'open-mistral-nemo': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
// Legacy (kept for reference, no longer in API)
'open-mistral-7b': { chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
'open-mistral-7b': { pubDate: '20230927', chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
};
@@ -28,7 +28,8 @@ const _PS_Reasoning: ModelDescriptionSchema['parameterSpecs'] = [
* Moonshot AI (Kimi) models.
* - models list and pricing: https://platform.kimi.ai/docs/pricing/chat (was platform.moonshot.ai - now 301 redirect)
* - API docs: https://platform.kimi.ai/docs/api/chat
* - updated: 2026-04-20
* - updated: 2026-05-04
* - NOTE: K2 series (non-2.5/2.6) is scheduled for discontinuation on 2026-05-25 per Moonshot docs.
*/
const _knownMoonshotModels: ManualMappings = [
@@ -36,6 +37,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2.6',
label: 'Kimi K2.6',
pubDate: '20260420',
description: 'Native multimodal flagship (text, image, video inputs) with thinking and non-thinking modes. Stronger long-form coding, improved instruction compliance and self-correction. 256K context.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -49,6 +51,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2.5',
label: 'Kimi K2.5',
pubDate: '20260127',
description: 'Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -58,12 +61,13 @@ const _knownMoonshotModels: ManualMappings = [
benchmark: { cbaElo: 1451 }, // kimi-k2.5-thinking
},
// Kimi K2 Series - Latest Models
// Kimi K2 Series - scheduled for discontinuation on 2026-05-25
// Fast, Thinking
{
idPrefix: 'kimi-k2-thinking-turbo',
label: 'Kimi K2 Thinking Turbo',
pubDate: '20251106',
description: 'High-speed reasoning model with advanced thinking and tool calling capabilities. Faster inference (~50 tok/s) with optimized performance. 256K context. Temperature 1.0 recommended.',
contextWindow: 262144,
maxCompletionTokens: 65536,
@@ -76,6 +80,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-thinking',
label: 'Kimi K2 Thinking',
pubDate: '20251106',
description: 'Advanced reasoning model with multi-step thinking and autonomous tool calling (200-300 sequential calls). Interleaves chain-of-thought with tool use. 256K context. Temperature 1.0 recommended.',
contextWindow: 262144,
maxCompletionTokens: 65536,
@@ -89,6 +94,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-0905-preview',
label: 'Kimi K2 0905 (Preview)',
pubDate: '20250905',
description: 'State-of-the-art MoE model (1T total, 32B active) with extended 256K context. Enhanced agentic coding intelligence and improved instruction following.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -102,6 +108,7 @@ const _knownMoonshotModels: ManualMappings = [
hidden: true,
idPrefix: 'kimi-k2-0711-preview',
label: 'Kimi K2 0711 (Preview)',
pubDate: '20250711',
description: 'Earlier preview variant with 128K context. Superseded by 0905 version.',
contextWindow: 131072,
maxCompletionTokens: 16384,
@@ -114,6 +121,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-turbo-preview',
label: 'Kimi K2 Turbo (Preview)',
pubDate: '20250801',
description: 'High-speed variant with 60-100 tokens/second output. 256K context. Optimized for real-time applications and agentic tasks.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -127,6 +135,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-128k',
label: 'V1 128K',
pubDate: '20240206',
description: 'Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -136,6 +145,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-32k',
label: 'V1 32K',
pubDate: '20240206',
description: 'Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -145,6 +155,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-8k',
label: 'V1 8K',
pubDate: '20240206',
description: 'Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 8192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -157,6 +168,7 @@ const _knownMoonshotModels: ManualMappings = [
// hidden: false, not hidden - only non-hidden vision for now
idPrefix: 'moonshot-v1-128k-vision-preview',
label: 'V1 128K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -166,6 +178,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-32k-vision-preview',
label: 'V1 32K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -176,6 +189,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-8k-vision-preview',
label: 'V1 8K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 8192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -12,6 +12,23 @@ import { fromManualMapping, KnownModel, llmDevCheckModels_DEV, ManualMappings }
// OpenAI Model Variants
export const hardcodedOpenAIVariants: ModelVariantMap = {
// GPT-5.5 with reasoning disabled (non-thinking) - supports temperature control
'gpt-5.5-2026-04-23': {
idVariant: '::thinking-none',
label: 'GPT-5.5 (No-thinking)',
hidden: true, // hidden by default as redundant, user can unhide in settings
description: 'Supports temperature control for creative applications. GPT-5.5 with reasoning disabled (reasoning_effort=none).',
interfaces: [LLM_IF_OAI_Responses, LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_OAI_PromptCaching], // NO LLM_IF_OAI_Reasoning, NO LLM_IF_HOTFIX_NoTemperature
parameterSpecs: [
{ paramId: 'llmVndOaiEffort', enumValues: ['none', 'low', 'medium', 'high', 'xhigh'], initialValue: 'none', hidden: true }, // factory 'none', not changeable
{ paramId: 'llmVndOaiWebSearchContext' },
{ paramId: 'llmVndOaiVerbosity' },
{ paramId: 'llmVndOaiImageGeneration' },
{ paramId: 'llmVndOaiCodeInterpreter' },
{ paramId: 'llmForceNoStream' },
],
},
// GPT-5.4 with reasoning disabled (non-thinking) - supports temperature control
'gpt-5.4-2026-03-05': {
idVariant: '::thinking-none',
@@ -88,12 +105,67 @@ const PS_DEEP_RESEARCH = [{ paramId: 'llmVndOaiWebSearchContext' as const, initi
// https://platform.openai.com/docs/pricing
export const _knownOpenAIChatModels: ManualMappings = [
/// GPT-5.5 series - Released April 23, 2026
// GPT-5.5
{
idPrefix: 'gpt-5.5-2026-04-23',
label: 'GPT-5.5 (2026-04-23)',
pubDate: '20260423',
description: 'New baseline for complex production workflows. Stronger task execution, more precise tool use, more efficient reasoning with fewer tokens. 1M token context.',
contextWindow: 1050000,
maxCompletionTokens: 128000,
interfaces: [LLM_IF_OAI_Responses, ...IFS_CHAT_CACHE_REASON, LLM_IF_HOTFIX_NoTemperature],
parameterSpecs: [
{ paramId: 'llmVndOaiEffort', enumValues: ['none', 'low', 'medium', 'high', 'xhigh'], initialValue: 'medium' }, // medium is the new default for 5.5
{ paramId: 'llmVndOaiWebSearchContext' },
{ paramId: 'llmVndOaiVerbosity' },
{ paramId: 'llmVndOaiImageGeneration' },
{ paramId: 'llmVndOaiCodeInterpreter' },
{ paramId: 'llmForceNoStream' },
],
chatPrice: { input: 5, cache: { cType: 'oai-ac', read: 0.5 }, output: 30 },
// benchmark: TBD - no CBA ELO yet
},
{
idPrefix: 'gpt-5.5',
label: 'GPT-5.5',
symLink: 'gpt-5.5-2026-04-23',
},
// GPT-5.5 Pro
{
idPrefix: 'gpt-5.5-pro-2026-04-23',
label: 'GPT-5.5 Pro (2026-04-23)',
pubDate: '20260423',
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on the hardest problems.',
contextWindow: 1050000,
maxCompletionTokens: 272000,
interfaces: [LLM_IF_OAI_Responses, ...IFS_CHAT_MIN, LLM_IF_OAI_Reasoning, LLM_IF_HOTFIX_NoTemperature],
parameterSpecs: [
{ paramId: 'llmVndOaiEffort', enumValues: ['medium', 'high', 'xhigh'] }, // Pro: no low/none
{ paramId: 'llmVndOaiWebSearchContext' },
{ paramId: 'llmVndOaiVerbosity' },
{ paramId: 'llmVndOaiImageGeneration' },
{ paramId: 'llmForceNoStream' },
],
chatPrice: { input: 30, output: 180 },
// benchmark: TBD
},
{
idPrefix: 'gpt-5.5-pro',
label: 'GPT-5.5 Pro',
symLink: 'gpt-5.5-pro-2026-04-23',
},
/// GPT-5.4 series - Released March 5, 2026
// GPT-5.4
{
idPrefix: 'gpt-5.4-2026-03-05',
label: 'GPT-5.4 (2026-03-05)',
pubDate: '20260305',
description: 'Most capable and efficient frontier model for professional work. Native computer use, improved reasoning, coding, and agentic workflows with 1M token context.',
contextWindow: 1050000,
maxCompletionTokens: 128000,
@@ -119,6 +191,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-pro-2026-03-05',
label: 'GPT-5.4 Pro (2026-03-05)',
pubDate: '20260305',
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on difficult problems.',
contextWindow: 1050000,
maxCompletionTokens: 272000,
@@ -143,6 +216,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-mini-2026-03-17',
label: 'GPT-5.4 Mini (2026-03-17)',
pubDate: '20260317',
description: 'Strongest mini model for coding, computer use, and subagents. GPT-5.4-class intelligence at lower cost and latency.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -168,6 +242,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-nano-2026-03-17',
label: 'GPT-5.4 Nano (2026-03-17)',
pubDate: '20260317',
description: 'Cheapest GPT-5.4-class model for simple high-volume tasks like classification and data extraction.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -196,6 +271,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.3-codex',
label: 'GPT-5.3 Codex',
pubDate: '20260205',
description: 'Most capable agentic coding model. Combines frontier coding performance of GPT-5.2-Codex with reasoning and professional knowledge of GPT-5.2. ~25% faster.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -216,6 +292,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // Research preview, ChatGPT Pro only - API access limited to design partners
idPrefix: 'gpt-5.3-codex-spark',
label: 'GPT-5.3 Codex Spark',
pubDate: '20260212',
description: 'Text-only research preview optimized for real-time coding iteration. Delivers 1000+ tokens/sec on low-latency hardware.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -228,10 +305,11 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: TBD
},
// GPT-5.3 Chat Latest - Released March 4, 2026
// GPT-5.3 Chat Latest - Released March 3, 2026
{
idPrefix: 'gpt-5.3-chat-latest',
label: 'GPT-5.3 Instant',
pubDate: '20260303',
description: 'GPT-5.3 model powering ChatGPT. Points to the GPT-5.3 Instant snapshot currently used in ChatGPT.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -250,8 +328,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.2
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.2-2025-12-11',
label: 'GPT-5.2 (2025-12-11)',
pubDate: '20251211',
description: 'Most capable model for professional work and long-running agents. Improvements in general intelligence, long-context, agentic tool-calling, and vision.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -268,6 +348,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1441 }, // gpt-5.2-high
},
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.2',
label: 'GPT-5.2',
symLink: 'gpt-5.2-2025-12-11',
@@ -275,8 +356,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.2 Codex
{
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.2-codex',
label: 'GPT-5.2 Codex',
pubDate: '20251211',
description: 'GPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar environments. Supports low, medium, high, and xhigh reasoning effort settings.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -293,8 +376,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.2 Chat Latest
{
hidden: true, // superseded by GPT-5.3 Instant
idPrefix: 'gpt-5.2-chat-latest',
label: 'GPT-5.2 Instant',
pubDate: '20251211',
description: 'GPT-5.2 model powering ChatGPT. Fast, capable for everyday work with clear improvements in info-seeking, how-tos, technical writing.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -311,8 +396,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.2 Pro
{
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5.2-pro-2025-12-11',
label: 'GPT-5.2 Pro (2025-12-11)',
pubDate: '20251211',
description: 'Smartest and most trustworthy option for difficult questions. Uses more compute for harder thinking on complex domains like programming.',
contextWindow: 400000,
maxCompletionTokens: 272000,
@@ -328,6 +415,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: TBD
},
{
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5.2-pro',
label: 'GPT-5.2 Pro',
symLink: 'gpt-5.2-pro-2025-12-11',
@@ -338,8 +426,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.1
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.1-2025-11-13',
label: 'GPT-5.1 (2025-11-13)',
pubDate: '20251113',
description: 'The best model for coding and agentic tasks with configurable reasoning effort.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -355,6 +445,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1455 }, // gpt-5.1-high
},
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.1',
label: 'GPT-5.1',
symLink: 'gpt-5.1-2025-11-13',
@@ -362,8 +453,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.1 Chat Latest
{
hidden: true, // superseded by GPT-5.3 Instant
idPrefix: 'gpt-5.1-chat-latest',
label: 'GPT-5.1 Instant',
pubDate: '20251112',
description: 'GPT-5.1 Instant with adaptive reasoning. More conversational with improved instruction following.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -381,8 +474,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5.1 Codex Max
{
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex-max',
label: 'GPT-5.1 Codex Max',
pubDate: '20251119',
description: 'Our most intelligent coding model optimized for long-horizon, agentic coding tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -398,8 +493,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
},
// GPT-5.1 Codex
{
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex',
label: 'GPT-5.1 Codex',
pubDate: '20251113',
description: 'A version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -415,8 +512,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
},
// GPT-5.1 Codex Mini
{
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex-mini',
label: 'GPT-5.1 Codex Mini',
pubDate: '20251113',
description: 'Smaller, faster version of GPT-5.1 Codex for efficient coding tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -436,8 +535,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5-2025-08-07',
label: 'GPT-5 (2025-08-07)',
pubDate: '20250807',
description: 'The best model for coding and agentic tasks across domains.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -453,6 +554,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1433 }, // gpt-5-high
},
{
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5',
label: 'GPT-5',
symLink: 'gpt-5-2025-08-07',
@@ -460,8 +562,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5 Pro
{
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5-pro-2025-10-06',
label: 'GPT-5 Pro (2025-10-06)',
pubDate: '20251006',
description: 'Version of GPT-5 that uses more compute to produce smarter and more precise responses. Designed for tough problems.',
contextWindow: 400000,
maxCompletionTokens: 272000,
@@ -471,6 +575,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: has not been measured yet
},
{
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5-pro',
label: 'GPT-5 Pro',
symLink: 'gpt-5-pro-2025-10-06',
@@ -481,6 +586,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // deprecated per OpenAI docs (2026-04)
idPrefix: 'gpt-5-chat-latest',
label: 'GPT-5 ChatGPT (Non-Thinking)',
pubDate: '20250807',
description: 'GPT-5 model used in ChatGPT. Points to the GPT-5 snapshot currently used in ChatGPT.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -495,6 +601,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // deprecated per OpenAI docs (2026-04), superseded by gpt-5.1-codex/gpt-5.3-codex
idPrefix: 'gpt-5-codex',
label: 'GPT-5 Codex',
pubDate: '20250915',
description: 'A version of GPT-5 optimized for agentic coding in Codex.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -511,8 +618,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5 Search API
{
hidden: true, // poor quality - use llmVndOaiWebSearchContext on regular models instead
idPrefix: 'gpt-5-search-api-2025-10-14',
label: 'GPT-5 Search API (2025-10-14)',
pubDate: '20251014',
description: 'Updated web search model in Chat Completions API. 60% cheaper with domain filtering support.',
contextWindow: 400000,
maxCompletionTokens: 100000,
@@ -522,6 +631,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: TBD
},
{
hidden: true, // poor quality - use llmVndOaiWebSearchContext on regular models instead
idPrefix: 'gpt-5-search-api',
label: 'GPT-5 Search API',
symLink: 'gpt-5-search-api-2025-10-14',
@@ -529,8 +639,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5 mini
{
hidden: true, // superseded by GPT-5.4 Mini
idPrefix: 'gpt-5-mini-2025-08-07',
label: 'GPT-5 Mini (2025-08-07)',
pubDate: '20250807',
description: 'A faster, more cost-efficient version of GPT-5 for well-defined tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -540,6 +652,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1390 }, // gpt-5-mini-high
},
{
hidden: true, // superseded by GPT-5.4 Mini
idPrefix: 'gpt-5-mini',
label: 'GPT-5 Mini',
symLink: 'gpt-5-mini-2025-08-07',
@@ -547,8 +660,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-5 nano
{
hidden: true, // superseded by GPT-5.4 Nano
idPrefix: 'gpt-5-nano-2025-08-07',
label: 'GPT-5 Nano (2025-08-07)',
pubDate: '20250807',
description: 'Fastest, most cost-efficient version of GPT-5 for summarization and classification tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -558,6 +673,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1337 }, // gpt-5-nano-high
},
{
hidden: true, // superseded by GPT-5.4 Nano
idPrefix: 'gpt-5-nano',
label: 'GPT-5 Nano',
symLink: 'gpt-5-nano-2025-08-07',
@@ -588,6 +704,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // UNSUPPORTED YET
idPrefix: 'computer-use-preview-2025-03-11',
label: 'Computer Use Preview (2025-03-11)',
pubDate: '20250311',
description: 'Specialized model for computer use tool. Optimized for computer interaction capabilities.',
contextWindow: 8192,
maxCompletionTokens: 1024,
@@ -608,8 +725,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// o4-mini-deep-research - (v1/responses API)
{
idPrefix: 'o4-mini-deep-research-2025-06-26',
label: 'o4 Mini Deep Research (2025-06-26)',
description: 'Faster, more affordable deep research model for complex, multi-step research tasks.',
label: 'o4 Mini Deep Research [Deprecated]',
pubDate: '20250626',
isLegacy: true,
description: 'Faster, more affordable deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 with web search.]',
contextWindow: 200000,
maxCompletionTokens: 100000,
interfaces: [LLM_IF_OAI_Responses, ...IFS_CHAT_CACHE_REASON, LLM_IF_HOTFIX_NoTemperature],
@@ -625,8 +744,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
/// o4-mini
{
idPrefix: 'o4-mini-2025-04-16',
label: 'o4 Mini (2025-04-16)',
description: 'Latest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks.',
label: 'o4 Mini [Deprecated]',
pubDate: '20250416',
isLegacy: true,
description: 'Latest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
contextWindow: 200000,
maxCompletionTokens: 100000,
interfaces: IFS_CHAT_CACHE_REASON,
@@ -643,8 +764,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// o3-deep-research - (v1/responses API)
{
idPrefix: 'o3-deep-research-2025-06-26',
label: 'o3 Deep Research (2025-06-26)',
description: 'Our most powerful deep research model for complex, multi-step research tasks.',
label: 'o3 Deep Research [Deprecated]',
pubDate: '20250626',
isLegacy: true,
description: 'Our most powerful deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 Pro with web search.]',
contextWindow: 200000,
maxCompletionTokens: 100000,
interfaces: [LLM_IF_OAI_Responses, ...IFS_CHAT_CACHE_REASON, LLM_IF_HOTFIX_NoTemperature],
@@ -661,6 +784,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-pro-2025-06-10',
label: 'o3 Pro (2025-06-10)',
pubDate: '20250610',
description: 'Version of o3 with more compute for better responses. Provides consistently better answers for complex tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -679,6 +803,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-2025-04-16',
label: 'o3 (2025-04-16)',
pubDate: '20250416',
description: 'A well-rounded and powerful model across domains. Sets a new standard for math, science, coding, and visual reasoning tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -696,8 +821,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// o3-mini
{
idPrefix: 'o3-mini-2025-01-31',
label: 'o3 Mini (2025-01-31)',
description: 'Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks.',
label: 'o3 Mini [Deprecated]',
pubDate: '20250131',
isLegacy: true,
description: 'Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
contextWindow: 200000,
maxCompletionTokens: 100000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_HOTFIX_StripImages],
@@ -716,6 +843,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true,
idPrefix: 'o1-pro-2025-03-19',
label: 'o1 Pro (2025-03-19)',
pubDate: '20250319',
description: 'A version of o1 with more compute for better responses. Provides consistently better answers for complex tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -733,8 +861,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// o1
{
idPrefix: 'o1-2024-12-17',
label: 'o1 (2024-12-17)',
description: 'Previous full o-series reasoning model.',
label: 'o1 [Deprecated]',
pubDate: '20241217',
isLegacy: true,
description: 'Previous full o-series reasoning model. [Shutdown: 2026-10-23 - migrate to GPT-5.5 or o3.]',
contextWindow: 200000,
maxCompletionTokens: 100000,
interfaces: IFS_CHAT_CACHE_REASON,
@@ -755,6 +885,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4.1-2025-04-14',
label: 'GPT-4.1 (2025-04-14)',
pubDate: '20250414',
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -772,6 +903,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4.1-mini-2025-04-14',
label: 'GPT-4.1 Mini (2025-04-14)',
pubDate: '20250414',
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency by nearly half and cost by 83%.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -788,8 +920,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// GPT-4.1 nano
{
idPrefix: 'gpt-4.1-nano-2025-04-14',
label: 'GPT-4.1 Nano (2025-04-14)',
description: 'Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion.',
label: 'GPT-4.1 Nano [Deprecated]',
pubDate: '20250414',
isLegacy: true,
description: 'Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Nano.]',
contextWindow: 1047576,
maxCompletionTokens: 32768,
interfaces: IFS_CHAT_CACHE,
@@ -809,6 +943,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-1.5',
label: 'GPT Audio 1.5',
pubDate: '20260224',
description: 'Best voice model for audio in, audio out with Chat Completions. Accepts audio inputs and outputs.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -819,8 +954,10 @@ export const _knownOpenAIChatModels: ManualMappings = [
// gpt-audio
{
hidden: true, // superseded by GPT Audio 1.5
idPrefix: 'gpt-audio-2025-08-28',
label: 'GPT Audio (2025-08-28)',
pubDate: '20250828',
description: 'First generally available audio model. Accepts audio inputs and outputs, and can be used in the Chat Completions REST API.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -829,6 +966,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: TBD
},
{
hidden: true, // superseded by GPT Audio 1.5
idPrefix: 'gpt-audio',
label: 'GPT Audio',
symLink: 'gpt-audio-2025-08-28',
@@ -836,6 +974,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-mini-2025-12-15',
label: 'GPT Audio Mini (2025-12-15)',
pubDate: '20251215',
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -845,6 +984,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-mini-2025-10-06',
label: 'GPT Audio Mini (2025-10-06)',
pubDate: '20251006',
hidden: true, // previous version
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
contextWindow: 128000,
@@ -867,6 +1007,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-11-20',
label: 'GPT-4o (2024-11-20)',
pubDate: '20241120',
description: 'Snapshot of gpt-4o from November 20th, 2024.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -877,6 +1018,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-08-06',
label: 'GPT-4o (2024-08-06)',
pubDate: '20240806',
hidden: true, // previous version
description: 'Snapshot that supports Structured Outputs. gpt-4o currently points to this version.',
contextWindow: 128000,
@@ -888,6 +1030,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-05-13',
label: 'GPT-4o (2024-05-13)',
pubDate: '20240513',
hidden: true, // previous version
description: 'Original gpt-4o snapshot from May 13, 2024.',
contextWindow: 128000,
@@ -908,6 +1051,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-search-preview-2025-03-11',
label: 'GPT-4o Search Preview (2025-03-11)',
pubDate: '20250311',
description: 'Latest snapshot of the GPT-4o model optimized for web search capabilities.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -928,6 +1072,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-audio-preview-2025-06-03',
label: 'GPT-4o Audio Preview (2025-06-03)',
pubDate: '20250603',
description: 'Latest snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -940,6 +1085,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-audio-preview-2024-12-17',
label: 'GPT-4o Audio Preview (2024-12-17)',
pubDate: '20241217',
description: 'Snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -958,6 +1104,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-mini-2024-07-18',
label: 'GPT-4o Mini (2024-07-18)',
pubDate: '20240718',
description: 'Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more capable than GPT-3.5 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -974,6 +1121,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // UNSUPPORTED yet (audio output model)
idPrefix: 'gpt-4o-mini-audio-preview-2024-12-17',
label: 'GPT-4o Mini Audio Preview (2024-12-17)',
pubDate: '20241217',
description: 'Snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -992,6 +1140,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-mini-search-preview-2025-03-11',
label: 'GPT-4o Mini Search Preview (2025-03-11)',
pubDate: '20250311',
description: 'Latest snapshot of the GPT-4o Mini model optimized for web search capabilities.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1011,6 +1160,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-turbo-2024-04-09',
label: 'GPT-4 Turbo (2024-04-09)',
pubDate: '20240409',
hidden: true, // OLD
description: 'GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently points to this version.',
contextWindow: 128000,
@@ -1027,6 +1177,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0125-preview',
label: 'GPT-4 Turbo (0125)',
pubDate: '20240125',
hidden: true, // OLD
description: 'GPT-4 Turbo preview model intended to reduce cases of "laziness" where the model doesn\'t complete a task.',
contextWindow: 128000,
@@ -1038,6 +1189,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-1106-preview', // GPT-4 Turbo preview model
label: 'GPT-4 Turbo (1106)',
pubDate: '20231106',
hidden: true, // OLD
description: 'GPT-4 Turbo preview model featuring improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
contextWindow: 128000,
@@ -1057,6 +1209,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0613',
label: 'GPT-4 (0613)',
pubDate: '20230613',
hidden: true, // OLD
description: 'Snapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.',
contextWindow: 8192,
@@ -1068,6 +1221,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0314',
label: 'GPT-4 (0314)',
pubDate: '20230314',
hidden: true, // OLD
description: 'Snapshot of gpt-4 from March 14th 2023 with function calling data. Data up to Sep 2021.',
contextWindow: 8192,
@@ -1090,6 +1244,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-3.5-turbo-0125',
label: '3.5-Turbo (2024-01-25)',
pubDate: '20240125',
hidden: true, // OLD
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.',
contextWindow: 16385,
@@ -1101,6 +1256,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-3.5-turbo-1106',
label: '3.5-Turbo (1106)',
pubDate: '20231106',
hidden: true, // OLD
description: 'GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
contextWindow: 16385,
@@ -1220,6 +1376,12 @@ export function openAIInjectVariants(acc: ModelDescriptionSchema[], model: Model
const _manualOrderingIdPrefixes = [
// GPT-5.5
'gpt-5.5-20',
'gpt-5.5-pro-20',
'gpt-5.5-pro',
'gpt-5.5-chat-latest',
'gpt-5.5',
// GPT-5.4
'gpt-5.4-20',
'gpt-5.4-pro-20',
@@ -1419,6 +1581,7 @@ export function llmOrtOaiLookup(orModelName: string): OrtVendorLookupResult | un
// typemap to known models
const ortOaiRefMap: Record<string, string | null> = {
// renames
'gpt-5.5-chat': 'gpt-5.5-2026-04-23', // no chat-latest yet, map to snapshot
'gpt-5.4-chat': 'gpt-5.4-2026-03-05', // no chat-latest yet, map to snapshot
'gpt-5.3-chat': 'gpt-5.3-chat-latest',
'gpt-5.2-chat': 'gpt-5.2-chat-latest',
@@ -1453,5 +1616,5 @@ export function llmOrtOaiLookup(orModelName: string): OrtVendorLookupResult | un
// initialTemperature: not set - OpenAI models use the global fallback (0.5);
// NoTemperature models are handled client-side via LLM_IF_HOTFIX_NoTemperature (not propagated to OR)
return { interfaces, parameterSpecs };
return { interfaces, parameterSpecs, pubDate: entry.pubDate };
}
@@ -12,6 +12,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4.1-2025-04-14',
label: '💾➜ GPT-4.1 (2025-04-14)',
pubDate: '20250414',
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -22,6 +23,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4.1-mini-2025-04-14',
label: '💾➜ GPT-4.1 Mini (2025-04-14)',
pubDate: '20250414',
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency and cost.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -32,6 +34,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4o-mini-2024-07-18',
label: '💾➜ GPT-4o Mini (2024-07-18)',
pubDate: '20240718',
description: 'Affordable model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -41,6 +44,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4o-2024-08-06',
label: '💾➜ GPT-4o (2024-08-06)',
pubDate: '20240806',
description: 'Advanced, multimodal flagship model that\'s cheaper and faster than GPT-4 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -51,6 +55,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-3.5-turbo-0125',
label: '💾➜ GPT-3.5 Turbo (0125)',
pubDate: '20240125',
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats',
contextWindow: 16385,
maxCompletionTokens: 4096,
@@ -63,6 +68,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gemini-1.0-pro-001',
label: '💾➜ Gemini 1.0 Pro',
pubDate: '20240215',
description: 'Google\'s Gemini 1.0 Pro model',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -70,6 +76,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gemini-1.5-flash-001',
label: '💾➜ Gemini 1.5 Flash',
pubDate: '20240514',
description: 'Google\'s Gemini 1.5 Flash model - fast and efficient',
contextWindow: 1000000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
@@ -79,6 +86,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Meta-Llama-3.1-8B-Instruct',
label: '💾 Llama 3.1 · 8B Instruct',
pubDate: '20240723',
description: 'Meta Llama 3.1 8B Instruct - hosted inference with per-token pricing',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -87,6 +95,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Meta-Llama-3.1-70B-Instruct',
label: '💾 Llama 3.1 · 70B Instruct',
pubDate: '20240723',
description: 'Meta Llama 3.1 70B Instruct - hosted inference with per-token pricing',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -95,6 +104,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.1-8B',
label: '💾 Llama 3.1 · 8B Base',
pubDate: '20240723',
description: 'Meta Llama 3.1 8B base model for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -102,6 +112,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.1-70B',
label: '💾 Llama 3.1 · 70B Base',
pubDate: '20240723',
description: 'Meta Llama 3.1 70B base model for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -111,6 +122,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.2-1B-Instruct',
label: '💾 Llama 3.2 · 1B Instruct',
pubDate: '20240925',
description: 'Meta Llama 3.2 1B Instruct - lightweight model for edge and mobile deployment',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -118,6 +130,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.2-3B-Instruct',
label: '💾 Llama 3.2 · 3B Instruct',
pubDate: '20240925',
description: 'Meta Llama 3.2 3B Instruct - efficient model for edge deployment',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -127,6 +140,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.3-70B-Instruct',
label: '💾 Llama 3.3 · 70B Instruct',
pubDate: '20241206',
description: 'Meta Llama 3.3 70B Instruct - latest 70B model with performance comparable to Llama 3.1 405B',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -136,6 +150,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2-VL-7B-Instruct',
label: '💾 Qwen 2 · VL 7B Instruct',
pubDate: '20240830',
description: 'Alibaba Qwen 2 Vision-Language 7B Instruct - multimodal model for text and image understanding',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -145,6 +160,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-1.5B-Instruct',
label: '💾 Qwen 2.5 · 1.5B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 1.5B Instruct - efficient small model',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -152,6 +168,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-7B-Instruct',
label: '💾 Qwen 2.5 · 7B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 7B Instruct - balanced performance and efficiency',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -159,6 +176,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-14B-Instruct',
label: '💾 Qwen 2.5 · 14B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 14B Instruct - hosted inference (hourly compute unit pricing)',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -166,6 +184,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-72B-Instruct',
label: '💾 Qwen 2.5 · 72B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 72B Instruct - flagship model with performance comparable to Llama 3.1 405B',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -173,6 +192,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-Coder-7B-Instruct',
label: '💾 Qwen 2.5 · Coder 7B Instruct',
pubDate: '20241112',
description: 'Alibaba Qwen 2.5 Coder 7B Instruct - specialized for code generation and understanding',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -180,6 +200,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-Coder-32B-Instruct',
label: '💾 Qwen 2.5 · Coder 32B Instruct',
pubDate: '20241112',
description: 'Alibaba Qwen 2.5 Coder 32B Instruct - specialized for code generation and understanding',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -189,6 +210,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen3-8B',
label: '💾 Qwen 3 · 8B Base',
pubDate: '20250429',
description: 'Alibaba Qwen 3 8B base model for fine-tuning - supports thinking and non-thinking modes',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -196,6 +218,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen3-14B',
label: '💾 Qwen 3 · 14B Base',
pubDate: '20250429',
description: 'Alibaba Qwen 3 14B base model for fine-tuning - supports thinking and non-thinking modes',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -205,6 +228,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-1b-it',
label: '💾 Gemma 3 · 1B IT',
pubDate: '20250312',
description: 'Google Gemma 3 1B instruction-tuned - lightweight text-only model',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -212,6 +236,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-4b-it',
label: '💾 Gemma 3 · 4B IT',
pubDate: '20250312',
description: 'Google Gemma 3 4B instruction-tuned - efficient multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -219,6 +244,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-12b-it',
label: '💾 Gemma 3 · 12B IT',
pubDate: '20250312',
description: 'Google Gemma 3 12B instruction-tuned - balanced multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -226,6 +252,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-27b-it',
label: '💾 Gemma 3 · 27B IT',
pubDate: '20250312',
description: 'Google Gemma 3 27B instruction-tuned - largest Gemma 3 multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -235,6 +262,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'mistralai/Mistral-Nemo-Base-2407',
label: '💾 Mistral Nemo · Base',
pubDate: '20240718',
description: 'Mistral Nemo 12B base model (July 2024) for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -242,6 +270,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'mistralai/Mistral-Small-24B-Base-2501',
label: '💾 Mistral Small · 24B Base',
pubDate: '20250130',
description: 'Mistral Small 24B base model (Jan 2025) - competitive with larger models while faster',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -162,8 +162,11 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
// -- Vendor parameter & interface inheritance --
const llmRef = model.id.replace(/^[^/]+\//, '');
let initialTemperature: number | undefined;
let pubDate: string | undefined;
const _mergeLookup = (lookup: OrtVendorLookupResult | undefined) => {
if (lookup?.pubDate !== undefined)
pubDate = lookup.pubDate;
if (lookup?.interfaces)
for (const iface of lookup.interfaces)
if (!interfaces.includes(iface))
@@ -246,7 +249,10 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
// 0-day: xAI/Grok/Moonshot/Z.ai/DeepSeek models get default reasoning effort if not inherited
if (interfaces.includes(LLM_IF_OAI_Reasoning) && !parameterSpecs.some(p => p.paramId === 'llmVndMiscEffort')) {
// console.log('[DEV] openRouterModelToModelDescription: unexpected xAI/Grok/DeepSeek reasoning model:', model.id);
parameterSpecs.push({ paramId: 'llmVndMiscEffort' }); // binary thinking for these vendors
// Binary thinking only: OpenRouter's unified reasoning API currently rejects 'max' (see openai.chatCompletions.ts).
// We pin enumValues here so the shared llmVndMiscEffort registry (which also includes 'max' for native DeepSeek V4)
// does not surface 'max' in the UI for OR-routed models that can't honor it.
parameterSpecs.push({ paramId: 'llmVndMiscEffort', enumValues: ['none', 'high'] });
}
break;
@@ -267,6 +273,7 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
idPrefix: model.id,
// latest: ...
label,
...(pubDate !== undefined && { pubDate }),
description: model.description?.length > 280 ? model.description.slice(0, 277) + '...' : model.description,
contextWindow,
maxCompletionTokens,
@@ -39,6 +39,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-deep-research',
label: 'Sonar Deep Research',
pubDate: '20250214',
description: 'Expert-level research model for exhaustive searches and comprehensive reports. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
@@ -59,6 +60,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-reasoning-pro',
label: 'Sonar Reasoning Pro',
pubDate: '20250218',
description: 'Premier reasoning model (DeepSeek R1) with Chain of Thought. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
@@ -78,6 +80,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-pro',
label: 'Sonar Pro',
pubDate: '20250121',
description: 'Advanced search model for complex queries and deep content understanding. 200k context.',
contextWindow: 200000,
maxCompletionTokens: 8000,
@@ -96,6 +99,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar',
label: 'Sonar',
pubDate: '20250121',
description: 'Lightweight, cost-effective search model for quick, grounded answers. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -16,7 +16,14 @@ const DEV_DEBUG_XAI_MODELS = (Release.TenantSlug as any) === 'staging' /* ALSO I
// Known xAI Models - Manual Mappings
// List on: https://docs.x.ai/docs/models?cluster=us-east-1
// Verified: 2026-04-16
// Verified: 2026-05-03
// Flat pricing for Grok 4.3 flagship (April 2026)
const PRICE_43 = {
input: 1.25,
output: 2.5,
cache: { cType: 'oai-ac' as const, read: 0.2 },
};
// Flat pricing for Grok 4.20 flagship models
const PRICE_420 = {
@@ -82,11 +89,27 @@ const XAI_PAR_Pre4: ModelDescriptionSchema['parameterSpecs'] = [] as const;
const _knownXAIChatModels: ManualMappings = [
// Grok 4.20 (flagship, March 2026) - note: model IDs use dot (4.20), unlike earlier models
// Grok 4.3 (flagship, April 2026) - always-on reasoning, no reasoning_effort support
{
idPrefix: 'grok-4.3',
label: 'Grok 4.3',
pubDate: '20260417',
description: 'xAI\'s latest flagship model with always-on reasoning and a 1M token context window. Supports text, image, and video inputs with improved agentic performance at lower cost.',
contextWindow: 1000000,
maxCompletionTokens: undefined,
interfaces: [...XAI_IF_Vision, LLM_IF_OAI_Reasoning],
parameterSpecs: XAI_PAR, // no reasoning_effort - always-on reasoning
chatPrice: PRICE_43,
benchmark: { cbaElo: 1456 }, // grok-4.3
},
// Grok 4.20 (flagship, March 2026) - superseded by 4.3
{
hidden: true, // yield to 4.3
idPrefix: 'grok-4.20-0309-reasoning',
label: 'Grok 4.20 Reasoning',
description: 'xAI\'s most advanced flagship reasoning model with a 2M token context window. Deep reasoning and problem-solving capabilities with text and image inputs.',
pubDate: '20260309',
description: 'xAI\'s previous flagship reasoning model with a 2M token context window. Deep reasoning and problem-solving capabilities with text and image inputs.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
interfaces: [...XAI_IF_Vision, LLM_IF_OAI_Reasoning],
@@ -95,9 +118,11 @@ const _knownXAIChatModels: ManualMappings = [
benchmark: { cbaElo: 1480 }, // grok-4.20-beta-0309-reasoning (CBA name)
},
{
hidden: true, // yield to 4.3
idPrefix: 'grok-4.20-0309-non-reasoning',
label: 'Grok 4.20',
description: 'xAI\'s most advanced flagship model with a 2M token context window. Non-reasoning variant for fast, high-quality responses with text and image inputs.',
pubDate: '20260309',
description: 'xAI\'s previous flagship model with a 2M token context window. Non-reasoning variant for fast, high-quality responses with text and image inputs.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
interfaces: XAI_IF_Vision,
@@ -108,6 +133,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4.20-multi-agent-0309',
label: 'Grok 4.20 Multi-Agent',
pubDate: '20260309',
description: 'Multi-agent reasoning model that runs 4 specialized agents in parallel (coordinator, fact-checker, analyst, challenger) for collaborative verification with reduced hallucination.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -125,6 +151,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4-1-fast-reasoning',
label: 'Grok 4.1 Fast Reasoning',
pubDate: '20251119',
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Trained specifically for real-world enterprise use cases with exceptional performance on agentic workflows.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -136,6 +163,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4-1-fast-non-reasoning',
label: 'Grok 4.1 Fast', // 'Grok 4.1 Fast Non-Reasoning'
pubDate: '20251119',
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Non-reasoning variant for instant responses.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -150,6 +178,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.1
idPrefix: 'grok-4-fast-reasoning',
label: 'Grok 4 Fast Reasoning',
pubDate: '20250919',
description: 'Cost-efficient reasoning model with a 2M token context window. Optimized for fast reasoning in agentic workflows. 98% cost reduction vs Grok 4 with comparable performance.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -162,6 +191,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.1
idPrefix: 'grok-4-fast-non-reasoning',
label: 'Grok 4 Fast', // 'Grok 4 Fast Non-Reasoning'
pubDate: '20250919',
description: 'Cost-efficient non-reasoning model with a 2M token context window. Same weights as grok-4-fast-reasoning but constrained by non-reasoning system prompt for quick responses.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -174,6 +204,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.20
idPrefix: 'grok-4-0709',
label: 'Grok 4 (0709)',
pubDate: '20250709',
description: 'xAI\'s most advanced model, offering state-of-the-art reasoning and problem-solving capabilities over a massive 256k context window. Supports text and image inputs.',
contextWindow: 256000,
maxCompletionTokens: undefined,
@@ -187,6 +218,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-3',
label: 'Grok 3',
pubDate: '20250217',
description: 'xAI flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.',
contextWindow: 131072,
maxCompletionTokens: undefined,
@@ -198,6 +230,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-3-mini',
label: 'Grok 3 Mini',
pubDate: '20250217',
description: 'A lightweight model that is fast and smart for logic-based tasks. Supports function calling and structured outputs.',
contextWindow: 131072,
maxCompletionTokens: undefined,
@@ -214,6 +247,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-code-fast-1',
label: 'Grok Code Fast 1',
pubDate: '20250828',
description: 'Specialized reasoning model for agentic coding workflows. Fast, economical, and optimized for code generation, debugging, and software development tasks.',
contextWindow: 256000,
maxCompletionTokens: undefined,
@@ -227,6 +261,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-2-vision-1212',
label: 'Grok 2 Vision (1212)',
pubDate: '20241212',
description: 'xAI model grok-2-vision-1212 with image and text input capabilities. Supports text generation with a 32,768 token context window.',
contextWindow: 32768,
maxCompletionTokens: undefined,
@@ -320,6 +355,7 @@ export async function xaiFetchModelDescriptions(access: OpenAIAccessSchema): Pro
// manual sort order - your desired order
const _xaiIdStartsWithOrder = [
'grok-4.3',
'grok-4.20-0309-reasoning',
'grok-4.20-0309-non-reasoning',
'grok-4.20-multi-agent-0309',
@@ -32,6 +32,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-5',
label: 'GLM-5',
pubDate: '20260211',
description: 'Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.',
contextWindow: 204800, // 200K
interfaces: _IF_Reasoning,
@@ -43,6 +44,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-5-code',
label: 'GLM-5 Code',
// pubDate: UNCONFIRMED - 'glm-5-code' not in Z.ai pricing table or release-notes; Z.ai's coding plan documents GLM-5.1 / GLM-5-Turbo / GLM-4.7 / GLM-4.5-Air, no 'glm-5-code'
description: 'GLM-5 optimized for coding tasks. Uses the dedicated Coding endpoint. 200K context, thinking mode.',
contextWindow: 204800, // 200K
interfaces: _IF_Reasoning,
@@ -58,6 +60,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7',
label: 'GLM-4.7',
pubDate: '20251222',
description: 'Latest-gen GLM model with 128K context. Thinking mode activated by default.',
contextWindow: 131072, // 128K
interfaces: _IF_Reasoning,
@@ -69,6 +72,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7-flashx',
label: 'GLM-4.7 FlashX', // fast, low cost
pubDate: '20260119',
description: 'Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -80,6 +84,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7-flash',
label: 'GLM-4.7 Flash (Free)',
pubDate: '20260119',
description: 'Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -94,6 +99,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v-flashx',
label: 'GLM-4.6 V FlashX',
pubDate: '20251208',
description: 'Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -106,6 +112,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v-flash',
label: 'GLM-4.6 V Flash (Free)',
pubDate: '20251208',
description: 'Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -117,6 +124,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v',
label: 'GLM-4.6 V',
pubDate: '20251208',
description: 'Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -131,6 +139,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6',
label: 'GLM-4.6',
pubDate: '20250930',
description: 'GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -144,6 +153,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-ocr',
label: 'GLM-OCR (Vision, OCR)',
pubDate: '20260203',
description: 'Specialized OCR model for text extraction from images and documents.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_HOTFIX_NoWebP],
@@ -158,6 +168,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5v',
label: 'GLM-4.5 V',
pubDate: '20250811',
description: 'Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.',
contextWindow: 98304, // 96K
interfaces: _IF_Vision_Reasoning,
@@ -173,6 +184,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-flash',
label: 'GLM-4.5 Flash (Free)',
pubDate: '20250728',
description: 'Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -185,6 +197,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-airx',
label: 'GLM-4.5 AirX',
pubDate: '20250728',
description: 'Extended lightweight GLM-4.5 variant. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -197,6 +210,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-air',
label: 'GLM-4.5 Air',
pubDate: '20250728',
description: 'Lightweight GLM-4.5 variant. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -209,6 +223,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-x',
label: 'GLM-4.5 X',
pubDate: '20250728',
description: 'Extended GLM-4.5 model. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -221,6 +236,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5',
label: 'GLM-4.5',
pubDate: '20250728',
description: 'Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -234,6 +250,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4-32b-0414-128k',
label: 'GLM-4 32B (0414) 128K',
pubDate: '20250414',
description: 'GLM-4 32B model with 128K context, 16K output.',
contextWindow: 131072,
interfaces: _IF_Chat,
+4 -1
View File
@@ -6,4 +6,7 @@ cd "$(dirname "$0")/../../.."
# Run with npx tsx (will download on-demand if needed)
# Uses npx cache, lightweight and no local install required
exec npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
# Then dump a fresh JSON snapshot next to the DB.
exec npx -y tsx tools/data/llms/llm-registry-sync.ts --export-db tools/data/llms/llm-registry.json
+152 -5
View File
@@ -41,6 +41,7 @@ interface CliOptions {
discordWebhook?: string;
notifyFilters?: string;
validate?: boolean;
exportDbPath?: string; // --export-db <path>: read-only DB dump (no API calls, no sync)
}
interface StoredModel {
@@ -53,6 +54,7 @@ interface StoredModel {
deleted_at: string | null;
created: number | null;
updated: number | null;
pub_date: string | null;
context_window: number | null;
max_completion_tokens: number | null;
interfaces: string | null;
@@ -90,6 +92,13 @@ function extractSimplePrice(price: any): number | null {
return null;
}
/** Idempotent schema migration: adds a column if it doesn't already exist. Safe to call on every run. */
function ensureColumn(db: DatabaseSync, table: string, column: string, columnDef: string): void {
const cols = db.prepare(`PRAGMA table_info(${table})`).all() as Array<{ name: string }>;
if (!cols.some((c) => c.name === column))
db.exec(`ALTER TABLE ${table} ADD COLUMN ${column} ${columnDef}`);
}
function initDatabase(): DatabaseSync {
const db = new DatabaseSync(DB_PATH);
@@ -105,6 +114,7 @@ function initDatabase(): DatabaseSync {
deleted_at TEXT,
created INTEGER,
updated INTEGER,
pub_date TEXT,
context_window INTEGER,
max_completion_tokens INTEGER,
interfaces TEXT,
@@ -131,6 +141,9 @@ function initDatabase(): DatabaseSync {
)
`);
// Migrations for existing DBs (safe no-ops on fresh DBs that already have the column from CREATE TABLE).
ensureColumn(db, 'models', 'pub_date', 'TEXT');
return db;
}
@@ -157,15 +170,16 @@ function saveChanges(
): void {
if (changes.new.length > 0) {
const stmt = db.prepare(`
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
context_window, max_completion_tokens, interfaces, description,
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET
label = excluded.label,
last_seen = excluded.last_seen,
created = excluded.created,
updated = excluded.updated,
pub_date = excluded.pub_date,
context_window = excluded.context_window,
max_completion_tokens = excluded.max_completion_tokens,
interfaces = excluded.interfaces,
@@ -188,6 +202,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -208,6 +223,7 @@ function saveChanges(
last_seen = ?,
created = ?,
updated = ?,
pub_date = ?,
context_window = ?,
max_completion_tokens = ?,
interfaces = ?,
@@ -229,6 +245,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -247,11 +264,13 @@ function saveChanges(
if (changes.unchanged.length > 0) {
const stmt = db.prepare(`
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
context_window, max_completion_tokens, interfaces, description,
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET last_seen = excluded.last_seen
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET
last_seen = excluded.last_seen,
pub_date = excluded.pub_date
`);
for (const model of changes.unchanged) {
@@ -264,6 +283,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -310,6 +330,114 @@ function saveSyncHistory(
);
}
// ============================================================================
// Snapshot Export
// ============================================================================
interface CatalogModel {
id: string;
vendor: string;
service: string;
label: string;
pubDate: string | null;
firstSeen: string;
lastSeen: string;
deletedAt: string | null;
created: number | null;
updated: number | null;
contextWindow: number | null;
maxCompletionTokens: number | null;
interfaces: string[] | null;
description: string | null;
benchmarkElo: number | null;
priceInput: number | null;
priceOutput: number | null;
}
interface CatalogSnapshot {
schemaVersion: number;
exportedAt: string;
totalCount: number;
activeCount: number;
deletedCount: number;
byVendor: Record<string, number>;
models: CatalogModel[];
}
/** Dump the entire registry (active + soft-deleted) to a JSON file. Read-only on the DB. */
function exportSnapshot(db: DatabaseSync, outPath: string): void {
const rows = db.prepare(`
SELECT id, vendor, service, label, pub_date, first_seen, last_seen, deleted_at,
created, updated, context_window, max_completion_tokens, interfaces, description,
benchmark_elo, price_input, price_output
FROM models
ORDER BY vendor, service, id
`).all() as unknown as Array<StoredModel & { interfaces: string | null }>;
const byVendor: Record<string, number> = {};
let activeCount = 0;
let deletedCount = 0;
const models: CatalogModel[] = rows.map((r) => {
byVendor[r.vendor] = (byVendor[r.vendor] || 0) + 1;
if (r.deleted_at) deletedCount++;
else activeCount++;
let parsedInterfaces: string[] | null = null;
if (r.interfaces) {
try {
const parsed = JSON.parse(r.interfaces);
if (Array.isArray(parsed)) parsedInterfaces = parsed;
} catch {
// leave null on parse failure
}
}
return {
id: r.id,
vendor: r.vendor,
service: r.service,
label: r.label,
pubDate: r.pub_date,
firstSeen: r.first_seen,
lastSeen: r.last_seen,
deletedAt: r.deleted_at,
created: r.created,
updated: r.updated,
contextWindow: r.context_window,
maxCompletionTokens: r.max_completion_tokens,
interfaces: parsedInterfaces,
description: r.description,
benchmarkElo: r.benchmark_elo,
priceInput: r.price_input,
priceOutput: r.price_output,
};
});
const snapshot: CatalogSnapshot = {
schemaVersion: 1,
exportedAt: new Date().toISOString(),
totalCount: rows.length,
activeCount,
deletedCount,
byVendor,
models,
};
// Write atomically: write to temp, then rename. Avoids partial reads if a consumer is watching.
const dir = path.dirname(path.resolve(outPath));
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
const tmpPath = `${outPath}.tmp`;
fs.writeFileSync(tmpPath, JSON.stringify(snapshot, null, 2));
fs.renameSync(tmpPath, outPath);
console.log(
`${COLORS.green}✓ Exported${COLORS.reset} ${rows.length} models ` +
`(${activeCount} active, ${deletedCount} deleted) ` +
`${COLORS.dim}-> ${path.resolve(outPath)}${COLORS.reset}`,
);
}
// ============================================================================
// Change Detection
// ============================================================================
@@ -353,6 +481,9 @@ function detectChanges(
existingModel.context_window !== (model.contextWindow ?? null) ||
existingModel.max_completion_tokens !== (model.maxCompletionTokens ?? null) ||
existingModel.interfaces !== modelInterfaces;
// NOTE: pub_date intentionally EXCLUDED from change detection. On first run after upgrade,
// all rows go from NULL -> editorial value, which would fire ~hundreds of spurious "updated"
// notifications. The unchanged-touch path below silently backfills pub_date instead.
if (hasChanged) {
changes.updated.push(model);
@@ -542,6 +673,10 @@ function parseArgs(): CliOptions {
case '--validate':
options.validate = true;
break;
case '--export-db':
options.exportDbPath = nextArg;
i++;
break;
}
}
@@ -566,6 +701,7 @@ ${COLORS.bright}Options:${COLORS.reset}
--posthog-key <key> PostHog API key for analytics
--discord-webhook <url> Discord webhook URL
--notify-filters <list> Comma-separated vendor list (e.g., openai,anthropic)
--export-db <path> Read-only DB dump to JSON (no API calls, no sync). Run separately from sync.
--help Show this help
${COLORS.bright}Examples:${COLORS.reset}
@@ -961,6 +1097,17 @@ async function main() {
try {
const options = parseArgs();
// --export-db: read-only DB dump. No config, no sync, no API calls.
if (options.exportDbPath) {
const db = initDatabase();
try {
exportSnapshot(db, options.exportDbPath);
} finally {
db.close();
}
return;
}
let servicesConfig: Record<string, AixAPI_Access>;
if (options.config) {