mirror of
https://github.com/enricoros/big-AGI.git
synced 2026-05-10 21:50:14 -07:00
Compare commits
17 Commits
acdbb2fbaf
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| e900695f8b | |||
| aacb4349e9 | |||
| 55bde68a4d | |||
| 26ae3545a7 | |||
| 0001f7392b | |||
| d7e83e578b | |||
| 901d93b5f0 | |||
| 6858b0b94a | |||
| 9d88bf9b82 | |||
| 1bf1b744b9 | |||
| ee2d7114c7 | |||
| 3b1b54b3a3 | |||
| 524029a882 | |||
| 69161d29a7 | |||
| 8a542c1af4 | |||
| fe16970624 | |||
| e21abdef45 |
@@ -17,6 +17,13 @@ Architecture and system documentation is available in the `/kb/` knowledge base,
|
||||
#### CSF - Client-Side Fetch
|
||||
- **[CSF.md](systems/client-side-fetch.md)** - Direct browser-to-API communication for LLM requests
|
||||
|
||||
#### LLM - Language Model Metadata
|
||||
- **[LLM-editorial-control.md](modules/LLM-editorial-pubdate.md)** - Where we have editorial control over per-model metadata vs dynamic discovery; `pubDate` field semantics, propagation chain, resolution rules, per-vendor matrix
|
||||
- **[LLM-models-catalog-pipeline.md](modules/LLM-models-catalog-pipeline.md)** - Forward-looking pipeline: extraction script, snapshot artifact, website consumption, future schema extensions
|
||||
|
||||
#### LLM - Vendor APIs
|
||||
- **[LLM-gemini-interactions.md](modules/LLM-gemini-interactions.md)** - Gemini Interactions API (Deep Research): endpoints, status taxonomy, two retrieval paths (SSE replay vs JSON GET), known failure modes (10-min cuts, zombies), UI surface
|
||||
|
||||
### Systems Documentation
|
||||
|
||||
#### Core Platform Systems
|
||||
|
||||
@@ -0,0 +1,106 @@
|
||||
# LLM Editorial Control Surface
|
||||
|
||||
This document maps where Big-AGI has editorial control over per-model metadata (and therefore can guarantee fields like `pubDate`, curated `description`, `chatPrice`, `benchmark`, `parameterSpecs`, etc.) versus where it must rely on the vendor API's dynamic discovery (and therefore cannot guarantee them).
|
||||
|
||||
For the forward-looking pipeline (extraction script, snapshot, website consumption, future schema extensions), see [LLM-models-catalog-pipeline.md](LLM-models-catalog-pipeline.md).
|
||||
|
||||
|
||||
## The `pubDate` field
|
||||
|
||||
`pubDate?: string` (validated as `/^\d{8}$/`, e.g. `'20250929'`) is **optional** in the wire schema and on `DLLM`. It was added to:
|
||||
|
||||
- `ModelDescription_schema` in `src/modules/llms/server/llm.server.types.ts` - the canonical wire type
|
||||
- `OrtVendorLookupResult` in the same file - so OpenRouter inherits it via `llmOrt*Lookup`
|
||||
- `DLLM` in `src/common/stores/llms/llms.types.ts` - the persisted client model
|
||||
|
||||
### Where `pubDate` is guaranteed (always emitted)
|
||||
|
||||
- **Editorial entries** in 12 hybrid/editorial vendors (282 models). Hand-curated, externally corroborated. Future entries in these arrays are expected to include `pubDate`.
|
||||
- **Anthropic 0-day placeholder** (`llmsAntCreatePlaceholderModel`): when the API surfaces an Anthropic model not in the editorial list, the placeholder uses the API's `created_at` ISO date, falling back to today via `formatPubDate()`.
|
||||
- **Gemini 0-day fallback** (`geminiModelToModelDescription`): when the API returns a Gemini model not in `_knownGeminiModels`, the converter falls back to today via `formatPubDate()` (Gemini API does not expose a creation timestamp).
|
||||
|
||||
### Where `pubDate` is omitted (optional)
|
||||
|
||||
- **Symlink entries** (`KnownLink`) - inherit the target's `pubDate` via the merge logic in `fromManualMapping`.
|
||||
- **Unknown variants resolved through `super`/`fallback`** in `fromManualMapping` for non-Anthropic/non-Gemini vendors - the field is left undefined rather than fabricated.
|
||||
- **Dynamic-only vendors** (OpenRouter, TogetherAI, Novita, ChutesAI, FireworksAI, TLUS, Azure, LM Studio, LocalAI, FastAPI, ArceeAI, LLMAPI) - no editorial knob; pubDate flows in only when the underlying lookup or upstream API populates it.
|
||||
|
||||
The rationale: today's date is a defensible 0-day proxy only when we know we're seeing a brand-new model the vendor just announced (Anthropic and Gemini's "discovery via official model list" paths). For arbitrary dynamic vendors, fabricating today would mark old/well-known models as new - misleading. Better to omit.
|
||||
|
||||
### Propagation chain
|
||||
|
||||
- `fromManualMapping()` in `src/modules/llms/server/models.mappings.ts` - copies the field for OAI-style vendors when present
|
||||
- `geminiModelToModelDescription()` in `src/modules/llms/server/gemini/gemini.models.ts` - copies for Gemini, falls back to today for unknowns
|
||||
- `llmsAntCreatePlaceholderModel()` in `src/modules/llms/server/anthropic/anthropic.models.ts` - emits from API `created_at` (or today)
|
||||
- `_mergeLookup()` in `src/modules/llms/server/openai/models/openrouter.models.ts` - merges for OpenRouter cross-vendor inheritance
|
||||
- `_createDLLMFromModelDescription()` in `src/modules/llms/llm.client.ts` - copies onto the persisted DLLM when present
|
||||
- `formatPubDate()` helper in `src/modules/llms/server/models.mappings.ts` - shared `'YYYYMMDD'` formatter for the 0-day-fillable paths
|
||||
|
||||
### Semantics
|
||||
|
||||
`pubDate` is the **earliest public availability** of the model - the date on which the vendor first made this specific model usable by external users via any channel (consumer app, web, console, API, partner, open-weights upload).
|
||||
|
||||
It is **not**:
|
||||
|
||||
- The date Big-AGI added the entry to its catalog (Ollama uses `added` for that)
|
||||
- The training-data cutoff (proposed but not implemented; see `src/common/stores/llms/llms.types.next.ts:217`)
|
||||
- The date the model snapshot was built (suffixes like `-1212` may refer to build dates, but `pubDate` tracks public availability)
|
||||
|
||||
### Resolution rules (when sources conflict)
|
||||
|
||||
1. **Date-suffixed model IDs**: when the suffix matches a documented announcement, the suffix is canonical (vendor convention). xAI, OpenAI, and Mistral all use suffixes that closely track release dates.
|
||||
2. **Anthropic exception**: Anthropic's date suffixes are typically the **snapshot/training-cutoff date, not the public release date**. For example, `claude-3-7-sonnet-20250219` was released on 2025-02-24, `claude-opus-4-20250514` was released 2025-05-22, and `claude-haiku-4-5-20251001` was released 2025-10-15. Always corroborate against Anthropic's blog/press for the actual release date. Only `claude-sonnet-4-5-20250929` and `claude-opus-4-1-20250805` have suffixes that match.
|
||||
3. **Closed beta -> public beta -> GA**: use the first date *external* users could access the specific variant.
|
||||
4. **Family-headline IDs and dated snapshots** (e.g., `claude-opus-4-1` and `claude-opus-4-1-20250805`): typically share a release date.
|
||||
5. **Hosted on a third party** (Groq hosting Llama, OpenPipe mirroring others, OpenRouter aggregating): use the *underlying* model's original release date by its creator, not when the host added it.
|
||||
6. **Symlinks** (entries with `symLink:`): inherit the target's date.
|
||||
7. **Partial dates** (only month known): use the 1st of the month and tag as MEDIUM confidence in the editor's note.
|
||||
|
||||
|
||||
## Editorial control matrix
|
||||
|
||||
Three categories:
|
||||
|
||||
- **Editorial** - the vendor file contains hand-curated entries; we control descriptions, pricing, benchmarks, interfaces, parameter specs, and `pubDate`.
|
||||
- **Hybrid** - the API returns the live model list, and editorial entries (keyed by id/idPrefix) merge over the API data via `fromManualMapping`. We control everything except *which models exist*.
|
||||
- **Dynamic** - the API is the only source of model identity and metadata. Big-AGI cannot reliably populate `pubDate` here (no editorial knob).
|
||||
|
||||
| Vendor | Category | File | Array | Entries | `pubDate` populated |
|
||||
|---|---|---|---|---|---|
|
||||
| Anthropic | Hybrid | `anthropic/anthropic.models.ts` | `hardcodedAnthropicModels` | 12 | 12/12 HIGH |
|
||||
| Gemini | Hybrid | `gemini/gemini.models.ts` | `_knownGeminiModels` | 33 | 33/33 HIGH |
|
||||
| OpenAI | Hybrid | `openai/models/openai.models.ts` | `_knownOpenAIChatModels` | 96 | 95/96 HIGH/MED (`osb-120b` skipped, speculative) |
|
||||
| xAI | Hybrid | `openai/models/xai.models.ts` | `_knownXAIChatModels` | 13 | 13/13 HIGH (pilot) |
|
||||
| Mistral | Hybrid | `openai/models/mistral.models.ts` | `_knownMistralModelDetails` | 41 | 41/41 (40 HIGH, 1 MED for legacy `mistral-medium`) |
|
||||
| Moonshot (Kimi) | Hybrid | `openai/models/moonshot.models.ts` | `_knownMoonshotModels` | 13 | 13/13 (10 HIGH, 3 MED for v1 base models) |
|
||||
| Perplexity | Editorial | `openai/models/perplexity.models.ts` | `_knownPerplexityChatModels` | 4 | 4/4 HIGH |
|
||||
| MiniMax | Editorial | `openai/models/minimax.models.ts` | `_knownMiniMaxModels` | 10 | 10/10 HIGH |
|
||||
| DeepSeek | Hybrid | `openai/models/deepseek.models.ts` | `_knownDeepseekChatModels` | 4 | 4/4 HIGH |
|
||||
| Groq | Hybrid (host) | `openai/models/groq.models.ts` | `_knownGroqModels` | 11 | 11/11 HIGH (underlying-model date) |
|
||||
| Z.AI / GLM | Hybrid | `openai/models/zai.models.ts` | `_knownZAIModels` | 17 | 16/17 (`glm-5-code` UNCONFIRMED) |
|
||||
| OpenPipe | Editorial (mirror) | `openai/models/openpipe.models.ts` | `_knownOpenPipeChatModels` | 30 | 30/30 HIGH (all upstream-mirror, no OpenPipe originals) |
|
||||
| Bedrock | Reuses Anthropic | `bedrock/bedrock.models.ts` | -> `hardcodedAnthropicModels` | (12) | inherited |
|
||||
| Ollama | Editorial (catalog) | `ollama/ollama.models.ts` | `OLLAMA_BASE_MODELS` | 209 | **deferred** - see notes |
|
||||
| Arcee AI | Dynamic | `openai/models/arceeai.models.ts` | `_arceeKnownModels` | 0 | n/a (empty) |
|
||||
| LLMAPI | Dynamic | `openai/models/llmapi.models.ts` | `_llmapiKnownModels` | 0 | n/a (empty) |
|
||||
| Alibaba | Dynamic | `openai/models/alibaba.models.ts` | `_knownAlibabaChatModels` | 0 | n/a (empty) |
|
||||
| OpenRouter | Dynamic + delegated lookup | `openai/models/openrouter.models.ts` | (parser) | -- | inherited via `llmOrt*Lookup` |
|
||||
| TogetherAI | Dynamic | `openai/models/together.models.ts` | (parser) | -- | no |
|
||||
| FireworksAI | Dynamic | `openai/models/fireworksai.models.ts` | (parser) | -- | no |
|
||||
| Novita | Dynamic | `openai/models/novita.models.ts` | (parser) | -- | no |
|
||||
| ChutesAI | Dynamic | `openai/models/chutesai.models.ts` | (parser) | -- | no |
|
||||
| TLUS | Dynamic | `openai/models/tlusapi.models.ts` | (parser) | -- | no |
|
||||
| Azure | Dynamic | `openai/models/azure.models.ts` | (parser) | -- | no |
|
||||
| LM Studio | Dynamic | `openai/models/lmstudio.models.ts` | (parser) | -- | no |
|
||||
| LocalAI | Dynamic | `openai/models/localai.models.ts` | (parser) | -- | no |
|
||||
| FastAPI | Dynamic | `openai/models/fastapi.models.ts` | (parser) | -- | no |
|
||||
|
||||
**Totals**: 284 editorial entries across 12 vendors, of which **282** have corroborated `pubDate` and **2** are intentional gaps (`osb-120b` speculative, `glm-5-code` not yet announced). All 12 vendor files type-check clean.
|
||||
|
||||
### Notes
|
||||
|
||||
- **Hybrid** vendors are still effectively editorial for the models we know about: when an API id matches a hardcoded `idPrefix` (or `id`), `fromManualMapping` injects all the editorial fields. Unknown ids fall through to a default-shaped placeholder where `pubDate` is undefined.
|
||||
- **OpenRouter** delegates back to Anthropic / Gemini / OpenAI editorial lookups via `llmOrtAntLookup_ThinkingVariants`, `llmOrtGemLookup`, `llmOrtOaiLookup`. `pubDate` flows through these lookups, so OpenRouter-served Claude/Gemini/GPT models get `pubDate` automatically once the underlying editorial entry has it.
|
||||
- **Bedrock** finds Anthropic editorial via `llmBedrockFindAnthropicModel` and strips unsupported interfaces - `pubDate` inherits from Anthropic.
|
||||
- **Ollama** is deferred: 209 entries keyed by upstream model family (e.g. `qwen3.6`, `kimi-k2`, `glm-4.6`). Each entry's `pubDate` would need to be the upstream creator's release date (Meta, Alibaba, Moonshot, Z.AI, etc.). This is large-scale upstream research; better handled in a follow-up pass once cross-vendor `pubDate` data is consolidated and reusable.
|
||||
- **Dynamic-only** vendors get nothing automatic. To add `pubDate` for them we'd have to seed editorial entries (which is what `fromManualMapping`'s mapping mechanism was built for); this is a per-vendor decision and out of scope for the initial rollout.
|
||||
@@ -0,0 +1,88 @@
|
||||
# Gemini Interactions API
|
||||
|
||||
The Interactions API powers Gemini's agent runs (Deep Research today, more agent types planned). This doc is the source of truth for protocol shape, failure modes, and the recovery model — code comments link here instead of repeating the rationale.
|
||||
|
||||
## References
|
||||
|
||||
- **GH [#1088](https://github.com/enricoros/big-AGI/issues/1088)** — Auto-resume for Deep Research; Recover button
|
||||
- **GH [#1095](https://github.com/enricoros/big-AGI/issues/1095)** — Visualizations toggle (`agent_config.visualization`)
|
||||
- **Google forum [143098](https://discuss.ai.google.dev/t/interactions-api-connection-breaks-at-the-10-minutes-mark/143098)** — 10-min SSE cut
|
||||
- **Google forum [143099](https://discuss.ai.google.dev/t/streaming-resume-broken-on-interactions-api-deep-research-often-cannot-resume/143099)** — Streaming resume re-cuts
|
||||
- **Upstream specs** — `_upstream/gemini.interactions.spec.md`, `gemini.interactions.guide.md`, `gemini.deep-research.guide.md`
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Verb | Path | Purpose |
|
||||
|--------|-------------------------------------------|-------------------------------------------------------------------|
|
||||
| POST | `/v1beta/interactions` | Start a run. We always send `stream:true, background:true, store:true` |
|
||||
| GET | `/v1beta/interactions/{id}?stream=true` | Reattach via SSE replay (full event sequence from start) |
|
||||
| GET | `/v1beta/interactions/{id}` | Fetch the resource as JSON (one-shot) |
|
||||
| POST | `/v1beta/interactions/{id}/cancel` | Stop a background run |
|
||||
| DELETE | `/v1beta/interactions/{id}` | Remove the stored record (does NOT cancel an in-flight run) |
|
||||
|
||||
Retention: 1 day free, 55 days paid.
|
||||
|
||||
## Status taxonomy
|
||||
|
||||
| Status | Meaning | Handling |
|
||||
|-------------------|-----------------------------------------------|-------------------------------------------------------|
|
||||
| `in_progress` | Live run **or** zombie (see C) | Surface diagnostics; offer Resume/Recover/Stop |
|
||||
| `completed` | Done with content in `outputs[]` | Emit fragments, `tokenStopReason='ok'` |
|
||||
| `failed` | Server-side failure | Terminating issue |
|
||||
| `cancelled` | We or another client cancelled | Close as `cg-issue` |
|
||||
| `incomplete` | Stopped early (token limit) — partial outputs | Note + `tokenStopReason='out-of-tokens'` |
|
||||
| `requires_action` | Not expected for Deep Research | Fail loudly so we notice |
|
||||
|
||||
## Two retrieval paths
|
||||
|
||||
| Path | Endpoint | Parser | Use case |
|
||||
|-----------------------|-----------------------------------|-------------------------------------------|-----------------------------------|
|
||||
| SSE replay | `GET ?stream=true` | `createGeminiInteractionsParserSSE` | Canonical resume; live deltas |
|
||||
| JSON GET (recovery) | `GET` (no `stream`) | `createGeminiInteractionsParserNS` | Recover when SSE is broken |
|
||||
|
||||
Both replay from the start — `ContentReassembler` REPLACES content on reattach, so partial replay (`last_event_id`) is intentionally NOT used. The NS parser walks `outputs[]` (thoughts, text, images, audio) and emits the same particles the SSE parser would, in one batch.
|
||||
|
||||
## Failure modes
|
||||
|
||||
### A. 10-minute SSE cut (forum 143098)
|
||||
|
||||
The SSE connection gets cut at exactly 600 s, regardless of activity. The cut is malformed (JSON error array instead of clean SSE close) and we treat it as stream-closed-early. The run typically **continues** server-side and reaches `completed`. **Recover (JSON GET)** retrieves the full report.
|
||||
|
||||
### B. Streaming resume re-cuts (forum 143099)
|
||||
|
||||
A fresh SSE replay can re-cut at the same 10-minute boundary on long runs, so Resume alone never reaches `interaction.complete`. **Recover** is the fallback.
|
||||
|
||||
### C. Zombie interactions (#1088)
|
||||
|
||||
Resource sits in `status: in_progress` for **days** with `outputs: []` — the generator crashed but the status never transitioned. **Not recoverable** (no data was ever produced). The NS parser surfaces `created`, `updated`, output count, and a "stuck for over an hour" hint so the user can decide to delete and retry.
|
||||
|
||||
### D. Connection drop mid-run
|
||||
|
||||
Network blip; resource is fine. **Resume (SSE replay)** picks up cleanly.
|
||||
|
||||
## UI
|
||||
|
||||
`BlockOpUpstreamResume` renders up to three buttons:
|
||||
|
||||
| Button | Action | Shown when |
|
||||
|----------|-----------------------------------|---------------------------------------------------------|
|
||||
| Resume | SSE replay | `onResume` provided |
|
||||
| Recover | JSON GET (one-shot) | `upstreamHandle.uht` ∈ `_NS_RECOVER_UHTS` |
|
||||
| Stop | Cancel + delete upstream resource | `onDelete` provided |
|
||||
|
||||
The Recover gate is an inline `uht === 'vnd.gem.interactions'` check in `BlockOpUpstreamResume.tsx` — extend when another vendor needs the same fallback. Stop is intentionally NOT gated by Resume/Recover busy state — it's the escape hatch for hung resumes.
|
||||
|
||||
## Visualization control (#1095)
|
||||
|
||||
Deep Research accepts `agent_config.visualization: 'auto' | 'off'`. Exposed as `llmVndGeminiAgentViz` (label "Visualizations"). Forwarded only when explicitly `'off'` so the upstream `'auto'` default stays untouched. Useful when merging multiple reports — image fragments break Beam fusion.
|
||||
|
||||
## Code map
|
||||
|
||||
| File | Role |
|
||||
|--------------------------------------------------------------------------------------|-------------------------------------------------------|
|
||||
| `aix/server/dispatch/wiretypes/gemini.interactions.wiretypes.ts` | Zod schemas (RequestBody, Interaction, StreamEvent) |
|
||||
| `aix/server/dispatch/chatGenerate/adapters/gemini.interactionsCreate.ts` | POST body (input + agent_config) |
|
||||
| `aix/server/dispatch/chatGenerate/parsers/gemini.interactions.parser.ts` | SSE parser + NS parser |
|
||||
| `aix/server/dispatch/chatGenerate/chatGenerate.dispatch.ts` (`gemini` case) | Resume dispatch: SSE vs JSON branch |
|
||||
| `apps/chat/components/message/BlockOpUpstreamResume.tsx` | Resume / Recover / Stop UI |
|
||||
| `apps/chat/components/ChatMessageList.tsx` (`handleMessageUpstreamResume`) | Wires click handler to `aixReattachContent_DMessage_orThrow` |
|
||||
@@ -0,0 +1,78 @@
|
||||
# LLM Models Catalog Pipeline (forward-looking)
|
||||
|
||||
Status: **proposal / partially implemented**. Companion to [LLM-editorial-control.md](LLM-editorial-pubdate.md) which describes the durable reference (`pubDate` semantics, editorial-vs-dynamic matrix, propagation chain).
|
||||
|
||||
This document captures the forward-looking pipeline that turns Big-AGI's editorial model metadata into website value-add (plots, decision helpers, comparison tools at big-agi.com).
|
||||
|
||||
|
||||
## Goal
|
||||
|
||||
Stand up a database/datastore that the website (`~/dev/website`) can query for plots, decision helpers, and comparison tools - without requiring the website to call our authenticated tRPC endpoints.
|
||||
|
||||
|
||||
## Stages
|
||||
|
||||
### Stage 1: source of truth (in this repo) — DONE
|
||||
|
||||
Editorial files in `src/modules/llms/server/` remain the canonical source for:
|
||||
|
||||
- Identity: id, label, vendor
|
||||
- Capabilities: `interfaces`, `parameterSpecs`, `contextWindow`, `maxCompletionTokens`
|
||||
- Pricing: `chatPrice` (input / output / cache tiers)
|
||||
- Benchmarks: `benchmark.cbaElo` (Chat Bot Arena ELO)
|
||||
- Lifecycle: `pubDate`, `isLegacy`, `isPreview`, `hidden`, deprecation comments
|
||||
|
||||
Well-typed, version-controlled, reviewed - every model edit is a code change with diff history. 282 entries currently carry `pubDate` (see editorial-control matrix).
|
||||
|
||||
### Stage 2: extraction script — IN PROGRESS
|
||||
|
||||
A build-time script (e.g. `scripts/llms/export-models.ts`) that:
|
||||
|
||||
1. Loads every editorial vendor's model array.
|
||||
2. Normalizes per-vendor shapes (array vs Record, `id` vs `idPrefix`, `KnownLink` symlinks) to a single row format.
|
||||
3. Resolves symlinks (target's `pubDate` flows through).
|
||||
4. Writes a single JSON snapshot: `data/models-catalog.json` (one row per model, with vendor + the editorial fields above).
|
||||
|
||||
Open question: do we want this committed (gives the website a stable artifact / public URL) or built on-demand in CI? **Recommend committed snapshot** under `data/` so consumers get a stable URL.
|
||||
|
||||
### Stage 3: enrichment — NOT STARTED
|
||||
|
||||
The exported snapshot gets enriched with data we don't currently track in editorial files:
|
||||
|
||||
- **Knowledge cutoff** (proposed in `llms.types.next.ts:217` but never implemented; should be added to `ModelDescription_schema` as a follow-up).
|
||||
- **MMLU / HumanEval / SWE-bench / GPQA / MATH** scores (currently only `cbaElo`; richer benchmarks belong in a separate block).
|
||||
- **Throughput / latency** numbers (per-vendor, possibly per-region).
|
||||
- **Modalities matrix** (input image, input audio, input video, input PDF, output image, output audio).
|
||||
- **Weights availability** (closed / open / restricted), license.
|
||||
|
||||
Sources for enrichment: HuggingFace cards, vendor docs, Artificial Analysis, LLM-Stats, official benchmarks. Some can be scraped on a cadence; some needs editorial review.
|
||||
|
||||
### Stage 4: website consumption — NOT STARTED
|
||||
|
||||
The website (`~/dev/website`) consumes the snapshot to render:
|
||||
|
||||
- **Timeline plot**: `pubDate` (x-axis) vs `cbaElo` (y-axis), grouped by vendor - shows the frontier and rate of progress.
|
||||
- **Cost-per-quality plot**: `chatPrice.output` vs `cbaElo` - "best model per dollar".
|
||||
- **Decision helpers**: filter by capability (`interfaces`), context window, pricing tier, vendor.
|
||||
- **Comparison cards**: side-by-side specs.
|
||||
- **Lifecycle alerts**: deprecation warnings for retiring models.
|
||||
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Where does enrichment data live?** A separate `data/models-enrichment.json` (joined by id at build time) keeps editorial files clean but introduces a join surface. Alternative: extend `ModelDescription_schema` with optional enrichment fields and treat editorial files as the only source. Recommend the separate file approach - editorial files stay focused on vendor-API integration; enrichment evolves on a different cadence.
|
||||
2. **How fresh does the website need to be?** If daily, build the snapshot in CI on push and publish to a static URL. If real-time, consume tRPC directly - more work but fewer freshness gaps.
|
||||
3. **Do we expose `pubDate` and other editorial metadata via tRPC publicly, or only via the snapshot?** The current tRPC routes require auth; the website should consume the snapshot, not live tRPC.
|
||||
4. **Schema versioning** - if `ModelDescription_schema` evolves, the snapshot consumers need to be tolerant. Include a `schemaVersion` field in the snapshot envelope.
|
||||
|
||||
|
||||
## Future extensions to `ModelDescription_schema`
|
||||
|
||||
Beyond `pubDate`, the natural follow-ups (in priority order):
|
||||
|
||||
1. **`knowledgeCutoff?: string`** (`'YYYY-MM'` or `'YYYY-MM-DD'`) - already proposed in `llms.types.next.ts`. Useful for the timeline plot and for context-aware prompts.
|
||||
2. **`deprecationDate?: string`** - currently exists informally as `deprecated?: string` on `_knownGeminiModels`; should be promoted to the schema.
|
||||
3. **`license?: string`** - especially important for open-weights models (apache-2.0, mit, llama-community, custom).
|
||||
4. **`weights?: 'closed' | 'open' | 'restricted'`** - quick filter for "can I run this myself?".
|
||||
5. **`benchmarks?: { mmlu?: number, humaneval?: number, gpqa?: number, ... }`** - richer than the current `cbaElo`-only block.
|
||||
6. **`modalities?: { in: string[], out: string[] }`** - more precise than `interfaces` for input/output capability matrices.
|
||||
@@ -6,6 +6,7 @@ import { Box, List } from '@mui/joy';
|
||||
|
||||
import type { SystemPurposeExample } from '../../../data';
|
||||
|
||||
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
|
||||
import type { DiagramConfig } from '~/modules/aifn/digrams/DiagramsModal';
|
||||
import { speakText } from '~/modules/speex/speex.client';
|
||||
|
||||
@@ -123,7 +124,16 @@ export function ChatMessageList(props: {
|
||||
}
|
||||
}, [conversationHandler, conversationId, onConversationExecuteHistory]);
|
||||
|
||||
const handleMessageUpstreamResume = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId) => {
|
||||
|
||||
// Resume in-flight tracking - lives at this level (NOT inside BlockOpUpstreamResume) so it
|
||||
// survives any remount of the message bubble during a long-running stream (e.g. Deep Research).
|
||||
// - `resumeInFlight` (state) drives the loading/Detach UI on BlockOpUpstreamResume via props.
|
||||
// - `resumeAbortersRef` (ref) holds the AbortController so Detach can abort even after a remount.
|
||||
// Map keyed by messageId so multiple messages could in principle resume concurrently.
|
||||
const [resumeInFlight, setResumeInFlight] = React.useState<Record<DMessageId, AixReattachMode>>({});
|
||||
const resumeAbortersRef = React.useRef<Map<DMessageId, AbortController>>(new Map());
|
||||
|
||||
const handleMessageUpstreamResume = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId, mode: AixReattachMode) => {
|
||||
if (!conversationId || !conversationHandler) return;
|
||||
if (!generator.upstreamHandle) throw new Error('No upstream handle on generator');
|
||||
|
||||
@@ -131,20 +141,36 @@ export function ChatMessageList(props: {
|
||||
const llmId = generator.mgt === 'aix' ? generator.aix.mId : undefined;
|
||||
if (!llmId) throw new Error('No model id on generator');
|
||||
|
||||
const controller = new AbortController();
|
||||
resumeAbortersRef.current.set(messageId, controller);
|
||||
setResumeInFlight(prev => ({ ...prev, [messageId]: mode }));
|
||||
|
||||
const { aixCreateChatGenerateContext, aixReattachContent_DMessage_orThrow } = await import('~/modules/aix/client/aix.client');
|
||||
const result = await aixReattachContent_DMessage_orThrow(
|
||||
llmId,
|
||||
generator,
|
||||
aixCreateChatGenerateContext('conversation', conversationId),
|
||||
{ abortSignal: 'NON_ABORTABLE', throttleParallelThreads: 0 },
|
||||
async (update, isDone) => {
|
||||
conversationHandler.messageEdit(messageId, {
|
||||
fragments: update.fragments,
|
||||
generator: update.generator,
|
||||
pendingIncomplete: update.pendingIncomplete,
|
||||
}, isDone, isDone); // remove the pending state and updte only when done
|
||||
},
|
||||
);
|
||||
try {
|
||||
await aixReattachContent_DMessage_orThrow(
|
||||
llmId,
|
||||
generator,
|
||||
aixCreateChatGenerateContext('conversation', conversationId),
|
||||
mode,
|
||||
{ abortSignal: controller.signal, throttleParallelThreads: 0 }, // Detach: aborting kills the local fetch; upstream run keeps going.
|
||||
async (update, isDone) => {
|
||||
conversationHandler.messageEdit(messageId, {
|
||||
fragments: update.fragments,
|
||||
generator: update.generator,
|
||||
pendingIncomplete: update.pendingIncomplete,
|
||||
}, isDone, isDone); // remove the pending state and update only when done
|
||||
},
|
||||
);
|
||||
} finally {
|
||||
// Clear local tracking only if this attempt is still the current one (avoid races on rapid retry)
|
||||
if (resumeAbortersRef.current.get(messageId) === controller)
|
||||
resumeAbortersRef.current.delete(messageId);
|
||||
setResumeInFlight(prev => {
|
||||
if (prev[messageId] !== mode) return prev;
|
||||
const { [messageId]: _, ...rest } = prev;
|
||||
return rest;
|
||||
});
|
||||
}
|
||||
|
||||
// Manual reattach is one-shot: on failure (e.g. upstream 404 from expired or already-consumed handle),
|
||||
// drop the upstreamHandle so the Resume button doesn't keep luring the user into the same error.
|
||||
@@ -156,6 +182,11 @@ export function ChatMessageList(props: {
|
||||
// }, false /* messageComplete */, true /* touch */);
|
||||
}, [conversationHandler, conversationId]);
|
||||
|
||||
const handleMessageUpstreamDetach = React.useCallback((messageId: DMessageId) => {
|
||||
resumeAbortersRef.current.get(messageId)?.abort();
|
||||
}, []);
|
||||
|
||||
|
||||
const handleMessageUpstreamDelete = React.useCallback(async (generator: DMessageGenerator, messageId: DMessageId) => {
|
||||
if (!conversationId || !conversationHandler) return;
|
||||
if (!generator.upstreamHandle) throw new Error('No upstream handle on generator');
|
||||
@@ -395,7 +426,11 @@ export function ChatMessageList(props: {
|
||||
|
||||
{filteredMessages.map((message, idx) => {
|
||||
|
||||
// Optimization: only memo complete components, or we'd be memoizing garbage
|
||||
// Optimization: only memo complete components, or we'd be memoizing garbage (fragments
|
||||
// change every chunk during streaming, so the equality check would always fail).
|
||||
// CAVEAT: switching between memo and non-memo at the same position causes React to
|
||||
// remount the subtree (different component types). Any state that must survive that
|
||||
// boundary lives on this component (e.g. resumeInFlight, resumeAbortersRef).
|
||||
const ChatMessageMemoOrNot = !message.pendingIncomplete ? ChatMessageMemo : ChatMessage;
|
||||
|
||||
return props.isMessageSelectionMode ? (
|
||||
@@ -427,7 +462,9 @@ export function ChatMessageList(props: {
|
||||
onMessageBranch={handleMessageBranch}
|
||||
onMessageContinue={handleMessageContinue}
|
||||
onMessageUpstreamResume={handleMessageUpstreamResume}
|
||||
onMessageUpstreamDetach={handleMessageUpstreamDetach}
|
||||
onMessageUpstreamDelete={handleMessageUpstreamDelete}
|
||||
upstreamResumeMode={resumeInFlight[message.id]}
|
||||
onMessageDelete={handleMessageDelete}
|
||||
onMessageFragmentAppend={handleMessageAppendFragment}
|
||||
onMessageFragmentDelete={handleMessageDeleteFragment}
|
||||
|
||||
@@ -2,9 +2,13 @@ import * as React from 'react';
|
||||
import TimeAgo from 'react-timeago';
|
||||
|
||||
import { Box, Button, ButtonGroup, Tooltip, Typography } from '@mui/joy';
|
||||
import DownloadIcon from '@mui/icons-material/Download';
|
||||
import LinkOffRoundedIcon from '@mui/icons-material/LinkOffRounded';
|
||||
import PlayArrowRoundedIcon from '@mui/icons-material/PlayArrowRounded';
|
||||
import StopRoundedIcon from '@mui/icons-material/StopRounded';
|
||||
|
||||
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
|
||||
|
||||
import type { DMessageGenerator } from '~/common/stores/chat/chat.message';
|
||||
|
||||
|
||||
@@ -12,54 +16,65 @@ const ARM_TIMEOUT_MS = 4000;
|
||||
|
||||
|
||||
/**
|
||||
* FIXME: COMPLETE THIS
|
||||
* Resume controls for an upstream-stored run.
|
||||
* - Resume: SSE replay (live deltas) - canonical path. Always offered when onResume exists.
|
||||
* - Recover: one-shot JSON GET - shown only for vendors that benefit from it (Gemini Interactions).
|
||||
* - Detach: abort the local fetch but leave the upstream run alive. Visible only when a resume
|
||||
* is in-flight (`inFlightMode != null`). Resume/Recover stay available afterwards.
|
||||
* - Stop: terminate the upstream run + delete the resource.
|
||||
*
|
||||
* IMPORTANT: in-flight state is owned by the parent (`inFlightMode` + `onDetach`) so it survives
|
||||
* remounts that happen while a long-running stream is active (e.g. Deep Research).
|
||||
*/
|
||||
export function BlockOpUpstreamResume(props: {
|
||||
upstreamHandle: Exclude<DMessageGenerator['upstreamHandle'], undefined>,
|
||||
pending?: boolean; // true while the message is actively streaming; labels the Delete button as "Stop"
|
||||
onResume?: () => void | Promise<void>;
|
||||
onCancel?: () => void | Promise<void>;
|
||||
pending?: boolean; // true iff a local in-flight op (initial POST or resume); drives the state machine + hides the expiry footer
|
||||
inFlightMode?: AixReattachMode; // set by the parent while a resume is in flight; drives the loading/Detach UI
|
||||
onResume?: (mode: AixReattachMode) => void | Promise<void>;
|
||||
onDetach?: () => void;
|
||||
onDelete?: () => void | Promise<void>;
|
||||
}) {
|
||||
|
||||
// state
|
||||
const [isResuming, setIsResuming] = React.useState(false);
|
||||
const [isCancelling, setIsCancelling] = React.useState(false);
|
||||
// local state - only for short-lived ops the parent doesn't own
|
||||
const [isDeleting, setIsDeleting] = React.useState(false);
|
||||
const [deleteArmed, setDeleteArmed] = React.useState(false);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
|
||||
// expiration: boolean is evaluated at render (may lag briefly if nothing re-renders past expiry).
|
||||
// TimeAgo handles its own tick for the label; the button's disabled state is the only consumer of this flag.
|
||||
const { expiresAt /*, runId = ''*/ } = props.upstreamHandle;
|
||||
// const isExpired = expiresAt != null && Date.now() > expiresAt;
|
||||
|
||||
// State machine - mutually exclusive triplet (idle | initial-POST | resume | recover):
|
||||
// - Idle : !pending - run not active locally (incl. post-reload, since
|
||||
// chats.converters.ts clears pendingIncomplete on hydrate).
|
||||
// - Initial POST : pending && !inFlightMode - first generation streaming.
|
||||
// - Resume replay : pending && mode='replay' - we own this resume cycle.
|
||||
// - Recover snap : pending && mode='snapshot' - we own this snapshot fetch.
|
||||
//
|
||||
// Visibility matrix (see BlockOpUpstreamResume props doc):
|
||||
// Resume Recover Detach Cancel
|
||||
// Idle ✅ ✅¹ — ✅
|
||||
// Initial POST — — — ✅
|
||||
// Resume in flight — — ✅ ✅
|
||||
// Recover in flight — ✅² — —
|
||||
// ¹ only for Gemini Interactions ² with loading spinner
|
||||
const isReplaying = props.inFlightMode === 'replay';
|
||||
const isSnapshotting = props.inFlightMode === 'snapshot';
|
||||
const isIdle = !props.pending;
|
||||
|
||||
const canRecoverVendor = props.upstreamHandle.uht === 'vnd.gem.interactions';
|
||||
const showResume = isIdle && !!props.onResume;
|
||||
const showRecover = (isIdle || isSnapshotting) && !!props.onResume && canRecoverVendor;
|
||||
const showDetach = isReplaying && !!props.onDetach;
|
||||
const showCancel = !isSnapshotting && !!props.onDelete;
|
||||
|
||||
// handlers
|
||||
|
||||
const handleResume = React.useCallback(async () => {
|
||||
const handleResume = React.useCallback((mode: AixReattachMode) => {
|
||||
if (!props.onResume) return;
|
||||
setError(null);
|
||||
setIsResuming(true);
|
||||
try {
|
||||
await props.onResume();
|
||||
} catch (err: any) {
|
||||
setError(err?.message || 'Resume failed');
|
||||
} finally {
|
||||
setIsResuming(false);
|
||||
}
|
||||
}, [props]);
|
||||
|
||||
const handleCancel = React.useCallback(async () => {
|
||||
if (!props.onCancel) return;
|
||||
setError(null);
|
||||
setIsCancelling(true);
|
||||
try {
|
||||
await props.onCancel();
|
||||
} catch (err: any) {
|
||||
setError(err?.message || 'Cancel failed');
|
||||
} finally {
|
||||
setIsCancelling(false);
|
||||
}
|
||||
// fire-and-forget: parent owns the promise lifecycle and the abort controller.
|
||||
// If it rejects, the parent surfaces the error via its own UI; we stay silent.
|
||||
Promise.resolve(props.onResume(mode)).catch(() => { /* parent handles */ });
|
||||
}, [props]);
|
||||
|
||||
// Two-click arm: first click arms (visible red "Confirm?"), second click (within ARM_TIMEOUT_MS) executes.
|
||||
@@ -88,7 +103,6 @@ export function BlockOpUpstreamResume(props: {
|
||||
return () => clearTimeout(t);
|
||||
}, [deleteArmed]);
|
||||
|
||||
|
||||
return (
|
||||
<Box
|
||||
sx={{
|
||||
@@ -100,43 +114,55 @@ export function BlockOpUpstreamResume(props: {
|
||||
}}
|
||||
>
|
||||
<ButtonGroup>
|
||||
{props.onResume && (
|
||||
<Tooltip title='Resume generation from last checkpoint'>
|
||||
{showResume && (
|
||||
<Tooltip title='Resume by re-streaming from the upstream run'>
|
||||
<Button
|
||||
disabled={isResuming || isCancelling || isDeleting}
|
||||
loading={isResuming}
|
||||
disabled={isDeleting}
|
||||
startDecorator={<PlayArrowRoundedIcon color='success' />}
|
||||
onClick={handleResume}
|
||||
onClick={() => handleResume('replay')}
|
||||
>
|
||||
Resume
|
||||
</Button>
|
||||
</Tooltip>
|
||||
)}
|
||||
|
||||
{props.onCancel && (
|
||||
<Tooltip title='Cancel the response generation'>
|
||||
{showRecover && (
|
||||
<Tooltip title='Fetch the result without streaming - recovers stuck or hung runs'>
|
||||
<Button
|
||||
disabled={isResuming || isCancelling || isDeleting}
|
||||
loading={isCancelling}
|
||||
// startDecorator={<CancelIcon />}
|
||||
onClick={handleCancel}
|
||||
disabled={isDeleting}
|
||||
loading={isSnapshotting}
|
||||
loadingPosition='start'
|
||||
startDecorator={<DownloadIcon />}
|
||||
onClick={() => handleResume('snapshot')}
|
||||
>
|
||||
Cancel
|
||||
Recover
|
||||
</Button>
|
||||
</Tooltip>
|
||||
)}
|
||||
|
||||
{props.onDelete && (
|
||||
<Tooltip title={deleteArmed ? 'Click again to confirm - cancels the run upstream (no resume after)' : (props.pending ? 'Stop this response and cancel the upstream run' : 'Cancel the upstream run')}>
|
||||
{showDetach && (
|
||||
<Tooltip title='Close this connection only - the upstream run keeps going. Click Resume or Recover later to fetch results.'>
|
||||
<Button
|
||||
disabled={isDeleting}
|
||||
startDecorator={<LinkOffRoundedIcon />}
|
||||
onClick={props.onDetach}
|
||||
>
|
||||
Detach
|
||||
</Button>
|
||||
</Tooltip>
|
||||
)}
|
||||
|
||||
{showCancel && (
|
||||
<Tooltip title={deleteArmed ? 'Click again to confirm - cancels the upstream run and clears the handle' : 'Cancel the upstream run'}>
|
||||
<Button
|
||||
loading={isDeleting}
|
||||
color={deleteArmed ? 'danger' : 'neutral'}
|
||||
variant={deleteArmed ? 'solid' : 'outlined'}
|
||||
startDecorator={<StopRoundedIcon />}
|
||||
onClick={handleDelete}
|
||||
disabled={isCancelling || isDeleting}
|
||||
disabled={isDeleting}
|
||||
>
|
||||
{deleteArmed ? 'Confirm?' : (props.pending ? 'Stop' : 'Cancel')}
|
||||
{deleteArmed ? 'Confirm?' : 'Cancel'}
|
||||
</Button>
|
||||
</Tooltip>
|
||||
)}
|
||||
|
||||
@@ -29,6 +29,7 @@ import VerticalAlignBottomIcon from '@mui/icons-material/VerticalAlignBottom';
|
||||
import VisibilityIcon from '@mui/icons-material/Visibility';
|
||||
import VisibilityOffIcon from '@mui/icons-material/VisibilityOff';
|
||||
|
||||
import type { AixReattachMode } from '~/modules/aix/client/aix.client';
|
||||
import { ModelVendorAnthropic } from '~/modules/llms/vendors/anthropic/anthropic.vendor';
|
||||
|
||||
import { AnthropicIcon } from '~/common/components/icons/vendors/AnthropicIcon';
|
||||
@@ -161,8 +162,10 @@ export function ChatMessage(props: {
|
||||
onMessageBeam?: (messageId: string) => Promise<void>,
|
||||
onMessageBranch?: (messageId: string) => void,
|
||||
onMessageContinue?: (messageId: string, continueText: null | string) => void,
|
||||
onMessageUpstreamResume?: (generator: DMessageGenerator, messageId: string) => Promise<void>,
|
||||
onMessageUpstreamResume?: (generator: DMessageGenerator, messageId: string, mode: AixReattachMode) => Promise<void>,
|
||||
onMessageUpstreamDetach?: (messageId: string) => void,
|
||||
onMessageUpstreamDelete?: (generator: DMessageGenerator, messageId: string) => Promise<void>,
|
||||
upstreamResumeMode?: AixReattachMode, // set by parent while a resume is in flight on this message
|
||||
onMessageDelete?: (messageId: string) => void,
|
||||
onMessageFragmentAppend?: (messageId: DMessageId, fragment: DMessageFragment) => void
|
||||
onMessageFragmentDelete?: (messageId: DMessageId, fragmentId: DMessageFragmentId) => void,
|
||||
@@ -247,7 +250,7 @@ export function ChatMessage(props: {
|
||||
// const wordsDiff = useWordsDifference(textSubject, props.diffPreviousText, showDiff);
|
||||
|
||||
|
||||
const { onMessageAssistantFrom, onMessageDelete, onMessageFragmentAppend, onMessageFragmentDelete, onMessageFragmentReplace, onMessageContinue, onMessageUpstreamResume, onMessageUpstreamDelete } = props;
|
||||
const { onMessageAssistantFrom, onMessageDelete, onMessageFragmentAppend, onMessageFragmentDelete, onMessageFragmentReplace, onMessageContinue, onMessageUpstreamResume, onMessageUpstreamDetach, onMessageUpstreamDelete } = props;
|
||||
|
||||
const handleFragmentNew = React.useCallback(() => {
|
||||
onMessageFragmentAppend?.(messageId, createTextContentFragment(''));
|
||||
@@ -265,11 +268,15 @@ export function ChatMessage(props: {
|
||||
onMessageContinue?.(messageId, continueText);
|
||||
}, [messageId, onMessageContinue]);
|
||||
|
||||
const handleUpstreamResume = React.useCallback(() => {
|
||||
const handleUpstreamResume = React.useCallback((mode: AixReattachMode) => {
|
||||
if (!messageGenerator) return;
|
||||
return onMessageUpstreamResume?.(messageGenerator, messageId);
|
||||
return onMessageUpstreamResume?.(messageGenerator, messageId, mode);
|
||||
}, [messageGenerator, messageId, onMessageUpstreamResume]);
|
||||
|
||||
const handleUpstreamDetach = React.useCallback(() => {
|
||||
onMessageUpstreamDetach?.(messageId);
|
||||
}, [messageId, onMessageUpstreamDetach]);
|
||||
|
||||
const handleUpstreamDelete = React.useCallback(() => {
|
||||
if (!messageGenerator) return;
|
||||
return onMessageUpstreamDelete?.(messageGenerator, messageId);
|
||||
@@ -903,7 +910,9 @@ export function ChatMessage(props: {
|
||||
<BlockOpUpstreamResume
|
||||
upstreamHandle={messageGenerator.upstreamHandle}
|
||||
pending={messagePendingIncomplete}
|
||||
onResume={(!messagePendingIncomplete && onMessageUpstreamResume) ? handleUpstreamResume : undefined}
|
||||
inFlightMode={props.upstreamResumeMode}
|
||||
onResume={onMessageUpstreamResume ? handleUpstreamResume : undefined}
|
||||
onDetach={onMessageUpstreamDetach ? handleUpstreamDetach : undefined}
|
||||
onDelete={onMessageUpstreamDelete ? handleUpstreamDelete : undefined}
|
||||
/>
|
||||
)}
|
||||
|
||||
@@ -23,7 +23,7 @@ export const Release = {
|
||||
|
||||
// this is here to trigger revalidation of data, e.g. models refresh
|
||||
Monotonics: {
|
||||
Aix: 69,
|
||||
Aix: 70,
|
||||
NewsVersion: 204,
|
||||
},
|
||||
|
||||
|
||||
@@ -0,0 +1,14 @@
|
||||
import * as React from 'react';
|
||||
|
||||
import { SvgIcon, SvgIconProps } from '@mui/joy';
|
||||
|
||||
/*
|
||||
* Source: 'https://phosphoricons.com/' - list-checks (regular)
|
||||
*/
|
||||
export function PhListChecks(props: SvgIconProps) {
|
||||
return (
|
||||
<SvgIcon viewBox='0 0 256 256' stroke='none' fill='currentColor' width='24' height='24' {...props}>
|
||||
<path d='M128,128a8,8,0,0,1-8,8H40a8,8,0,0,1,0-16h80A8,8,0,0,1,128,128ZM40,72H184a8,8,0,0,0,0-16H40a8,8,0,0,0,0,16Zm80,112H40a8,8,0,0,0,0,16h80a8,8,0,0,0,0-16Zm133.66-50.34a8,8,0,0,0-11.32,0L208,171.31l-10.34-10.34a8,8,0,0,0-11.32,11.32l16,16a8,8,0,0,0,11.32,0l40-40A8,8,0,0,0,253.66,131.66Zm0,64a8,8,0,0,0-11.32,0L208,235.31l-10.34-10.34a8,8,0,0,0-11.32,11.32l16,16a8,8,0,0,0,11.32,0l40-40A8,8,0,0,0,253.66,195.66Z' />
|
||||
</SvgIcon>
|
||||
);
|
||||
}
|
||||
@@ -349,6 +349,15 @@ export const DModelParameterRegistry = {
|
||||
// when undefined, the model chooses automatically
|
||||
},
|
||||
|
||||
// Gemini Interactions API agent_config - per-agent knobs (Deep Research only today)
|
||||
llmVndGeminiAgentViz: _enumDef({
|
||||
label: 'Visualizations',
|
||||
type: 'enum',
|
||||
description: 'Charts and images in Deep Research reports. Disable for text-only output (helpful when merging multiple reports).',
|
||||
values: ['auto', 'off'],
|
||||
// undefined means upstream default ('auto'); we only forward when explicitly 'off'
|
||||
}),
|
||||
|
||||
// NOTE: we don't have this as a parameter, as for now we use it in tandem with llmVndGeminiGoogleSearch
|
||||
// llmVndGeminiUrlContext: {
|
||||
// label: 'URL Context',
|
||||
|
||||
@@ -25,6 +25,7 @@ export interface DLLM {
|
||||
label: string;
|
||||
created: number | 0;
|
||||
updated?: number | 0;
|
||||
pubDate?: string; // official release date in 'YYYYMMDD'
|
||||
description: string;
|
||||
hidden: boolean;
|
||||
|
||||
@@ -137,6 +138,20 @@ export function getLLMMaxOutputTokens(llm: DLLM | null): DLLMMaxOutputTokens | u
|
||||
return llm.userMaxOutputTokens ?? llm.maxOutputTokens;
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse the model's editorial `pubDate` ('YYYYMMDD') into a Date, or null if missing/malformed.
|
||||
* Date is constructed at local midnight - pubDate is day-precision, no time component.
|
||||
*/
|
||||
export function getLLMPubDate(llm: DLLM | null | undefined): Date | null {
|
||||
const p = llm?.pubDate;
|
||||
if (!p || !/^\d{8}$/.test(p)) return null;
|
||||
const y = parseInt(p.slice(0, 4), 10);
|
||||
const m = parseInt(p.slice(4, 6), 10) - 1; // JS Date months are 0-indexed
|
||||
const d = parseInt(p.slice(6, 8), 10);
|
||||
const date = new Date(y, m, d);
|
||||
return Number.isFinite(date.getTime()) ? date : null;
|
||||
}
|
||||
|
||||
/// Interfaces ///
|
||||
|
||||
// do not change anything below! those will be persisted in data
|
||||
|
||||
@@ -70,7 +70,7 @@ export function aixCreateModelFromLLMOptions(
|
||||
llmVndAntEffort, llmVndGemEffort, llmVndOaiEffort, llmVndMiscEffort,
|
||||
llmVndAnt1MContext, llmVndAntInfSpeed, llmVndAntSkills, llmVndAntThinkingBudget, llmVndAntWebDynamic, llmVndAntWebFetch, llmVndAntWebFetchMaxUses, llmVndAntWebSearch, llmVndAntWebSearchMaxUses,
|
||||
llmVndBedrockAPI,
|
||||
llmVndGeminiAspectRatio, llmVndGeminiImageSize, llmVndGeminiCodeExecution, llmVndGeminiComputerUse, llmVndGeminiGoogleSearch, llmVndGeminiMediaResolution, llmVndGeminiThinkingBudget,
|
||||
llmVndGeminiAgentViz, llmVndGeminiAspectRatio, llmVndGeminiImageSize, llmVndGeminiCodeExecution, llmVndGeminiComputerUse, llmVndGeminiGoogleSearch, llmVndGeminiMediaResolution, llmVndGeminiThinkingBudget,
|
||||
// llmVndMoonshotWebSearch,
|
||||
llmVndOaiRestoreMarkdown, llmVndOaiVerbosity, llmVndOaiWebSearchContext, llmVndOaiWebSearchGeolocation, llmVndOaiImageGeneration, llmVndOaiCodeInterpreter,
|
||||
llmVndOrtWebSearch,
|
||||
@@ -143,6 +143,7 @@ export function aixCreateModelFromLLMOptions(
|
||||
|
||||
// Gemini
|
||||
...(llmVndGeminiInteractions ? { vndGeminiAPI: 'interactions-agent' } : {}),
|
||||
...(llmVndGeminiAgentViz === 'off' ? { vndGeminiAgentViz: 'off' } : {}), // Deep Research agent_config.visualization - only forward when explicitly disabled
|
||||
...(llmVndGeminiAspectRatio ? { vndGeminiAspectRatio: llmVndGeminiAspectRatio } : {}),
|
||||
...(llmVndGeminiCodeExecution === 'auto' ? { vndGeminiCodeExecution: llmVndGeminiCodeExecution } : {}),
|
||||
...(llmVndGeminiComputerUse ? { vndGeminiComputerUse: llmVndGeminiComputerUse } : {}),
|
||||
@@ -644,22 +645,30 @@ function _finalizeLlmMetricsWithCosts(cgMetricsLg: undefined | DMetricsChatGener
|
||||
|
||||
// --- L2 - Content Generation reattachment as DMessage ---
|
||||
|
||||
/**
|
||||
* Reattach mode selects how to reconstruct an in-progress upstream run:
|
||||
* - 'replay' - canonical: SSE replays the event sequence from the start. Live deltas reach
|
||||
* the UI as the run progresses (or as past content is replayed).
|
||||
* - 'snapshot' - one-shot JSON GET returns the resource as-is right now. Used to recover when
|
||||
* the SSE endpoint is broken upstream but the resource itself is still readable.
|
||||
*
|
||||
* Names describe what you get, not how. See `kb/modules/LLM-gemini-interactions.md` for failure modes.
|
||||
*/
|
||||
export type AixReattachMode = 'replay' | 'snapshot';
|
||||
|
||||
/**
|
||||
* Reattach facade: wraps `aixChatGenerateContent_DMessage_orThrow` for the reattach-to-upstream flow.
|
||||
* - Validates the generator carries an `upstreamHandle`
|
||||
* - Stubs the unused chat-generate request, and
|
||||
* - Seeds the base function so the LL's reattach branch fires.
|
||||
*
|
||||
* On an in-progress upstream run (Gemini Deep Research today, extensible to OAI Responses), the server
|
||||
* just needs the handle to GET-poll; no chat-generate body is needed. This facade:
|
||||
* - validates the generator carries an `upstreamHandle`,
|
||||
* - stubs the chat-generate request (unused on the reattach path - the server uses the handle),
|
||||
* - seeds the base function via `clientOptions.reattachGenerator` so the LL's reattach branch fires.
|
||||
*
|
||||
* The reassembler starts with empty fragments; since Gemini Interactions snapshots are cumulative,
|
||||
* the stream will rebuild the complete content from scratch. Any partial content from the original run is replaced.
|
||||
* The reassembler replaces content on reattach (Gemini Interactions snapshots are cumulative, so this rebuilds from scratch).
|
||||
*/
|
||||
export async function aixReattachContent_DMessage_orThrow(
|
||||
llmId: DLLMId,
|
||||
reattachGenerator: Readonly<DMessageGenerator>,
|
||||
aixContext: AixAPI_Context_ChatGenerate,
|
||||
mode: AixReattachMode,
|
||||
clientOptions: Pick<AixClientOptions, 'abortSignal' | 'throttleParallelThreads'>,
|
||||
onStreamingUpdate?: (update: AixChatGenerateContent_DMessageGuts, isDone: boolean) => MaybePromise<void>,
|
||||
): Promise<_AixChatGenerateContent_DMessageGuts_WithOutcome> {
|
||||
@@ -674,7 +683,7 @@ export async function aixReattachContent_DMessage_orThrow(
|
||||
llmId,
|
||||
stubChatGenerate,
|
||||
aixContext,
|
||||
true, // streaming
|
||||
mode === 'replay', // wire-level: SSE demuxer (replay) vs one-shot JSON body (snapshot)
|
||||
{ ...clientOptions, reattachGenerator: reattachGenerator as any /* guaranteed by the check */ },
|
||||
onStreamingUpdate,
|
||||
);
|
||||
|
||||
@@ -516,6 +516,7 @@ export namespace AixWire_API {
|
||||
|
||||
// Gemini
|
||||
vndGeminiAPI: z.enum(['interactions-agent']).optional(), // opt-in per-model API dialect; unset = generateContent
|
||||
vndGeminiAgentViz: z.enum(['auto', 'off']).optional(), // agent_config.visualization; default 'auto' upstream
|
||||
vndGeminiAspectRatio: z.enum(['1:1', '2:3', '3:2', '3:4', '4:3', '9:16', '16:9', '21:9']).optional(),
|
||||
vndGeminiCodeExecution: z.enum(['auto']).optional(),
|
||||
vndGeminiComputerUse: z.enum(['browser']).optional(),
|
||||
|
||||
@@ -86,7 +86,8 @@ export function aixToGeminiInteractionsCreate(model: AixAPI_Model, chatGenerateR
|
||||
agent_config: {
|
||||
type: 'deep-research',
|
||||
thinking_summaries: 'auto', // Enable thought_summary blocks - without this the API would not emit summaries during streaming
|
||||
// visualization defaults to 'auto' upstream; leave unset to keep the default (agent may generate charts/images).
|
||||
// visualization: forwarded only when the client explicitly opts out; 'auto' (default) is left unset so the agent may generate charts/images.
|
||||
...(model.vndGeminiAgentViz === 'off' && { visualization: 'off' }),
|
||||
},
|
||||
}),
|
||||
// non-DR agents: use native system_instruction field (matches gemini.generateContent.ts convention)
|
||||
|
||||
@@ -25,7 +25,7 @@ import { createAnthropicFileInlineTransform } from './parsers/anthropic.transfor
|
||||
import { createAnthropicMessageParser, createAnthropicMessageParserNS } from './parsers/anthropic.parser';
|
||||
import { createBedrockConverseParserNS, createBedrockConverseStreamParser } from './parsers/bedrock-converse.parser';
|
||||
import { createGeminiGenerateContentResponseParser } from './parsers/gemini.parser';
|
||||
import { createGeminiInteractionsParserSSE } from './parsers/gemini.interactions.parser';
|
||||
import { createGeminiInteractionsParserNS, createGeminiInteractionsParserSSE } from './parsers/gemini.interactions.parser';
|
||||
import { createOpenAIChatCompletionsChunkParser, createOpenAIChatCompletionsParserNS } from './parsers/openai.parser';
|
||||
import { createOpenAIResponseParserNS, createOpenAIResponsesEventParser } from './parsers/openai.responses.parser';
|
||||
|
||||
@@ -329,16 +329,16 @@ export async function createChatGenerateResumeDispatch(access: AixAPI_Access, re
|
||||
};
|
||||
|
||||
case 'gemini': {
|
||||
// [Gemini Interactions] Reattach via SSE stream - GET /interactions/{id}?stream=true replays all events from the start (intentional - client's ContentReassembler replaces message content on reattach; partial resume via last_event_id is deliberately NOT used).
|
||||
// [Gemini Interactions] Reattach: SSE replay (?stream=true) or JSON snapshot (no query). See kb/modules/LLM-gemini-interactions.md.
|
||||
if (resumeHandle.uht !== 'vnd.gem.interactions')
|
||||
throw new Error(`Resume handle mismatch for gemini: expected 'vnd.gem.interactions', got '${resumeHandle.uht}'`);
|
||||
if (!streaming) console.warn(`[DEV] Gemini Interactions API - Resume only supported in SSE mode, ignoring streaming=false for ${resumeHandle.runId}`);
|
||||
const { url: _baseUrl, headers: _headers } = geminiAccess(access, null, GeminiInteractionsWire_API_Interactions.getPath(resumeHandle.runId /* Gemini interaction.id */), false);
|
||||
return {
|
||||
request: { url: `${_baseUrl}${_baseUrl.includes('?') ? '&' : '?'}stream=true`, method: 'GET', headers: _headers },
|
||||
/** Again, only support SSE here, for now (see comment in `createChatGenerateDispatch`) */
|
||||
demuxerFormat: 'fast-sse',
|
||||
chatGenerateParse: createGeminiInteractionsParserSSE(null /* model name unknown at resume time - caller's DMessage already has it */),
|
||||
request: { url: streaming ? `${_baseUrl}${_baseUrl.includes('?') ? '&' : '?'}stream=true` : _baseUrl, method: 'GET', headers: _headers },
|
||||
demuxerFormat: streaming ? 'fast-sse' : null,
|
||||
chatGenerateParse: streaming
|
||||
? createGeminiInteractionsParserSSE(null /* model name unknown at resume time - caller's DMessage already has it */)
|
||||
: createGeminiInteractionsParserNS(null),
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -151,6 +151,9 @@ export function createGeminiInteractionsParserSSE(requestedModelName: string | n
|
||||
if (!deltaParse.success) {
|
||||
// Empty deltas ({}) appear alongside placeholder blocks (e.g. internal tool slots) - silent skip
|
||||
if (event.delta && Object.keys(event.delta).length === 0) break;
|
||||
// Known-but-not-surfaced delta types (mirrors NS parser's INTERNAL_OUTPUT_TYPES policy + spec's document/video variants we don't model) - silent skip
|
||||
const deltaType = (event.delta as { type?: string })?.type;
|
||||
if (deltaType && (GeminiInteractionsWire_API_Interactions.INTERNAL_OUTPUT_TYPES.has(deltaType) || deltaType === 'document' || deltaType === 'video')) break;
|
||||
console.warn('[GeminiInteractions] unknown content.delta shape at index', event.index, event.delta);
|
||||
break;
|
||||
}
|
||||
@@ -241,6 +244,192 @@ export function createGeminiInteractionsParserSSE(requestedModelName: string | n
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Non-streaming parser: reads the GET /v1beta/interactions/{id} JSON body once and emits the same
|
||||
* particles the SSE parser would, in a single batch.
|
||||
*
|
||||
* Used by the "Recover" path when SSE delivery is broken upstream (10-min cuts; see KB doc) but the
|
||||
* resource is still fetchable. We always re-emit the upstream handle so failed/in_progress runs
|
||||
* remain retryable; only `status: completed` clears it (via the reassembler's outcome=='completed' policy).
|
||||
*
|
||||
* See `kb/modules/LLM-gemini-interactions.md` for failure modes and recovery model.
|
||||
*/
|
||||
export function createGeminiInteractionsParserNS(requestedModelName: string | null): ChatGenerateParseFunction {
|
||||
|
||||
const parserCreationTimestamp = Date.now();
|
||||
|
||||
return function parse(pt: IParticleTransmitter, rawEventData: string, _eventName?: string): void {
|
||||
|
||||
// model name (preserved from caller's DMessage on resume; first-call only on fresh fetches)
|
||||
if (requestedModelName != null)
|
||||
pt.setModelName(requestedModelName);
|
||||
|
||||
// parse + validate against the Interaction resource schema (looseObject - tolerant to upstream additions)
|
||||
let rawJson: unknown;
|
||||
try {
|
||||
rawJson = JSON.parse(rawEventData);
|
||||
} catch (e: any) {
|
||||
throw new Error(`malformed Interaction JSON: ${e?.message || String(e)}`);
|
||||
}
|
||||
const parsed = GeminiInteractionsWire_API_Interactions.Interaction_schema.safeParse(rawJson);
|
||||
if (!parsed.success) {
|
||||
console.warn('[GeminiInteractions-NS] unexpected Interaction shape:', rawJson);
|
||||
throw new Error('Gemini Interactions: unexpected resource shape (no `id`/`status` fields)');
|
||||
}
|
||||
const interaction = parsed.data;
|
||||
|
||||
// upstream handle - preserve so user can retry / delete
|
||||
pt.setUpstreamHandle(interaction.id, 'vnd.gem.interactions');
|
||||
|
||||
// Walk outputs in order. Each output is loose; we safeParse against KnownOutput_schema and
|
||||
// silently skip INTERNAL_OUTPUT_TYPES (tool calls/results). Order matters - thoughts and
|
||||
// text interleave in the report and the user reads them top-to-bottom.
|
||||
const outputs = interaction.outputs ?? [];
|
||||
let lastEmittedKind: 'thought' | 'text' | 'image' | 'audio' | null = null;
|
||||
for (const rawOut of outputs) {
|
||||
const outType = (rawOut as { type?: string })?.type;
|
||||
|
||||
// silent-skip internal tool-call outputs (matches SSE parser policy for INTERNAL_OUTPUT_TYPES)
|
||||
if (outType && GeminiInteractionsWire_API_Interactions.INTERNAL_OUTPUT_TYPES.has(outType))
|
||||
continue;
|
||||
|
||||
const knownOut = GeminiInteractionsWire_API_Interactions.KnownOutput_schema.safeParse(rawOut);
|
||||
if (!knownOut.success) {
|
||||
if (outType) console.warn('[GeminiInteractions-NS] unknown output type, skipping:', outType);
|
||||
continue;
|
||||
}
|
||||
|
||||
// emit a part boundary when switching kinds, mirrors SSE behavior on content.start across indices
|
||||
if (lastEmittedKind !== null && lastEmittedKind !== knownOut.data.type)
|
||||
pt.endMessagePart();
|
||||
|
||||
switch (knownOut.data.type) {
|
||||
case 'thought': {
|
||||
const summary = knownOut.data.summary;
|
||||
if (typeof summary === 'string') {
|
||||
if (summary) pt.appendReasoningText(summary);
|
||||
} else if (Array.isArray(summary)) {
|
||||
for (const item of summary)
|
||||
if (item.text) pt.appendReasoningText(item.text);
|
||||
}
|
||||
if (knownOut.data.signature)
|
||||
pt.setReasoningSignature(knownOut.data.signature);
|
||||
lastEmittedKind = 'thought';
|
||||
break;
|
||||
}
|
||||
case 'text': {
|
||||
if (knownOut.data.text)
|
||||
pt.appendText(knownOut.data.text);
|
||||
// Citations: matches SSE policy - DISABLE_CITATIONS kill-switch dictates Deep Research drops them
|
||||
if (!DISABLE_CITATIONS && knownOut.data.annotations) {
|
||||
for (const annRaw of knownOut.data.annotations) {
|
||||
const ann = GeminiInteractionsWire_API_Interactions.UrlCitationAnnotation_schema.safeParse(annRaw);
|
||||
if (!ann.success) continue;
|
||||
const a = ann.data;
|
||||
pt.appendUrlCitation(a.title || a.url, a.url, undefined, a.start_index, a.end_index, undefined, undefined);
|
||||
}
|
||||
}
|
||||
lastEmittedKind = 'text';
|
||||
break;
|
||||
}
|
||||
case 'image': {
|
||||
if (knownOut.data.data && knownOut.data.mime_type)
|
||||
pt.appendImageInline(knownOut.data.mime_type, knownOut.data.data, 'Gemini Generated Image', 'Gemini', '', true);
|
||||
else if (knownOut.data.uri)
|
||||
pt.appendText(`\n[Image: ${knownOut.data.uri}]\n`);
|
||||
lastEmittedKind = 'image';
|
||||
break;
|
||||
}
|
||||
case 'audio': {
|
||||
if (knownOut.data.data && knownOut.data.mime_type) {
|
||||
const mime = knownOut.data.mime_type.toLowerCase();
|
||||
const isPCM = mime.startsWith('audio/l16') || mime.includes('codec=pcm');
|
||||
if (isPCM) {
|
||||
try {
|
||||
const wav = geminiConvertPCM2WAV(knownOut.data.mime_type, knownOut.data.data);
|
||||
pt.appendAudioInline(wav.mimeType, wav.base64Data, 'Gemini Generated Audio', 'Gemini', wav.durationMs);
|
||||
} catch (error) {
|
||||
console.warn('[GeminiInteractions-NS] audio PCM convert failed:', error);
|
||||
}
|
||||
} else {
|
||||
pt.appendAudioInline(knownOut.data.mime_type, knownOut.data.data, 'Gemini Generated Audio', 'Gemini', 0);
|
||||
}
|
||||
}
|
||||
lastEmittedKind = 'audio';
|
||||
break;
|
||||
}
|
||||
default: {
|
||||
const _exhaustive: never = knownOut.data;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// close out any open part before the terminal status emission
|
||||
if (lastEmittedKind !== null) pt.endMessagePart();
|
||||
|
||||
// Terminal status -> stop reason + dialect end (mirrors _handleInteractionComplete)
|
||||
switch (interaction.status) {
|
||||
case 'completed':
|
||||
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
|
||||
pt.setTokenStopReason('ok');
|
||||
pt.setDialectEnded('done-dialect');
|
||||
break;
|
||||
case 'failed':
|
||||
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
|
||||
pt.setDialectTerminatingIssue('Deep Research interaction failed', null, 'srv-warn');
|
||||
break;
|
||||
case 'cancelled':
|
||||
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
|
||||
pt.setTokenStopReason('cg-issue');
|
||||
pt.setDialectEnded('done-dialect');
|
||||
break;
|
||||
case 'incomplete':
|
||||
pt.appendText('\n_Response incomplete (run stopped early)._\n');
|
||||
_emitUsageMetrics(pt, interaction.usage, parserCreationTimestamp, undefined);
|
||||
pt.setTokenStopReason('out-of-tokens');
|
||||
pt.setDialectEnded('done-dialect');
|
||||
break;
|
||||
case 'requires_action':
|
||||
pt.setDialectTerminatingIssue('Deep Research returned requires_action (not supported in this client)', null, 'srv-warn');
|
||||
break;
|
||||
case 'in_progress': {
|
||||
// Two scenarios both surface as `in_progress`:
|
||||
// 1) Run is genuinely live server-side (just slow) - polling later will yield content.
|
||||
// 2) "Zombie": the generator crashed but the status never transitioned. Stays `in_progress`
|
||||
// for days with no outputs. Not recoverable - the only remedy is delete + retry.
|
||||
// We can't disambiguate from one frame, so we surface {created, updated, outputs.length}
|
||||
// and let the user decide. `tokenStopReason='cg-issue'` keeps the upstream handle alive
|
||||
// (vs 'ok' which would clear it via the reassembler's clean-completion policy).
|
||||
// see kb/modules/LLM-gemini-interactions.md#failure-modes (C)
|
||||
const elapsedMin = _minutesSince(interaction.created);
|
||||
const updatedMin = _minutesSince(interaction.updated);
|
||||
const outCount = (interaction.outputs ?? []).length;
|
||||
const lines: string[] = ['\n_Deep Research run is **`in_progress`** server-side._\n'];
|
||||
if (elapsedMin != null) lines.push(`- Started: **${_humanDuration(elapsedMin)} ago**`);
|
||||
if (updatedMin != null && updatedMin !== elapsedMin) lines.push(`- Last server update: **${_humanDuration(updatedMin)} ago**`);
|
||||
lines.push(`- Outputs so far: **${outCount === 0 ? 'none' : outCount}**`);
|
||||
// Heuristic threshold: stale-and-empty for >60 min is almost certainly a zombie.
|
||||
const looksStuck = outCount === 0 && elapsedMin != null && elapsedMin > 60;
|
||||
if (looksStuck)
|
||||
lines.push('\nThis run looks **stuck** (no content for over an hour). Click **Cancel** to delete it and try again.');
|
||||
else
|
||||
lines.push('\nTry **Recover** again in a few minutes; if it stays empty, click **Cancel** to delete and retry.');
|
||||
pt.appendText(lines.join('\n') + '\n');
|
||||
pt.setTokenStopReason('cg-issue');
|
||||
pt.setDialectEnded('done-dialect');
|
||||
break;
|
||||
}
|
||||
default: {
|
||||
const _exhaustiveCheck: never = interaction.status;
|
||||
console.warn('[GeminiInteractions-NS] unreachable status', interaction.status);
|
||||
break;
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
// --- helpers ---
|
||||
|
||||
function _classifyContentKind(rawType: unknown): BlockState['kind'] {
|
||||
@@ -370,3 +559,22 @@ function _emitUsageMetrics(
|
||||
|
||||
pt.updateMetrics(m);
|
||||
}
|
||||
|
||||
|
||||
/** Minutes elapsed between an upstream ISO 8601 timestamp and now. Returns null on parse failure. */
|
||||
function _minutesSince(iso: string | undefined | null): number | null {
|
||||
if (!iso) return null;
|
||||
const ms = Date.parse(iso);
|
||||
if (!Number.isFinite(ms)) return null;
|
||||
return Math.max(0, (Date.now() - ms) / 60_000);
|
||||
}
|
||||
|
||||
/** Human-readable elapsed-time string for in_progress diagnostic messages. */
|
||||
function _humanDuration(minutes: number): string {
|
||||
if (minutes < 1) return 'less than a minute';
|
||||
if (minutes < 60) return `${Math.round(minutes)} min`;
|
||||
const hours = minutes / 60;
|
||||
if (hours < 24) return `${Math.round(hours * 10) / 10} hours`;
|
||||
const days = hours / 24;
|
||||
return `${Math.round(days * 10) / 10} days`;
|
||||
}
|
||||
|
||||
@@ -167,7 +167,7 @@ export namespace GeminiInteractionsWire_API_Interactions {
|
||||
// the parser prefers inline and falls back to a URI note when only `uri` is present.
|
||||
data: z.string().optional(), // base64-encoded bytes
|
||||
uri: z.string().optional(),
|
||||
mime_type: z.string(),
|
||||
mime_type: z.string().optional(), // spec: optional - parser still requires it before emitting inline
|
||||
resolution: z.string().optional(), // 'low' | 'medium' | 'high' | 'ultra_high'
|
||||
});
|
||||
|
||||
@@ -176,7 +176,7 @@ export namespace GeminiInteractionsWire_API_Interactions {
|
||||
// Per docs: data or uri, mime_type covers both PCM (audio/l16) and packaged formats (audio/wav, audio/mp3, ...).
|
||||
data: z.string().optional(),
|
||||
uri: z.string().optional(),
|
||||
mime_type: z.string(),
|
||||
mime_type: z.string().optional(), // spec: optional - parser still requires it before emitting inline
|
||||
rate: z.number().optional(), // sample rate, when known
|
||||
channels: z.number().optional(),
|
||||
});
|
||||
|
||||
@@ -107,6 +107,7 @@ function _createDLLMFromModelDescription(d: ModelDescriptionSchema, service: DMo
|
||||
label: d.label,
|
||||
created: d.created || 0,
|
||||
updated: d.updated || 0,
|
||||
...(d.pubDate && { pubDate: d.pubDate }),
|
||||
description: d.description,
|
||||
hidden: !!d.hidden,
|
||||
|
||||
|
||||
@@ -15,7 +15,7 @@ import WarningRoundedIcon from '@mui/icons-material/WarningRounded';
|
||||
|
||||
import { type DPricingChatGenerate, isLLMChatFree_cached, llmChatPricing_adjusted } from '~/common/stores/llms/llms.pricing';
|
||||
import type { ModelOptionsContext } from '~/common/layout/optima/store-layout-optima';
|
||||
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
|
||||
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
|
||||
import { FormLabelStart } from '~/common/components/forms/FormLabelStart';
|
||||
import { GoodModal } from '~/common/components/modals/GoodModal';
|
||||
import { LLMImplicitParametersRuntimeFallback } from '~/common/stores/llms/llms.parameters';
|
||||
@@ -280,6 +280,7 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
|
||||
|
||||
// cache
|
||||
const adjChatPricing = llmChatPricing_adjusted(llm);
|
||||
const pubDate = getLLMPubDate(llm);
|
||||
|
||||
|
||||
return (
|
||||
@@ -502,7 +503,8 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
|
||||
id: {llm.id}<br />
|
||||
context: <b>{getLLMContextTokens(llm)?.toLocaleString() ?? 'not provided'}</b> tokens{` · `}
|
||||
max output: <b>{getLLMMaxOutputTokens(llm)?.toLocaleString() ?? 'not provided'}</b><br />
|
||||
{!!llm.created && <>created: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
|
||||
{!!pubDate && <>published: <b>{pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}</b> · <TimeAgo date={pubDate} /><br /></>}
|
||||
{!!llm.created && <>indexed: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
|
||||
{/*· tags: {llm.tags.join(', ')}*/}
|
||||
{!!adjChatPricing && prettyPricingComponent(adjChatPricing)}
|
||||
{/*{!!llm.benchmark && <>benchmark: <b>{llm.benchmark.cbaElo?.toLocaleString() || '(unk) '}</b> CBA Elo<br /></>}*/}
|
||||
|
||||
@@ -123,6 +123,11 @@ const _geminiGoogleSearchOptions = [
|
||||
{ value: _UNSPECIFIED, label: 'Off', description: 'Default (disabled)' },
|
||||
] as const;
|
||||
|
||||
const _geminiAgentVizOptions = [
|
||||
{ value: _UNSPECIFIED, label: 'Auto', description: 'Default - agent may include charts/images' },
|
||||
{ value: 'off', label: 'Off', description: 'Text only (better when merging multiple reports)' },
|
||||
] as const;
|
||||
|
||||
const _geminiMediaResolutionOptions = [
|
||||
{ value: 'mr_high', label: 'High', description: 'Best quality' },
|
||||
{ value: 'mr_medium', label: 'Medium', description: 'Balanced' },
|
||||
@@ -245,6 +250,7 @@ export function LLMParametersEditor(props: {
|
||||
llmVndAntWebSearch,
|
||||
llmVndAntWebSearchMaxUses,
|
||||
llmVndGemEffort,
|
||||
llmVndGeminiAgentViz,
|
||||
llmVndGeminiAspectRatio,
|
||||
llmVndGeminiCodeExecution,
|
||||
llmVndGeminiGoogleSearch,
|
||||
@@ -687,6 +693,19 @@ export function LLMParametersEditor(props: {
|
||||
/>
|
||||
)}
|
||||
|
||||
{showParam('llmVndGeminiAgentViz') && (
|
||||
<FormSelectControl
|
||||
title='Visualizations'
|
||||
tooltip='Charts and images in Deep Research reports. Disable for text-only output (helpful when merging multiple reports).'
|
||||
value={llmVndGeminiAgentViz ?? _UNSPECIFIED}
|
||||
onChange={(value) => {
|
||||
if (value === _UNSPECIFIED || !value) onRemoveParameter('llmVndGeminiAgentViz');
|
||||
else onChangeParameter({ llmVndGeminiAgentViz: value });
|
||||
}}
|
||||
options={_geminiAgentVizOptions}
|
||||
/>
|
||||
)}
|
||||
|
||||
|
||||
{/*{showParam('llmVndMoonshotWebSearch') && (*/}
|
||||
{/* <FormSelectControl*/}
|
||||
|
||||
@@ -9,7 +9,7 @@ import VisibilityOutlinedIcon from '@mui/icons-material/VisibilityOutlined';
|
||||
|
||||
import type { DModelsServiceId } from '~/common/stores/llms/llms.service.types';
|
||||
import { isLLMChatFree_cached } from '~/common/stores/llms/llms.pricing';
|
||||
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
|
||||
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
|
||||
import { GoodTooltip } from '~/common/components/GoodTooltip';
|
||||
import { PhGearSixIcon } from '~/common/components/icons/phosphor/PhGearSixIcon';
|
||||
import { STAR_EMOJI, StarredToggle, starredToggleStyle } from '~/common/components/StarIcons';
|
||||
@@ -99,6 +99,10 @@ export const ModelItem = React.memo(function ModelItem(props: {
|
||||
const isNotSymlink = !llm.label.startsWith('🔗'); // getLLMLabel exception: need access to the base
|
||||
const llmLabel = getLLMLabel(llm);
|
||||
|
||||
// "new" badge: shown only when pubDate is set AND within the last 30 days
|
||||
const pubDate = getLLMPubDate(llm);
|
||||
const isRecentlyPublished = pubDate ? (Date.now() - pubDate.getTime()) < 30 * 24 * 60 * 60 * 1000 : false;
|
||||
|
||||
|
||||
const handleLLMConfigure = React.useCallback((event: React.MouseEvent) => {
|
||||
event.stopPropagation();
|
||||
@@ -227,6 +231,7 @@ export const ModelItem = React.memo(function ModelItem(props: {
|
||||
</>}
|
||||
|
||||
{/* Features Chips - sync with `useLLMSelect.tsx` */}
|
||||
{isRecentlyPublished && isNotSymlink && pubDate && <GoodTooltip title={`Released ${pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}`}><Chip size='sm' variant='solid' sx={isHidden ? styles.chipDisabled : { bgcolor: '#d4ff3a', color: 'black', fontWeight: 'lg' }}>new</Chip></GoodTooltip>}
|
||||
{featuresChipMemo}
|
||||
{seemsFree && isNotSymlink && <Chip size='sm' color='success' variant='plain' sx={isHidden ? styles.chipDisabled : styles.chipFree}>free</Chip>}
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
|
||||
|
||||
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
|
||||
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
|
||||
import { llmDevCheckModels_DEV } from '../models.mappings';
|
||||
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
|
||||
|
||||
|
||||
// Note: these model definitions are shared across Anthropic API, OpenRouter, and AWS Bedrock.
|
||||
@@ -214,12 +214,13 @@ export function llmsAntInjectVariants(acc: ModelDescriptionSchema[], model: Mode
|
||||
}
|
||||
|
||||
|
||||
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean })[] = [
|
||||
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean, pubDate: string /* make it required for the defs */ })[] = [
|
||||
|
||||
// Claude 4.7 models
|
||||
{
|
||||
id: 'claude-opus-4-7', // Active - 2026-04-16
|
||||
label: 'Claude Opus 4.7',
|
||||
pubDate: '20260416',
|
||||
description: 'Most capable generally available model for complex reasoning and agentic coding',
|
||||
contextWindow: 1_000_000, // 1M GA at standard pricing (no opt-in required)
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -239,6 +240,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-opus-4-6', // Active
|
||||
label: 'Claude Opus 4.6',
|
||||
pubDate: '20260205',
|
||||
description: 'Previous most intelligent model for complex agents and coding, with adaptive thinking',
|
||||
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -255,6 +257,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-sonnet-4-6', // Active
|
||||
label: 'Claude Sonnet 4.6',
|
||||
pubDate: '20260217',
|
||||
description: 'Best combination of speed and intelligence for everyday tasks',
|
||||
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
|
||||
maxCompletionTokens: 128000, // docs say 64000, API reports 128000
|
||||
@@ -272,6 +275,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-opus-4-5-20251101', // Active
|
||||
label: 'Claude Opus 4.5',
|
||||
pubDate: '20251124',
|
||||
description: 'Previous most intelligent model with advanced reasoning for complex agentic workflows',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 64000,
|
||||
@@ -286,6 +290,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-sonnet-4-5-20250929', // Active
|
||||
label: 'Claude Sonnet 4.5',
|
||||
pubDate: '20250929',
|
||||
description: 'Previous best combination of speed and intelligence for complex agents and coding',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 64000,
|
||||
@@ -311,6 +316,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-haiku-4-5-20251001', // Active
|
||||
label: 'Claude Haiku 4.5',
|
||||
pubDate: '20251015',
|
||||
description: 'Fastest model with exceptional speed and performance',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 64000,
|
||||
@@ -324,6 +330,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-opus-4-1-20250805', // Active
|
||||
label: 'Claude Opus 4.1',
|
||||
pubDate: '20250805',
|
||||
description: 'Exceptional model for specialized complex tasks requiring advanced reasoning',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 32000,
|
||||
@@ -338,6 +345,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-opus-4-7
|
||||
id: 'claude-opus-4-20250514', // Deprecated
|
||||
label: 'Claude Opus 4 [Deprecated]',
|
||||
pubDate: '20250522',
|
||||
description: 'Previous flagship model. Deprecated April 14, 2026, retiring June 15, 2026.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 32000,
|
||||
@@ -351,6 +359,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-sonnet-4-6
|
||||
id: 'claude-sonnet-4-20250514', // Deprecated
|
||||
label: 'Claude Sonnet 4 [Deprecated]',
|
||||
pubDate: '20250522',
|
||||
description: 'High-performance model. Deprecated April 14, 2026, retiring June 15, 2026.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 64000,
|
||||
@@ -379,6 +388,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-3-7-sonnet-20250219', // Retired | Deprecated: October 28, 2025 | Retired: February 19, 2026 | Replacement: claude-opus-4-6
|
||||
label: 'Claude Sonnet 3.7 [Retired]',
|
||||
pubDate: '20250224',
|
||||
description: 'High-performance model with early extended thinking. Retired February 19, 2026.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 64000,
|
||||
@@ -396,6 +406,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
{
|
||||
id: 'claude-3-5-haiku-20241022', // Retired | Deprecated: December 19, 2025 | Retired: February 19, 2026
|
||||
label: 'Claude Haiku 3.5 [Retired]',
|
||||
pubDate: '20241104',
|
||||
description: 'Intelligence at blazing speeds. Retired February 19, 2026.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 8192,
|
||||
@@ -413,6 +424,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
|
||||
hidden: true, // deprecated
|
||||
id: 'claude-3-haiku-20240307', // Deprecated | Deprecated: February 19, 2026 | Retiring: April 20, 2026 | Replacement: claude-haiku-4-5-20251001
|
||||
label: 'Claude Haiku 3 [Deprecated]',
|
||||
pubDate: '20240313',
|
||||
description: 'Fast and compact model for near-instant responsiveness. Deprecated February 19, 2026, retiring April 20, 2026.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 4096,
|
||||
@@ -595,11 +607,13 @@ export function llmsAntCreatePlaceholderModel(model: AnthropicWire_API_Models_Li
|
||||
parameterSpecs.push(...ANT_TOOLS);
|
||||
|
||||
const maxInputTokens = model.max_input_tokens;
|
||||
const createdAt = model.created_at ? new Date(model.created_at) : undefined;
|
||||
return {
|
||||
id: model.id,
|
||||
idVariant: '::placeholder',
|
||||
label: model.display_name,
|
||||
created: Math.round(new Date(model.created_at).getTime() / 1000),
|
||||
created: createdAt ? Math.round(createdAt.getTime() / 1000) : undefined,
|
||||
pubDate: formatPubDate(createdAt), // 0-day: use Anthropic API's created_at, or today if unset
|
||||
description: 'Newest model, description not available yet.',
|
||||
contextWindow: maxInputTokens ?? 200_000, // report API value as-is (no cap for unknown models)
|
||||
maxCompletionTokens: model.max_tokens || 32768,
|
||||
@@ -755,5 +769,5 @@ export function llmOrtAntLookup_ThinkingVariants(orModelName: string): OrtVendor
|
||||
.map((spec) => ({ ...spec }));
|
||||
|
||||
// initialTemperature: not set - Anthropic models use the global fallback (0.5)
|
||||
return { interfaces, parameterSpecs };
|
||||
return { pubDate: model.pubDate, interfaces, parameterSpecs };
|
||||
}
|
||||
|
||||
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
|
||||
|
||||
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
|
||||
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
|
||||
import { llmDevCheckModels_DEV } from '../models.mappings';
|
||||
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
|
||||
|
||||
|
||||
// dev options
|
||||
@@ -186,7 +186,7 @@ const _knownGeminiModels: ({
|
||||
symLink?: string,
|
||||
deprecated?: string, // Gemini may provide deprecation dates
|
||||
// _delete removed - models are now physically removed from the list instead of marked for deletion
|
||||
} & Pick<ModelDescriptionSchema, 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'>)[] = [
|
||||
} & Pick<ModelDescriptionSchema, 'pubDate' | 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'> & { pubDate: string /* make it required */})[] = [
|
||||
|
||||
/// Generation 3.1
|
||||
|
||||
@@ -195,6 +195,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-3.1-pro-preview',
|
||||
labelOverride: 'Gemini 3.1 Pro Preview',
|
||||
pubDate: '20260219',
|
||||
isPreview: true,
|
||||
chatPrice: gemini30ProPricing, // same pricing as 3 Pro
|
||||
interfaces: IF_30,
|
||||
@@ -213,6 +214,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // specialized variant for custom tool prioritization
|
||||
id: 'models/gemini-3.1-pro-preview-customtools',
|
||||
labelOverride: 'Gemini 3.1 Pro Preview (Custom Tools)',
|
||||
pubDate: '20260219',
|
||||
isPreview: true,
|
||||
chatPrice: gemini30ProPricing,
|
||||
interfaces: IF_30,
|
||||
@@ -230,6 +232,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-3.1-flash-image-preview',
|
||||
labelOverride: 'Nano Banana 2',
|
||||
pubDate: '20260226',
|
||||
isPreview: true,
|
||||
chatPrice: gemini31FlashImagePricing,
|
||||
interfaces: IF_30,
|
||||
@@ -242,11 +245,30 @@ const _knownGeminiModels: ({
|
||||
benchmark: undefined, // Non-benchmarkable because generates images
|
||||
},
|
||||
|
||||
// 3.1 Flash-Lite (Stable) - Released May 2026 (graduated from preview)
|
||||
// First Flash-Lite model in the Gemini 3 series - cost-efficient, high-throughput
|
||||
{
|
||||
id: 'models/gemini-3.1-flash-lite',
|
||||
labelOverride: 'Gemini 3.1 Flash-Lite',
|
||||
pubDate: '20260506',
|
||||
chatPrice: gemini31FlashLitePricing,
|
||||
interfaces: IF_30,
|
||||
parameterSpecs: [
|
||||
{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'low', 'medium', 'high'] },
|
||||
{ paramId: 'llmVndGeminiMediaResolution' },
|
||||
{ paramId: 'llmVndGeminiCodeExecution' },
|
||||
{ paramId: 'llmVndGeminiGoogleSearch' },
|
||||
],
|
||||
benchmark: { cbaElo: 1438 }, // same lineage as gemini-3.1-flash-lite-preview
|
||||
},
|
||||
|
||||
// 3.1 Flash-Lite (Preview) - Released March 3, 2026
|
||||
// First Flash-Lite model in the Gemini 3 series - cost-efficient, high-throughput
|
||||
{
|
||||
hidden: true, // superseded by stable gemini-3.1-flash-lite (May 2026)
|
||||
id: 'models/gemini-3.1-flash-lite-preview',
|
||||
labelOverride: 'Gemini 3.1 Flash-Lite Preview',
|
||||
pubDate: '20260303',
|
||||
isPreview: true,
|
||||
chatPrice: gemini31FlashLitePricing,
|
||||
interfaces: IF_30,
|
||||
@@ -268,6 +290,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // March 9, 2026: API silently routes 'gemini-3-pro-preview' to 'gemini-3.1-pro-preview' - hide to prevent user confusion
|
||||
id: 'models/gemini-3-pro-preview',
|
||||
labelOverride: 'Gemini 3 Pro Preview',
|
||||
pubDate: '20251118',
|
||||
isPreview: true,
|
||||
deprecated: '2026-03-09',
|
||||
chatPrice: gemini30ProPricing,
|
||||
@@ -286,6 +309,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-3-pro-image-preview',
|
||||
labelOverride: 'Nano Banana Pro', // Marketing name for the technical model ID
|
||||
pubDate: '20251120',
|
||||
isPreview: true,
|
||||
chatPrice: gemini30ProImagePricing,
|
||||
interfaces: IF_30,
|
||||
@@ -301,6 +325,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/nano-banana-pro-preview',
|
||||
labelOverride: 'Nano Banana Pro',
|
||||
pubDate: '20251120',
|
||||
symLink: 'models/gemini-3-pro-image-preview',
|
||||
// copied from symlink
|
||||
isPreview: true,
|
||||
@@ -320,6 +345,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-3-flash-preview',
|
||||
labelOverride: 'Gemini 3 Flash Preview',
|
||||
pubDate: '20251217',
|
||||
isPreview: true,
|
||||
chatPrice: gemini30FlashPricing,
|
||||
interfaces: IF_30,
|
||||
@@ -340,6 +366,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // outperformed by 3.1 Pro (1493) and even 3 Flash (1474) - deprecated in 2 months
|
||||
id: 'models/gemini-2.5-pro',
|
||||
labelOverride: 'Gemini 2.5 Pro',
|
||||
pubDate: '20250617',
|
||||
deprecated: '2026-06-17',
|
||||
chatPrice: gemini25ProPricing,
|
||||
interfaces: IF_25,
|
||||
@@ -362,6 +389,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // single-turn-only model - unhide and just send a message to make use of this
|
||||
id: 'models/gemini-2.5-pro-preview-tts',
|
||||
pubDate: '20250520',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25ProPreviewTTSPricing,
|
||||
interfaces: [
|
||||
@@ -379,10 +407,11 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/deep-research-preview-04-2026',
|
||||
labelOverride: 'Deep Research Preview (2026-04)',
|
||||
pubDate: '20260421',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25ProPricing, // pricing not explicitly listed; using 2.5 Pro as baseline
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
|
||||
parameterSpecs: [],
|
||||
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }],
|
||||
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
|
||||
// 128K input, 64K output
|
||||
},
|
||||
@@ -391,22 +420,24 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/deep-research-max-preview-04-2026',
|
||||
labelOverride: 'Deep Research Max Preview (2026-04)',
|
||||
pubDate: '20260421',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25ProPricing, // baseline estimate (see note above)
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
|
||||
parameterSpecs: [],
|
||||
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }],
|
||||
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
|
||||
},
|
||||
|
||||
// Deep Research Pro Preview - Released December 12, 2025
|
||||
// Deep Research Pro Preview - Released December 11, 2025
|
||||
{
|
||||
hidden: true, // yield to newer 2026-04 models
|
||||
id: 'models/deep-research-pro-preview-12-2025',
|
||||
labelOverride: 'Deep Research Pro Preview',
|
||||
pubDate: '20251211',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25ProPricing,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
|
||||
parameterSpecs: [{ paramId: 'llmVndGeminiThinkingBudget' }],
|
||||
parameterSpecs: [{ paramId: 'llmVndGeminiAgentViz' }, { paramId: 'llmVndGeminiThinkingBudget' }],
|
||||
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
|
||||
// Note: 128K input context, 64K output context
|
||||
},
|
||||
@@ -418,6 +449,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // outperformed by 3 Flash Preview (1474 vs 1411) - deprecated in 2 months
|
||||
id: 'models/gemini-2.5-flash',
|
||||
labelOverride: 'Gemini 2.5 Flash',
|
||||
pubDate: '20250617',
|
||||
deprecated: '2026-06-17',
|
||||
chatPrice: gemini25FlashPricing,
|
||||
interfaces: IF_25,
|
||||
@@ -445,6 +477,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-2.5-computer-use-preview-10-2025',
|
||||
labelOverride: 'Gemini 2.5 Computer Use Preview 10-2025',
|
||||
pubDate: '20251007',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25ProPricing, // Uses same pricing as 2.5 Pro (pricing page doesn't list separately)
|
||||
// NOTE: sweep shows fn=['auto'] only (no 'roundtrip') - partial Fn capability, do not advertise LLM_IF_OAI_Fn
|
||||
@@ -462,6 +495,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-robotics-er-1.6-preview',
|
||||
labelOverride: 'Gemini Robotics-ER 1.6 Preview',
|
||||
pubDate: '20260414',
|
||||
isPreview: true,
|
||||
chatPrice: geminiRoboticsER16Pricing,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_OAI_Reasoning],
|
||||
@@ -474,6 +508,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // superseded by Robotics-ER 1.6 - shutdown April 30, 2026
|
||||
id: 'models/gemini-robotics-er-1.5-preview',
|
||||
labelOverride: 'Gemini Robotics-ER 1.5 Preview',
|
||||
pubDate: '20250925',
|
||||
isPreview: true,
|
||||
deprecated: '2026-04-30',
|
||||
chatPrice: gemini25FlashPricing, // Uses same pricing as 2.5 Flash per pricing page
|
||||
@@ -486,6 +521,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-2.5-flash-image',
|
||||
labelOverride: 'Nano Banana',
|
||||
pubDate: '20251002',
|
||||
deprecated: '2026-10-02',
|
||||
chatPrice: { input: 0.30, output: undefined }, // Per pricing page: $0.30 text/image input, $0.039 per image output, but the text output is not stated
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -506,6 +542,7 @@ const _knownGeminiModels: ({
|
||||
hidden: true, // audio outputs are unavailable
|
||||
id: 'models/gemini-3.1-flash-tts-preview',
|
||||
labelOverride: 'Gemini 3.1 Flash TTS Preview',
|
||||
pubDate: '20260415',
|
||||
isPreview: true,
|
||||
chatPrice: gemini31FlashTTSPricing,
|
||||
interfaces: [
|
||||
@@ -521,6 +558,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // audio outputs are unavailable as of 2025-05-27
|
||||
id: 'models/gemini-2.5-flash-preview-tts',
|
||||
pubDate: '20250520',
|
||||
isPreview: true,
|
||||
chatPrice: gemini25FlashPreviewTTSPricing,
|
||||
interfaces: [
|
||||
@@ -548,6 +586,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
id: 'models/gemini-2.5-flash-lite',
|
||||
labelOverride: 'Gemini 2.5 Flash-Lite',
|
||||
pubDate: '20250722',
|
||||
deprecated: '2026-07-22',
|
||||
chatPrice: gemini25FlashLitePricing,
|
||||
interfaces: IF_25,
|
||||
@@ -580,6 +619,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
|
||||
id: 'models/gemini-2.0-flash-001',
|
||||
pubDate: '20250205',
|
||||
deprecated: '2026-06-01',
|
||||
chatPrice: gemini20FlashPricing,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_GEM_CodeExecution],
|
||||
@@ -588,6 +628,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
|
||||
id: 'models/gemini-2.0-flash',
|
||||
pubDate: '20250205',
|
||||
symLink: 'models/gemini-2.0-flash-001',
|
||||
deprecated: '2026-06-01',
|
||||
// copied from symlink
|
||||
@@ -600,6 +641,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
|
||||
id: 'models/gemini-2.0-flash-lite',
|
||||
pubDate: '20250225',
|
||||
chatPrice: gemini20FlashLitePricing,
|
||||
symLink: 'models/gemini-2.0-flash-lite-001',
|
||||
deprecated: '2026-06-01',
|
||||
@@ -609,6 +651,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
|
||||
id: 'models/gemini-2.0-flash-lite-001',
|
||||
pubDate: '20250225',
|
||||
chatPrice: gemini20FlashLitePricing,
|
||||
deprecated: '2026-06-01',
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
|
||||
@@ -648,6 +691,7 @@ const _knownGeminiModels: ({
|
||||
// Gemma 4 Models - Released April 2, 2026
|
||||
{
|
||||
id: 'models/gemma-4-31b-it',
|
||||
pubDate: '20260402',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
|
||||
@@ -657,6 +701,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // smaller MoE variant
|
||||
id: 'models/gemma-4-26b-a4b-it',
|
||||
pubDate: '20260402',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
|
||||
@@ -667,6 +712,7 @@ const _knownGeminiModels: ({
|
||||
// Gemma 3n Model (newer than 3, first seen on the May 2025 update)
|
||||
{
|
||||
id: 'models/gemma-3n-e4b-it',
|
||||
pubDate: '20250626',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree, // Free tier only according to pricing page
|
||||
@@ -674,6 +720,7 @@ const _knownGeminiModels: ({
|
||||
},
|
||||
{
|
||||
id: 'models/gemma-3n-e2b-it',
|
||||
pubDate: '20250626',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree, // Free tier only according to pricing page
|
||||
@@ -685,6 +732,7 @@ const _knownGeminiModels: ({
|
||||
// - LLM_IF_HOTFIX_Sys0ToUsr0, because: "Developer instruction is not enabled for models/gemma-3-27b-it"
|
||||
{
|
||||
id: 'models/gemma-3-27b-it',
|
||||
pubDate: '20250312',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree, // Pricing page indicates free tier only
|
||||
@@ -694,6 +742,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // keep larger model
|
||||
id: 'models/gemma-3-12b-it',
|
||||
pubDate: '20250312',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree,
|
||||
@@ -702,6 +751,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // keep larger model
|
||||
id: 'models/gemma-3-4b-it',
|
||||
pubDate: '20250312',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree,
|
||||
@@ -710,6 +760,7 @@ const _knownGeminiModels: ({
|
||||
{
|
||||
hidden: true, // keep larger model
|
||||
id: 'models/gemma-3-1b-it',
|
||||
pubDate: '20250312',
|
||||
isPreview: true,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
|
||||
chatPrice: geminiExpFree,
|
||||
@@ -781,6 +832,7 @@ const _sortOderIdPrefix: string[] = [
|
||||
'models/gemini-3.1-pro-preview-customtools',
|
||||
'models/gemini-3.1-flash-image-preview',
|
||||
'models/gemini-3.1-flash-preview',
|
||||
'models/gemini-3.1-flash-lite',
|
||||
'models/gemini-3.1-flash-lite-preview',
|
||||
'models/gemini-3.1-flash-tts-preview',
|
||||
'models/gemini-3.1-',
|
||||
@@ -948,6 +1000,7 @@ export function geminiModelToModelDescription(geminiModel: GeminiWire_API_Models
|
||||
label: label,
|
||||
// created: ...
|
||||
// updated: ...
|
||||
pubDate: knownModel?.pubDate ?? formatPubDate(), // 0-day fallback; the editorial entry is the source of truth; today's date is a placeholder until editorial catches up
|
||||
description: descriptionLong,
|
||||
contextWindow: contextWindow,
|
||||
maxCompletionTokens: outputTokenLimit,
|
||||
@@ -1035,5 +1088,5 @@ export function llmOrtGemLookup(orModelName: string): OrtVendorLookupResult | un
|
||||
?.filter(spec => _ORT_GEM_PARAM_ALLOWLIST.has(spec.paramId))
|
||||
.map(spec => ({ ...spec }));
|
||||
|
||||
return { interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
|
||||
return { pubDate: knownModel.pubDate, interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
|
||||
}
|
||||
|
||||
@@ -94,6 +94,7 @@ const ModelParameterSpec_schema = z.object({
|
||||
// Bedrock
|
||||
'llmVndBedrockAPI',
|
||||
// Gemini
|
||||
'llmVndGeminiAgentViz',
|
||||
'llmVndGeminiAspectRatio',
|
||||
'llmVndGeminiCodeExecution',
|
||||
'llmVndGeminiComputerUse',
|
||||
@@ -137,6 +138,7 @@ export const ModelDescription_schema = z.object({
|
||||
label: z.string(),
|
||||
created: z.int().optional(),
|
||||
updated: z.int().optional(),
|
||||
pubDate: z.string().regex(/^\d{8}$/).optional(), // editorial: model's official public release date 'YYYYMMDD'. Required for editorial entries (KnownModelEditorial) and for 0-day-fillable paths (Anthropic placeholder, Gemini unknown-model fallback). Omitted for dynamic-only vendors and unknown variants where we have no reliable signal.
|
||||
description: z.string(),
|
||||
contextWindow: z.int().nullable(),
|
||||
interfaces: z.array(z.enum(LLMS_ALL_INTERFACES).or(z.string())), // backward compatibility: to not Break client-side interface parsing on newer server
|
||||
@@ -155,6 +157,7 @@ export const ModelDescription_schema = z.object({
|
||||
// Each vendor's lookup filters to only what works through OpenRouter's OAI-compatible API.
|
||||
// OpenRouter merges these with its own auto-detected interfaces and params.
|
||||
export type OrtVendorLookupResult = {
|
||||
pubDate?: ModelDescriptionSchema['pubDate'];
|
||||
interfaces?: ModelDescriptionSchema['interfaces'];
|
||||
parameterSpecs?: ModelDescriptionSchema['parameterSpecs'];
|
||||
initialTemperature?: number; // vendor-specific default (e.g. Gemini 1.0); undefined = use global fallback (0.5)
|
||||
|
||||
@@ -111,6 +111,28 @@ export function llmDevValidateParameterSpecs_DEV(model: ModelDescriptionSchema):
|
||||
}
|
||||
|
||||
|
||||
// -- pubDate helpers --
|
||||
|
||||
/**
|
||||
* Format an epoch / Date / nothing as 'YYYYMMDD'.
|
||||
* Accepts either a Unix epoch (seconds), a Date, or undefined (-> today).
|
||||
*/
|
||||
export function formatPubDate(input?: number | Date): string {
|
||||
let date: Date;
|
||||
if (input instanceof Date && Number.isFinite(input.getTime()))
|
||||
date = input;
|
||||
else if (typeof input === 'number' && Number.isFinite(input) && input > 0) {
|
||||
const candidate = new Date(input * 1000);
|
||||
date = Number.isFinite(candidate.getTime()) ? candidate : new Date();
|
||||
} else
|
||||
date = new Date();
|
||||
const y = date.getUTCFullYear();
|
||||
const m = String(date.getUTCMonth() + 1).padStart(2, '0');
|
||||
const d = String(date.getUTCDate()).padStart(2, '0');
|
||||
return `${y}${m}${d}`;
|
||||
}
|
||||
|
||||
|
||||
// -- Manual model mappings: types and helper --
|
||||
|
||||
export type ManualMappings = (KnownModel | KnownLink)[];
|
||||
@@ -224,6 +246,7 @@ export function fromManualMapping(mappings: (KnownModel | KnownLink)[], upstream
|
||||
};
|
||||
|
||||
// apply optional fields
|
||||
if (m.pubDate) md.pubDate = m.pubDate;
|
||||
if (m.parameterSpecs) md.parameterSpecs = m.parameterSpecs;
|
||||
if (m.maxCompletionTokens) md.maxCompletionTokens = m.maxCompletionTokens;
|
||||
if (m.benchmark) md.benchmark = m.benchmark;
|
||||
|
||||
@@ -20,6 +20,7 @@ const _knownDeepseekChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'deepseek-v4-pro',
|
||||
label: 'DeepSeek V4 Pro',
|
||||
pubDate: '20260424',
|
||||
description: 'Premium reasoning model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
|
||||
contextWindow: 1_048_576, // 1M
|
||||
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
|
||||
@@ -33,6 +34,7 @@ const _knownDeepseekChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'deepseek-v4-flash',
|
||||
label: 'DeepSeek V4 Flash',
|
||||
pubDate: '20260424',
|
||||
description: 'Fast general-purpose model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
|
||||
contextWindow: 1_048_576, // 1M
|
||||
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
|
||||
|
||||
@@ -23,6 +23,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
isPreview: true,
|
||||
idPrefix: 'meta-llama/llama-4-scout-17b-16e-instruct',
|
||||
label: 'Llama 4 Scout · 17B × 16E (Preview)',
|
||||
pubDate: '20250405',
|
||||
description: 'Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 8192,
|
||||
@@ -33,6 +34,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
isPreview: true,
|
||||
idPrefix: 'qwen/qwen3-32b',
|
||||
label: 'Qwen 3 · 32B (Preview)',
|
||||
pubDate: '20250428',
|
||||
description: 'Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 40960,
|
||||
@@ -43,6 +45,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
isPreview: true,
|
||||
idPrefix: 'moonshotai/kimi-k2-instruct-0905',
|
||||
label: 'Kimi K2 Instruct 0905 (Preview)',
|
||||
pubDate: '20250905',
|
||||
description: 'Kimi K2 1T MoE model (32B active, 384 experts). Advanced agentic coding. 262K context, 16K max output. ~200 t/s on Groq.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -53,6 +56,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshotai/kimi-k2-instruct',
|
||||
label: 'Kimi K2 Instruct (Deprecated)',
|
||||
pubDate: '20250711',
|
||||
symLink: 'moonshotai/kimi-k2-instruct-0905',
|
||||
contextWindow: 131072, // API returns 131K (vs 262K for the 0905 version)
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -69,6 +73,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'groq/compound',
|
||||
label: 'Compound (Agentic System)',
|
||||
pubDate: '20250904',
|
||||
description: 'Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 8192,
|
||||
@@ -78,6 +83,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'groq/compound-mini',
|
||||
label: 'Compound Mini (Agentic System)',
|
||||
pubDate: '20250904',
|
||||
description: 'Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 8192,
|
||||
@@ -89,6 +95,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'openai/gpt-oss-120b',
|
||||
label: 'GPT OSS 120B',
|
||||
pubDate: '20250805',
|
||||
description: 'OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -99,6 +106,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
isPreview: true,
|
||||
idPrefix: 'openai/gpt-oss-safeguard-20b',
|
||||
label: 'GPT OSS Safeguard 20B (Preview)',
|
||||
pubDate: '20251029',
|
||||
description: 'OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -108,6 +116,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'openai/gpt-oss-20b',
|
||||
label: 'GPT OSS 20B',
|
||||
pubDate: '20250805',
|
||||
description: 'OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -120,6 +129,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'llama-3.3-70b-versatile',
|
||||
label: 'Llama 3.3 · 70B Versatile',
|
||||
pubDate: '20241206',
|
||||
description: 'Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -129,6 +139,7 @@ const _knownGroqModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'llama-3.1-8b-instant',
|
||||
label: 'Llama 3.1 · 8B Instant',
|
||||
pubDate: '20240723',
|
||||
description: 'Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 131072,
|
||||
|
||||
@@ -22,6 +22,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.7',
|
||||
label: 'MiniMax M2.7',
|
||||
pubDate: '20260318',
|
||||
description: 'Latest flagship with recursive self-improvement and agentic capabilities. 200K context, 131K max output. ~60 t/s.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 131072,
|
||||
@@ -31,6 +32,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.7-highspeed',
|
||||
label: 'MiniMax M2.7 (Highspeed)',
|
||||
pubDate: '20260318',
|
||||
description: 'Faster M2.7 variant at ~100 t/s. 200K context, 131K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 131072,
|
||||
@@ -42,6 +44,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.5',
|
||||
label: 'MiniMax M2.5',
|
||||
pubDate: '20260212',
|
||||
description: 'Strong coding and reasoning, best value. 200K context, 65K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -51,6 +54,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.5-highspeed',
|
||||
label: 'MiniMax M2.5 (Highspeed)',
|
||||
pubDate: '20260212',
|
||||
description: 'Faster M2.5 variant at ~100 t/s. 200K context, 65K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -62,6 +66,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2-her',
|
||||
label: 'MiniMax M2-her',
|
||||
pubDate: '20260127',
|
||||
description: 'Dialogue-first model for immersive roleplay, character-driven chat, and expressive multi-turn conversations. 64K context.',
|
||||
contextWindow: 65536,
|
||||
maxCompletionTokens: 2048,
|
||||
@@ -73,6 +78,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.1',
|
||||
label: 'MiniMax M2.1',
|
||||
pubDate: '20251223',
|
||||
description: '230B params (10B active), multilingual coding. 200K context, 65K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -83,6 +89,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2.1-highspeed',
|
||||
label: 'MiniMax M2.1 (Highspeed)',
|
||||
pubDate: '20251223',
|
||||
description: 'Faster M2.1 variant. 200K context, 65K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -95,6 +102,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M2',
|
||||
label: 'MiniMax M2',
|
||||
pubDate: '20251027',
|
||||
description: '230B params (10B active), agentic and reasoning. 200K context, 128K max output.',
|
||||
contextWindow: 204800,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -107,6 +115,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-M1',
|
||||
label: 'MiniMax M1',
|
||||
pubDate: '20250616',
|
||||
description: '456B total / 45.9B active MoE with lightning attention. 1M context, 40K max output.',
|
||||
contextWindow: 1000000,
|
||||
maxCompletionTokens: 40000,
|
||||
@@ -119,6 +128,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'MiniMax-01',
|
||||
label: 'MiniMax 01',
|
||||
pubDate: '20250114',
|
||||
description: 'Legacy flagship. 1M context.',
|
||||
contextWindow: 1000192,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
|
||||
|
||||
@@ -19,80 +19,81 @@ const DEV_DEBUG_MISTRAL_MODELS = Release.IsNodeDevBuild; // not in staging to re
|
||||
|
||||
const _knownMistralModelDetails: Record<string, {
|
||||
label?: string; // override the API-provided name
|
||||
pubDate?: string; // YYYYMMDD - earliest public availability (announcement / La Plateforme / HF upload)
|
||||
chatPrice?: { input: number; output: number };
|
||||
benchmark?: { cbaElo: number };
|
||||
hidden?: boolean;
|
||||
}> = {
|
||||
|
||||
// Premier models - Mistral 3 (Dec 2025)
|
||||
'mistral-large-2512': { chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
|
||||
'mistral-large-2411': { chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
|
||||
'mistral-large-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
|
||||
'mistral-large-2512': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
|
||||
'mistral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
|
||||
'mistral-large-latest': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
|
||||
|
||||
'mistral-medium-2508': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3
|
||||
'mistral-medium-2505': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // older version
|
||||
'mistral-medium-latest': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
|
||||
'mistral-medium': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
|
||||
'mistral-medium-2508': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3.1
|
||||
'mistral-medium-2505': { pubDate: '20250507', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // Mistral Medium 3
|
||||
'mistral-medium-latest': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
|
||||
'mistral-medium': { pubDate: '20231211', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink (legacy: original Mistral Medium prototype on La Plateforme beta)
|
||||
|
||||
'magistral-medium-2509': { chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
|
||||
'magistral-medium-latest': { chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
|
||||
'magistral-medium-2509': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
|
||||
'magistral-medium-latest': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
|
||||
|
||||
'devstral-2512': { label: 'Devstral 2 (2512)', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
|
||||
'devstral-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
|
||||
'devstral-medium-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
|
||||
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
|
||||
'devstral-medium-2507': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
|
||||
'devstral-2512': { label: 'Devstral 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
|
||||
'devstral-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
|
||||
'devstral-medium-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
|
||||
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
|
||||
'devstral-medium-2507': { pubDate: '20250710', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
|
||||
|
||||
'mistral-large-pixtral-2411': { chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
|
||||
'pixtral-large-2411': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
|
||||
'pixtral-large-latest': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
|
||||
'mistral-large-pixtral-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
|
||||
'pixtral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
|
||||
'pixtral-large-latest': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
|
||||
|
||||
'codestral-2508': { chatPrice: { input: 0.3, output: 0.9 } }, // code generation
|
||||
'codestral-latest': { chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
|
||||
'codestral-2508': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 } }, // code generation (Codestral 25.08)
|
||||
'codestral-latest': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
|
||||
|
||||
'voxtral-small-2507': { chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
|
||||
'voxtral-small-latest': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
|
||||
'voxtral-small-2507': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
|
||||
'voxtral-small-latest': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
|
||||
|
||||
'voxtral-mini-2507': { chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
|
||||
'voxtral-mini-latest': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
|
||||
'voxtral-mini-2507': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
|
||||
'voxtral-mini-latest': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
|
||||
|
||||
// Ministral 3 family (Dec 2025) - multimodal, multilingual, Apache 2.0
|
||||
'ministral-14b-2512': { chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
|
||||
'ministral-14b-latest': { chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
|
||||
'ministral-14b-2512': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
|
||||
'ministral-14b-latest': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
|
||||
|
||||
'ministral-8b-2512': { chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
|
||||
'ministral-8b-2410': { chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
|
||||
'ministral-8b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
'ministral-8b-2512': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
|
||||
'ministral-8b-2410': { pubDate: '20241016', chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
|
||||
'ministral-8b-latest': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
|
||||
'ministral-3b-2512': { chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
|
||||
'ministral-3b-2410': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
|
||||
'ministral-3b-latest': { chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
|
||||
'ministral-3b-2512': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
|
||||
'ministral-3b-2410': { pubDate: '20241016', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
|
||||
'ministral-3b-latest': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
|
||||
|
||||
// Open models
|
||||
'mistral-small-2603': { chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
|
||||
'mistral-small-2506': { chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
|
||||
'mistral-small-latest': { chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
|
||||
'mistral-small-2603': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
|
||||
'mistral-small-2506': { pubDate: '20250620', chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
|
||||
'mistral-small-latest': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
|
||||
|
||||
'labs-mistral-small-creative': { label: 'Mistral Small Creative', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
|
||||
'labs-mistral-small-creative': { label: 'Mistral Small Creative', pubDate: '20251211', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
|
||||
|
||||
'labs-leanstral-2603': { label: 'Leanstral (2603)', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
|
||||
'labs-leanstral-2603': { label: 'Leanstral (2603)', pubDate: '20260316', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
|
||||
|
||||
'magistral-small-2509': { chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
|
||||
'magistral-small-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
|
||||
'magistral-small-2509': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
|
||||
'magistral-small-latest': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
|
||||
|
||||
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
|
||||
'devstral-small-2507': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version
|
||||
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
|
||||
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
|
||||
'devstral-small-2507': { pubDate: '20250710', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version (Devstral Small 1.1)
|
||||
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
|
||||
|
||||
'pixtral-12b-2409': { chatPrice: { input: 0.15, output: 0.15 } }, // vision
|
||||
'pixtral-12b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
'pixtral-12b': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
'pixtral-12b-2409': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 } }, // vision
|
||||
'pixtral-12b-latest': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
'pixtral-12b': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
|
||||
'open-mistral-nemo-2407': { chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
|
||||
'open-mistral-nemo': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
'open-mistral-nemo-2407': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
|
||||
'open-mistral-nemo': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
|
||||
|
||||
// Legacy (kept for reference, no longer in API)
|
||||
'open-mistral-7b': { chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
|
||||
'open-mistral-7b': { pubDate: '20230927', chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
|
||||
};
|
||||
|
||||
|
||||
|
||||
@@ -28,7 +28,8 @@ const _PS_Reasoning: ModelDescriptionSchema['parameterSpecs'] = [
|
||||
* Moonshot AI (Kimi) models.
|
||||
* - models list and pricing: https://platform.kimi.ai/docs/pricing/chat (was platform.moonshot.ai - now 301 redirect)
|
||||
* - API docs: https://platform.kimi.ai/docs/api/chat
|
||||
* - updated: 2026-04-20
|
||||
* - updated: 2026-05-04
|
||||
* - NOTE: K2 series (non-2.5/2.6) is scheduled for discontinuation on 2026-05-25 per Moonshot docs.
|
||||
*/
|
||||
const _knownMoonshotModels: ManualMappings = [
|
||||
|
||||
@@ -36,6 +37,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'kimi-k2.6',
|
||||
label: 'Kimi K2.6',
|
||||
pubDate: '20260420',
|
||||
description: 'Native multimodal flagship (text, image, video inputs) with thinking and non-thinking modes. Stronger long-form coding, improved instruction compliance and self-correction. 256K context.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -49,6 +51,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'kimi-k2.5',
|
||||
label: 'Kimi K2.5',
|
||||
pubDate: '20260127',
|
||||
description: 'Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -58,12 +61,13 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
benchmark: { cbaElo: 1451 }, // kimi-k2.5-thinking
|
||||
},
|
||||
|
||||
// Kimi K2 Series - Latest Models
|
||||
// Kimi K2 Series - scheduled for discontinuation on 2026-05-25
|
||||
|
||||
// Fast, Thinking
|
||||
{
|
||||
idPrefix: 'kimi-k2-thinking-turbo',
|
||||
label: 'Kimi K2 Thinking Turbo',
|
||||
pubDate: '20251106',
|
||||
description: 'High-speed reasoning model with advanced thinking and tool calling capabilities. Faster inference (~50 tok/s) with optimized performance. 256K context. Temperature 1.0 recommended.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -76,6 +80,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'kimi-k2-thinking',
|
||||
label: 'Kimi K2 Thinking',
|
||||
pubDate: '20251106',
|
||||
description: 'Advanced reasoning model with multi-step thinking and autonomous tool calling (200-300 sequential calls). Interleaves chain-of-thought with tool use. 256K context. Temperature 1.0 recommended.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 65536,
|
||||
@@ -89,6 +94,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'kimi-k2-0905-preview',
|
||||
label: 'Kimi K2 0905 (Preview)',
|
||||
pubDate: '20250905',
|
||||
description: 'State-of-the-art MoE model (1T total, 32B active) with extended 256K context. Enhanced agentic coding intelligence and improved instruction following.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -102,6 +108,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
hidden: true,
|
||||
idPrefix: 'kimi-k2-0711-preview',
|
||||
label: 'Kimi K2 0711 (Preview)',
|
||||
pubDate: '20250711',
|
||||
description: 'Earlier preview variant with 128K context. Superseded by 0905 version.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -114,6 +121,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'kimi-k2-turbo-preview',
|
||||
label: 'Kimi K2 Turbo (Preview)',
|
||||
pubDate: '20250801',
|
||||
description: 'High-speed variant with 60-100 tokens/second output. 256K context. Optimized for real-time applications and agentic tasks.',
|
||||
contextWindow: 262144,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -127,6 +135,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshot-v1-128k',
|
||||
label: 'V1 128K',
|
||||
pubDate: '20240206',
|
||||
description: 'Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
|
||||
@@ -136,6 +145,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshot-v1-32k',
|
||||
label: 'V1 32K',
|
||||
pubDate: '20240206',
|
||||
description: 'Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
|
||||
@@ -145,6 +155,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshot-v1-8k',
|
||||
label: 'V1 8K',
|
||||
pubDate: '20240206',
|
||||
description: 'Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.',
|
||||
contextWindow: 8192,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
|
||||
@@ -157,6 +168,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
// hidden: false, not hidden - only non-hidden vision for now
|
||||
idPrefix: 'moonshot-v1-128k-vision-preview',
|
||||
label: 'V1 128K Vision (Preview)',
|
||||
pubDate: '20250115',
|
||||
description: 'Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision for production.',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -166,6 +178,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshot-v1-32k-vision-preview',
|
||||
label: 'V1 32K Vision (Preview)',
|
||||
pubDate: '20250115',
|
||||
description: 'Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision for production.',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -176,6 +189,7 @@ const _knownMoonshotModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'moonshot-v1-8k-vision-preview',
|
||||
label: 'V1 8K Vision (Preview)',
|
||||
pubDate: '20250115',
|
||||
description: 'Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision for production.',
|
||||
contextWindow: 8192,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
|
||||
@@ -111,6 +111,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.5-2026-04-23',
|
||||
label: 'GPT-5.5 (2026-04-23)',
|
||||
pubDate: '20260423',
|
||||
description: 'New baseline for complex production workflows. Stronger task execution, more precise tool use, more efficient reasoning with fewer tokens. 1M token context.',
|
||||
contextWindow: 1050000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -136,6 +137,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.5-pro-2026-04-23',
|
||||
label: 'GPT-5.5 Pro (2026-04-23)',
|
||||
pubDate: '20260423',
|
||||
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on the hardest problems.',
|
||||
contextWindow: 1050000,
|
||||
maxCompletionTokens: 272000,
|
||||
@@ -163,6 +165,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.4-2026-03-05',
|
||||
label: 'GPT-5.4 (2026-03-05)',
|
||||
pubDate: '20260305',
|
||||
description: 'Most capable and efficient frontier model for professional work. Native computer use, improved reasoning, coding, and agentic workflows with 1M token context.',
|
||||
contextWindow: 1050000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -188,6 +191,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.4-pro-2026-03-05',
|
||||
label: 'GPT-5.4 Pro (2026-03-05)',
|
||||
pubDate: '20260305',
|
||||
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on difficult problems.',
|
||||
contextWindow: 1050000,
|
||||
maxCompletionTokens: 272000,
|
||||
@@ -212,6 +216,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.4-mini-2026-03-17',
|
||||
label: 'GPT-5.4 Mini (2026-03-17)',
|
||||
pubDate: '20260317',
|
||||
description: 'Strongest mini model for coding, computer use, and subagents. GPT-5.4-class intelligence at lower cost and latency.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -237,6 +242,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.4-nano-2026-03-17',
|
||||
label: 'GPT-5.4 Nano (2026-03-17)',
|
||||
pubDate: '20260317',
|
||||
description: 'Cheapest GPT-5.4-class model for simple high-volume tasks like classification and data extraction.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -265,6 +271,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-5.3-codex',
|
||||
label: 'GPT-5.3 Codex',
|
||||
pubDate: '20260205',
|
||||
description: 'Most capable agentic coding model. Combines frontier coding performance of GPT-5.2-Codex with reasoning and professional knowledge of GPT-5.2. ~25% faster.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -285,6 +292,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // Research preview, ChatGPT Pro only - API access limited to design partners
|
||||
idPrefix: 'gpt-5.3-codex-spark',
|
||||
label: 'GPT-5.3 Codex Spark',
|
||||
pubDate: '20260212',
|
||||
description: 'Text-only research preview optimized for real-time coding iteration. Delivers 1000+ tokens/sec on low-latency hardware.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -297,10 +305,11 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
// benchmark: TBD
|
||||
},
|
||||
|
||||
// GPT-5.3 Chat Latest - Released March 4, 2026
|
||||
// GPT-5.3 Chat Latest - Released March 3, 2026
|
||||
{
|
||||
idPrefix: 'gpt-5.3-chat-latest',
|
||||
label: 'GPT-5.3 Instant',
|
||||
pubDate: '20260303',
|
||||
description: 'GPT-5.3 model powering ChatGPT. Points to the GPT-5.3 Instant snapshot currently used in ChatGPT.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -322,6 +331,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4/5.5
|
||||
idPrefix: 'gpt-5.2-2025-12-11',
|
||||
label: 'GPT-5.2 (2025-12-11)',
|
||||
pubDate: '20251211',
|
||||
description: 'Most capable model for professional work and long-running agents. Improvements in general intelligence, long-context, agentic tool-calling, and vision.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -349,6 +359,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Codex
|
||||
idPrefix: 'gpt-5.2-codex',
|
||||
label: 'GPT-5.2 Codex',
|
||||
pubDate: '20251211',
|
||||
description: 'GPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar environments. Supports low, medium, high, and xhigh reasoning effort settings.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -368,6 +379,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Instant
|
||||
idPrefix: 'gpt-5.2-chat-latest',
|
||||
label: 'GPT-5.2 Instant',
|
||||
pubDate: '20251211',
|
||||
description: 'GPT-5.2 model powering ChatGPT. Fast, capable for everyday work with clear improvements in info-seeking, how-tos, technical writing.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -387,6 +399,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4/5.5 Pro
|
||||
idPrefix: 'gpt-5.2-pro-2025-12-11',
|
||||
label: 'GPT-5.2 Pro (2025-12-11)',
|
||||
pubDate: '20251211',
|
||||
description: 'Smartest and most trustworthy option for difficult questions. Uses more compute for harder thinking on complex domains like programming.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 272000,
|
||||
@@ -416,6 +429,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4/5.5
|
||||
idPrefix: 'gpt-5.1-2025-11-13',
|
||||
label: 'GPT-5.1 (2025-11-13)',
|
||||
pubDate: '20251113',
|
||||
description: 'The best model for coding and agentic tasks with configurable reasoning effort.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -442,6 +456,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Instant
|
||||
idPrefix: 'gpt-5.1-chat-latest',
|
||||
label: 'GPT-5.1 Instant',
|
||||
pubDate: '20251112',
|
||||
description: 'GPT-5.1 Instant with adaptive reasoning. More conversational with improved instruction following.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -462,6 +477,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Codex
|
||||
idPrefix: 'gpt-5.1-codex-max',
|
||||
label: 'GPT-5.1 Codex Max',
|
||||
pubDate: '20251119',
|
||||
description: 'Our most intelligent coding model optimized for long-horizon, agentic coding tasks.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -480,6 +496,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Codex
|
||||
idPrefix: 'gpt-5.1-codex',
|
||||
label: 'GPT-5.1 Codex',
|
||||
pubDate: '20251113',
|
||||
description: 'A version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -498,6 +515,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.3 Codex
|
||||
idPrefix: 'gpt-5.1-codex-mini',
|
||||
label: 'GPT-5.1 Codex Mini',
|
||||
pubDate: '20251113',
|
||||
description: 'Smaller, faster version of GPT-5.1 Codex for efficient coding tasks.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -520,6 +538,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4/5.5
|
||||
idPrefix: 'gpt-5-2025-08-07',
|
||||
label: 'GPT-5 (2025-08-07)',
|
||||
pubDate: '20250807',
|
||||
description: 'The best model for coding and agentic tasks across domains.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -546,6 +565,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4/5.5 Pro
|
||||
idPrefix: 'gpt-5-pro-2025-10-06',
|
||||
label: 'GPT-5 Pro (2025-10-06)',
|
||||
pubDate: '20251006',
|
||||
description: 'Version of GPT-5 that uses more compute to produce smarter and more precise responses. Designed for tough problems.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 272000,
|
||||
@@ -566,6 +586,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // deprecated per OpenAI docs (2026-04)
|
||||
idPrefix: 'gpt-5-chat-latest',
|
||||
label: 'GPT-5 ChatGPT (Non-Thinking)',
|
||||
pubDate: '20250807',
|
||||
description: 'GPT-5 model used in ChatGPT. Points to the GPT-5 snapshot currently used in ChatGPT.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -580,6 +601,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // deprecated per OpenAI docs (2026-04), superseded by gpt-5.1-codex/gpt-5.3-codex
|
||||
idPrefix: 'gpt-5-codex',
|
||||
label: 'GPT-5 Codex',
|
||||
pubDate: '20250915',
|
||||
description: 'A version of GPT-5 optimized for agentic coding in Codex.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -599,6 +621,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // poor quality - use llmVndOaiWebSearchContext on regular models instead
|
||||
idPrefix: 'gpt-5-search-api-2025-10-14',
|
||||
label: 'GPT-5 Search API (2025-10-14)',
|
||||
pubDate: '20251014',
|
||||
description: 'Updated web search model in Chat Completions API. 60% cheaper with domain filtering support.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 100000,
|
||||
@@ -619,6 +642,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4 Mini
|
||||
idPrefix: 'gpt-5-mini-2025-08-07',
|
||||
label: 'GPT-5 Mini (2025-08-07)',
|
||||
pubDate: '20250807',
|
||||
description: 'A faster, more cost-efficient version of GPT-5 for well-defined tasks.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -639,6 +663,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT-5.4 Nano
|
||||
idPrefix: 'gpt-5-nano-2025-08-07',
|
||||
label: 'GPT-5 Nano (2025-08-07)',
|
||||
pubDate: '20250807',
|
||||
description: 'Fastest, most cost-efficient version of GPT-5 for summarization and classification tasks.',
|
||||
contextWindow: 400000,
|
||||
maxCompletionTokens: 128000,
|
||||
@@ -679,6 +704,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // UNSUPPORTED YET
|
||||
idPrefix: 'computer-use-preview-2025-03-11',
|
||||
label: 'Computer Use Preview (2025-03-11)',
|
||||
pubDate: '20250311',
|
||||
description: 'Specialized model for computer use tool. Optimized for computer interaction capabilities.',
|
||||
contextWindow: 8192,
|
||||
maxCompletionTokens: 1024,
|
||||
@@ -700,6 +726,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o4-mini-deep-research-2025-06-26',
|
||||
label: 'o4 Mini Deep Research [Deprecated]',
|
||||
pubDate: '20250626',
|
||||
isLegacy: true,
|
||||
description: 'Faster, more affordable deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 with web search.]',
|
||||
contextWindow: 200000,
|
||||
@@ -718,6 +745,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o4-mini-2025-04-16',
|
||||
label: 'o4 Mini [Deprecated]',
|
||||
pubDate: '20250416',
|
||||
isLegacy: true,
|
||||
description: 'Latest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
|
||||
contextWindow: 200000,
|
||||
@@ -737,6 +765,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o3-deep-research-2025-06-26',
|
||||
label: 'o3 Deep Research [Deprecated]',
|
||||
pubDate: '20250626',
|
||||
isLegacy: true,
|
||||
description: 'Our most powerful deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 Pro with web search.]',
|
||||
contextWindow: 200000,
|
||||
@@ -755,6 +784,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o3-pro-2025-06-10',
|
||||
label: 'o3 Pro (2025-06-10)',
|
||||
pubDate: '20250610',
|
||||
description: 'Version of o3 with more compute for better responses. Provides consistently better answers for complex tasks.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 100000,
|
||||
@@ -773,6 +803,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o3-2025-04-16',
|
||||
label: 'o3 (2025-04-16)',
|
||||
pubDate: '20250416',
|
||||
description: 'A well-rounded and powerful model across domains. Sets a new standard for math, science, coding, and visual reasoning tasks.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 100000,
|
||||
@@ -791,6 +822,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o3-mini-2025-01-31',
|
||||
label: 'o3 Mini [Deprecated]',
|
||||
pubDate: '20250131',
|
||||
isLegacy: true,
|
||||
description: 'Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
|
||||
contextWindow: 200000,
|
||||
@@ -811,6 +843,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true,
|
||||
idPrefix: 'o1-pro-2025-03-19',
|
||||
label: 'o1 Pro (2025-03-19)',
|
||||
pubDate: '20250319',
|
||||
description: 'A version of o1 with more compute for better responses. Provides consistently better answers for complex tasks.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 100000,
|
||||
@@ -829,6 +862,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'o1-2024-12-17',
|
||||
label: 'o1 [Deprecated]',
|
||||
pubDate: '20241217',
|
||||
isLegacy: true,
|
||||
description: 'Previous full o-series reasoning model. [Shutdown: 2026-10-23 - migrate to GPT-5.5 or o3.]',
|
||||
contextWindow: 200000,
|
||||
@@ -851,6 +885,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4.1-2025-04-14',
|
||||
label: 'GPT-4.1 (2025-04-14)',
|
||||
pubDate: '20250414',
|
||||
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
|
||||
contextWindow: 1047576,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -868,6 +903,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4.1-mini-2025-04-14',
|
||||
label: 'GPT-4.1 Mini (2025-04-14)',
|
||||
pubDate: '20250414',
|
||||
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency by nearly half and cost by 83%.',
|
||||
contextWindow: 1047576,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -885,6 +921,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4.1-nano-2025-04-14',
|
||||
label: 'GPT-4.1 Nano [Deprecated]',
|
||||
pubDate: '20250414',
|
||||
isLegacy: true,
|
||||
description: 'Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Nano.]',
|
||||
contextWindow: 1047576,
|
||||
@@ -906,6 +943,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-audio-1.5',
|
||||
label: 'GPT Audio 1.5',
|
||||
pubDate: '20260224',
|
||||
description: 'Best voice model for audio in, audio out with Chat Completions. Accepts audio inputs and outputs.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -919,6 +957,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // superseded by GPT Audio 1.5
|
||||
idPrefix: 'gpt-audio-2025-08-28',
|
||||
label: 'GPT Audio (2025-08-28)',
|
||||
pubDate: '20250828',
|
||||
description: 'First generally available audio model. Accepts audio inputs and outputs, and can be used in the Chat Completions REST API.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -935,6 +974,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-audio-mini-2025-12-15',
|
||||
label: 'GPT Audio Mini (2025-12-15)',
|
||||
pubDate: '20251215',
|
||||
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -944,6 +984,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-audio-mini-2025-10-06',
|
||||
label: 'GPT Audio Mini (2025-10-06)',
|
||||
pubDate: '20251006',
|
||||
hidden: true, // previous version
|
||||
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
|
||||
contextWindow: 128000,
|
||||
@@ -966,6 +1007,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4o-2024-11-20',
|
||||
label: 'GPT-4o (2024-11-20)',
|
||||
pubDate: '20241120',
|
||||
description: 'Snapshot of gpt-4o from November 20th, 2024.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -976,6 +1018,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4o-2024-08-06',
|
||||
label: 'GPT-4o (2024-08-06)',
|
||||
pubDate: '20240806',
|
||||
hidden: true, // previous version
|
||||
description: 'Snapshot that supports Structured Outputs. gpt-4o currently points to this version.',
|
||||
contextWindow: 128000,
|
||||
@@ -987,6 +1030,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4o-2024-05-13',
|
||||
label: 'GPT-4o (2024-05-13)',
|
||||
pubDate: '20240513',
|
||||
hidden: true, // previous version
|
||||
description: 'Original gpt-4o snapshot from May 13, 2024.',
|
||||
contextWindow: 128000,
|
||||
@@ -1007,6 +1051,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // old
|
||||
idPrefix: 'gpt-4o-search-preview-2025-03-11',
|
||||
label: 'GPT-4o Search Preview (2025-03-11)',
|
||||
pubDate: '20250311',
|
||||
description: 'Latest snapshot of the GPT-4o model optimized for web search capabilities.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1027,6 +1072,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // old
|
||||
idPrefix: 'gpt-4o-audio-preview-2025-06-03',
|
||||
label: 'GPT-4o Audio Preview (2025-06-03)',
|
||||
pubDate: '20250603',
|
||||
description: 'Latest snapshot for the Audio API model.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1039,6 +1085,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // old
|
||||
idPrefix: 'gpt-4o-audio-preview-2024-12-17',
|
||||
label: 'GPT-4o Audio Preview (2024-12-17)',
|
||||
pubDate: '20241217',
|
||||
description: 'Snapshot for the Audio API model.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1057,6 +1104,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4o-mini-2024-07-18',
|
||||
label: 'GPT-4o Mini (2024-07-18)',
|
||||
pubDate: '20240718',
|
||||
description: 'Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more capable than GPT-3.5 Turbo.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1073,6 +1121,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // UNSUPPORTED yet (audio output model)
|
||||
idPrefix: 'gpt-4o-mini-audio-preview-2024-12-17',
|
||||
label: 'GPT-4o Mini Audio Preview (2024-12-17)',
|
||||
pubDate: '20241217',
|
||||
description: 'Snapshot for the Audio API model.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1091,6 +1140,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
hidden: true, // old
|
||||
idPrefix: 'gpt-4o-mini-search-preview-2025-03-11',
|
||||
label: 'GPT-4o Mini Search Preview (2025-03-11)',
|
||||
pubDate: '20250311',
|
||||
description: 'Latest snapshot of the GPT-4o Mini model optimized for web search capabilities.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -1110,6 +1160,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4-turbo-2024-04-09',
|
||||
label: 'GPT-4 Turbo (2024-04-09)',
|
||||
pubDate: '20240409',
|
||||
hidden: true, // OLD
|
||||
description: 'GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently points to this version.',
|
||||
contextWindow: 128000,
|
||||
@@ -1126,6 +1177,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4-0125-preview',
|
||||
label: 'GPT-4 Turbo (0125)',
|
||||
pubDate: '20240125',
|
||||
hidden: true, // OLD
|
||||
description: 'GPT-4 Turbo preview model intended to reduce cases of "laziness" where the model doesn\'t complete a task.',
|
||||
contextWindow: 128000,
|
||||
@@ -1137,6 +1189,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4-1106-preview', // GPT-4 Turbo preview model
|
||||
label: 'GPT-4 Turbo (1106)',
|
||||
pubDate: '20231106',
|
||||
hidden: true, // OLD
|
||||
description: 'GPT-4 Turbo preview model featuring improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
|
||||
contextWindow: 128000,
|
||||
@@ -1156,6 +1209,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4-0613',
|
||||
label: 'GPT-4 (0613)',
|
||||
pubDate: '20230613',
|
||||
hidden: true, // OLD
|
||||
description: 'Snapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.',
|
||||
contextWindow: 8192,
|
||||
@@ -1167,6 +1221,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-4-0314',
|
||||
label: 'GPT-4 (0314)',
|
||||
pubDate: '20230314',
|
||||
hidden: true, // OLD
|
||||
description: 'Snapshot of gpt-4 from March 14th 2023 with function calling data. Data up to Sep 2021.',
|
||||
contextWindow: 8192,
|
||||
@@ -1189,6 +1244,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-3.5-turbo-0125',
|
||||
label: '3.5-Turbo (2024-01-25)',
|
||||
pubDate: '20240125',
|
||||
hidden: true, // OLD
|
||||
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.',
|
||||
contextWindow: 16385,
|
||||
@@ -1200,6 +1256,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'gpt-3.5-turbo-1106',
|
||||
label: '3.5-Turbo (1106)',
|
||||
pubDate: '20231106',
|
||||
hidden: true, // OLD
|
||||
description: 'GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
|
||||
contextWindow: 16385,
|
||||
@@ -1559,5 +1616,5 @@ export function llmOrtOaiLookup(orModelName: string): OrtVendorLookupResult | un
|
||||
|
||||
// initialTemperature: not set - OpenAI models use the global fallback (0.5);
|
||||
// NoTemperature models are handled client-side via LLM_IF_HOTFIX_NoTemperature (not propagated to OR)
|
||||
return { interfaces, parameterSpecs };
|
||||
return { interfaces, parameterSpecs, pubDate: entry.pubDate };
|
||||
}
|
||||
|
||||
@@ -12,6 +12,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gpt-4.1-2025-04-14',
|
||||
label: '💾➜ GPT-4.1 (2025-04-14)',
|
||||
pubDate: '20250414',
|
||||
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
|
||||
contextWindow: 1047576,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -22,6 +23,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gpt-4.1-mini-2025-04-14',
|
||||
label: '💾➜ GPT-4.1 Mini (2025-04-14)',
|
||||
pubDate: '20250414',
|
||||
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency and cost.',
|
||||
contextWindow: 1047576,
|
||||
maxCompletionTokens: 32768,
|
||||
@@ -32,6 +34,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gpt-4o-mini-2024-07-18',
|
||||
label: '💾➜ GPT-4o Mini (2024-07-18)',
|
||||
pubDate: '20240718',
|
||||
description: 'Affordable model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -41,6 +44,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gpt-4o-2024-08-06',
|
||||
label: '💾➜ GPT-4o (2024-08-06)',
|
||||
pubDate: '20240806',
|
||||
description: 'Advanced, multimodal flagship model that\'s cheaper and faster than GPT-4 Turbo.',
|
||||
contextWindow: 128000,
|
||||
maxCompletionTokens: 16384,
|
||||
@@ -51,6 +55,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gpt-3.5-turbo-0125',
|
||||
label: '💾➜ GPT-3.5 Turbo (0125)',
|
||||
pubDate: '20240125',
|
||||
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats',
|
||||
contextWindow: 16385,
|
||||
maxCompletionTokens: 4096,
|
||||
@@ -63,6 +68,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gemini-1.0-pro-001',
|
||||
label: '💾➜ Gemini 1.0 Pro',
|
||||
pubDate: '20240215',
|
||||
description: 'Google\'s Gemini 1.0 Pro model',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
|
||||
@@ -70,6 +76,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'gemini-1.5-flash-001',
|
||||
label: '💾➜ Gemini 1.5 Flash',
|
||||
pubDate: '20240514',
|
||||
description: 'Google\'s Gemini 1.5 Flash model - fast and efficient',
|
||||
contextWindow: 1000000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
|
||||
@@ -79,6 +86,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Meta-Llama-3.1-8B-Instruct',
|
||||
label: '💾 Llama 3.1 · 8B Instruct',
|
||||
pubDate: '20240723',
|
||||
description: 'Meta Llama 3.1 8B Instruct - hosted inference with per-token pricing',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -87,6 +95,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Meta-Llama-3.1-70B-Instruct',
|
||||
label: '💾 Llama 3.1 · 70B Instruct',
|
||||
pubDate: '20240723',
|
||||
description: 'Meta Llama 3.1 70B Instruct - hosted inference with per-token pricing',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -95,6 +104,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Llama-3.1-8B',
|
||||
label: '💾 Llama 3.1 · 8B Base',
|
||||
pubDate: '20240723',
|
||||
description: 'Meta Llama 3.1 8B base model for fine-tuning',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -102,6 +112,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Llama-3.1-70B',
|
||||
label: '💾 Llama 3.1 · 70B Base',
|
||||
pubDate: '20240723',
|
||||
description: 'Meta Llama 3.1 70B base model for fine-tuning',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -111,6 +122,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Llama-3.2-1B-Instruct',
|
||||
label: '💾 Llama 3.2 · 1B Instruct',
|
||||
pubDate: '20240925',
|
||||
description: 'Meta Llama 3.2 1B Instruct - lightweight model for edge and mobile deployment',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -118,6 +130,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Llama-3.2-3B-Instruct',
|
||||
label: '💾 Llama 3.2 · 3B Instruct',
|
||||
pubDate: '20240925',
|
||||
description: 'Meta Llama 3.2 3B Instruct - efficient model for edge deployment',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -127,6 +140,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'meta-llama/Llama-3.3-70B-Instruct',
|
||||
label: '💾 Llama 3.3 · 70B Instruct',
|
||||
pubDate: '20241206',
|
||||
description: 'Meta Llama 3.3 70B Instruct - latest 70B model with performance comparable to Llama 3.1 405B',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -136,6 +150,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2-VL-7B-Instruct',
|
||||
label: '💾 Qwen 2 · VL 7B Instruct',
|
||||
pubDate: '20240830',
|
||||
description: 'Alibaba Qwen 2 Vision-Language 7B Instruct - multimodal model for text and image understanding',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -145,6 +160,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-1.5B-Instruct',
|
||||
label: '💾 Qwen 2.5 · 1.5B Instruct',
|
||||
pubDate: '20240919',
|
||||
description: 'Alibaba Qwen 2.5 1.5B Instruct - efficient small model',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -152,6 +168,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-7B-Instruct',
|
||||
label: '💾 Qwen 2.5 · 7B Instruct',
|
||||
pubDate: '20240919',
|
||||
description: 'Alibaba Qwen 2.5 7B Instruct - balanced performance and efficiency',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -159,6 +176,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-14B-Instruct',
|
||||
label: '💾 Qwen 2.5 · 14B Instruct',
|
||||
pubDate: '20240919',
|
||||
description: 'Alibaba Qwen 2.5 14B Instruct - hosted inference (hourly compute unit pricing)',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -166,6 +184,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-72B-Instruct',
|
||||
label: '💾 Qwen 2.5 · 72B Instruct',
|
||||
pubDate: '20240919',
|
||||
description: 'Alibaba Qwen 2.5 72B Instruct - flagship model with performance comparable to Llama 3.1 405B',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -173,6 +192,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-Coder-7B-Instruct',
|
||||
label: '💾 Qwen 2.5 · Coder 7B Instruct',
|
||||
pubDate: '20241112',
|
||||
description: 'Alibaba Qwen 2.5 Coder 7B Instruct - specialized for code generation and understanding',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -180,6 +200,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen2.5-Coder-32B-Instruct',
|
||||
label: '💾 Qwen 2.5 · Coder 32B Instruct',
|
||||
pubDate: '20241112',
|
||||
description: 'Alibaba Qwen 2.5 Coder 32B Instruct - specialized for code generation and understanding',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
|
||||
@@ -189,6 +210,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen3-8B',
|
||||
label: '💾 Qwen 3 · 8B Base',
|
||||
pubDate: '20250429',
|
||||
description: 'Alibaba Qwen 3 8B base model for fine-tuning - supports thinking and non-thinking modes',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -196,6 +218,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'Qwen/Qwen3-14B',
|
||||
label: '💾 Qwen 3 · 14B Base',
|
||||
pubDate: '20250429',
|
||||
description: 'Alibaba Qwen 3 14B base model for fine-tuning - supports thinking and non-thinking modes',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -205,6 +228,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'google/gemma-3-1b-it',
|
||||
label: '💾 Gemma 3 · 1B IT',
|
||||
pubDate: '20250312',
|
||||
description: 'Google Gemma 3 1B instruction-tuned - lightweight text-only model',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -212,6 +236,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'google/gemma-3-4b-it',
|
||||
label: '💾 Gemma 3 · 4B IT',
|
||||
pubDate: '20250312',
|
||||
description: 'Google Gemma 3 4B instruction-tuned - efficient multimodal model with 128K context',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -219,6 +244,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'google/gemma-3-12b-it',
|
||||
label: '💾 Gemma 3 · 12B IT',
|
||||
pubDate: '20250312',
|
||||
description: 'Google Gemma 3 12B instruction-tuned - balanced multimodal model with 128K context',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -226,6 +252,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'google/gemma-3-27b-it',
|
||||
label: '💾 Gemma 3 · 27B IT',
|
||||
pubDate: '20250312',
|
||||
description: 'Google Gemma 3 27B instruction-tuned - largest Gemma 3 multimodal model with 128K context',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
|
||||
@@ -235,6 +262,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'mistralai/Mistral-Nemo-Base-2407',
|
||||
label: '💾 Mistral Nemo · Base',
|
||||
pubDate: '20240718',
|
||||
description: 'Mistral Nemo 12B base model (July 2024) for fine-tuning',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
@@ -242,6 +270,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'mistralai/Mistral-Small-24B-Base-2501',
|
||||
label: '💾 Mistral Small · 24B Base',
|
||||
pubDate: '20250130',
|
||||
description: 'Mistral Small 24B base model (Jan 2025) - competitive with larger models while faster',
|
||||
contextWindow: 32768,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
|
||||
@@ -162,8 +162,11 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
|
||||
// -- Vendor parameter & interface inheritance --
|
||||
const llmRef = model.id.replace(/^[^/]+\//, '');
|
||||
let initialTemperature: number | undefined;
|
||||
let pubDate: string | undefined;
|
||||
|
||||
const _mergeLookup = (lookup: OrtVendorLookupResult | undefined) => {
|
||||
if (lookup?.pubDate !== undefined)
|
||||
pubDate = lookup.pubDate;
|
||||
if (lookup?.interfaces)
|
||||
for (const iface of lookup.interfaces)
|
||||
if (!interfaces.includes(iface))
|
||||
@@ -270,6 +273,7 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
|
||||
idPrefix: model.id,
|
||||
// latest: ...
|
||||
label,
|
||||
...(pubDate !== undefined && { pubDate }),
|
||||
description: model.description?.length > 280 ? model.description.slice(0, 277) + '...' : model.description,
|
||||
contextWindow,
|
||||
maxCompletionTokens,
|
||||
|
||||
@@ -39,6 +39,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'sonar-deep-research',
|
||||
label: 'Sonar Deep Research',
|
||||
pubDate: '20250214',
|
||||
description: 'Expert-level research model for exhaustive searches and comprehensive reports. 128k context.',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
|
||||
@@ -59,6 +60,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'sonar-reasoning-pro',
|
||||
label: 'Sonar Reasoning Pro',
|
||||
pubDate: '20250218',
|
||||
description: 'Premier reasoning model (DeepSeek R1) with Chain of Thought. 128k context.',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
|
||||
@@ -78,6 +80,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'sonar-pro',
|
||||
label: 'Sonar Pro',
|
||||
pubDate: '20250121',
|
||||
description: 'Advanced search model for complex queries and deep content understanding. 200k context.',
|
||||
contextWindow: 200000,
|
||||
maxCompletionTokens: 8000,
|
||||
@@ -96,6 +99,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
|
||||
{
|
||||
id: 'sonar',
|
||||
label: 'Sonar',
|
||||
pubDate: '20250121',
|
||||
description: 'Lightweight, cost-effective search model for quick, grounded answers. 128k context.',
|
||||
contextWindow: 128000,
|
||||
interfaces: [LLM_IF_OAI_Chat],
|
||||
|
||||
@@ -93,6 +93,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-4.3',
|
||||
label: 'Grok 4.3',
|
||||
pubDate: '20260417',
|
||||
description: 'xAI\'s latest flagship model with always-on reasoning and a 1M token context window. Supports text, image, and video inputs with improved agentic performance at lower cost.',
|
||||
contextWindow: 1000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -107,6 +108,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
hidden: true, // yield to 4.3
|
||||
idPrefix: 'grok-4.20-0309-reasoning',
|
||||
label: 'Grok 4.20 Reasoning',
|
||||
pubDate: '20260309',
|
||||
description: 'xAI\'s previous flagship reasoning model with a 2M token context window. Deep reasoning and problem-solving capabilities with text and image inputs.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -119,6 +121,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
hidden: true, // yield to 4.3
|
||||
idPrefix: 'grok-4.20-0309-non-reasoning',
|
||||
label: 'Grok 4.20',
|
||||
pubDate: '20260309',
|
||||
description: 'xAI\'s previous flagship model with a 2M token context window. Non-reasoning variant for fast, high-quality responses with text and image inputs.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -130,6 +133,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-4.20-multi-agent-0309',
|
||||
label: 'Grok 4.20 Multi-Agent',
|
||||
pubDate: '20260309',
|
||||
description: 'Multi-agent reasoning model that runs 4 specialized agents in parallel (coordinator, fact-checker, analyst, challenger) for collaborative verification with reduced hallucination.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -147,6 +151,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-4-1-fast-reasoning',
|
||||
label: 'Grok 4.1 Fast Reasoning',
|
||||
pubDate: '20251119',
|
||||
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Trained specifically for real-world enterprise use cases with exceptional performance on agentic workflows.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -158,6 +163,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-4-1-fast-non-reasoning',
|
||||
label: 'Grok 4.1 Fast', // 'Grok 4.1 Fast Non-Reasoning'
|
||||
pubDate: '20251119',
|
||||
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Non-reasoning variant for instant responses.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -172,6 +178,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
hidden: true, // yield to 4.1
|
||||
idPrefix: 'grok-4-fast-reasoning',
|
||||
label: 'Grok 4 Fast Reasoning',
|
||||
pubDate: '20250919',
|
||||
description: 'Cost-efficient reasoning model with a 2M token context window. Optimized for fast reasoning in agentic workflows. 98% cost reduction vs Grok 4 with comparable performance.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -184,6 +191,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
hidden: true, // yield to 4.1
|
||||
idPrefix: 'grok-4-fast-non-reasoning',
|
||||
label: 'Grok 4 Fast', // 'Grok 4 Fast Non-Reasoning'
|
||||
pubDate: '20250919',
|
||||
description: 'Cost-efficient non-reasoning model with a 2M token context window. Same weights as grok-4-fast-reasoning but constrained by non-reasoning system prompt for quick responses.',
|
||||
contextWindow: 2000000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -196,6 +204,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
hidden: true, // yield to 4.20
|
||||
idPrefix: 'grok-4-0709',
|
||||
label: 'Grok 4 (0709)',
|
||||
pubDate: '20250709',
|
||||
description: 'xAI\'s most advanced model, offering state-of-the-art reasoning and problem-solving capabilities over a massive 256k context window. Supports text and image inputs.',
|
||||
contextWindow: 256000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -209,6 +218,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-3',
|
||||
label: 'Grok 3',
|
||||
pubDate: '20250217',
|
||||
description: 'xAI flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -220,6 +230,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-3-mini',
|
||||
label: 'Grok 3 Mini',
|
||||
pubDate: '20250217',
|
||||
description: 'A lightweight model that is fast and smart for logic-based tasks. Supports function calling and structured outputs.',
|
||||
contextWindow: 131072,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -236,6 +247,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-code-fast-1',
|
||||
label: 'Grok Code Fast 1',
|
||||
pubDate: '20250828',
|
||||
description: 'Specialized reasoning model for agentic coding workflows. Fast, economical, and optimized for code generation, debugging, and software development tasks.',
|
||||
contextWindow: 256000,
|
||||
maxCompletionTokens: undefined,
|
||||
@@ -249,6 +261,7 @@ const _knownXAIChatModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'grok-2-vision-1212',
|
||||
label: 'Grok 2 Vision (1212)',
|
||||
pubDate: '20241212',
|
||||
description: 'xAI model grok-2-vision-1212 with image and text input capabilities. Supports text generation with a 32,768 token context window.',
|
||||
contextWindow: 32768,
|
||||
maxCompletionTokens: undefined,
|
||||
|
||||
@@ -32,6 +32,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-5',
|
||||
label: 'GLM-5',
|
||||
pubDate: '20260211',
|
||||
description: 'Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.',
|
||||
contextWindow: 204800, // 200K
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -43,6 +44,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-5-code',
|
||||
label: 'GLM-5 Code',
|
||||
// pubDate: UNCONFIRMED - 'glm-5-code' not in Z.ai pricing table or release-notes; Z.ai's coding plan documents GLM-5.1 / GLM-5-Turbo / GLM-4.7 / GLM-4.5-Air, no 'glm-5-code'
|
||||
description: 'GLM-5 optimized for coding tasks. Uses the dedicated Coding endpoint. 200K context, thinking mode.',
|
||||
contextWindow: 204800, // 200K
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -58,6 +60,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.7',
|
||||
label: 'GLM-4.7',
|
||||
pubDate: '20251222',
|
||||
description: 'Latest-gen GLM model with 128K context. Thinking mode activated by default.',
|
||||
contextWindow: 131072, // 128K
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -69,6 +72,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.7-flashx',
|
||||
label: 'GLM-4.7 FlashX', // fast, low cost
|
||||
pubDate: '20260119',
|
||||
description: 'Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -80,6 +84,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.7-flash',
|
||||
label: 'GLM-4.7 Flash (Free)',
|
||||
pubDate: '20260119',
|
||||
description: 'Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -94,6 +99,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.6v-flashx',
|
||||
label: 'GLM-4.6 V FlashX',
|
||||
pubDate: '20251208',
|
||||
description: 'Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Vision_Reasoning,
|
||||
@@ -106,6 +112,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.6v-flash',
|
||||
label: 'GLM-4.6 V Flash (Free)',
|
||||
pubDate: '20251208',
|
||||
description: 'Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Vision_Reasoning,
|
||||
@@ -117,6 +124,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.6v',
|
||||
label: 'GLM-4.6 V',
|
||||
pubDate: '20251208',
|
||||
description: 'Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Vision_Reasoning,
|
||||
@@ -131,6 +139,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.6',
|
||||
label: 'GLM-4.6',
|
||||
pubDate: '20250930',
|
||||
description: 'GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -144,6 +153,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-ocr',
|
||||
label: 'GLM-OCR (Vision, OCR)',
|
||||
pubDate: '20260203',
|
||||
description: 'Specialized OCR model for text extraction from images and documents.',
|
||||
contextWindow: 131072,
|
||||
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_HOTFIX_NoWebP],
|
||||
@@ -158,6 +168,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5v',
|
||||
label: 'GLM-4.5 V',
|
||||
pubDate: '20250811',
|
||||
description: 'Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.',
|
||||
contextWindow: 98304, // 96K
|
||||
interfaces: _IF_Vision_Reasoning,
|
||||
@@ -173,6 +184,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5-flash',
|
||||
label: 'GLM-4.5 Flash (Free)',
|
||||
pubDate: '20250728',
|
||||
description: 'Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.',
|
||||
contextWindow: 98304,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -185,6 +197,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5-airx',
|
||||
label: 'GLM-4.5 AirX',
|
||||
pubDate: '20250728',
|
||||
description: 'Extended lightweight GLM-4.5 variant. Interleaved thinking.',
|
||||
contextWindow: 98304,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -197,6 +210,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5-air',
|
||||
label: 'GLM-4.5 Air',
|
||||
pubDate: '20250728',
|
||||
description: 'Lightweight GLM-4.5 variant. Interleaved thinking.',
|
||||
contextWindow: 98304,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -209,6 +223,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5-x',
|
||||
label: 'GLM-4.5 X',
|
||||
pubDate: '20250728',
|
||||
description: 'Extended GLM-4.5 model. Interleaved thinking.',
|
||||
contextWindow: 98304,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -221,6 +236,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4.5',
|
||||
label: 'GLM-4.5',
|
||||
pubDate: '20250728',
|
||||
description: 'Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.',
|
||||
contextWindow: 98304,
|
||||
interfaces: _IF_Reasoning,
|
||||
@@ -234,6 +250,7 @@ const _knownZAIModels: ManualMappings = [
|
||||
{
|
||||
idPrefix: 'glm-4-32b-0414-128k',
|
||||
label: 'GLM-4 32B (0414) 128K',
|
||||
pubDate: '20250414',
|
||||
description: 'GLM-4 32B model with 128K context, 16K output.',
|
||||
contextWindow: 131072,
|
||||
interfaces: _IF_Chat,
|
||||
|
||||
Regular → Executable
+4
-1
@@ -6,4 +6,7 @@ cd "$(dirname "$0")/../../.."
|
||||
|
||||
# Run with npx tsx (will download on-demand if needed)
|
||||
# Uses npx cache, lightweight and no local install required
|
||||
exec npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
|
||||
npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
|
||||
|
||||
# Then dump a fresh JSON snapshot next to the DB.
|
||||
exec npx -y tsx tools/data/llms/llm-registry-sync.ts --export-db tools/data/llms/llm-registry.json
|
||||
|
||||
@@ -41,6 +41,7 @@ interface CliOptions {
|
||||
discordWebhook?: string;
|
||||
notifyFilters?: string;
|
||||
validate?: boolean;
|
||||
exportDbPath?: string; // --export-db <path>: read-only DB dump (no API calls, no sync)
|
||||
}
|
||||
|
||||
interface StoredModel {
|
||||
@@ -53,6 +54,7 @@ interface StoredModel {
|
||||
deleted_at: string | null;
|
||||
created: number | null;
|
||||
updated: number | null;
|
||||
pub_date: string | null;
|
||||
context_window: number | null;
|
||||
max_completion_tokens: number | null;
|
||||
interfaces: string | null;
|
||||
@@ -90,6 +92,13 @@ function extractSimplePrice(price: any): number | null {
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Idempotent schema migration: adds a column if it doesn't already exist. Safe to call on every run. */
|
||||
function ensureColumn(db: DatabaseSync, table: string, column: string, columnDef: string): void {
|
||||
const cols = db.prepare(`PRAGMA table_info(${table})`).all() as Array<{ name: string }>;
|
||||
if (!cols.some((c) => c.name === column))
|
||||
db.exec(`ALTER TABLE ${table} ADD COLUMN ${column} ${columnDef}`);
|
||||
}
|
||||
|
||||
function initDatabase(): DatabaseSync {
|
||||
const db = new DatabaseSync(DB_PATH);
|
||||
|
||||
@@ -105,6 +114,7 @@ function initDatabase(): DatabaseSync {
|
||||
deleted_at TEXT,
|
||||
created INTEGER,
|
||||
updated INTEGER,
|
||||
pub_date TEXT,
|
||||
context_window INTEGER,
|
||||
max_completion_tokens INTEGER,
|
||||
interfaces TEXT,
|
||||
@@ -131,6 +141,9 @@ function initDatabase(): DatabaseSync {
|
||||
)
|
||||
`);
|
||||
|
||||
// Migrations for existing DBs (safe no-ops on fresh DBs that already have the column from CREATE TABLE).
|
||||
ensureColumn(db, 'models', 'pub_date', 'TEXT');
|
||||
|
||||
return db;
|
||||
}
|
||||
|
||||
@@ -157,15 +170,16 @@ function saveChanges(
|
||||
): void {
|
||||
if (changes.new.length > 0) {
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
|
||||
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
|
||||
context_window, max_completion_tokens, interfaces, description,
|
||||
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
|
||||
ON CONFLICT (id, vendor, service) DO UPDATE SET
|
||||
label = excluded.label,
|
||||
last_seen = excluded.last_seen,
|
||||
created = excluded.created,
|
||||
updated = excluded.updated,
|
||||
pub_date = excluded.pub_date,
|
||||
context_window = excluded.context_window,
|
||||
max_completion_tokens = excluded.max_completion_tokens,
|
||||
interfaces = excluded.interfaces,
|
||||
@@ -188,6 +202,7 @@ function saveChanges(
|
||||
timestamp,
|
||||
model.created ?? null,
|
||||
model.updated ?? null,
|
||||
model.pubDate ?? null,
|
||||
model.contextWindow ?? null,
|
||||
model.maxCompletionTokens ?? null,
|
||||
model.interfaces ? JSON.stringify(model.interfaces) : null,
|
||||
@@ -208,6 +223,7 @@ function saveChanges(
|
||||
last_seen = ?,
|
||||
created = ?,
|
||||
updated = ?,
|
||||
pub_date = ?,
|
||||
context_window = ?,
|
||||
max_completion_tokens = ?,
|
||||
interfaces = ?,
|
||||
@@ -229,6 +245,7 @@ function saveChanges(
|
||||
timestamp,
|
||||
model.created ?? null,
|
||||
model.updated ?? null,
|
||||
model.pubDate ?? null,
|
||||
model.contextWindow ?? null,
|
||||
model.maxCompletionTokens ?? null,
|
||||
model.interfaces ? JSON.stringify(model.interfaces) : null,
|
||||
@@ -247,11 +264,13 @@ function saveChanges(
|
||||
|
||||
if (changes.unchanged.length > 0) {
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
|
||||
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
|
||||
context_window, max_completion_tokens, interfaces, description,
|
||||
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
|
||||
ON CONFLICT (id, vendor, service) DO UPDATE SET last_seen = excluded.last_seen
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
|
||||
ON CONFLICT (id, vendor, service) DO UPDATE SET
|
||||
last_seen = excluded.last_seen,
|
||||
pub_date = excluded.pub_date
|
||||
`);
|
||||
|
||||
for (const model of changes.unchanged) {
|
||||
@@ -264,6 +283,7 @@ function saveChanges(
|
||||
timestamp,
|
||||
model.created ?? null,
|
||||
model.updated ?? null,
|
||||
model.pubDate ?? null,
|
||||
model.contextWindow ?? null,
|
||||
model.maxCompletionTokens ?? null,
|
||||
model.interfaces ? JSON.stringify(model.interfaces) : null,
|
||||
@@ -310,6 +330,114 @@ function saveSyncHistory(
|
||||
);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Snapshot Export
|
||||
// ============================================================================
|
||||
|
||||
interface CatalogModel {
|
||||
id: string;
|
||||
vendor: string;
|
||||
service: string;
|
||||
label: string;
|
||||
pubDate: string | null;
|
||||
firstSeen: string;
|
||||
lastSeen: string;
|
||||
deletedAt: string | null;
|
||||
created: number | null;
|
||||
updated: number | null;
|
||||
contextWindow: number | null;
|
||||
maxCompletionTokens: number | null;
|
||||
interfaces: string[] | null;
|
||||
description: string | null;
|
||||
benchmarkElo: number | null;
|
||||
priceInput: number | null;
|
||||
priceOutput: number | null;
|
||||
}
|
||||
|
||||
interface CatalogSnapshot {
|
||||
schemaVersion: number;
|
||||
exportedAt: string;
|
||||
totalCount: number;
|
||||
activeCount: number;
|
||||
deletedCount: number;
|
||||
byVendor: Record<string, number>;
|
||||
models: CatalogModel[];
|
||||
}
|
||||
|
||||
/** Dump the entire registry (active + soft-deleted) to a JSON file. Read-only on the DB. */
|
||||
function exportSnapshot(db: DatabaseSync, outPath: string): void {
|
||||
const rows = db.prepare(`
|
||||
SELECT id, vendor, service, label, pub_date, first_seen, last_seen, deleted_at,
|
||||
created, updated, context_window, max_completion_tokens, interfaces, description,
|
||||
benchmark_elo, price_input, price_output
|
||||
FROM models
|
||||
ORDER BY vendor, service, id
|
||||
`).all() as unknown as Array<StoredModel & { interfaces: string | null }>;
|
||||
|
||||
const byVendor: Record<string, number> = {};
|
||||
let activeCount = 0;
|
||||
let deletedCount = 0;
|
||||
|
||||
const models: CatalogModel[] = rows.map((r) => {
|
||||
byVendor[r.vendor] = (byVendor[r.vendor] || 0) + 1;
|
||||
if (r.deleted_at) deletedCount++;
|
||||
else activeCount++;
|
||||
|
||||
let parsedInterfaces: string[] | null = null;
|
||||
if (r.interfaces) {
|
||||
try {
|
||||
const parsed = JSON.parse(r.interfaces);
|
||||
if (Array.isArray(parsed)) parsedInterfaces = parsed;
|
||||
} catch {
|
||||
// leave null on parse failure
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
id: r.id,
|
||||
vendor: r.vendor,
|
||||
service: r.service,
|
||||
label: r.label,
|
||||
pubDate: r.pub_date,
|
||||
firstSeen: r.first_seen,
|
||||
lastSeen: r.last_seen,
|
||||
deletedAt: r.deleted_at,
|
||||
created: r.created,
|
||||
updated: r.updated,
|
||||
contextWindow: r.context_window,
|
||||
maxCompletionTokens: r.max_completion_tokens,
|
||||
interfaces: parsedInterfaces,
|
||||
description: r.description,
|
||||
benchmarkElo: r.benchmark_elo,
|
||||
priceInput: r.price_input,
|
||||
priceOutput: r.price_output,
|
||||
};
|
||||
});
|
||||
|
||||
const snapshot: CatalogSnapshot = {
|
||||
schemaVersion: 1,
|
||||
exportedAt: new Date().toISOString(),
|
||||
totalCount: rows.length,
|
||||
activeCount,
|
||||
deletedCount,
|
||||
byVendor,
|
||||
models,
|
||||
};
|
||||
|
||||
// Write atomically: write to temp, then rename. Avoids partial reads if a consumer is watching.
|
||||
const dir = path.dirname(path.resolve(outPath));
|
||||
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
|
||||
const tmpPath = `${outPath}.tmp`;
|
||||
fs.writeFileSync(tmpPath, JSON.stringify(snapshot, null, 2));
|
||||
fs.renameSync(tmpPath, outPath);
|
||||
|
||||
console.log(
|
||||
`${COLORS.green}✓ Exported${COLORS.reset} ${rows.length} models ` +
|
||||
`(${activeCount} active, ${deletedCount} deleted) ` +
|
||||
`${COLORS.dim}-> ${path.resolve(outPath)}${COLORS.reset}`,
|
||||
);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Change Detection
|
||||
// ============================================================================
|
||||
@@ -353,6 +481,9 @@ function detectChanges(
|
||||
existingModel.context_window !== (model.contextWindow ?? null) ||
|
||||
existingModel.max_completion_tokens !== (model.maxCompletionTokens ?? null) ||
|
||||
existingModel.interfaces !== modelInterfaces;
|
||||
// NOTE: pub_date intentionally EXCLUDED from change detection. On first run after upgrade,
|
||||
// all rows go from NULL -> editorial value, which would fire ~hundreds of spurious "updated"
|
||||
// notifications. The unchanged-touch path below silently backfills pub_date instead.
|
||||
|
||||
if (hasChanged) {
|
||||
changes.updated.push(model);
|
||||
@@ -542,6 +673,10 @@ function parseArgs(): CliOptions {
|
||||
case '--validate':
|
||||
options.validate = true;
|
||||
break;
|
||||
case '--export-db':
|
||||
options.exportDbPath = nextArg;
|
||||
i++;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -566,6 +701,7 @@ ${COLORS.bright}Options:${COLORS.reset}
|
||||
--posthog-key <key> PostHog API key for analytics
|
||||
--discord-webhook <url> Discord webhook URL
|
||||
--notify-filters <list> Comma-separated vendor list (e.g., openai,anthropic)
|
||||
--export-db <path> Read-only DB dump to JSON (no API calls, no sync). Run separately from sync.
|
||||
--help Show this help
|
||||
|
||||
${COLORS.bright}Examples:${COLORS.reset}
|
||||
@@ -961,6 +1097,17 @@ async function main() {
|
||||
try {
|
||||
const options = parseArgs();
|
||||
|
||||
// --export-db: read-only DB dump. No config, no sync, no API calls.
|
||||
if (options.exportDbPath) {
|
||||
const db = initDatabase();
|
||||
try {
|
||||
exportSnapshot(db, options.exportDbPath);
|
||||
} finally {
|
||||
db.close();
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
let servicesConfig: Record<string, AixAPI_Access>;
|
||||
|
||||
if (options.config) {
|
||||
|
||||
Reference in New Issue
Block a user