Compare commits

...

8 Commits

Author SHA1 Message Date
Enrico Ros 1bf1b744b9 llm-registry-sync: export models 2026-05-05 01:33:06 -07:00
Enrico Ros ee2d7114c7 llm-registry-sync: record/sync pub date
the next update won't have the spam (pub date not used for change detection)
2026-05-05 01:33:06 -07:00
Enrico Ros 3b1b54b3a3 KB: +llm-editorial 2026-05-05 01:33:06 -07:00
Enrico Ros 524029a882 Models List: show new (<30 days) models 2026-05-05 00:54:34 -07:00
Enrico Ros 69161d29a7 LLMs: Gemini typo 2026-05-05 00:29:13 -07:00
Enrico Ros 8a542c1af4 LLMs: display the pubDate 2026-05-05 00:16:01 -07:00
Enrico Ros fe16970624 LLMs: PubDates 2026-05-05 00:01:06 -07:00
Enrico Ros e21abdef45 LLMs: pubDate support 2026-05-04 13:48:29 -07:00
24 changed files with 664 additions and 68 deletions
+4
View File
@@ -17,6 +17,10 @@ Architecture and system documentation is available in the `/kb/` knowledge base,
#### CSF - Client-Side Fetch
- **[CSF.md](systems/client-side-fetch.md)** - Direct browser-to-API communication for LLM requests
#### LLM - Language Model Metadata
- **[LLM-editorial-control.md](modules/LLM-editorial-pubdate.md)** - Where we have editorial control over per-model metadata vs dynamic discovery; `pubDate` field semantics, propagation chain, resolution rules, per-vendor matrix
- **[LLM-models-catalog-pipeline.md](modules/LLM-models-catalog-pipeline.md)** - Forward-looking pipeline: extraction script, snapshot artifact, website consumption, future schema extensions
### Systems Documentation
#### Core Platform Systems
+106
View File
@@ -0,0 +1,106 @@
# LLM Editorial Control Surface
This document maps where Big-AGI has editorial control over per-model metadata (and therefore can guarantee fields like `pubDate`, curated `description`, `chatPrice`, `benchmark`, `parameterSpecs`, etc.) versus where it must rely on the vendor API's dynamic discovery (and therefore cannot guarantee them).
For the forward-looking pipeline (extraction script, snapshot, website consumption, future schema extensions), see [LLM-models-catalog-pipeline.md](LLM-models-catalog-pipeline.md).
## The `pubDate` field
`pubDate?: string` (validated as `/^\d{8}$/`, e.g. `'20250929'`) is **optional** in the wire schema and on `DLLM`. It was added to:
- `ModelDescription_schema` in `src/modules/llms/server/llm.server.types.ts` - the canonical wire type
- `OrtVendorLookupResult` in the same file - so OpenRouter inherits it via `llmOrt*Lookup`
- `DLLM` in `src/common/stores/llms/llms.types.ts` - the persisted client model
### Where `pubDate` is guaranteed (always emitted)
- **Editorial entries** in 12 hybrid/editorial vendors (282 models). Hand-curated, externally corroborated. Future entries in these arrays are expected to include `pubDate`.
- **Anthropic 0-day placeholder** (`llmsAntCreatePlaceholderModel`): when the API surfaces an Anthropic model not in the editorial list, the placeholder uses the API's `created_at` ISO date, falling back to today via `formatPubDate()`.
- **Gemini 0-day fallback** (`geminiModelToModelDescription`): when the API returns a Gemini model not in `_knownGeminiModels`, the converter falls back to today via `formatPubDate()` (Gemini API does not expose a creation timestamp).
### Where `pubDate` is omitted (optional)
- **Symlink entries** (`KnownLink`) - inherit the target's `pubDate` via the merge logic in `fromManualMapping`.
- **Unknown variants resolved through `super`/`fallback`** in `fromManualMapping` for non-Anthropic/non-Gemini vendors - the field is left undefined rather than fabricated.
- **Dynamic-only vendors** (OpenRouter, TogetherAI, Novita, ChutesAI, FireworksAI, TLUS, Azure, LM Studio, LocalAI, FastAPI, ArceeAI, LLMAPI) - no editorial knob; pubDate flows in only when the underlying lookup or upstream API populates it.
The rationale: today's date is a defensible 0-day proxy only when we know we're seeing a brand-new model the vendor just announced (Anthropic and Gemini's "discovery via official model list" paths). For arbitrary dynamic vendors, fabricating today would mark old/well-known models as new - misleading. Better to omit.
### Propagation chain
- `fromManualMapping()` in `src/modules/llms/server/models.mappings.ts` - copies the field for OAI-style vendors when present
- `geminiModelToModelDescription()` in `src/modules/llms/server/gemini/gemini.models.ts` - copies for Gemini, falls back to today for unknowns
- `llmsAntCreatePlaceholderModel()` in `src/modules/llms/server/anthropic/anthropic.models.ts` - emits from API `created_at` (or today)
- `_mergeLookup()` in `src/modules/llms/server/openai/models/openrouter.models.ts` - merges for OpenRouter cross-vendor inheritance
- `_createDLLMFromModelDescription()` in `src/modules/llms/llm.client.ts` - copies onto the persisted DLLM when present
- `formatPubDate()` helper in `src/modules/llms/server/models.mappings.ts` - shared `'YYYYMMDD'` formatter for the 0-day-fillable paths
### Semantics
`pubDate` is the **earliest public availability** of the model - the date on which the vendor first made this specific model usable by external users via any channel (consumer app, web, console, API, partner, open-weights upload).
It is **not**:
- The date Big-AGI added the entry to its catalog (Ollama uses `added` for that)
- The training-data cutoff (proposed but not implemented; see `src/common/stores/llms/llms.types.next.ts:217`)
- The date the model snapshot was built (suffixes like `-1212` may refer to build dates, but `pubDate` tracks public availability)
### Resolution rules (when sources conflict)
1. **Date-suffixed model IDs**: when the suffix matches a documented announcement, the suffix is canonical (vendor convention). xAI, OpenAI, and Mistral all use suffixes that closely track release dates.
2. **Anthropic exception**: Anthropic's date suffixes are typically the **snapshot/training-cutoff date, not the public release date**. For example, `claude-3-7-sonnet-20250219` was released on 2025-02-24, `claude-opus-4-20250514` was released 2025-05-22, and `claude-haiku-4-5-20251001` was released 2025-10-15. Always corroborate against Anthropic's blog/press for the actual release date. Only `claude-sonnet-4-5-20250929` and `claude-opus-4-1-20250805` have suffixes that match.
3. **Closed beta -> public beta -> GA**: use the first date *external* users could access the specific variant.
4. **Family-headline IDs and dated snapshots** (e.g., `claude-opus-4-1` and `claude-opus-4-1-20250805`): typically share a release date.
5. **Hosted on a third party** (Groq hosting Llama, OpenPipe mirroring others, OpenRouter aggregating): use the *underlying* model's original release date by its creator, not when the host added it.
6. **Symlinks** (entries with `symLink:`): inherit the target's date.
7. **Partial dates** (only month known): use the 1st of the month and tag as MEDIUM confidence in the editor's note.
## Editorial control matrix
Three categories:
- **Editorial** - the vendor file contains hand-curated entries; we control descriptions, pricing, benchmarks, interfaces, parameter specs, and `pubDate`.
- **Hybrid** - the API returns the live model list, and editorial entries (keyed by id/idPrefix) merge over the API data via `fromManualMapping`. We control everything except *which models exist*.
- **Dynamic** - the API is the only source of model identity and metadata. Big-AGI cannot reliably populate `pubDate` here (no editorial knob).
| Vendor | Category | File | Array | Entries | `pubDate` populated |
|---|---|---|---|---|---|
| Anthropic | Hybrid | `anthropic/anthropic.models.ts` | `hardcodedAnthropicModels` | 12 | 12/12 HIGH |
| Gemini | Hybrid | `gemini/gemini.models.ts` | `_knownGeminiModels` | 33 | 33/33 HIGH |
| OpenAI | Hybrid | `openai/models/openai.models.ts` | `_knownOpenAIChatModels` | 96 | 95/96 HIGH/MED (`osb-120b` skipped, speculative) |
| xAI | Hybrid | `openai/models/xai.models.ts` | `_knownXAIChatModels` | 13 | 13/13 HIGH (pilot) |
| Mistral | Hybrid | `openai/models/mistral.models.ts` | `_knownMistralModelDetails` | 41 | 41/41 (40 HIGH, 1 MED for legacy `mistral-medium`) |
| Moonshot (Kimi) | Hybrid | `openai/models/moonshot.models.ts` | `_knownMoonshotModels` | 13 | 13/13 (10 HIGH, 3 MED for v1 base models) |
| Perplexity | Editorial | `openai/models/perplexity.models.ts` | `_knownPerplexityChatModels` | 4 | 4/4 HIGH |
| MiniMax | Editorial | `openai/models/minimax.models.ts` | `_knownMiniMaxModels` | 10 | 10/10 HIGH |
| DeepSeek | Hybrid | `openai/models/deepseek.models.ts` | `_knownDeepseekChatModels` | 4 | 4/4 HIGH |
| Groq | Hybrid (host) | `openai/models/groq.models.ts` | `_knownGroqModels` | 11 | 11/11 HIGH (underlying-model date) |
| Z.AI / GLM | Hybrid | `openai/models/zai.models.ts` | `_knownZAIModels` | 17 | 16/17 (`glm-5-code` UNCONFIRMED) |
| OpenPipe | Editorial (mirror) | `openai/models/openpipe.models.ts` | `_knownOpenPipeChatModels` | 30 | 30/30 HIGH (all upstream-mirror, no OpenPipe originals) |
| Bedrock | Reuses Anthropic | `bedrock/bedrock.models.ts` | -> `hardcodedAnthropicModels` | (12) | inherited |
| Ollama | Editorial (catalog) | `ollama/ollama.models.ts` | `OLLAMA_BASE_MODELS` | 209 | **deferred** - see notes |
| Arcee AI | Dynamic | `openai/models/arceeai.models.ts` | `_arceeKnownModels` | 0 | n/a (empty) |
| LLMAPI | Dynamic | `openai/models/llmapi.models.ts` | `_llmapiKnownModels` | 0 | n/a (empty) |
| Alibaba | Dynamic | `openai/models/alibaba.models.ts` | `_knownAlibabaChatModels` | 0 | n/a (empty) |
| OpenRouter | Dynamic + delegated lookup | `openai/models/openrouter.models.ts` | (parser) | -- | inherited via `llmOrt*Lookup` |
| TogetherAI | Dynamic | `openai/models/together.models.ts` | (parser) | -- | no |
| FireworksAI | Dynamic | `openai/models/fireworksai.models.ts` | (parser) | -- | no |
| Novita | Dynamic | `openai/models/novita.models.ts` | (parser) | -- | no |
| ChutesAI | Dynamic | `openai/models/chutesai.models.ts` | (parser) | -- | no |
| TLUS | Dynamic | `openai/models/tlusapi.models.ts` | (parser) | -- | no |
| Azure | Dynamic | `openai/models/azure.models.ts` | (parser) | -- | no |
| LM Studio | Dynamic | `openai/models/lmstudio.models.ts` | (parser) | -- | no |
| LocalAI | Dynamic | `openai/models/localai.models.ts` | (parser) | -- | no |
| FastAPI | Dynamic | `openai/models/fastapi.models.ts` | (parser) | -- | no |
**Totals**: 284 editorial entries across 12 vendors, of which **282** have corroborated `pubDate` and **2** are intentional gaps (`osb-120b` speculative, `glm-5-code` not yet announced). All 12 vendor files type-check clean.
### Notes
- **Hybrid** vendors are still effectively editorial for the models we know about: when an API id matches a hardcoded `idPrefix` (or `id`), `fromManualMapping` injects all the editorial fields. Unknown ids fall through to a default-shaped placeholder where `pubDate` is undefined.
- **OpenRouter** delegates back to Anthropic / Gemini / OpenAI editorial lookups via `llmOrtAntLookup_ThinkingVariants`, `llmOrtGemLookup`, `llmOrtOaiLookup`. `pubDate` flows through these lookups, so OpenRouter-served Claude/Gemini/GPT models get `pubDate` automatically once the underlying editorial entry has it.
- **Bedrock** finds Anthropic editorial via `llmBedrockFindAnthropicModel` and strips unsupported interfaces - `pubDate` inherits from Anthropic.
- **Ollama** is deferred: 209 entries keyed by upstream model family (e.g. `qwen3.6`, `kimi-k2`, `glm-4.6`). Each entry's `pubDate` would need to be the upstream creator's release date (Meta, Alibaba, Moonshot, Z.AI, etc.). This is large-scale upstream research; better handled in a follow-up pass once cross-vendor `pubDate` data is consolidated and reusable.
- **Dynamic-only** vendors get nothing automatic. To add `pubDate` for them we'd have to seed editorial entries (which is what `fromManualMapping`'s mapping mechanism was built for); this is a per-vendor decision and out of scope for the initial rollout.
+78
View File
@@ -0,0 +1,78 @@
# LLM Models Catalog Pipeline (forward-looking)
Status: **proposal / partially implemented**. Companion to [LLM-editorial-control.md](LLM-editorial-pubdate.md) which describes the durable reference (`pubDate` semantics, editorial-vs-dynamic matrix, propagation chain).
This document captures the forward-looking pipeline that turns Big-AGI's editorial model metadata into website value-add (plots, decision helpers, comparison tools at big-agi.com).
## Goal
Stand up a database/datastore that the website (`~/dev/website`) can query for plots, decision helpers, and comparison tools - without requiring the website to call our authenticated tRPC endpoints.
## Stages
### Stage 1: source of truth (in this repo) — DONE
Editorial files in `src/modules/llms/server/` remain the canonical source for:
- Identity: id, label, vendor
- Capabilities: `interfaces`, `parameterSpecs`, `contextWindow`, `maxCompletionTokens`
- Pricing: `chatPrice` (input / output / cache tiers)
- Benchmarks: `benchmark.cbaElo` (Chat Bot Arena ELO)
- Lifecycle: `pubDate`, `isLegacy`, `isPreview`, `hidden`, deprecation comments
Well-typed, version-controlled, reviewed - every model edit is a code change with diff history. 282 entries currently carry `pubDate` (see editorial-control matrix).
### Stage 2: extraction script — IN PROGRESS
A build-time script (e.g. `scripts/llms/export-models.ts`) that:
1. Loads every editorial vendor's model array.
2. Normalizes per-vendor shapes (array vs Record, `id` vs `idPrefix`, `KnownLink` symlinks) to a single row format.
3. Resolves symlinks (target's `pubDate` flows through).
4. Writes a single JSON snapshot: `data/models-catalog.json` (one row per model, with vendor + the editorial fields above).
Open question: do we want this committed (gives the website a stable artifact / public URL) or built on-demand in CI? **Recommend committed snapshot** under `data/` so consumers get a stable URL.
### Stage 3: enrichment — NOT STARTED
The exported snapshot gets enriched with data we don't currently track in editorial files:
- **Knowledge cutoff** (proposed in `llms.types.next.ts:217` but never implemented; should be added to `ModelDescription_schema` as a follow-up).
- **MMLU / HumanEval / SWE-bench / GPQA / MATH** scores (currently only `cbaElo`; richer benchmarks belong in a separate block).
- **Throughput / latency** numbers (per-vendor, possibly per-region).
- **Modalities matrix** (input image, input audio, input video, input PDF, output image, output audio).
- **Weights availability** (closed / open / restricted), license.
Sources for enrichment: HuggingFace cards, vendor docs, Artificial Analysis, LLM-Stats, official benchmarks. Some can be scraped on a cadence; some needs editorial review.
### Stage 4: website consumption — NOT STARTED
The website (`~/dev/website`) consumes the snapshot to render:
- **Timeline plot**: `pubDate` (x-axis) vs `cbaElo` (y-axis), grouped by vendor - shows the frontier and rate of progress.
- **Cost-per-quality plot**: `chatPrice.output` vs `cbaElo` - "best model per dollar".
- **Decision helpers**: filter by capability (`interfaces`), context window, pricing tier, vendor.
- **Comparison cards**: side-by-side specs.
- **Lifecycle alerts**: deprecation warnings for retiring models.
## Open questions
1. **Where does enrichment data live?** A separate `data/models-enrichment.json` (joined by id at build time) keeps editorial files clean but introduces a join surface. Alternative: extend `ModelDescription_schema` with optional enrichment fields and treat editorial files as the only source. Recommend the separate file approach - editorial files stay focused on vendor-API integration; enrichment evolves on a different cadence.
2. **How fresh does the website need to be?** If daily, build the snapshot in CI on push and publish to a static URL. If real-time, consume tRPC directly - more work but fewer freshness gaps.
3. **Do we expose `pubDate` and other editorial metadata via tRPC publicly, or only via the snapshot?** The current tRPC routes require auth; the website should consume the snapshot, not live tRPC.
4. **Schema versioning** - if `ModelDescription_schema` evolves, the snapshot consumers need to be tolerant. Include a `schemaVersion` field in the snapshot envelope.
## Future extensions to `ModelDescription_schema`
Beyond `pubDate`, the natural follow-ups (in priority order):
1. **`knowledgeCutoff?: string`** (`'YYYY-MM'` or `'YYYY-MM-DD'`) - already proposed in `llms.types.next.ts`. Useful for the timeline plot and for context-aware prompts.
2. **`deprecationDate?: string`** - currently exists informally as `deprecated?: string` on `_knownGeminiModels`; should be promoted to the schema.
3. **`license?: string`** - especially important for open-weights models (apache-2.0, mit, llama-community, custom).
4. **`weights?: 'closed' | 'open' | 'restricted'`** - quick filter for "can I run this myself?".
5. **`benchmarks?: { mmlu?: number, humaneval?: number, gpqa?: number, ... }`** - richer than the current `cbaElo`-only block.
6. **`modalities?: { in: string[], out: string[] }`** - more precise than `interfaces` for input/output capability matrices.
+15
View File
@@ -25,6 +25,7 @@ export interface DLLM {
label: string;
created: number | 0;
updated?: number | 0;
pubDate?: string; // official release date in 'YYYYMMDD'
description: string;
hidden: boolean;
@@ -137,6 +138,20 @@ export function getLLMMaxOutputTokens(llm: DLLM | null): DLLMMaxOutputTokens | u
return llm.userMaxOutputTokens ?? llm.maxOutputTokens;
}
/**
* Parse the model's editorial `pubDate` ('YYYYMMDD') into a Date, or null if missing/malformed.
* Date is constructed at local midnight - pubDate is day-precision, no time component.
*/
export function getLLMPubDate(llm: DLLM | null | undefined): Date | null {
const p = llm?.pubDate;
if (!p || !/^\d{8}$/.test(p)) return null;
const y = parseInt(p.slice(0, 4), 10);
const m = parseInt(p.slice(4, 6), 10) - 1; // JS Date months are 0-indexed
const d = parseInt(p.slice(6, 8), 10);
const date = new Date(y, m, d);
return Number.isFinite(date.getTime()) ? date : null;
}
/// Interfaces ///
// do not change anything below! those will be persisted in data
+1
View File
@@ -107,6 +107,7 @@ function _createDLLMFromModelDescription(d: ModelDescriptionSchema, service: DMo
label: d.label,
created: d.created || 0,
updated: d.updated || 0,
...(d.pubDate && { pubDate: d.pubDate }),
description: d.description,
hidden: !!d.hidden,
@@ -15,7 +15,7 @@ import WarningRoundedIcon from '@mui/icons-material/WarningRounded';
import { type DPricingChatGenerate, isLLMChatFree_cached, llmChatPricing_adjusted } from '~/common/stores/llms/llms.pricing';
import type { ModelOptionsContext } from '~/common/layout/optima/store-layout-optima';
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import { DLLMId, DModelInterfaceV1, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMVisible, LLM_IF_HOTFIX_NoStream, LLM_IF_HOTFIX_NoTemperature, LLM_IF_OAI_Reasoning } from '~/common/stores/llms/llms.types';
import { FormLabelStart } from '~/common/components/forms/FormLabelStart';
import { GoodModal } from '~/common/components/modals/GoodModal';
import { LLMImplicitParametersRuntimeFallback } from '~/common/stores/llms/llms.parameters';
@@ -280,6 +280,7 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
// cache
const adjChatPricing = llmChatPricing_adjusted(llm);
const pubDate = getLLMPubDate(llm);
return (
@@ -502,7 +503,8 @@ export function LLMOptionsModal(props: { id: DLLMId, context?: ModelOptionsConte
id: {llm.id}<br />
context: <b>{getLLMContextTokens(llm)?.toLocaleString() ?? 'not provided'}</b> tokens{` · `}
max output: <b>{getLLMMaxOutputTokens(llm)?.toLocaleString() ?? 'not provided'}</b><br />
{!!llm.created && <>created: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
{!!pubDate && <>published: <b>{pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}</b> · <TimeAgo date={pubDate} /><br /></>}
{!!llm.created && <>indexed: <TimeAgo date={new Date(llm.created * 1000)} /><br /></>}
{/*· tags: {llm.tags.join(', ')}*/}
{!!adjChatPricing && prettyPricingComponent(adjChatPricing)}
{/*{!!llm.benchmark && <>benchmark: <b>{llm.benchmark.cbaElo?.toLocaleString() || '(unk) '}</b> CBA Elo<br /></>}*/}
+6 -1
View File
@@ -9,7 +9,7 @@ import VisibilityOutlinedIcon from '@mui/icons-material/VisibilityOutlined';
import type { DModelsServiceId } from '~/common/stores/llms/llms.service.types';
import { isLLMChatFree_cached } from '~/common/stores/llms/llms.pricing';
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
import { DLLM, DLLMId, getLLMContextTokens, getLLMLabel, getLLMMaxOutputTokens, getLLMPubDate, isLLMCustomUserParameters, isLLMHidden, LLM_IF_ANT_PromptCaching, LLM_IF_GEM_CodeExecution, LLM_IF_OAI_Fn, LLM_IF_OAI_Json, LLM_IF_OAI_PromptCaching, LLM_IF_OAI_Reasoning, LLM_IF_OAI_Vision, LLM_IF_Outputs_Audio, LLM_IF_Outputs_Image, LLM_IF_Tools_WebSearch } from '~/common/stores/llms/llms.types';
import { GoodTooltip } from '~/common/components/GoodTooltip';
import { PhGearSixIcon } from '~/common/components/icons/phosphor/PhGearSixIcon';
import { STAR_EMOJI, StarredToggle, starredToggleStyle } from '~/common/components/StarIcons';
@@ -99,6 +99,10 @@ export const ModelItem = React.memo(function ModelItem(props: {
const isNotSymlink = !llm.label.startsWith('🔗'); // getLLMLabel exception: need access to the base
const llmLabel = getLLMLabel(llm);
// "new" badge: shown only when pubDate is set AND within the last 30 days
const pubDate = getLLMPubDate(llm);
const isRecentlyPublished = pubDate ? (Date.now() - pubDate.getTime()) < 30 * 24 * 60 * 60 * 1000 : false;
const handleLLMConfigure = React.useCallback((event: React.MouseEvent) => {
event.stopPropagation();
@@ -227,6 +231,7 @@ export const ModelItem = React.memo(function ModelItem(props: {
</>}
{/* Features Chips - sync with `useLLMSelect.tsx` */}
{isRecentlyPublished && isNotSymlink && pubDate && <GoodTooltip title={`Released ${pubDate.toLocaleDateString(undefined, { year: 'numeric', month: 'short', day: 'numeric' })}`}><Chip size='sm' variant='solid' sx={isHidden ? styles.chipDisabled : { bgcolor: '#d4ff3a', color: 'black', fontWeight: 'lg' }}>new</Chip></GoodTooltip>}
{featuresChipMemo}
{seemsFree && isNotSymlink && <Chip size='sm' color='success' variant='plain' sx={isHidden ? styles.chipDisabled : styles.chipFree}>free</Chip>}
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
import { llmDevCheckModels_DEV } from '../models.mappings';
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
// Note: these model definitions are shared across Anthropic API, OpenRouter, and AWS Bedrock.
@@ -214,12 +214,13 @@ export function llmsAntInjectVariants(acc: ModelDescriptionSchema[], model: Mode
}
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean })[] = [
export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: boolean, pubDate: string /* make it required for the defs */ })[] = [
// Claude 4.7 models
{
id: 'claude-opus-4-7', // Active - 2026-04-16
label: 'Claude Opus 4.7',
pubDate: '20260416',
description: 'Most capable generally available model for complex reasoning and agentic coding',
contextWindow: 1_000_000, // 1M GA at standard pricing (no opt-in required)
maxCompletionTokens: 128000,
@@ -239,6 +240,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-6', // Active
label: 'Claude Opus 4.6',
pubDate: '20260205',
description: 'Previous most intelligent model for complex agents and coding, with adaptive thinking',
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
maxCompletionTokens: 128000,
@@ -255,6 +257,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-sonnet-4-6', // Active
label: 'Claude Sonnet 4.6',
pubDate: '20260217',
description: 'Best combination of speed and intelligence for everyday tasks',
contextWindow: 1_000_000, // 1M GA at standard pricing since 2026-03-13 (no opt-in required)
maxCompletionTokens: 128000, // docs say 64000, API reports 128000
@@ -272,6 +275,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-5-20251101', // Active
label: 'Claude Opus 4.5',
pubDate: '20251124',
description: 'Previous most intelligent model with advanced reasoning for complex agentic workflows',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -286,6 +290,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-sonnet-4-5-20250929', // Active
label: 'Claude Sonnet 4.5',
pubDate: '20250929',
description: 'Previous best combination of speed and intelligence for complex agents and coding',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -311,6 +316,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-haiku-4-5-20251001', // Active
label: 'Claude Haiku 4.5',
pubDate: '20251015',
description: 'Fastest model with exceptional speed and performance',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -324,6 +330,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-opus-4-1-20250805', // Active
label: 'Claude Opus 4.1',
pubDate: '20250805',
description: 'Exceptional model for specialized complex tasks requiring advanced reasoning',
contextWindow: 200000,
maxCompletionTokens: 32000,
@@ -338,6 +345,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-opus-4-7
id: 'claude-opus-4-20250514', // Deprecated
label: 'Claude Opus 4 [Deprecated]',
pubDate: '20250522',
description: 'Previous flagship model. Deprecated April 14, 2026, retiring June 15, 2026.',
contextWindow: 200000,
maxCompletionTokens: 32000,
@@ -351,6 +359,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // Deprecated: April 14, 2026 | Retiring: June 15, 2026 | Replacement: claude-sonnet-4-6
id: 'claude-sonnet-4-20250514', // Deprecated
label: 'Claude Sonnet 4 [Deprecated]',
pubDate: '20250522',
description: 'High-performance model. Deprecated April 14, 2026, retiring June 15, 2026.',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -379,6 +388,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-3-7-sonnet-20250219', // Retired | Deprecated: October 28, 2025 | Retired: February 19, 2026 | Replacement: claude-opus-4-6
label: 'Claude Sonnet 3.7 [Retired]',
pubDate: '20250224',
description: 'High-performance model with early extended thinking. Retired February 19, 2026.',
contextWindow: 200000,
maxCompletionTokens: 64000,
@@ -396,6 +406,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
{
id: 'claude-3-5-haiku-20241022', // Retired | Deprecated: December 19, 2025 | Retired: February 19, 2026
label: 'Claude Haiku 3.5 [Retired]',
pubDate: '20241104',
description: 'Intelligence at blazing speeds. Retired February 19, 2026.',
contextWindow: 200000,
maxCompletionTokens: 8192,
@@ -413,6 +424,7 @@ export const hardcodedAnthropicModels: (ModelDescriptionSchema & { isLegacy?: bo
hidden: true, // deprecated
id: 'claude-3-haiku-20240307', // Deprecated | Deprecated: February 19, 2026 | Retiring: April 20, 2026 | Replacement: claude-haiku-4-5-20251001
label: 'Claude Haiku 3 [Deprecated]',
pubDate: '20240313',
description: 'Fast and compact model for near-instant responsiveness. Deprecated February 19, 2026, retiring April 20, 2026.',
contextWindow: 200000,
maxCompletionTokens: 4096,
@@ -595,11 +607,13 @@ export function llmsAntCreatePlaceholderModel(model: AnthropicWire_API_Models_Li
parameterSpecs.push(...ANT_TOOLS);
const maxInputTokens = model.max_input_tokens;
const createdAt = model.created_at ? new Date(model.created_at) : undefined;
return {
id: model.id,
idVariant: '::placeholder',
label: model.display_name,
created: Math.round(new Date(model.created_at).getTime() / 1000),
created: createdAt ? Math.round(createdAt.getTime() / 1000) : undefined,
pubDate: formatPubDate(createdAt), // 0-day: use Anthropic API's created_at, or today if unset
description: 'Newest model, description not available yet.',
contextWindow: maxInputTokens ?? 200_000, // report API value as-is (no cap for unknown models)
maxCompletionTokens: model.max_tokens || 32768,
@@ -755,5 +769,5 @@ export function llmOrtAntLookup_ThinkingVariants(orModelName: string): OrtVendor
.map((spec) => ({ ...spec }));
// initialTemperature: not set - Anthropic models use the global fallback (0.5)
return { interfaces, parameterSpecs };
return { pubDate: model.pubDate, interfaces, parameterSpecs };
}
@@ -6,7 +6,7 @@ import { Release } from '~/common/app.release';
import type { ModelDescriptionSchema, OrtVendorLookupResult } from '../llm.server.types';
import { createVariantInjector, ModelVariantMap } from '../llm.server.variants';
import { llmDevCheckModels_DEV } from '../models.mappings';
import { formatPubDate, llmDevCheckModels_DEV } from '../models.mappings';
// dev options
@@ -186,7 +186,7 @@ const _knownGeminiModels: ({
symLink?: string,
deprecated?: string, // Gemini may provide deprecation dates
// _delete removed - models are now physically removed from the list instead of marked for deletion
} & Pick<ModelDescriptionSchema, 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'>)[] = [
} & Pick<ModelDescriptionSchema, 'pubDate' | 'interfaces' | 'parameterSpecs' | 'chatPrice' | 'hidden' | 'benchmark'> & { pubDate: string /* make it required */})[] = [
/// Generation 3.1
@@ -195,6 +195,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-pro-preview',
labelOverride: 'Gemini 3.1 Pro Preview',
pubDate: '20260219',
isPreview: true,
chatPrice: gemini30ProPricing, // same pricing as 3 Pro
interfaces: IF_30,
@@ -213,6 +214,7 @@ const _knownGeminiModels: ({
hidden: true, // specialized variant for custom tool prioritization
id: 'models/gemini-3.1-pro-preview-customtools',
labelOverride: 'Gemini 3.1 Pro Preview (Custom Tools)',
pubDate: '20260219',
isPreview: true,
chatPrice: gemini30ProPricing,
interfaces: IF_30,
@@ -230,6 +232,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-flash-image-preview',
labelOverride: 'Nano Banana 2',
pubDate: '20260226',
isPreview: true,
chatPrice: gemini31FlashImagePricing,
interfaces: IF_30,
@@ -247,6 +250,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3.1-flash-lite-preview',
labelOverride: 'Gemini 3.1 Flash-Lite Preview',
pubDate: '20260303',
isPreview: true,
chatPrice: gemini31FlashLitePricing,
interfaces: IF_30,
@@ -268,6 +272,7 @@ const _knownGeminiModels: ({
hidden: true, // March 9, 2026: API silently routes 'gemini-3-pro-preview' to 'gemini-3.1-pro-preview' - hide to prevent user confusion
id: 'models/gemini-3-pro-preview',
labelOverride: 'Gemini 3 Pro Preview',
pubDate: '20251118',
isPreview: true,
deprecated: '2026-03-09',
chatPrice: gemini30ProPricing,
@@ -286,6 +291,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3-pro-image-preview',
labelOverride: 'Nano Banana Pro', // Marketing name for the technical model ID
pubDate: '20251120',
isPreview: true,
chatPrice: gemini30ProImagePricing,
interfaces: IF_30,
@@ -301,6 +307,7 @@ const _knownGeminiModels: ({
{
id: 'models/nano-banana-pro-preview',
labelOverride: 'Nano Banana Pro',
pubDate: '20251120',
symLink: 'models/gemini-3-pro-image-preview',
// copied from symlink
isPreview: true,
@@ -320,6 +327,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-3-flash-preview',
labelOverride: 'Gemini 3 Flash Preview',
pubDate: '20251217',
isPreview: true,
chatPrice: gemini30FlashPricing,
interfaces: IF_30,
@@ -340,6 +348,7 @@ const _knownGeminiModels: ({
hidden: true, // outperformed by 3.1 Pro (1493) and even 3 Flash (1474) - deprecated in 2 months
id: 'models/gemini-2.5-pro',
labelOverride: 'Gemini 2.5 Pro',
pubDate: '20250617',
deprecated: '2026-06-17',
chatPrice: gemini25ProPricing,
interfaces: IF_25,
@@ -362,6 +371,7 @@ const _knownGeminiModels: ({
{
hidden: true, // single-turn-only model - unhide and just send a message to make use of this
id: 'models/gemini-2.5-pro-preview-tts',
pubDate: '20250520',
isPreview: true,
chatPrice: gemini25ProPreviewTTSPricing,
interfaces: [
@@ -379,6 +389,7 @@ const _knownGeminiModels: ({
{
id: 'models/deep-research-preview-04-2026',
labelOverride: 'Deep Research Preview (2026-04)',
pubDate: '20260421',
isPreview: true,
chatPrice: gemini25ProPricing, // pricing not explicitly listed; using 2.5 Pro as baseline
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
@@ -391,6 +402,7 @@ const _knownGeminiModels: ({
{
id: 'models/deep-research-max-preview-04-2026',
labelOverride: 'Deep Research Max Preview (2026-04)',
pubDate: '20260421',
isPreview: true,
chatPrice: gemini25ProPricing, // baseline estimate (see note above)
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
@@ -398,11 +410,12 @@ const _knownGeminiModels: ({
benchmark: undefined, // Deep research model, not benchmarkable on standard tests
},
// Deep Research Pro Preview - Released December 12, 2025
// Deep Research Pro Preview - Released December 11, 2025
{
hidden: true, // yield to newer 2026-04 models
id: 'models/deep-research-pro-preview-12-2025',
labelOverride: 'Deep Research Pro Preview',
pubDate: '20251211',
isPreview: true,
chatPrice: gemini25ProPricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Reasoning, LLM_IF_GEM_Interactions],
@@ -418,6 +431,7 @@ const _knownGeminiModels: ({
hidden: true, // outperformed by 3 Flash Preview (1474 vs 1411) - deprecated in 2 months
id: 'models/gemini-2.5-flash',
labelOverride: 'Gemini 2.5 Flash',
pubDate: '20250617',
deprecated: '2026-06-17',
chatPrice: gemini25FlashPricing,
interfaces: IF_25,
@@ -445,6 +459,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-computer-use-preview-10-2025',
labelOverride: 'Gemini 2.5 Computer Use Preview 10-2025',
pubDate: '20251007',
isPreview: true,
chatPrice: gemini25ProPricing, // Uses same pricing as 2.5 Pro (pricing page doesn't list separately)
// NOTE: sweep shows fn=['auto'] only (no 'roundtrip') - partial Fn capability, do not advertise LLM_IF_OAI_Fn
@@ -462,6 +477,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-robotics-er-1.6-preview',
labelOverride: 'Gemini Robotics-ER 1.6 Preview',
pubDate: '20260414',
isPreview: true,
chatPrice: geminiRoboticsER16Pricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_OAI_Reasoning],
@@ -474,6 +490,7 @@ const _knownGeminiModels: ({
hidden: true, // superseded by Robotics-ER 1.6 - shutdown April 30, 2026
id: 'models/gemini-robotics-er-1.5-preview',
labelOverride: 'Gemini Robotics-ER 1.5 Preview',
pubDate: '20250925',
isPreview: true,
deprecated: '2026-04-30',
chatPrice: gemini25FlashPricing, // Uses same pricing as 2.5 Flash per pricing page
@@ -486,6 +503,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-flash-image',
labelOverride: 'Nano Banana',
pubDate: '20251002',
deprecated: '2026-10-02',
chatPrice: { input: 0.30, output: undefined }, // Per pricing page: $0.30 text/image input, $0.039 per image output, but the text output is not stated
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -506,6 +524,7 @@ const _knownGeminiModels: ({
hidden: true, // audio outputs are unavailable
id: 'models/gemini-3.1-flash-tts-preview',
labelOverride: 'Gemini 3.1 Flash TTS Preview',
pubDate: '20260415',
isPreview: true,
chatPrice: gemini31FlashTTSPricing,
interfaces: [
@@ -521,6 +540,7 @@ const _knownGeminiModels: ({
{
hidden: true, // audio outputs are unavailable as of 2025-05-27
id: 'models/gemini-2.5-flash-preview-tts',
pubDate: '20250520',
isPreview: true,
chatPrice: gemini25FlashPreviewTTSPricing,
interfaces: [
@@ -548,6 +568,7 @@ const _knownGeminiModels: ({
{
id: 'models/gemini-2.5-flash-lite',
labelOverride: 'Gemini 2.5 Flash-Lite',
pubDate: '20250722',
deprecated: '2026-07-22',
chatPrice: gemini25FlashLitePricing,
interfaces: IF_25,
@@ -580,6 +601,7 @@ const _knownGeminiModels: ({
{
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-001',
pubDate: '20250205',
deprecated: '2026-06-01',
chatPrice: gemini20FlashPricing,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn, LLM_IF_GEM_CodeExecution],
@@ -588,6 +610,7 @@ const _knownGeminiModels: ({
{
hidden: true, // outclassed by all Flash models in 2.5/3.x series - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash',
pubDate: '20250205',
symLink: 'models/gemini-2.0-flash-001',
deprecated: '2026-06-01',
// copied from symlink
@@ -600,6 +623,7 @@ const _knownGeminiModels: ({
{
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-lite',
pubDate: '20250225',
chatPrice: gemini20FlashLitePricing,
symLink: 'models/gemini-2.0-flash-lite-001',
deprecated: '2026-06-01',
@@ -609,6 +633,7 @@ const _knownGeminiModels: ({
{
hidden: true, // outclassed by 2.5/3.1 Flash-Lite - shutdown in ~5 weeks
id: 'models/gemini-2.0-flash-lite-001',
pubDate: '20250225',
chatPrice: gemini20FlashLitePricing,
deprecated: '2026-06-01',
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
@@ -648,6 +673,7 @@ const _knownGeminiModels: ({
// Gemma 4 Models - Released April 2, 2026
{
id: 'models/gemma-4-31b-it',
pubDate: '20260402',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
@@ -657,6 +683,7 @@ const _knownGeminiModels: ({
{
hidden: true, // smaller MoE variant
id: 'models/gemma-4-26b-a4b-it',
pubDate: '20260402',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
parameterSpecs: [{ paramId: 'llmVndGemEffort', enumValues: ['minimal', 'high'] }],
@@ -667,6 +694,7 @@ const _knownGeminiModels: ({
// Gemma 3n Model (newer than 3, first seen on the May 2025 update)
{
id: 'models/gemma-3n-e4b-it',
pubDate: '20250626',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Free tier only according to pricing page
@@ -674,6 +702,7 @@ const _knownGeminiModels: ({
},
{
id: 'models/gemma-3n-e2b-it',
pubDate: '20250626',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Free tier only according to pricing page
@@ -685,6 +714,7 @@ const _knownGeminiModels: ({
// - LLM_IF_HOTFIX_Sys0ToUsr0, because: "Developer instruction is not enabled for models/gemma-3-27b-it"
{
id: 'models/gemma-3-27b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree, // Pricing page indicates free tier only
@@ -694,6 +724,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-12b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -702,6 +733,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-4b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -710,6 +742,7 @@ const _knownGeminiModels: ({
{
hidden: true, // keep larger model
id: 'models/gemma-3-1b-it',
pubDate: '20250312',
isPreview: true,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_HOTFIX_StripImages, LLM_IF_HOTFIX_Sys0ToUsr0],
chatPrice: geminiExpFree,
@@ -948,6 +981,7 @@ export function geminiModelToModelDescription(geminiModel: GeminiWire_API_Models
label: label,
// created: ...
// updated: ...
pubDate: knownModel?.pubDate ?? formatPubDate(), // 0-day fallback; the editorial entry is the source of truth; today's date is a placeholder until editorial catches up
description: descriptionLong,
contextWindow: contextWindow,
maxCompletionTokens: outputTokenLimit,
@@ -1035,5 +1069,5 @@ export function llmOrtGemLookup(orModelName: string): OrtVendorLookupResult | un
?.filter(spec => _ORT_GEM_PARAM_ALLOWLIST.has(spec.paramId))
.map(spec => ({ ...spec }));
return { interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
return { pubDate: knownModel.pubDate, interfaces, parameterSpecs, initialTemperature: GEMINI_DEFAULT_TEMPERATURE };
}
@@ -137,6 +137,7 @@ export const ModelDescription_schema = z.object({
label: z.string(),
created: z.int().optional(),
updated: z.int().optional(),
pubDate: z.string().regex(/^\d{8}$/).optional(), // editorial: model's official public release date 'YYYYMMDD'. Required for editorial entries (KnownModelEditorial) and for 0-day-fillable paths (Anthropic placeholder, Gemini unknown-model fallback). Omitted for dynamic-only vendors and unknown variants where we have no reliable signal.
description: z.string(),
contextWindow: z.int().nullable(),
interfaces: z.array(z.enum(LLMS_ALL_INTERFACES).or(z.string())), // backward compatibility: to not Break client-side interface parsing on newer server
@@ -155,6 +156,7 @@ export const ModelDescription_schema = z.object({
// Each vendor's lookup filters to only what works through OpenRouter's OAI-compatible API.
// OpenRouter merges these with its own auto-detected interfaces and params.
export type OrtVendorLookupResult = {
pubDate?: ModelDescriptionSchema['pubDate'];
interfaces?: ModelDescriptionSchema['interfaces'];
parameterSpecs?: ModelDescriptionSchema['parameterSpecs'];
initialTemperature?: number; // vendor-specific default (e.g. Gemini 1.0); undefined = use global fallback (0.5)
@@ -111,6 +111,28 @@ export function llmDevValidateParameterSpecs_DEV(model: ModelDescriptionSchema):
}
// -- pubDate helpers --
/**
* Format an epoch / Date / nothing as 'YYYYMMDD'.
* Accepts either a Unix epoch (seconds), a Date, or undefined (-> today).
*/
export function formatPubDate(input?: number | Date): string {
let date: Date;
if (input instanceof Date && Number.isFinite(input.getTime()))
date = input;
else if (typeof input === 'number' && Number.isFinite(input) && input > 0) {
const candidate = new Date(input * 1000);
date = Number.isFinite(candidate.getTime()) ? candidate : new Date();
} else
date = new Date();
const y = date.getUTCFullYear();
const m = String(date.getUTCMonth() + 1).padStart(2, '0');
const d = String(date.getUTCDate()).padStart(2, '0');
return `${y}${m}${d}`;
}
// -- Manual model mappings: types and helper --
export type ManualMappings = (KnownModel | KnownLink)[];
@@ -224,6 +246,7 @@ export function fromManualMapping(mappings: (KnownModel | KnownLink)[], upstream
};
// apply optional fields
if (m.pubDate) md.pubDate = m.pubDate;
if (m.parameterSpecs) md.parameterSpecs = m.parameterSpecs;
if (m.maxCompletionTokens) md.maxCompletionTokens = m.maxCompletionTokens;
if (m.benchmark) md.benchmark = m.benchmark;
@@ -20,6 +20,7 @@ const _knownDeepseekChatModels: ManualMappings = [
{
idPrefix: 'deepseek-v4-pro',
label: 'DeepSeek V4 Pro',
pubDate: '20260424',
description: 'Premium reasoning model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
contextWindow: 1_048_576, // 1M
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
@@ -33,6 +34,7 @@ const _knownDeepseekChatModels: ManualMappings = [
{
idPrefix: 'deepseek-v4-flash',
label: 'DeepSeek V4 Flash',
pubDate: '20260424',
description: 'Fast general-purpose model with 1M context. Supports extended thinking modes, JSON output, and function calling.',
contextWindow: 1_048_576, // 1M
interfaces: [...IF_4, LLM_IF_OAI_Reasoning],
@@ -23,6 +23,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'meta-llama/llama-4-scout-17b-16e-instruct',
label: 'Llama 4 Scout · 17B × 16E (Preview)',
pubDate: '20250405',
description: 'Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -33,6 +34,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'qwen/qwen3-32b',
label: 'Qwen 3 · 32B (Preview)',
pubDate: '20250428',
description: 'Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 40960,
@@ -43,6 +45,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'moonshotai/kimi-k2-instruct-0905',
label: 'Kimi K2 Instruct 0905 (Preview)',
pubDate: '20250905',
description: 'Kimi K2 1T MoE model (32B active, 384 experts). Advanced agentic coding. 262K context, 16K max output. ~200 t/s on Groq.',
contextWindow: 262144,
maxCompletionTokens: 16384,
@@ -53,6 +56,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'moonshotai/kimi-k2-instruct',
label: 'Kimi K2 Instruct (Deprecated)',
pubDate: '20250711',
symLink: 'moonshotai/kimi-k2-instruct-0905',
contextWindow: 131072, // API returns 131K (vs 262K for the 0905 version)
maxCompletionTokens: 16384,
@@ -69,6 +73,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'groq/compound',
label: 'Compound (Agentic System)',
pubDate: '20250904',
description: 'Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -78,6 +83,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'groq/compound-mini',
label: 'Compound Mini (Agentic System)',
pubDate: '20250904',
description: 'Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.',
contextWindow: 131072,
maxCompletionTokens: 8192,
@@ -89,6 +95,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'openai/gpt-oss-120b',
label: 'GPT OSS 120B',
pubDate: '20250805',
description: 'OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -99,6 +106,7 @@ const _knownGroqModels: ManualMappings = [
isPreview: true,
idPrefix: 'openai/gpt-oss-safeguard-20b',
label: 'GPT OSS Safeguard 20B (Preview)',
pubDate: '20251029',
description: 'OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -108,6 +116,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'openai/gpt-oss-20b',
label: 'GPT OSS 20B',
pubDate: '20250805',
description: 'OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 65536,
@@ -120,6 +129,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'llama-3.3-70b-versatile',
label: 'Llama 3.3 · 70B Versatile',
pubDate: '20241206',
description: 'Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 32768,
@@ -129,6 +139,7 @@ const _knownGroqModels: ManualMappings = [
{
idPrefix: 'llama-3.1-8b-instant',
label: 'Llama 3.1 · 8B Instant',
pubDate: '20240723',
description: 'Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.',
contextWindow: 131072,
maxCompletionTokens: 131072,
@@ -22,6 +22,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.7',
label: 'MiniMax M2.7',
pubDate: '20260318',
description: 'Latest flagship with recursive self-improvement and agentic capabilities. 200K context, 131K max output. ~60 t/s.',
contextWindow: 204800,
maxCompletionTokens: 131072,
@@ -31,6 +32,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.7-highspeed',
label: 'MiniMax M2.7 (Highspeed)',
pubDate: '20260318',
description: 'Faster M2.7 variant at ~100 t/s. 200K context, 131K max output.',
contextWindow: 204800,
maxCompletionTokens: 131072,
@@ -42,6 +44,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.5',
label: 'MiniMax M2.5',
pubDate: '20260212',
description: 'Strong coding and reasoning, best value. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -51,6 +54,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.5-highspeed',
label: 'MiniMax M2.5 (Highspeed)',
pubDate: '20260212',
description: 'Faster M2.5 variant at ~100 t/s. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -62,6 +66,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2-her',
label: 'MiniMax M2-her',
pubDate: '20260127',
description: 'Dialogue-first model for immersive roleplay, character-driven chat, and expressive multi-turn conversations. 64K context.',
contextWindow: 65536,
maxCompletionTokens: 2048,
@@ -73,6 +78,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.1',
label: 'MiniMax M2.1',
pubDate: '20251223',
description: '230B params (10B active), multilingual coding. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -83,6 +89,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2.1-highspeed',
label: 'MiniMax M2.1 (Highspeed)',
pubDate: '20251223',
description: 'Faster M2.1 variant. 200K context, 65K max output.',
contextWindow: 204800,
maxCompletionTokens: 65536,
@@ -95,6 +102,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M2',
label: 'MiniMax M2',
pubDate: '20251027',
description: '230B params (10B active), agentic and reasoning. 200K context, 128K max output.',
contextWindow: 204800,
maxCompletionTokens: 128000,
@@ -107,6 +115,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-M1',
label: 'MiniMax M1',
pubDate: '20250616',
description: '456B total / 45.9B active MoE with lightning attention. 1M context, 40K max output.',
contextWindow: 1000000,
maxCompletionTokens: 40000,
@@ -119,6 +128,7 @@ const _knownMiniMaxModels: ModelDescriptionSchema[] = [
{
id: 'MiniMax-01',
label: 'MiniMax 01',
pubDate: '20250114',
description: 'Legacy flagship. 1M context.',
contextWindow: 1000192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -19,80 +19,81 @@ const DEV_DEBUG_MISTRAL_MODELS = Release.IsNodeDevBuild; // not in staging to re
const _knownMistralModelDetails: Record<string, {
label?: string; // override the API-provided name
pubDate?: string; // YYYYMMDD - earliest public availability (announcement / La Plateforme / HF upload)
chatPrice?: { input: number; output: number };
benchmark?: { cbaElo: number };
hidden?: boolean;
}> = {
// Premier models - Mistral 3 (Dec 2025)
'mistral-large-2512': { chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
'mistral-large-2411': { chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
'mistral-large-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
'mistral-large-2512': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, benchmark: { cbaElo: 1415 } }, // Mistral Large 3 - MoE 41B active / 675B total
'mistral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, benchmark: { cbaElo: 1305 }, hidden: true }, // older version
'mistral-large-latest': { pubDate: '20251202', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // → 2512
'mistral-medium-2508': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3
'mistral-medium-2505': { chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // older version
'mistral-medium-latest': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
'mistral-medium': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-medium-2508': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1410 } }, // Mistral Medium 3.1
'mistral-medium-2505': { pubDate: '20250507', chatPrice: { input: 0.4, output: 2 }, benchmark: { cbaElo: 1387 }, hidden: true }, // Mistral Medium 3
'mistral-medium-latest': { pubDate: '20250812', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // → 2508
'mistral-medium': { pubDate: '20231211', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink (legacy: original Mistral Medium prototype on La Plateforme beta)
'magistral-medium-2509': { chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
'magistral-medium-latest': { chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
'magistral-medium-2509': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, benchmark: { cbaElo: 1304 } }, // reasoning (leaderboard: magistral-medium-2506 = 1304)
'magistral-medium-latest': { pubDate: '20250917', chatPrice: { input: 2, output: 5 }, hidden: true }, // symlink
'devstral-2512': { label: 'Devstral 2 (2512)', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
'devstral-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'devstral-medium-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
'devstral-medium-2507': { chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
'devstral-2512': { label: 'Devstral 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 } }, // Devstral 2 - 123B coding agents (API returns "Mistral Vibe Cli")
'devstral-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'devstral-medium-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // symlink
'mistral-vibe-cli-latest': { label: 'Devstral 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // alternate ID for devstral-latest
'devstral-medium-2507': { pubDate: '20250710', chatPrice: { input: 0.4, output: 2 }, hidden: true }, // older version
'mistral-large-pixtral-2411': { chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
'pixtral-large-2411': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'pixtral-large-latest': { chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'mistral-large-pixtral-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 } }, // Pixtral Large (alternate ID)
'pixtral-large-2411': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'pixtral-large-latest': { pubDate: '20241118', chatPrice: { input: 2, output: 6 }, hidden: true }, // symlink
'codestral-2508': { chatPrice: { input: 0.3, output: 0.9 } }, // code generation
'codestral-latest': { chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
'codestral-2508': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 } }, // code generation (Codestral 25.08)
'codestral-latest': { pubDate: '20250730', chatPrice: { input: 0.3, output: 0.9 }, hidden: true }, // symlink
'voxtral-small-2507': { chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
'voxtral-small-latest': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'voxtral-small-2507': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 } }, // voice (text tokens)
'voxtral-small-latest': { pubDate: '20250715', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'voxtral-mini-2507': { chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
'voxtral-mini-latest': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
'voxtral-mini-2507': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 } }, // voice (text tokens)
'voxtral-mini-latest': { pubDate: '20250715', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // symlink
// Ministral 3 family (Dec 2025) - multimodal, multilingual, Apache 2.0
'ministral-14b-2512': { chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
'ministral-14b-latest': { chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
'ministral-14b-2512': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 } }, // Ministral 3 14B
'ministral-14b-latest': { pubDate: '20251202', chatPrice: { input: 0.2, output: 0.2 }, hidden: true }, // symlink
'ministral-8b-2512': { chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
'ministral-8b-2410': { chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
'ministral-8b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'ministral-8b-2512': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 } }, // Ministral 3 8B
'ministral-8b-2410': { pubDate: '20241016', chatPrice: { input: 0.1, output: 0.1 }, benchmark: { cbaElo: 1237 }, hidden: true }, // older version
'ministral-8b-latest': { pubDate: '20251202', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'ministral-3b-2512': { chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
'ministral-3b-2410': { chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
'ministral-3b-latest': { chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
'ministral-3b-2512': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 } }, // Ministral 3 3B
'ministral-3b-2410': { pubDate: '20241016', chatPrice: { input: 0.04, output: 0.04 }, hidden: true }, // older version
'ministral-3b-latest': { pubDate: '20251202', chatPrice: { input: 0.1, output: 0.1 }, hidden: true }, // symlink
// Open models
'mistral-small-2603': { chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
'mistral-small-2506': { chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
'mistral-small-latest': { chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
'mistral-small-2603': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 } }, // Mistral Small 4 - 119B hybrid (instruct+reasoning+coding), 256k ctx
'mistral-small-2506': { pubDate: '20250620', chatPrice: { input: 0.1, output: 0.3 }, benchmark: { cbaElo: 1357 }, hidden: true }, // Mistral Small 3.2
'mistral-small-latest': { pubDate: '20260316', chatPrice: { input: 0.15, output: 0.6 }, hidden: true }, // → 2603
'labs-mistral-small-creative': { label: 'Mistral Small Creative', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
'labs-mistral-small-creative': { label: 'Mistral Small Creative', pubDate: '20251211', chatPrice: { input: 0.1, output: 0.3 } }, // creative writing, roleplay (Labs)
'labs-leanstral-2603': { label: 'Leanstral (2603)', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
'labs-leanstral-2603': { label: 'Leanstral (2603)', pubDate: '20260316', chatPrice: { input: 0, output: 0 } }, // Lean 4 formal proof engineering (Labs, free for limited period)
'magistral-small-2509': { chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
'magistral-small-latest': { chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
'magistral-small-2509': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 } }, // reasoning
'magistral-small-latest': { pubDate: '20250917', chatPrice: { input: 0.5, output: 1.5 }, hidden: true }, // symlink
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
'devstral-small-2507': { chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'labs-devstral-small-2512': { label: 'Devstral Small 2 (2512)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 } }, // Devstral Small 2 - 24B coding agents (Labs)
'devstral-small-2507': { pubDate: '20250710', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // older version (Devstral Small 1.1)
'devstral-small-latest': { label: 'Devstral Small 2 (latest)', pubDate: '20251209', chatPrice: { input: 0.1, output: 0.3 }, hidden: true }, // symlink
'pixtral-12b-2409': { chatPrice: { input: 0.15, output: 0.15 } }, // vision
'pixtral-12b-latest': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b-2409': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 } }, // vision
'pixtral-12b-latest': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'pixtral-12b': { pubDate: '20240911', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'open-mistral-nemo-2407': { chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
'open-mistral-nemo': { chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
'open-mistral-nemo-2407': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 } }, // NeMo
'open-mistral-nemo': { pubDate: '20240718', chatPrice: { input: 0.15, output: 0.15 }, hidden: true }, // symlink
// Legacy (kept for reference, no longer in API)
'open-mistral-7b': { chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
'open-mistral-7b': { pubDate: '20230927', chatPrice: { input: 0.25, output: 0.25 }, hidden: true },
};
@@ -28,7 +28,8 @@ const _PS_Reasoning: ModelDescriptionSchema['parameterSpecs'] = [
* Moonshot AI (Kimi) models.
* - models list and pricing: https://platform.kimi.ai/docs/pricing/chat (was platform.moonshot.ai - now 301 redirect)
* - API docs: https://platform.kimi.ai/docs/api/chat
* - updated: 2026-04-20
* - updated: 2026-05-04
* - NOTE: K2 series (non-2.5/2.6) is scheduled for discontinuation on 2026-05-25 per Moonshot docs.
*/
const _knownMoonshotModels: ManualMappings = [
@@ -36,6 +37,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2.6',
label: 'Kimi K2.6',
pubDate: '20260420',
description: 'Native multimodal flagship (text, image, video inputs) with thinking and non-thinking modes. Stronger long-form coding, improved instruction compliance and self-correction. 256K context.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -49,6 +51,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2.5',
label: 'Kimi K2.5',
pubDate: '20260127',
description: 'Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -58,12 +61,13 @@ const _knownMoonshotModels: ManualMappings = [
benchmark: { cbaElo: 1451 }, // kimi-k2.5-thinking
},
// Kimi K2 Series - Latest Models
// Kimi K2 Series - scheduled for discontinuation on 2026-05-25
// Fast, Thinking
{
idPrefix: 'kimi-k2-thinking-turbo',
label: 'Kimi K2 Thinking Turbo',
pubDate: '20251106',
description: 'High-speed reasoning model with advanced thinking and tool calling capabilities. Faster inference (~50 tok/s) with optimized performance. 256K context. Temperature 1.0 recommended.',
contextWindow: 262144,
maxCompletionTokens: 65536,
@@ -76,6 +80,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-thinking',
label: 'Kimi K2 Thinking',
pubDate: '20251106',
description: 'Advanced reasoning model with multi-step thinking and autonomous tool calling (200-300 sequential calls). Interleaves chain-of-thought with tool use. 256K context. Temperature 1.0 recommended.',
contextWindow: 262144,
maxCompletionTokens: 65536,
@@ -89,6 +94,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-0905-preview',
label: 'Kimi K2 0905 (Preview)',
pubDate: '20250905',
description: 'State-of-the-art MoE model (1T total, 32B active) with extended 256K context. Enhanced agentic coding intelligence and improved instruction following.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -102,6 +108,7 @@ const _knownMoonshotModels: ManualMappings = [
hidden: true,
idPrefix: 'kimi-k2-0711-preview',
label: 'Kimi K2 0711 (Preview)',
pubDate: '20250711',
description: 'Earlier preview variant with 128K context. Superseded by 0905 version.',
contextWindow: 131072,
maxCompletionTokens: 16384,
@@ -114,6 +121,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'kimi-k2-turbo-preview',
label: 'Kimi K2 Turbo (Preview)',
pubDate: '20250801',
description: 'High-speed variant with 60-100 tokens/second output. 256K context. Optimized for real-time applications and agentic tasks.',
contextWindow: 262144,
maxCompletionTokens: 32768,
@@ -127,6 +135,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-128k',
label: 'V1 128K',
pubDate: '20240206',
description: 'Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -136,6 +145,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-32k',
label: 'V1 32K',
pubDate: '20240206',
description: 'Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -145,6 +155,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-8k',
label: 'V1 8K',
pubDate: '20240206',
description: 'Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.',
contextWindow: 8192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -157,6 +168,7 @@ const _knownMoonshotModels: ManualMappings = [
// hidden: false, not hidden - only non-hidden vision for now
idPrefix: 'moonshot-v1-128k-vision-preview',
label: 'V1 128K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -166,6 +178,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-32k-vision-preview',
label: 'V1 32K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -176,6 +189,7 @@ const _knownMoonshotModels: ManualMappings = [
{
idPrefix: 'moonshot-v1-8k-vision-preview',
label: 'V1 8K Vision (Preview)',
pubDate: '20250115',
description: 'Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision for production.',
contextWindow: 8192,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -111,6 +111,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.5-2026-04-23',
label: 'GPT-5.5 (2026-04-23)',
pubDate: '20260423',
description: 'New baseline for complex production workflows. Stronger task execution, more precise tool use, more efficient reasoning with fewer tokens. 1M token context.',
contextWindow: 1050000,
maxCompletionTokens: 128000,
@@ -136,6 +137,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.5-pro-2026-04-23',
label: 'GPT-5.5 Pro (2026-04-23)',
pubDate: '20260423',
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on the hardest problems.',
contextWindow: 1050000,
maxCompletionTokens: 272000,
@@ -163,6 +165,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-2026-03-05',
label: 'GPT-5.4 (2026-03-05)',
pubDate: '20260305',
description: 'Most capable and efficient frontier model for professional work. Native computer use, improved reasoning, coding, and agentic workflows with 1M token context.',
contextWindow: 1050000,
maxCompletionTokens: 128000,
@@ -188,6 +191,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-pro-2026-03-05',
label: 'GPT-5.4 Pro (2026-03-05)',
pubDate: '20260305',
description: 'Most capable model for complex tasks. Uses more compute for smarter, more precise responses on difficult problems.',
contextWindow: 1050000,
maxCompletionTokens: 272000,
@@ -212,6 +216,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-mini-2026-03-17',
label: 'GPT-5.4 Mini (2026-03-17)',
pubDate: '20260317',
description: 'Strongest mini model for coding, computer use, and subagents. GPT-5.4-class intelligence at lower cost and latency.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -237,6 +242,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.4-nano-2026-03-17',
label: 'GPT-5.4 Nano (2026-03-17)',
pubDate: '20260317',
description: 'Cheapest GPT-5.4-class model for simple high-volume tasks like classification and data extraction.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -265,6 +271,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-5.3-codex',
label: 'GPT-5.3 Codex',
pubDate: '20260205',
description: 'Most capable agentic coding model. Combines frontier coding performance of GPT-5.2-Codex with reasoning and professional knowledge of GPT-5.2. ~25% faster.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -285,6 +292,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // Research preview, ChatGPT Pro only - API access limited to design partners
idPrefix: 'gpt-5.3-codex-spark',
label: 'GPT-5.3 Codex Spark',
pubDate: '20260212',
description: 'Text-only research preview optimized for real-time coding iteration. Delivers 1000+ tokens/sec on low-latency hardware.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -297,10 +305,11 @@ export const _knownOpenAIChatModels: ManualMappings = [
// benchmark: TBD
},
// GPT-5.3 Chat Latest - Released March 4, 2026
// GPT-5.3 Chat Latest - Released March 3, 2026
{
idPrefix: 'gpt-5.3-chat-latest',
label: 'GPT-5.3 Instant',
pubDate: '20260303',
description: 'GPT-5.3 model powering ChatGPT. Points to the GPT-5.3 Instant snapshot currently used in ChatGPT.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -322,6 +331,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.2-2025-12-11',
label: 'GPT-5.2 (2025-12-11)',
pubDate: '20251211',
description: 'Most capable model for professional work and long-running agents. Improvements in general intelligence, long-context, agentic tool-calling, and vision.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -349,6 +359,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.2-codex',
label: 'GPT-5.2 Codex',
pubDate: '20251211',
description: 'GPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar environments. Supports low, medium, high, and xhigh reasoning effort settings.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -368,6 +379,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Instant
idPrefix: 'gpt-5.2-chat-latest',
label: 'GPT-5.2 Instant',
pubDate: '20251211',
description: 'GPT-5.2 model powering ChatGPT. Fast, capable for everyday work with clear improvements in info-seeking, how-tos, technical writing.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -387,6 +399,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5.2-pro-2025-12-11',
label: 'GPT-5.2 Pro (2025-12-11)',
pubDate: '20251211',
description: 'Smartest and most trustworthy option for difficult questions. Uses more compute for harder thinking on complex domains like programming.',
contextWindow: 400000,
maxCompletionTokens: 272000,
@@ -416,6 +429,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5.1-2025-11-13',
label: 'GPT-5.1 (2025-11-13)',
pubDate: '20251113',
description: 'The best model for coding and agentic tasks with configurable reasoning effort.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -442,6 +456,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Instant
idPrefix: 'gpt-5.1-chat-latest',
label: 'GPT-5.1 Instant',
pubDate: '20251112',
description: 'GPT-5.1 Instant with adaptive reasoning. More conversational with improved instruction following.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -462,6 +477,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex-max',
label: 'GPT-5.1 Codex Max',
pubDate: '20251119',
description: 'Our most intelligent coding model optimized for long-horizon, agentic coding tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -480,6 +496,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex',
label: 'GPT-5.1 Codex',
pubDate: '20251113',
description: 'A version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -498,6 +515,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.3 Codex
idPrefix: 'gpt-5.1-codex-mini',
label: 'GPT-5.1 Codex Mini',
pubDate: '20251113',
description: 'Smaller, faster version of GPT-5.1 Codex for efficient coding tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -520,6 +538,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4/5.5
idPrefix: 'gpt-5-2025-08-07',
label: 'GPT-5 (2025-08-07)',
pubDate: '20250807',
description: 'The best model for coding and agentic tasks across domains.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -546,6 +565,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4/5.5 Pro
idPrefix: 'gpt-5-pro-2025-10-06',
label: 'GPT-5 Pro (2025-10-06)',
pubDate: '20251006',
description: 'Version of GPT-5 that uses more compute to produce smarter and more precise responses. Designed for tough problems.',
contextWindow: 400000,
maxCompletionTokens: 272000,
@@ -566,6 +586,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // deprecated per OpenAI docs (2026-04)
idPrefix: 'gpt-5-chat-latest',
label: 'GPT-5 ChatGPT (Non-Thinking)',
pubDate: '20250807',
description: 'GPT-5 model used in ChatGPT. Points to the GPT-5 snapshot currently used in ChatGPT.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -580,6 +601,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // deprecated per OpenAI docs (2026-04), superseded by gpt-5.1-codex/gpt-5.3-codex
idPrefix: 'gpt-5-codex',
label: 'GPT-5 Codex',
pubDate: '20250915',
description: 'A version of GPT-5 optimized for agentic coding in Codex.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -599,6 +621,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // poor quality - use llmVndOaiWebSearchContext on regular models instead
idPrefix: 'gpt-5-search-api-2025-10-14',
label: 'GPT-5 Search API (2025-10-14)',
pubDate: '20251014',
description: 'Updated web search model in Chat Completions API. 60% cheaper with domain filtering support.',
contextWindow: 400000,
maxCompletionTokens: 100000,
@@ -619,6 +642,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4 Mini
idPrefix: 'gpt-5-mini-2025-08-07',
label: 'GPT-5 Mini (2025-08-07)',
pubDate: '20250807',
description: 'A faster, more cost-efficient version of GPT-5 for well-defined tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -639,6 +663,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT-5.4 Nano
idPrefix: 'gpt-5-nano-2025-08-07',
label: 'GPT-5 Nano (2025-08-07)',
pubDate: '20250807',
description: 'Fastest, most cost-efficient version of GPT-5 for summarization and classification tasks.',
contextWindow: 400000,
maxCompletionTokens: 128000,
@@ -679,6 +704,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // UNSUPPORTED YET
idPrefix: 'computer-use-preview-2025-03-11',
label: 'Computer Use Preview (2025-03-11)',
pubDate: '20250311',
description: 'Specialized model for computer use tool. Optimized for computer interaction capabilities.',
contextWindow: 8192,
maxCompletionTokens: 1024,
@@ -700,6 +726,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o4-mini-deep-research-2025-06-26',
label: 'o4 Mini Deep Research [Deprecated]',
pubDate: '20250626',
isLegacy: true,
description: 'Faster, more affordable deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 with web search.]',
contextWindow: 200000,
@@ -718,6 +745,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o4-mini-2025-04-16',
label: 'o4 Mini [Deprecated]',
pubDate: '20250416',
isLegacy: true,
description: 'Latest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
contextWindow: 200000,
@@ -737,6 +765,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-deep-research-2025-06-26',
label: 'o3 Deep Research [Deprecated]',
pubDate: '20250626',
isLegacy: true,
description: 'Our most powerful deep research model for complex, multi-step research tasks. [Shutdown: 2026-07-23 - migrate to GPT-5.5 Pro with web search.]',
contextWindow: 200000,
@@ -755,6 +784,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-pro-2025-06-10',
label: 'o3 Pro (2025-06-10)',
pubDate: '20250610',
description: 'Version of o3 with more compute for better responses. Provides consistently better answers for complex tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -773,6 +803,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-2025-04-16',
label: 'o3 (2025-04-16)',
pubDate: '20250416',
description: 'A well-rounded and powerful model across domains. Sets a new standard for math, science, coding, and visual reasoning tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -791,6 +822,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o3-mini-2025-01-31',
label: 'o3 Mini [Deprecated]',
pubDate: '20250131',
isLegacy: true,
description: 'Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Mini.]',
contextWindow: 200000,
@@ -811,6 +843,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true,
idPrefix: 'o1-pro-2025-03-19',
label: 'o1 Pro (2025-03-19)',
pubDate: '20250319',
description: 'A version of o1 with more compute for better responses. Provides consistently better answers for complex tasks.',
contextWindow: 200000,
maxCompletionTokens: 100000,
@@ -829,6 +862,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'o1-2024-12-17',
label: 'o1 [Deprecated]',
pubDate: '20241217',
isLegacy: true,
description: 'Previous full o-series reasoning model. [Shutdown: 2026-10-23 - migrate to GPT-5.5 or o3.]',
contextWindow: 200000,
@@ -851,6 +885,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4.1-2025-04-14',
label: 'GPT-4.1 (2025-04-14)',
pubDate: '20250414',
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -868,6 +903,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4.1-mini-2025-04-14',
label: 'GPT-4.1 Mini (2025-04-14)',
pubDate: '20250414',
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency by nearly half and cost by 83%.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -885,6 +921,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4.1-nano-2025-04-14',
label: 'GPT-4.1 Nano [Deprecated]',
pubDate: '20250414',
isLegacy: true,
description: 'Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion. [Shutdown: 2026-10-23 - migrate to GPT-5.4 Nano.]',
contextWindow: 1047576,
@@ -906,6 +943,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-1.5',
label: 'GPT Audio 1.5',
pubDate: '20260224',
description: 'Best voice model for audio in, audio out with Chat Completions. Accepts audio inputs and outputs.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -919,6 +957,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // superseded by GPT Audio 1.5
idPrefix: 'gpt-audio-2025-08-28',
label: 'GPT Audio (2025-08-28)',
pubDate: '20250828',
description: 'First generally available audio model. Accepts audio inputs and outputs, and can be used in the Chat Completions REST API.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -935,6 +974,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-mini-2025-12-15',
label: 'GPT Audio Mini (2025-12-15)',
pubDate: '20251215',
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -944,6 +984,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-audio-mini-2025-10-06',
label: 'GPT Audio Mini (2025-10-06)',
pubDate: '20251006',
hidden: true, // previous version
description: 'Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.',
contextWindow: 128000,
@@ -966,6 +1007,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-11-20',
label: 'GPT-4o (2024-11-20)',
pubDate: '20241120',
description: 'Snapshot of gpt-4o from November 20th, 2024.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -976,6 +1018,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-08-06',
label: 'GPT-4o (2024-08-06)',
pubDate: '20240806',
hidden: true, // previous version
description: 'Snapshot that supports Structured Outputs. gpt-4o currently points to this version.',
contextWindow: 128000,
@@ -987,6 +1030,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-2024-05-13',
label: 'GPT-4o (2024-05-13)',
pubDate: '20240513',
hidden: true, // previous version
description: 'Original gpt-4o snapshot from May 13, 2024.',
contextWindow: 128000,
@@ -1007,6 +1051,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-search-preview-2025-03-11',
label: 'GPT-4o Search Preview (2025-03-11)',
pubDate: '20250311',
description: 'Latest snapshot of the GPT-4o model optimized for web search capabilities.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1027,6 +1072,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-audio-preview-2025-06-03',
label: 'GPT-4o Audio Preview (2025-06-03)',
pubDate: '20250603',
description: 'Latest snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1039,6 +1085,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-audio-preview-2024-12-17',
label: 'GPT-4o Audio Preview (2024-12-17)',
pubDate: '20241217',
description: 'Snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1057,6 +1104,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4o-mini-2024-07-18',
label: 'GPT-4o Mini (2024-07-18)',
pubDate: '20240718',
description: 'Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more capable than GPT-3.5 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1073,6 +1121,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // UNSUPPORTED yet (audio output model)
idPrefix: 'gpt-4o-mini-audio-preview-2024-12-17',
label: 'GPT-4o Mini Audio Preview (2024-12-17)',
pubDate: '20241217',
description: 'Snapshot for the Audio API model.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1091,6 +1140,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
hidden: true, // old
idPrefix: 'gpt-4o-mini-search-preview-2025-03-11',
label: 'GPT-4o Mini Search Preview (2025-03-11)',
pubDate: '20250311',
description: 'Latest snapshot of the GPT-4o Mini model optimized for web search capabilities.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -1110,6 +1160,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-turbo-2024-04-09',
label: 'GPT-4 Turbo (2024-04-09)',
pubDate: '20240409',
hidden: true, // OLD
description: 'GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently points to this version.',
contextWindow: 128000,
@@ -1126,6 +1177,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0125-preview',
label: 'GPT-4 Turbo (0125)',
pubDate: '20240125',
hidden: true, // OLD
description: 'GPT-4 Turbo preview model intended to reduce cases of "laziness" where the model doesn\'t complete a task.',
contextWindow: 128000,
@@ -1137,6 +1189,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-1106-preview', // GPT-4 Turbo preview model
label: 'GPT-4 Turbo (1106)',
pubDate: '20231106',
hidden: true, // OLD
description: 'GPT-4 Turbo preview model featuring improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
contextWindow: 128000,
@@ -1156,6 +1209,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0613',
label: 'GPT-4 (0613)',
pubDate: '20230613',
hidden: true, // OLD
description: 'Snapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.',
contextWindow: 8192,
@@ -1167,6 +1221,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-4-0314',
label: 'GPT-4 (0314)',
pubDate: '20230314',
hidden: true, // OLD
description: 'Snapshot of gpt-4 from March 14th 2023 with function calling data. Data up to Sep 2021.',
contextWindow: 8192,
@@ -1189,6 +1244,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-3.5-turbo-0125',
label: '3.5-Turbo (2024-01-25)',
pubDate: '20240125',
hidden: true, // OLD
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.',
contextWindow: 16385,
@@ -1200,6 +1256,7 @@ export const _knownOpenAIChatModels: ManualMappings = [
{
idPrefix: 'gpt-3.5-turbo-1106',
label: '3.5-Turbo (1106)',
pubDate: '20231106',
hidden: true, // OLD
description: 'GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.',
contextWindow: 16385,
@@ -1559,5 +1616,5 @@ export function llmOrtOaiLookup(orModelName: string): OrtVendorLookupResult | un
// initialTemperature: not set - OpenAI models use the global fallback (0.5);
// NoTemperature models are handled client-side via LLM_IF_HOTFIX_NoTemperature (not propagated to OR)
return { interfaces, parameterSpecs };
return { interfaces, parameterSpecs, pubDate: entry.pubDate };
}
@@ -12,6 +12,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4.1-2025-04-14',
label: '💾➜ GPT-4.1 (2025-04-14)',
pubDate: '20250414',
description: 'Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -22,6 +23,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4.1-mini-2025-04-14',
label: '💾➜ GPT-4.1 Mini (2025-04-14)',
pubDate: '20250414',
description: 'Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency and cost.',
contextWindow: 1047576,
maxCompletionTokens: 32768,
@@ -32,6 +34,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4o-mini-2024-07-18',
label: '💾➜ GPT-4o Mini (2024-07-18)',
pubDate: '20240718',
description: 'Affordable model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -41,6 +44,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-4o-2024-08-06',
label: '💾➜ GPT-4o (2024-08-06)',
pubDate: '20240806',
description: 'Advanced, multimodal flagship model that\'s cheaper and faster than GPT-4 Turbo.',
contextWindow: 128000,
maxCompletionTokens: 16384,
@@ -51,6 +55,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gpt-3.5-turbo-0125',
label: '💾➜ GPT-3.5 Turbo (0125)',
pubDate: '20240125',
description: 'The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats',
contextWindow: 16385,
maxCompletionTokens: 4096,
@@ -63,6 +68,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gemini-1.0-pro-001',
label: '💾➜ Gemini 1.0 Pro',
pubDate: '20240215',
description: 'Google\'s Gemini 1.0 Pro model',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn],
@@ -70,6 +76,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'gemini-1.5-flash-001',
label: '💾➜ Gemini 1.5 Flash',
pubDate: '20240514',
description: 'Google\'s Gemini 1.5 Flash model - fast and efficient',
contextWindow: 1000000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_OAI_Fn],
@@ -79,6 +86,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Meta-Llama-3.1-8B-Instruct',
label: '💾 Llama 3.1 · 8B Instruct',
pubDate: '20240723',
description: 'Meta Llama 3.1 8B Instruct - hosted inference with per-token pricing',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -87,6 +95,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Meta-Llama-3.1-70B-Instruct',
label: '💾 Llama 3.1 · 70B Instruct',
pubDate: '20240723',
description: 'Meta Llama 3.1 70B Instruct - hosted inference with per-token pricing',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -95,6 +104,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.1-8B',
label: '💾 Llama 3.1 · 8B Base',
pubDate: '20240723',
description: 'Meta Llama 3.1 8B base model for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -102,6 +112,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.1-70B',
label: '💾 Llama 3.1 · 70B Base',
pubDate: '20240723',
description: 'Meta Llama 3.1 70B base model for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -111,6 +122,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.2-1B-Instruct',
label: '💾 Llama 3.2 · 1B Instruct',
pubDate: '20240925',
description: 'Meta Llama 3.2 1B Instruct - lightweight model for edge and mobile deployment',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -118,6 +130,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.2-3B-Instruct',
label: '💾 Llama 3.2 · 3B Instruct',
pubDate: '20240925',
description: 'Meta Llama 3.2 3B Instruct - efficient model for edge deployment',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -127,6 +140,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'meta-llama/Llama-3.3-70B-Instruct',
label: '💾 Llama 3.3 · 70B Instruct',
pubDate: '20241206',
description: 'Meta Llama 3.3 70B Instruct - latest 70B model with performance comparable to Llama 3.1 405B',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -136,6 +150,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2-VL-7B-Instruct',
label: '💾 Qwen 2 · VL 7B Instruct',
pubDate: '20240830',
description: 'Alibaba Qwen 2 Vision-Language 7B Instruct - multimodal model for text and image understanding',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -145,6 +160,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-1.5B-Instruct',
label: '💾 Qwen 2.5 · 1.5B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 1.5B Instruct - efficient small model',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -152,6 +168,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-7B-Instruct',
label: '💾 Qwen 2.5 · 7B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 7B Instruct - balanced performance and efficiency',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -159,6 +176,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-14B-Instruct',
label: '💾 Qwen 2.5 · 14B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 14B Instruct - hosted inference (hourly compute unit pricing)',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -166,6 +184,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-72B-Instruct',
label: '💾 Qwen 2.5 · 72B Instruct',
pubDate: '20240919',
description: 'Alibaba Qwen 2.5 72B Instruct - flagship model with performance comparable to Llama 3.1 405B',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -173,6 +192,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-Coder-7B-Instruct',
label: '💾 Qwen 2.5 · Coder 7B Instruct',
pubDate: '20241112',
description: 'Alibaba Qwen 2.5 Coder 7B Instruct - specialized for code generation and understanding',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -180,6 +200,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen2.5-Coder-32B-Instruct',
label: '💾 Qwen 2.5 · Coder 32B Instruct',
pubDate: '20241112',
description: 'Alibaba Qwen 2.5 Coder 32B Instruct - specialized for code generation and understanding',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Fn, LLM_IF_OAI_Json],
@@ -189,6 +210,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen3-8B',
label: '💾 Qwen 3 · 8B Base',
pubDate: '20250429',
description: 'Alibaba Qwen 3 8B base model for fine-tuning - supports thinking and non-thinking modes',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -196,6 +218,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'Qwen/Qwen3-14B',
label: '💾 Qwen 3 · 14B Base',
pubDate: '20250429',
description: 'Alibaba Qwen 3 14B base model for fine-tuning - supports thinking and non-thinking modes',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -205,6 +228,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-1b-it',
label: '💾 Gemma 3 · 1B IT',
pubDate: '20250312',
description: 'Google Gemma 3 1B instruction-tuned - lightweight text-only model',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -212,6 +236,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-4b-it',
label: '💾 Gemma 3 · 4B IT',
pubDate: '20250312',
description: 'Google Gemma 3 4B instruction-tuned - efficient multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -219,6 +244,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-12b-it',
label: '💾 Gemma 3 · 12B IT',
pubDate: '20250312',
description: 'Google Gemma 3 12B instruction-tuned - balanced multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -226,6 +252,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'google/gemma-3-27b-it',
label: '💾 Gemma 3 · 27B IT',
pubDate: '20250312',
description: 'Google Gemma 3 27B instruction-tuned - largest Gemma 3 multimodal model with 128K context',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision],
@@ -235,6 +262,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'mistralai/Mistral-Nemo-Base-2407',
label: '💾 Mistral Nemo · Base',
pubDate: '20240718',
description: 'Mistral Nemo 12B base model (July 2024) for fine-tuning',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -242,6 +270,7 @@ const _knownOpenPipeChatModels: ModelDescriptionSchema[] = [
{
id: 'mistralai/Mistral-Small-24B-Base-2501',
label: '💾 Mistral Small · 24B Base',
pubDate: '20250130',
description: 'Mistral Small 24B base model (Jan 2025) - competitive with larger models while faster',
contextWindow: 32768,
interfaces: [LLM_IF_OAI_Chat],
@@ -162,8 +162,11 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
// -- Vendor parameter & interface inheritance --
const llmRef = model.id.replace(/^[^/]+\//, '');
let initialTemperature: number | undefined;
let pubDate: string | undefined;
const _mergeLookup = (lookup: OrtVendorLookupResult | undefined) => {
if (lookup?.pubDate !== undefined)
pubDate = lookup.pubDate;
if (lookup?.interfaces)
for (const iface of lookup.interfaces)
if (!interfaces.includes(iface))
@@ -270,6 +273,7 @@ export function openRouterModelToModelDescription(wireModel: object): ModelDescr
idPrefix: model.id,
// latest: ...
label,
...(pubDate !== undefined && { pubDate }),
description: model.description?.length > 280 ? model.description.slice(0, 277) + '...' : model.description,
contextWindow,
maxCompletionTokens,
@@ -39,6 +39,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-deep-research',
label: 'Sonar Deep Research',
pubDate: '20250214',
description: 'Expert-level research model for exhaustive searches and comprehensive reports. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
@@ -59,6 +60,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-reasoning-pro',
label: 'Sonar Reasoning Pro',
pubDate: '20250218',
description: 'Premier reasoning model (DeepSeek R1) with Chain of Thought. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Reasoning],
@@ -78,6 +80,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar-pro',
label: 'Sonar Pro',
pubDate: '20250121',
description: 'Advanced search model for complex queries and deep content understanding. 200k context.',
contextWindow: 200000,
maxCompletionTokens: 8000,
@@ -96,6 +99,7 @@ const _knownPerplexityChatModels: ModelDescriptionSchema[] = [
{
id: 'sonar',
label: 'Sonar',
pubDate: '20250121',
description: 'Lightweight, cost-effective search model for quick, grounded answers. 128k context.',
contextWindow: 128000,
interfaces: [LLM_IF_OAI_Chat],
@@ -93,6 +93,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4.3',
label: 'Grok 4.3',
pubDate: '20260417',
description: 'xAI\'s latest flagship model with always-on reasoning and a 1M token context window. Supports text, image, and video inputs with improved agentic performance at lower cost.',
contextWindow: 1000000,
maxCompletionTokens: undefined,
@@ -107,6 +108,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.3
idPrefix: 'grok-4.20-0309-reasoning',
label: 'Grok 4.20 Reasoning',
pubDate: '20260309',
description: 'xAI\'s previous flagship reasoning model with a 2M token context window. Deep reasoning and problem-solving capabilities with text and image inputs.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -119,6 +121,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.3
idPrefix: 'grok-4.20-0309-non-reasoning',
label: 'Grok 4.20',
pubDate: '20260309',
description: 'xAI\'s previous flagship model with a 2M token context window. Non-reasoning variant for fast, high-quality responses with text and image inputs.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -130,6 +133,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4.20-multi-agent-0309',
label: 'Grok 4.20 Multi-Agent',
pubDate: '20260309',
description: 'Multi-agent reasoning model that runs 4 specialized agents in parallel (coordinator, fact-checker, analyst, challenger) for collaborative verification with reduced hallucination.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -147,6 +151,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4-1-fast-reasoning',
label: 'Grok 4.1 Fast Reasoning',
pubDate: '20251119',
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Trained specifically for real-world enterprise use cases with exceptional performance on agentic workflows.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -158,6 +163,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-4-1-fast-non-reasoning',
label: 'Grok 4.1 Fast', // 'Grok 4.1 Fast Non-Reasoning'
pubDate: '20251119',
description: 'Next generation frontier multimodal model optimized for high-performance agentic tool calling with a 2M token context window. Non-reasoning variant for instant responses.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -172,6 +178,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.1
idPrefix: 'grok-4-fast-reasoning',
label: 'Grok 4 Fast Reasoning',
pubDate: '20250919',
description: 'Cost-efficient reasoning model with a 2M token context window. Optimized for fast reasoning in agentic workflows. 98% cost reduction vs Grok 4 with comparable performance.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -184,6 +191,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.1
idPrefix: 'grok-4-fast-non-reasoning',
label: 'Grok 4 Fast', // 'Grok 4 Fast Non-Reasoning'
pubDate: '20250919',
description: 'Cost-efficient non-reasoning model with a 2M token context window. Same weights as grok-4-fast-reasoning but constrained by non-reasoning system prompt for quick responses.',
contextWindow: 2000000,
maxCompletionTokens: undefined,
@@ -196,6 +204,7 @@ const _knownXAIChatModels: ManualMappings = [
hidden: true, // yield to 4.20
idPrefix: 'grok-4-0709',
label: 'Grok 4 (0709)',
pubDate: '20250709',
description: 'xAI\'s most advanced model, offering state-of-the-art reasoning and problem-solving capabilities over a massive 256k context window. Supports text and image inputs.',
contextWindow: 256000,
maxCompletionTokens: undefined,
@@ -209,6 +218,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-3',
label: 'Grok 3',
pubDate: '20250217',
description: 'xAI flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science.',
contextWindow: 131072,
maxCompletionTokens: undefined,
@@ -220,6 +230,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-3-mini',
label: 'Grok 3 Mini',
pubDate: '20250217',
description: 'A lightweight model that is fast and smart for logic-based tasks. Supports function calling and structured outputs.',
contextWindow: 131072,
maxCompletionTokens: undefined,
@@ -236,6 +247,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-code-fast-1',
label: 'Grok Code Fast 1',
pubDate: '20250828',
description: 'Specialized reasoning model for agentic coding workflows. Fast, economical, and optimized for code generation, debugging, and software development tasks.',
contextWindow: 256000,
maxCompletionTokens: undefined,
@@ -249,6 +261,7 @@ const _knownXAIChatModels: ManualMappings = [
{
idPrefix: 'grok-2-vision-1212',
label: 'Grok 2 Vision (1212)',
pubDate: '20241212',
description: 'xAI model grok-2-vision-1212 with image and text input capabilities. Supports text generation with a 32,768 token context window.',
contextWindow: 32768,
maxCompletionTokens: undefined,
@@ -32,6 +32,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-5',
label: 'GLM-5',
pubDate: '20260211',
description: 'Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.',
contextWindow: 204800, // 200K
interfaces: _IF_Reasoning,
@@ -43,6 +44,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-5-code',
label: 'GLM-5 Code',
// pubDate: UNCONFIRMED - 'glm-5-code' not in Z.ai pricing table or release-notes; Z.ai's coding plan documents GLM-5.1 / GLM-5-Turbo / GLM-4.7 / GLM-4.5-Air, no 'glm-5-code'
description: 'GLM-5 optimized for coding tasks. Uses the dedicated Coding endpoint. 200K context, thinking mode.',
contextWindow: 204800, // 200K
interfaces: _IF_Reasoning,
@@ -58,6 +60,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7',
label: 'GLM-4.7',
pubDate: '20251222',
description: 'Latest-gen GLM model with 128K context. Thinking mode activated by default.',
contextWindow: 131072, // 128K
interfaces: _IF_Reasoning,
@@ -69,6 +72,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7-flashx',
label: 'GLM-4.7 FlashX', // fast, low cost
pubDate: '20260119',
description: 'Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -80,6 +84,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.7-flash',
label: 'GLM-4.7 Flash (Free)',
pubDate: '20260119',
description: 'Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -94,6 +99,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v-flashx',
label: 'GLM-4.6 V FlashX',
pubDate: '20251208',
description: 'Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -106,6 +112,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v-flash',
label: 'GLM-4.6 V Flash (Free)',
pubDate: '20251208',
description: 'Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -117,6 +124,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6v',
label: 'GLM-4.6 V',
pubDate: '20251208',
description: 'Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.',
contextWindow: 131072,
interfaces: _IF_Vision_Reasoning,
@@ -131,6 +139,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.6',
label: 'GLM-4.6',
pubDate: '20250930',
description: 'GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.',
contextWindow: 131072,
interfaces: _IF_Reasoning,
@@ -144,6 +153,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-ocr',
label: 'GLM-OCR (Vision, OCR)',
pubDate: '20260203',
description: 'Specialized OCR model for text extraction from images and documents.',
contextWindow: 131072,
interfaces: [LLM_IF_OAI_Chat, LLM_IF_OAI_Vision, LLM_IF_HOTFIX_NoWebP],
@@ -158,6 +168,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5v',
label: 'GLM-4.5 V',
pubDate: '20250811',
description: 'Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.',
contextWindow: 98304, // 96K
interfaces: _IF_Vision_Reasoning,
@@ -173,6 +184,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-flash',
label: 'GLM-4.5 Flash (Free)',
pubDate: '20250728',
description: 'Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -185,6 +197,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-airx',
label: 'GLM-4.5 AirX',
pubDate: '20250728',
description: 'Extended lightweight GLM-4.5 variant. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -197,6 +210,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-air',
label: 'GLM-4.5 Air',
pubDate: '20250728',
description: 'Lightweight GLM-4.5 variant. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -209,6 +223,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5-x',
label: 'GLM-4.5 X',
pubDate: '20250728',
description: 'Extended GLM-4.5 model. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -221,6 +236,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4.5',
label: 'GLM-4.5',
pubDate: '20250728',
description: 'Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.',
contextWindow: 98304,
interfaces: _IF_Reasoning,
@@ -234,6 +250,7 @@ const _knownZAIModels: ManualMappings = [
{
idPrefix: 'glm-4-32b-0414-128k',
label: 'GLM-4 32B (0414) 128K',
pubDate: '20250414',
description: 'GLM-4 32B model with 128K context, 16K output.',
contextWindow: 131072,
interfaces: _IF_Chat,
+4 -1
View File
@@ -6,4 +6,7 @@ cd "$(dirname "$0")/../../.."
# Run with npx tsx (will download on-demand if needed)
# Uses npx cache, lightweight and no local install required
exec npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
npx -y tsx tools/data/llms/llm-registry-sync.ts "$@"
# Then dump a fresh JSON snapshot next to the DB.
exec npx -y tsx tools/data/llms/llm-registry-sync.ts --export-db tools/data/llms/llm-registry.json
+152 -5
View File
@@ -41,6 +41,7 @@ interface CliOptions {
discordWebhook?: string;
notifyFilters?: string;
validate?: boolean;
exportDbPath?: string; // --export-db <path>: read-only DB dump (no API calls, no sync)
}
interface StoredModel {
@@ -53,6 +54,7 @@ interface StoredModel {
deleted_at: string | null;
created: number | null;
updated: number | null;
pub_date: string | null;
context_window: number | null;
max_completion_tokens: number | null;
interfaces: string | null;
@@ -90,6 +92,13 @@ function extractSimplePrice(price: any): number | null {
return null;
}
/** Idempotent schema migration: adds a column if it doesn't already exist. Safe to call on every run. */
function ensureColumn(db: DatabaseSync, table: string, column: string, columnDef: string): void {
const cols = db.prepare(`PRAGMA table_info(${table})`).all() as Array<{ name: string }>;
if (!cols.some((c) => c.name === column))
db.exec(`ALTER TABLE ${table} ADD COLUMN ${column} ${columnDef}`);
}
function initDatabase(): DatabaseSync {
const db = new DatabaseSync(DB_PATH);
@@ -105,6 +114,7 @@ function initDatabase(): DatabaseSync {
deleted_at TEXT,
created INTEGER,
updated INTEGER,
pub_date TEXT,
context_window INTEGER,
max_completion_tokens INTEGER,
interfaces TEXT,
@@ -131,6 +141,9 @@ function initDatabase(): DatabaseSync {
)
`);
// Migrations for existing DBs (safe no-ops on fresh DBs that already have the column from CREATE TABLE).
ensureColumn(db, 'models', 'pub_date', 'TEXT');
return db;
}
@@ -157,15 +170,16 @@ function saveChanges(
): void {
if (changes.new.length > 0) {
const stmt = db.prepare(`
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
context_window, max_completion_tokens, interfaces, description,
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET
label = excluded.label,
last_seen = excluded.last_seen,
created = excluded.created,
updated = excluded.updated,
pub_date = excluded.pub_date,
context_window = excluded.context_window,
max_completion_tokens = excluded.max_completion_tokens,
interfaces = excluded.interfaces,
@@ -188,6 +202,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -208,6 +223,7 @@ function saveChanges(
last_seen = ?,
created = ?,
updated = ?,
pub_date = ?,
context_window = ?,
max_completion_tokens = ?,
interfaces = ?,
@@ -229,6 +245,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -247,11 +264,13 @@ function saveChanges(
if (changes.unchanged.length > 0) {
const stmt = db.prepare(`
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated,
INSERT INTO models (id, vendor, service, label, first_seen, last_seen, created, updated, pub_date,
context_window, max_completion_tokens, interfaces, description,
benchmark_elo, benchmark_mmlu, price_input, price_output, original_json, deleted_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET last_seen = excluded.last_seen
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL)
ON CONFLICT (id, vendor, service) DO UPDATE SET
last_seen = excluded.last_seen,
pub_date = excluded.pub_date
`);
for (const model of changes.unchanged) {
@@ -264,6 +283,7 @@ function saveChanges(
timestamp,
model.created ?? null,
model.updated ?? null,
model.pubDate ?? null,
model.contextWindow ?? null,
model.maxCompletionTokens ?? null,
model.interfaces ? JSON.stringify(model.interfaces) : null,
@@ -310,6 +330,114 @@ function saveSyncHistory(
);
}
// ============================================================================
// Snapshot Export
// ============================================================================
interface CatalogModel {
id: string;
vendor: string;
service: string;
label: string;
pubDate: string | null;
firstSeen: string;
lastSeen: string;
deletedAt: string | null;
created: number | null;
updated: number | null;
contextWindow: number | null;
maxCompletionTokens: number | null;
interfaces: string[] | null;
description: string | null;
benchmarkElo: number | null;
priceInput: number | null;
priceOutput: number | null;
}
interface CatalogSnapshot {
schemaVersion: number;
exportedAt: string;
totalCount: number;
activeCount: number;
deletedCount: number;
byVendor: Record<string, number>;
models: CatalogModel[];
}
/** Dump the entire registry (active + soft-deleted) to a JSON file. Read-only on the DB. */
function exportSnapshot(db: DatabaseSync, outPath: string): void {
const rows = db.prepare(`
SELECT id, vendor, service, label, pub_date, first_seen, last_seen, deleted_at,
created, updated, context_window, max_completion_tokens, interfaces, description,
benchmark_elo, price_input, price_output
FROM models
ORDER BY vendor, service, id
`).all() as unknown as Array<StoredModel & { interfaces: string | null }>;
const byVendor: Record<string, number> = {};
let activeCount = 0;
let deletedCount = 0;
const models: CatalogModel[] = rows.map((r) => {
byVendor[r.vendor] = (byVendor[r.vendor] || 0) + 1;
if (r.deleted_at) deletedCount++;
else activeCount++;
let parsedInterfaces: string[] | null = null;
if (r.interfaces) {
try {
const parsed = JSON.parse(r.interfaces);
if (Array.isArray(parsed)) parsedInterfaces = parsed;
} catch {
// leave null on parse failure
}
}
return {
id: r.id,
vendor: r.vendor,
service: r.service,
label: r.label,
pubDate: r.pub_date,
firstSeen: r.first_seen,
lastSeen: r.last_seen,
deletedAt: r.deleted_at,
created: r.created,
updated: r.updated,
contextWindow: r.context_window,
maxCompletionTokens: r.max_completion_tokens,
interfaces: parsedInterfaces,
description: r.description,
benchmarkElo: r.benchmark_elo,
priceInput: r.price_input,
priceOutput: r.price_output,
};
});
const snapshot: CatalogSnapshot = {
schemaVersion: 1,
exportedAt: new Date().toISOString(),
totalCount: rows.length,
activeCount,
deletedCount,
byVendor,
models,
};
// Write atomically: write to temp, then rename. Avoids partial reads if a consumer is watching.
const dir = path.dirname(path.resolve(outPath));
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
const tmpPath = `${outPath}.tmp`;
fs.writeFileSync(tmpPath, JSON.stringify(snapshot, null, 2));
fs.renameSync(tmpPath, outPath);
console.log(
`${COLORS.green}✓ Exported${COLORS.reset} ${rows.length} models ` +
`(${activeCount} active, ${deletedCount} deleted) ` +
`${COLORS.dim}-> ${path.resolve(outPath)}${COLORS.reset}`,
);
}
// ============================================================================
// Change Detection
// ============================================================================
@@ -353,6 +481,9 @@ function detectChanges(
existingModel.context_window !== (model.contextWindow ?? null) ||
existingModel.max_completion_tokens !== (model.maxCompletionTokens ?? null) ||
existingModel.interfaces !== modelInterfaces;
// NOTE: pub_date intentionally EXCLUDED from change detection. On first run after upgrade,
// all rows go from NULL -> editorial value, which would fire ~hundreds of spurious "updated"
// notifications. The unchanged-touch path below silently backfills pub_date instead.
if (hasChanged) {
changes.updated.push(model);
@@ -542,6 +673,10 @@ function parseArgs(): CliOptions {
case '--validate':
options.validate = true;
break;
case '--export-db':
options.exportDbPath = nextArg;
i++;
break;
}
}
@@ -566,6 +701,7 @@ ${COLORS.bright}Options:${COLORS.reset}
--posthog-key <key> PostHog API key for analytics
--discord-webhook <url> Discord webhook URL
--notify-filters <list> Comma-separated vendor list (e.g., openai,anthropic)
--export-db <path> Read-only DB dump to JSON (no API calls, no sync). Run separately from sync.
--help Show this help
${COLORS.bright}Examples:${COLORS.reset}
@@ -961,6 +1097,17 @@ async function main() {
try {
const options = parseArgs();
// --export-db: read-only DB dump. No config, no sync, no API calls.
if (options.exportDbPath) {
const db = initDatabase();
try {
exportSnapshot(db, options.exportDbPath);
} finally {
db.close();
}
return;
}
let servicesConfig: Record<string, AixAPI_Access>;
if (options.config) {