5.1 KiB
LLM Models Catalog Pipeline (forward-looking)
Status: proposal / partially implemented. Companion to LLM-editorial-control.md which describes the durable reference (pubDate semantics, editorial-vs-dynamic matrix, propagation chain).
This document captures the forward-looking pipeline that turns Big-AGI's editorial model metadata into website value-add (plots, decision helpers, comparison tools at big-agi.com).
Goal
Stand up a database/datastore that the website (~/dev/website) can query for plots, decision helpers, and comparison tools - without requiring the website to call our authenticated tRPC endpoints.
Stages
Stage 1: source of truth (in this repo) — DONE
Editorial files in src/modules/llms/server/ remain the canonical source for:
- Identity: id, label, vendor
- Capabilities:
interfaces,parameterSpecs,contextWindow,maxCompletionTokens - Pricing:
chatPrice(input / output / cache tiers) - Benchmarks:
benchmark.cbaElo(Chat Bot Arena ELO) - Lifecycle:
pubDate,isLegacy,isPreview,hidden, deprecation comments
Well-typed, version-controlled, reviewed - every model edit is a code change with diff history. 282 entries currently carry pubDate (see editorial-control matrix).
Stage 2: extraction script — IN PROGRESS
A build-time script (e.g. scripts/llms/export-models.ts) that:
- Loads every editorial vendor's model array.
- Normalizes per-vendor shapes (array vs Record,
idvsidPrefix,KnownLinksymlinks) to a single row format. - Resolves symlinks (target's
pubDateflows through). - Writes a single JSON snapshot:
data/models-catalog.json(one row per model, with vendor + the editorial fields above).
Open question: do we want this committed (gives the website a stable artifact / public URL) or built on-demand in CI? Recommend committed snapshot under data/ so consumers get a stable URL.
Stage 3: enrichment — NOT STARTED
The exported snapshot gets enriched with data we don't currently track in editorial files:
- Knowledge cutoff (proposed in
llms.types.next.ts:217but never implemented; should be added toModelDescription_schemaas a follow-up). - MMLU / HumanEval / SWE-bench / GPQA / MATH scores (currently only
cbaElo; richer benchmarks belong in a separate block). - Throughput / latency numbers (per-vendor, possibly per-region).
- Modalities matrix (input image, input audio, input video, input PDF, output image, output audio).
- Weights availability (closed / open / restricted), license.
Sources for enrichment: HuggingFace cards, vendor docs, Artificial Analysis, LLM-Stats, official benchmarks. Some can be scraped on a cadence; some needs editorial review.
Stage 4: website consumption — NOT STARTED
The website (~/dev/website) consumes the snapshot to render:
- Timeline plot:
pubDate(x-axis) vscbaElo(y-axis), grouped by vendor - shows the frontier and rate of progress. - Cost-per-quality plot:
chatPrice.outputvscbaElo- "best model per dollar". - Decision helpers: filter by capability (
interfaces), context window, pricing tier, vendor. - Comparison cards: side-by-side specs.
- Lifecycle alerts: deprecation warnings for retiring models.
Open questions
- Where does enrichment data live? A separate
data/models-enrichment.json(joined by id at build time) keeps editorial files clean but introduces a join surface. Alternative: extendModelDescription_schemawith optional enrichment fields and treat editorial files as the only source. Recommend the separate file approach - editorial files stay focused on vendor-API integration; enrichment evolves on a different cadence. - How fresh does the website need to be? If daily, build the snapshot in CI on push and publish to a static URL. If real-time, consume tRPC directly - more work but fewer freshness gaps.
- Do we expose
pubDateand other editorial metadata via tRPC publicly, or only via the snapshot? The current tRPC routes require auth; the website should consume the snapshot, not live tRPC. - Schema versioning - if
ModelDescription_schemaevolves, the snapshot consumers need to be tolerant. Include aschemaVersionfield in the snapshot envelope.
Future extensions to ModelDescription_schema
Beyond pubDate, the natural follow-ups (in priority order):
knowledgeCutoff?: string('YYYY-MM'or'YYYY-MM-DD') - already proposed inllms.types.next.ts. Useful for the timeline plot and for context-aware prompts.deprecationDate?: string- currently exists informally asdeprecated?: stringon_knownGeminiModels; should be promoted to the schema.license?: string- especially important for open-weights models (apache-2.0, mit, llama-community, custom).weights?: 'closed' | 'open' | 'restricted'- quick filter for "can I run this myself?".benchmarks?: { mmlu?: number, humaneval?: number, gpqa?: number, ... }- richer than the currentcbaElo-only block.modalities?: { in: string[], out: string[] }- more precise thaninterfacesfor input/output capability matrices.