Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 49aabddd71 |
+3
-13
@@ -5,9 +5,6 @@
|
||||
# All values have reasonable defaults, so you only need to change the ones you
|
||||
# want to override.
|
||||
|
||||
# Use production mode unless you are developing locally.
|
||||
NODE_ENV=production
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# General settings:
|
||||
|
||||
@@ -60,9 +57,8 @@ NODE_ENV=production
|
||||
# Requires additional setup. See `docs/google-sheets.md` for more information.
|
||||
# PROMPT_LOGGING=false
|
||||
|
||||
# The port and network interface to listen on.
|
||||
# The port to listen on.
|
||||
# PORT=7860
|
||||
# BIND_ADDRESS=0.0.0.0
|
||||
|
||||
# Whether cookies should be set without the Secure flag, for hosts that don't support SSL.
|
||||
# USE_INSECURE_COOKIES=false
|
||||
@@ -106,16 +102,10 @@ NODE_ENV=production
|
||||
# Leave unset to never automatically refresh quotas.
|
||||
# QUOTA_REFRESH_PERIOD=daily
|
||||
|
||||
# Specifies the number of proxies or load balancers in front of the server.
|
||||
# For Cloudflare or Hugging Face deployments, the default of 1 is correct.
|
||||
# For any other deployments, please see config.ts as the correct configuration
|
||||
# depends on your setup. Misconfiguring this value can result in problems
|
||||
# accurately tracking IP addresses and enforcing rate limits.
|
||||
# TRUSTED_PROXIES=1
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Secrets and keys:
|
||||
# For Huggingface, set them via the Secrets section in your Space's config UI. Dp not set them in .env.
|
||||
# Do not put any passwords or API keys directly in this file.
|
||||
# For Huggingface, set them via the Secrets section in your Space's config UI.
|
||||
# For Render, create a "secret file" called .env using the Environment tab.
|
||||
|
||||
# You can add multiple API keys by separating them with a comma.
|
||||
|
||||
+1
-3
@@ -1,10 +1,8 @@
|
||||
.env*
|
||||
!.env.vault
|
||||
.env
|
||||
.venv
|
||||
.vscode
|
||||
.idea
|
||||
build
|
||||
greeting.md
|
||||
node_modules
|
||||
|
||||
http-client.private.env.json
|
||||
|
||||
@@ -1,53 +1,34 @@
|
||||
# OAI Reverse Proxy
|
||||
|
||||
Reverse proxy server for various LLM APIs.
|
||||
Reverse proxy server for the OpenAI and Anthropic APIs. Forwards text generation requests while rejecting administrative/billing requests. Includes optional rate limiting and prompt filtering to prevent abuse.
|
||||
|
||||
### Table of Contents
|
||||
- [What is this?](#what-is-this)
|
||||
- [Features](#features)
|
||||
- [Usage Instructions](#usage-instructions)
|
||||
- [Self-hosting](#self-hosting)
|
||||
- [Alternatives](#alternatives)
|
||||
- [Huggingface (outdated, not advised)](#huggingface-outdated-not-advised)
|
||||
- [Render (outdated, not advised)](#render-outdated-not-advised)
|
||||
- [Why?](#why)
|
||||
- [Usage Instructions](#setup-instructions)
|
||||
- [Deploy to Huggingface (Recommended)](#deploy-to-huggingface-recommended)
|
||||
- [Deploy to Repl.it (WIP)](#deploy-to-replit-wip)
|
||||
- [Local Development](#local-development)
|
||||
|
||||
## What is this?
|
||||
This project allows you to run a reverse proxy server for various LLM APIs.
|
||||
If you would like to provide a friend access to an API via keys you own, you can use this to keep your keys safe while still allowing them to generate text with the API. You can also use this if you'd like to build a client-side application which uses the OpenAI or Anthropic APIs, but don't want to build your own backend. You should never embed your real API keys in a client-side application. Instead, you can have your frontend connect to this reverse proxy and forward requests to the downstream service.
|
||||
|
||||
## Features
|
||||
- [x] Support for multiple APIs
|
||||
- [x] [OpenAI](https://openai.com/)
|
||||
- [x] [Anthropic](https://www.anthropic.com/)
|
||||
- [x] [AWS Bedrock](https://aws.amazon.com/bedrock/)
|
||||
- [x] [Google MakerSuite/Gemini API](https://ai.google.dev/)
|
||||
- [x] [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service)
|
||||
- [x] Translation from OpenAI-formatted prompts to any other API, including streaming responses
|
||||
- [x] Multiple API keys with rotation and rate limit handling
|
||||
- [x] Basic user management
|
||||
- [x] Simple role-based permissions
|
||||
- [x] Per-model token quotas
|
||||
- [x] Temporary user accounts
|
||||
- [x] Prompt and completion logging
|
||||
- [x] Abuse detection and prevention
|
||||
This keeps your keys safe and allows you to use the rate limiting and prompt filtering features of the proxy to prevent abuse.
|
||||
|
||||
## Why?
|
||||
OpenAI keys have full account permissions. They can revoke themselves, generate new keys, modify spend quotas, etc. **You absolutely should not share them, post them publicly, nor embed them in client-side applications as they can be easily stolen.**
|
||||
|
||||
This proxy only forwards text generation requests to the downstream service and rejects requests which would otherwise modify your account.
|
||||
|
||||
---
|
||||
|
||||
## Usage Instructions
|
||||
If you'd like to run your own instance of this server, you'll need to deploy it somewhere and configure it with your API keys. A few easy options are provided below, though you can also deploy it to any other service you'd like if you know what you're doing and the service supports Node.js.
|
||||
If you'd like to run your own instance of this proxy, you'll need to deploy it somewhere and configure it with your API keys. A few easy options are provided below, though you can also deploy it to any other service you'd like.
|
||||
|
||||
### Self-hosting
|
||||
[See here for instructions on how to self-host the application on your own VPS or local machine.](./docs/self-hosting.md)
|
||||
|
||||
**Ensure you set the `TRUSTED_PROXIES` environment variable according to your deployment.** Refer to [.env.example](./.env.example) and [config.ts](./src/config.ts) for more information.
|
||||
|
||||
### Alternatives
|
||||
Fiz and Sekrit are working on some alternative ways to deploy this conveniently. While I'm not involved in this effort beyond providing technical advice regarding my code, I'll link to their work here for convenience: [Sekrit's rentry](https://rentry.org/sekrit)
|
||||
|
||||
### Huggingface (outdated, not advised)
|
||||
### Deploy to Huggingface (Recommended)
|
||||
[See here for instructions on how to deploy to a Huggingface Space.](./docs/deploy-huggingface.md)
|
||||
|
||||
### Render (outdated, not advised)
|
||||
### Deploy to Render
|
||||
[See here for instructions on how to deploy to Render.com.](./docs/deploy-render.md)
|
||||
|
||||
## Local Development
|
||||
@@ -59,12 +40,3 @@ To run the proxy locally for development or testing, install Node.js >= 18.0.0 a
|
||||
4. Start the server in development mode with `npm run start:dev`.
|
||||
|
||||
You can also use `npm run start:dev:tsc` to enable project-wide type checking at the cost of slower startup times. `npm run type-check` can be used to run type checking without starting the server.
|
||||
|
||||
## Building
|
||||
To build the project, run `npm run build`. This will compile the TypeScript code to JavaScript and output it to the `build` directory.
|
||||
|
||||
Note that if you are trying to build the server on a very memory-constrained (<= 1GB) VPS, you may need to run the build with `NODE_OPTIONS=--max_old_space_size=2048 npm run build` to avoid running out of memory during the build process, assuming you have swap enabled. The application itself should run fine on a 512MB VPS for most reasonable traffic levels.
|
||||
|
||||
## Forking
|
||||
|
||||
If you are forking the repository on GitGud, you may wish to disable GitLab CI/CD or you will be spammed with emails about failed builds due not having any CI runners. You can do this by going to *Settings > General > Visibility, project features, permissions* and then disabling the "CI/CD" feature.
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
stages:
|
||||
- build
|
||||
|
||||
build_image:
|
||||
stage: build
|
||||
image:
|
||||
name: gcr.io/kaniko-project/executor:debug
|
||||
entrypoint: [""]
|
||||
script:
|
||||
- |
|
||||
if [ "$CI_COMMIT_REF_NAME" = "main" ]; then
|
||||
TAG="latest"
|
||||
else
|
||||
TAG=$CI_COMMIT_REF_NAME
|
||||
fi
|
||||
- echo "Building image with tag $TAG"
|
||||
- BASE64_AUTH=$(echo -n "$DOCKER_HUB_USERNAME:$DOCKER_HUB_ACCESS_TOKEN" | base64)
|
||||
- echo "{\"auths\":{\"https://index.docker.io/v1/\":{\"auth\":\"$BASE64_AUTH\"}}}" > /kaniko/.docker/config.json
|
||||
- /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/docker/ci/Dockerfile --destination docker.io/khanonci/oai-reverse-proxy:$TAG --build-arg CI_COMMIT_REF_NAME=$CI_COMMIT_REF_NAME --build-arg CI_COMMIT_SHA=$CI_COMMIT_SHA --build-arg CI_PROJECT_PATH=$CI_PROJECT_PATH
|
||||
only:
|
||||
- main
|
||||
@@ -1,22 +0,0 @@
|
||||
FROM node:18-bullseye-slim
|
||||
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
|
||||
RUN npm ci
|
||||
RUN npm run build
|
||||
RUN npm prune --production
|
||||
|
||||
EXPOSE 7860
|
||||
ENV PORT=7860
|
||||
ENV NODE_ENV=production
|
||||
|
||||
ARG CI_COMMIT_REF_NAME
|
||||
ARG CI_COMMIT_SHA
|
||||
ARG CI_PROJECT_PATH
|
||||
|
||||
ENV GITGUD_BRANCH=$CI_COMMIT_REF_NAME
|
||||
ENV GITGUD_COMMIT=$CI_COMMIT_SHA
|
||||
ENV GITGUD_PROJECT=$CI_PROJECT_PATH
|
||||
|
||||
CMD [ "npm", "start" ]
|
||||
@@ -1,17 +0,0 @@
|
||||
# Before running this, create a .env and greeting.md file.
|
||||
# Refer to .env.example for the required environment variables.
|
||||
# User-generated content is stored in the data directory.
|
||||
# When self-hosting, it's recommended to run this behind a reverse proxy like
|
||||
# nginx or Caddy to handle SSL/TLS and rate limiting. Refer to
|
||||
# docs/self-hosting.md for more information and an example nginx config.
|
||||
version: '3.8'
|
||||
services:
|
||||
oai-reverse-proxy:
|
||||
image: khanonci/oai-reverse-proxy:latest
|
||||
ports:
|
||||
- "127.0.0.1:7860:7860"
|
||||
env_file:
|
||||
- ./.env
|
||||
volumes:
|
||||
- ./greeting.md:/app/greeting.md
|
||||
- ./data:/app/data
|
||||
@@ -1,7 +1,5 @@
|
||||
# Deploy to Huggingface Space
|
||||
|
||||
**⚠️ This method is no longer recommended. Please use the [self-hosting instructions](./self-hosting.md) instead.**
|
||||
|
||||
This repository can be deployed to a [Huggingface Space](https://huggingface.co/spaces). This is a free service that allows you to run a simple server in the cloud. You can use it to safely share your OpenAI API key with a friend.
|
||||
|
||||
### 1. Get an API key
|
||||
|
||||
@@ -1,7 +1,4 @@
|
||||
# Deploy to Render.com
|
||||
|
||||
**⚠️ This method is no longer recommended. Please use the [self-hosting instructions](./self-hosting.md) instead.**
|
||||
|
||||
Render.com offers a free tier that includes 750 hours of compute time per month. This is enough to run a single proxy instance 24/7. Instances shut down after 15 minutes without traffic but start up again automatically when a request is received. You can use something like https://app.checklyhq.com/ to ping your proxy every 15 minutes to keep it alive.
|
||||
|
||||
### 1. Create account
|
||||
@@ -31,8 +28,6 @@ The service will be created according to the instructions in the `render.yaml` f
|
||||
- For example, `OPENAI_KEY=sk-abc123`.
|
||||
- Click **Save Changes**.
|
||||
|
||||
**IMPORTANT:** Set `TRUSTED_PROXIES=3`, otherwise users' IP addresses will not be recorded correctly (the server will see the IP address of Render's load balancer instead of the user's real IP address).
|
||||
|
||||
The service will automatically rebuild and deploy with the new environment variables. This will take a few minutes. The link to your deployed proxy will appear at the top of the page.
|
||||
|
||||
If you want to change the URL, go to the **Settings** tab of your Web Service and click the **Edit** button next to **Name**. You can also set a custom domain, though I haven't tried this yet.
|
||||
|
||||
@@ -1,150 +0,0 @@
|
||||
# Quick self-hosting guide
|
||||
|
||||
Temporary guide for self-hosting. This will be improved in the future to provide more robust instructions and options. Provided commands are for Ubuntu.
|
||||
|
||||
This uses prebuilt Docker images for convenience. If you want to make adjustments to the code you can instead clone the repo and follow the Local Development guide in the [README](../README.md).
|
||||
|
||||
## Table of Contents
|
||||
- [Requirements](#requirements)
|
||||
- [Running the application](#running-the-application)
|
||||
- [Setting up a reverse proxy](#setting-up-a-reverse-proxy)
|
||||
- [trycloudflare](#trycloudflare)
|
||||
- [nginx](#nginx)
|
||||
- [Example basic nginx configuration (no SSL)](#example-basic-nginx-configuration-no-ssl)
|
||||
- [Example with Cloudflare SSL](#example-with-cloudflare-ssl)
|
||||
- [Updating/Restarting the application](#updatingrestarting-the-application)
|
||||
|
||||
## Requirements
|
||||
|
||||
- Docker
|
||||
- Docker Compose
|
||||
- A VPS with at least 512MB of RAM (1GB recommended)
|
||||
- A domain name
|
||||
|
||||
If you don't have a VPS and domain name you can use TryCloudflare to set up a temporary URL that you can share with others. See [trycloudflare](#trycloudflare) for more information.
|
||||
|
||||
## Running the application
|
||||
|
||||
- Install Docker and Docker Compose
|
||||
- Create a new directory for the application
|
||||
- This will contain your .env file, greeting file, and any user-generated files
|
||||
- Execute the following commands:
|
||||
- ```
|
||||
touch .env
|
||||
touch greeting.md
|
||||
echo "OPENAI_KEY=your-openai-key" >> .env
|
||||
curl https://gitgud.io/khanon/oai-reverse-proxy/-/raw/main/docker/docker-compose-selfhost.yml -o docker-compose.yml
|
||||
```
|
||||
- You can set further environment variables and keys in the `.env` file. See [.env.example](../.env.example) for a list of available options.
|
||||
- You can set a custom greeting in `greeting.md`. This will be displayed on the homepage.
|
||||
- Run `docker compose up -d`
|
||||
|
||||
You can check logs with `docker compose logs -n 100 -f`.
|
||||
|
||||
The provided docker-compose file listens on port 7860 but binds to localhost only. You should use a reverse proxy to expose the application to the internet as described in the next section.
|
||||
|
||||
## Setting up a reverse proxy
|
||||
|
||||
Rather than exposing the application directly to the internet, it is recommended to set up a reverse proxy. This will allow you to use HTTPS and add additional security measures.
|
||||
|
||||
### trycloudflare
|
||||
|
||||
This will give you a temporary (72 hours) URL that you can use to let others connect to your instance securely, without having to set up a reverse proxy. If you are running the server on your home network, this is probably the best option.
|
||||
- Install `cloudflared` following the instructions at [try.cloudflare.com](https://try.cloudflare.com/).
|
||||
- Run `cloudflared tunnel --url http://localhost:7860`
|
||||
- You will be given a temporary URL that you can share with others.
|
||||
|
||||
If you have a VPS, you should use a proper reverse proxy like nginx instead for a more permanent solution which will allow you to use your own domain name, handle SSL, and add additional security/anti-abuse measures.
|
||||
|
||||
### nginx
|
||||
|
||||
First, install nginx.
|
||||
- `sudo apt update && sudo apt install nginx`
|
||||
|
||||
#### Example basic nginx configuration (no SSL)
|
||||
|
||||
- `sudo nano /etc/nginx/sites-available/oai.conf`
|
||||
- ```
|
||||
server {
|
||||
listen 80;
|
||||
server_name example.com;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:7860;
|
||||
}
|
||||
}
|
||||
```
|
||||
- Replace `example.com` with your domain name.
|
||||
- Ctrl+X to exit, Y to save, Enter to confirm.
|
||||
- `sudo ln -s /etc/nginx/sites-available/oai.conf /etc/nginx/sites-enabled`
|
||||
- `sudo nginx -t`
|
||||
- This will check the configuration file for errors.
|
||||
- `sudo systemctl restart nginx`
|
||||
- This will restart nginx and apply the new configuration.
|
||||
|
||||
#### Example with Cloudflare SSL
|
||||
|
||||
This allows you to use a self-signed certificate on the server, and have Cloudflare handle client SSL. You need to have a Cloudflare account and have your domain set up with Cloudflare already, pointing to your server's IP address.
|
||||
|
||||
- Set Cloudflare to use Full SSL mode. Since we are using a self-signed certificate, don't use Full (strict) mode.
|
||||
- Create a self-signed certificate:
|
||||
- `openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/private/nginx-selfsigned.key -out /etc/ssl/certs/nginx-selfsigned.crt`
|
||||
- `sudo nano /etc/nginx/sites-available/oai.conf`
|
||||
- ```
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name yourdomain.com www.yourdomain.com;
|
||||
|
||||
ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
|
||||
ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
|
||||
|
||||
# Only allow inbound traffic from Cloudflare
|
||||
allow 173.245.48.0/20;
|
||||
allow 103.21.244.0/22;
|
||||
allow 103.22.200.0/22;
|
||||
allow 103.31.4.0/22;
|
||||
allow 141.101.64.0/18;
|
||||
allow 108.162.192.0/18;
|
||||
allow 190.93.240.0/20;
|
||||
allow 188.114.96.0/20;
|
||||
allow 197.234.240.0/22;
|
||||
allow 198.41.128.0/17;
|
||||
allow 162.158.0.0/15;
|
||||
allow 104.16.0.0/13;
|
||||
allow 104.24.0.0/14;
|
||||
allow 172.64.0.0/13;
|
||||
allow 131.0.72.0/22;
|
||||
deny all;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:7860;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection 'upgrade';
|
||||
proxy_set_header Host $host;
|
||||
proxy_cache_bypass $http_upgrade;
|
||||
}
|
||||
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
|
||||
ssl_prefer_server_ciphers on;
|
||||
ssl_session_cache shared:SSL:10m;
|
||||
}
|
||||
```
|
||||
- Replace `yourdomain.com` with your domain name.
|
||||
- Ctrl+X to exit, Y to save, Enter to confirm.
|
||||
- `sudo ln -s /etc/nginx/sites-available/oai.conf /etc/nginx/sites-enabled`
|
||||
|
||||
## Updating/Restarting the application
|
||||
|
||||
After making an .env change, you need to restart the application for it to take effect.
|
||||
|
||||
- `docker compose down`
|
||||
- `docker compose up -d`
|
||||
|
||||
To update the application to the latest version:
|
||||
|
||||
- `docker compose pull`
|
||||
- `docker compose down`
|
||||
- `docker compose up -d`
|
||||
- `docker image prune -f`
|
||||
Generated
+10
-11
@@ -20,7 +20,7 @@
|
||||
"copyfiles": "^2.4.1",
|
||||
"cors": "^2.8.5",
|
||||
"csrf-csrf": "^2.3.0",
|
||||
"dotenv": "^16.3.1",
|
||||
"dotenv": "^16.0.3",
|
||||
"ejs": "^3.1.9",
|
||||
"express": "^4.18.2",
|
||||
"express-session": "^1.17.3",
|
||||
@@ -36,7 +36,6 @@
|
||||
"sanitize-html": "^2.11.0",
|
||||
"sharp": "^0.32.6",
|
||||
"showdown": "^2.1.0",
|
||||
"source-map-support": "^0.5.21",
|
||||
"stream-json": "^1.8.0",
|
||||
"tiktoken": "^1.0.10",
|
||||
"uuid": "^9.0.0",
|
||||
@@ -62,6 +61,7 @@
|
||||
"nodemon": "^3.0.1",
|
||||
"pino-pretty": "^10.2.3",
|
||||
"prettier": "^3.0.3",
|
||||
"source-map-support": "^0.5.21",
|
||||
"ts-node": "^10.9.1",
|
||||
"typescript": "^5.1.3"
|
||||
},
|
||||
@@ -2249,14 +2249,11 @@
|
||||
}
|
||||
},
|
||||
"node_modules/dotenv": {
|
||||
"version": "16.3.1",
|
||||
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.3.1.tgz",
|
||||
"integrity": "sha512-IPzF4w4/Rd94bA9imS68tZBaYyBWSCE47V1RGuMrB94iyTOIEwRmVL2x/4An+6mETpLrKJ5hQkB8W4kFAadeIQ==",
|
||||
"version": "16.0.3",
|
||||
"resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.0.3.tgz",
|
||||
"integrity": "sha512-7GO6HghkA5fYG9TYnNxi14/7K9f5occMlp3zXAuSxn7CKCxt9xbNWG7yF8hTCSUchlfWSe3uLmlPfigevRItzQ==",
|
||||
"engines": {
|
||||
"node": ">=12"
|
||||
},
|
||||
"funding": {
|
||||
"url": "https://github.com/motdotla/dotenv?sponsor=1"
|
||||
}
|
||||
},
|
||||
"node_modules/duplexify": {
|
||||
@@ -2824,9 +2821,9 @@
|
||||
}
|
||||
},
|
||||
"node_modules/follow-redirects": {
|
||||
"version": "1.15.4",
|
||||
"resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.4.tgz",
|
||||
"integrity": "sha512-Cr4D/5wlrb0z9dgERpUL3LrmPKVDsETIJhaCMeDfuFYcqa5bldGV6wBsAN6X/vxlXQtFBMrXdXxdL8CbDTGniw==",
|
||||
"version": "1.15.2",
|
||||
"resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.2.tgz",
|
||||
"integrity": "sha512-VQLG33o04KaQ8uYi2tVNbdrWp1QWxNNea+nmIB4EVM28v0hmP17z7aG1+wAkNzVq4KeXTq3221ye5qTJP91JwA==",
|
||||
"funding": [
|
||||
{
|
||||
"type": "individual",
|
||||
@@ -5114,6 +5111,7 @@
|
||||
"version": "0.6.1",
|
||||
"resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz",
|
||||
"integrity": "sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==",
|
||||
"devOptional": true,
|
||||
"engines": {
|
||||
"node": ">=0.10.0"
|
||||
}
|
||||
@@ -5130,6 +5128,7 @@
|
||||
"version": "0.5.21",
|
||||
"resolved": "https://registry.npmjs.org/source-map-support/-/source-map-support-0.5.21.tgz",
|
||||
"integrity": "sha512-uBHU3L3czsIyYXKX88fdrGovxdSCoTGDRZ6SYXtSRxLZUzHg5P/66Ht6uoUlHu9EZod+inXhKo3qQgwXUT/y1w==",
|
||||
"dev": true,
|
||||
"dependencies": {
|
||||
"buffer-from": "^1.0.0",
|
||||
"source-map": "^0.6.0"
|
||||
|
||||
+3
-4
@@ -28,7 +28,7 @@
|
||||
"copyfiles": "^2.4.1",
|
||||
"cors": "^2.8.5",
|
||||
"csrf-csrf": "^2.3.0",
|
||||
"dotenv": "^16.3.1",
|
||||
"dotenv": "^16.0.3",
|
||||
"ejs": "^3.1.9",
|
||||
"express": "^4.18.2",
|
||||
"express-session": "^1.17.3",
|
||||
@@ -44,7 +44,6 @@
|
||||
"sanitize-html": "^2.11.0",
|
||||
"sharp": "^0.32.6",
|
||||
"showdown": "^2.1.0",
|
||||
"source-map-support": "^0.5.21",
|
||||
"stream-json": "^1.8.0",
|
||||
"tiktoken": "^1.0.10",
|
||||
"uuid": "^9.0.0",
|
||||
@@ -70,12 +69,12 @@
|
||||
"nodemon": "^3.0.1",
|
||||
"pino-pretty": "^10.2.3",
|
||||
"prettier": "^3.0.3",
|
||||
"source-map-support": "^0.5.21",
|
||||
"ts-node": "^10.9.1",
|
||||
"typescript": "^5.1.3"
|
||||
},
|
||||
"overrides": {
|
||||
"google-gax": "^3.6.1",
|
||||
"postcss": "^8.4.31",
|
||||
"follow-redirects": "^1.15.4"
|
||||
"postcss": "^8.4.31"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -16,8 +16,6 @@ export const USER_ASSETS_DIR = path.join(DATA_DIR, "user-files");
|
||||
type Config = {
|
||||
/** The port the proxy server will listen on. */
|
||||
port: number;
|
||||
/** The network interface the proxy server will listen on. */
|
||||
bindAddress: string;
|
||||
/** Comma-delimited list of OpenAI API keys. */
|
||||
openaiKey?: string;
|
||||
/** Comma-delimited list of Anthropic API keys. */
|
||||
@@ -200,57 +198,12 @@ type Config = {
|
||||
* configured ADMIN_KEY and go to /admin/service-info.
|
||||
**/
|
||||
staticServiceInfo?: boolean;
|
||||
/**
|
||||
* Trusted proxy hops. If you are deploying the server behind a reverse proxy
|
||||
* (Nginx, Cloudflare Tunnel, AWS WAF, etc.) the IP address of incoming
|
||||
* requests will be the IP address of the proxy, not the actual user.
|
||||
*
|
||||
* Depending on your hosting configuration, there may be multiple proxies/load
|
||||
* balancers between your server and the user. Each one will append the
|
||||
* incoming IP address to the `X-Forwarded-For` header. The user's real IP
|
||||
* address will be the first one in the list, assuming the header has not been
|
||||
* tampered with. Setting this value correctly ensures that the server doesn't
|
||||
* trust values in `X-Forwarded-For` not added by trusted proxies.
|
||||
*
|
||||
* In order for the server to determine the user's real IP address, you need
|
||||
* to tell it how many proxies are between the user and the server so it can
|
||||
* select the correct IP address from the `X-Forwarded-For` header.
|
||||
*
|
||||
* *WARNING:* If you set it incorrectly, the proxy will either record the
|
||||
* wrong IP address, or it will be possible for users to spoof their IP
|
||||
* addresses and bypass rate limiting. Check the request logs to see what
|
||||
* incoming X-Forwarded-For values look like.
|
||||
*
|
||||
* Examples:
|
||||
* - X-Forwarded-For: "34.1.1.1, 172.1.1.1, 10.1.1.1" => trustedProxies: 3
|
||||
* - X-Forwarded-For: "34.1.1.1" => trustedProxies: 1
|
||||
* - no X-Forwarded-For header => trustedProxies: 0 (the actual IP of the incoming request will be used)
|
||||
*
|
||||
* As of 2024/01/08:
|
||||
* For HuggingFace or Cloudflare Tunnel, use 1.
|
||||
* For Render, use 3.
|
||||
* For deployments not behind a load balancer, use 0.
|
||||
*
|
||||
* You should double check against your actual request logs to be sure.
|
||||
*
|
||||
* Defaults to 1, as most deployments are on HuggingFace or Cloudflare Tunnel.
|
||||
*/
|
||||
trustedProxies?: number;
|
||||
/**
|
||||
* Whether to allow OpenAI tool usage. The proxy doesn't impelment any
|
||||
* support for tools/function calling but can pass requests and responses as
|
||||
* is. Note that the proxy also cannot accurately track quota usage for
|
||||
* requests involving tools, so you must opt in to this feature at your own
|
||||
* risk.
|
||||
*/
|
||||
allowOpenAIToolUsage?: boolean;
|
||||
};
|
||||
|
||||
// To change configs, create a file called .env in the root directory.
|
||||
// See .env.example for an example.
|
||||
export const config: Config = {
|
||||
port: getEnvWithDefault("PORT", 7860),
|
||||
bindAddress: getEnvWithDefault("BIND_ADDRESS", "0.0.0.0"),
|
||||
openaiKey: getEnvWithDefault("OPENAI_KEY", ""),
|
||||
anthropicKey: getEnvWithDefault("ANTHROPIC_KEY", ""),
|
||||
googleAIKey: getEnvWithDefault("GOOGLE_AI_KEY", ""),
|
||||
@@ -333,8 +286,6 @@ export const config: Config = {
|
||||
showRecentImages: getEnvWithDefault("SHOW_RECENT_IMAGES", true),
|
||||
useInsecureCookies: getEnvWithDefault("USE_INSECURE_COOKIES", isDev),
|
||||
staticServiceInfo: getEnvWithDefault("STATIC_SERVICE_INFO", false),
|
||||
trustedProxies: getEnvWithDefault("TRUSTED_PROXIES", 1),
|
||||
allowOpenAIToolUsage: getEnvWithDefault("ALLOW_OPENAI_TOOL_USAGE", false),
|
||||
} as const;
|
||||
|
||||
function generateCookieSecret() {
|
||||
@@ -425,7 +376,6 @@ export const SENSITIVE_KEYS: (keyof Config)[] = ["googleSheetsSpreadsheetId"];
|
||||
*/
|
||||
export const OMITTED_KEYS = [
|
||||
"port",
|
||||
"bindAddress",
|
||||
"logLevel",
|
||||
"openaiKey",
|
||||
"anthropicKey",
|
||||
@@ -436,7 +386,6 @@ export const OMITTED_KEYS = [
|
||||
"proxyKey",
|
||||
"adminKey",
|
||||
"rejectPhrases",
|
||||
"rejectMessage",
|
||||
"showTokenCosts",
|
||||
"googleSheetsKey",
|
||||
"firebaseKey",
|
||||
@@ -452,7 +401,6 @@ export const OMITTED_KEYS = [
|
||||
"staticServiceInfo",
|
||||
"checkKeys",
|
||||
"allowedModelFamilies",
|
||||
"trustedProxies"
|
||||
] satisfies (keyof Config)[];
|
||||
type OmitKeys = (typeof OMITTED_KEYS)[number];
|
||||
|
||||
|
||||
+1
-18
@@ -61,25 +61,8 @@ export function renderPage(info: ServiceInfo) {
|
||||
<meta charset="utf-8" />
|
||||
<meta name="robots" content="noindex" />
|
||||
<title>${title}</title>
|
||||
<style>
|
||||
body {
|
||||
font-family: sans-serif;
|
||||
background-color: #f0f0f0;
|
||||
padding: 1em;
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body {
|
||||
background-color: #222;
|
||||
color: #eee;
|
||||
}
|
||||
|
||||
a:link, a:visited {
|
||||
color: #bbe;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<body style="font-family: sans-serif; background-color: #f0f0f0; padding: 1em;">
|
||||
${headerHtml}
|
||||
<hr />
|
||||
<h2>Service Info</h2>
|
||||
|
||||
+10
-1
@@ -173,7 +173,16 @@ anthropicRouter.post(
|
||||
function maybeReassignModel(req: Request) {
|
||||
const model = req.body.model;
|
||||
if (!model.startsWith("gpt-")) return;
|
||||
req.body.model = "claude-2.1";
|
||||
|
||||
const bigModel = process.env.CLAUDE_BIG_MODEL || "claude-v1-100k";
|
||||
const contextSize = req.promptTokens! + req.outputTokens!;
|
||||
if (contextSize > 8500) {
|
||||
req.log.debug(
|
||||
{ model: bigModel, contextSize },
|
||||
"Using Claude 100k model for OpenAI-to-Anthropic request"
|
||||
);
|
||||
req.body.model = bigModel;
|
||||
}
|
||||
}
|
||||
|
||||
export const anthropic = anthropicRouter;
|
||||
|
||||
@@ -1,58 +0,0 @@
|
||||
/* Provides a single endpoint for all services. */
|
||||
import { RequestHandler } from "express";
|
||||
import { generateErrorMessage } from "zod-error";
|
||||
import { APIFormat } from "../shared/key-management";
|
||||
import {
|
||||
getServiceForModel,
|
||||
LLMService,
|
||||
MODEL_FAMILIES,
|
||||
MODEL_FAMILY_SERVICE,
|
||||
ModelFamily,
|
||||
} from "../shared/models";
|
||||
import { API_SCHEMA_VALIDATORS } from "../shared/api-schemas";
|
||||
|
||||
const detectApiFormat = (body: any, formats: APIFormat[]): APIFormat => {
|
||||
const errors = [];
|
||||
for (const format of formats) {
|
||||
const result = API_SCHEMA_VALIDATORS[format].safeParse(body);
|
||||
if (result.success) {
|
||||
return format;
|
||||
} else {
|
||||
errors.push(result.error);
|
||||
}
|
||||
}
|
||||
throw new Error(`Couldn't determine the format of your request. Errors: ${errors}`);
|
||||
};
|
||||
|
||||
/**
|
||||
* Tries to infer LLMService and APIFormat using the model name and the presence
|
||||
* of certain fields in the request body.
|
||||
*/
|
||||
const inferService: RequestHandler = (req, res, next) => {
|
||||
const model = req.body.model;
|
||||
if (!model) {
|
||||
throw new Error("No model specified");
|
||||
}
|
||||
|
||||
// Service determines the key provider and is typically determined by the
|
||||
// requested model, though some models are served by multiple services.
|
||||
// API format determines the expected request/response format.
|
||||
let service: LLMService;
|
||||
let inboundApi: APIFormat;
|
||||
let outboundApi: APIFormat;
|
||||
|
||||
if (MODEL_FAMILIES.includes(model)) {
|
||||
service = MODEL_FAMILY_SERVICE[model as ModelFamily];
|
||||
} else {
|
||||
service = getServiceForModel(model);
|
||||
}
|
||||
|
||||
// Each service has typically one API format.
|
||||
switch (service) {
|
||||
case "openai": {
|
||||
const detected = detectApiFormat(req.body, ["openai", "openai-text", "openai-image"]);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
};
|
||||
@@ -194,9 +194,7 @@ export function getCompletionFromBody(req: Request, body: Record<string, any>) {
|
||||
switch (format) {
|
||||
case "openai":
|
||||
case "mistral-ai":
|
||||
// Can be null if the model wants to invoke tools rather than return a
|
||||
// completion.
|
||||
return body.choices[0].message.content || "";
|
||||
return body.choices[0].message.content;
|
||||
case "openai-text":
|
||||
return body.choices[0].text;
|
||||
case "anthropic":
|
||||
|
||||
@@ -18,22 +18,6 @@ export const addAzureKey: RequestPreprocessor = (req) => {
|
||||
|
||||
req.key = keyPool.get(model);
|
||||
req.body.model = model;
|
||||
|
||||
// Handles the sole Azure API deviation from the OpenAI spec (that I know of)
|
||||
const notNullOrUndefined = (x: any) => x !== null && x !== undefined;
|
||||
if ([req.body.logprobs, req.body.top_logprobs].some(notNullOrUndefined)) {
|
||||
// OpenAI wants logprobs: true/false and top_logprobs: number
|
||||
// Azure seems to just want to combine them into logprobs: number
|
||||
// if (typeof req.body.logprobs === "boolean") {
|
||||
// req.body.logprobs = req.body.top_logprobs || undefined;
|
||||
// delete req.body.top_logprobs
|
||||
// }
|
||||
|
||||
// Temporarily just disabling logprobs for Azure because their model support
|
||||
// is random: `This model does not support the 'logprobs' parameter.`
|
||||
delete req.body.logprobs;
|
||||
delete req.body.top_logprobs;
|
||||
}
|
||||
|
||||
req.log.info(
|
||||
{ key: req.key.hash, model },
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
import { RequestPreprocessor } from "../index";
|
||||
import { countTokens } from "../../../../shared/tokenization";
|
||||
import { assertNever } from "../../../../shared/utils";
|
||||
import {
|
||||
import type {
|
||||
GoogleAIChatMessage,
|
||||
MistralAIChatMessage,
|
||||
OpenAIChatMessage,
|
||||
} from "../../../../shared/api-schemas";
|
||||
} from "./transform-outbound-payload";
|
||||
|
||||
/**
|
||||
* Given a request with an already-transformed body, counts the number of
|
||||
|
||||
@@ -6,7 +6,7 @@ import { UserInputError } from "../../../../shared/errors";
|
||||
import {
|
||||
MistralAIChatMessage,
|
||||
OpenAIChatMessage,
|
||||
} from "../../../../shared/api-schemas";
|
||||
} from "./transform-outbound-payload";
|
||||
|
||||
const rejectedClients = new Map<string, number>();
|
||||
|
||||
|
||||
@@ -2,9 +2,9 @@ import express from "express";
|
||||
import { Sha256 } from "@aws-crypto/sha256-js";
|
||||
import { SignatureV4 } from "@smithy/signature-v4";
|
||||
import { HttpRequest } from "@smithy/protocol-http";
|
||||
import { AnthropicV1CompleteSchema } from "../../../../shared/api-schemas/anthropic";
|
||||
import { keyPool } from "../../../../shared/key-management";
|
||||
import { RequestPreprocessor } from "../index";
|
||||
import { AnthropicV1CompleteSchema } from "./transform-outbound-payload";
|
||||
|
||||
const AMZ_HOST =
|
||||
process.env.AMZ_HOST || "bedrock-runtime.%REGION%.amazonaws.com";
|
||||
@@ -32,9 +32,7 @@ export const signAwsRequest: RequestPreprocessor = async (req) => {
|
||||
temperature: true,
|
||||
top_k: true,
|
||||
top_p: true,
|
||||
})
|
||||
.strip()
|
||||
.parse(req.body);
|
||||
}).strip().parse(req.body);
|
||||
|
||||
const credential = getCredentialParts(req);
|
||||
const host = AMZ_HOST.replace("%REGION%", credential.region);
|
||||
@@ -70,7 +68,6 @@ type Credential = {
|
||||
secretAccessKey: string;
|
||||
region: string;
|
||||
};
|
||||
|
||||
function getCredentialParts(req: express.Request): Credential {
|
||||
const [accessKeyId, secretAccessKey, region] = req.key!.key.split(":");
|
||||
|
||||
|
||||
@@ -1,14 +1,206 @@
|
||||
import { Request } from "express";
|
||||
import { z } from "zod";
|
||||
import { config } from "../../../../config";
|
||||
import {
|
||||
isImageGenerationRequest,
|
||||
isTextGenerationRequest,
|
||||
isImageGenerationRequest,
|
||||
} from "../../common";
|
||||
import { RequestPreprocessor } from "../index";
|
||||
import { openAIToAnthropic } from "../../../../shared/api-schemas/anthropic";
|
||||
import { openAIToOpenAIText } from "../../../../shared/api-schemas/openai-text";
|
||||
import { openAIToOpenAIImage } from "../../../../shared/api-schemas/openai-image";
|
||||
import { openAIToGoogleAI } from "../../../../shared/api-schemas/google-ai";
|
||||
import { fixMistralPrompt } from "../../../../shared/api-schemas/mistral-ai";
|
||||
import { API_SCHEMA_VALIDATORS } from "../../../../shared/api-schemas";
|
||||
import { APIFormat } from "../../../../shared/key-management";
|
||||
|
||||
const CLAUDE_OUTPUT_MAX = config.maxOutputTokensAnthropic;
|
||||
const OPENAI_OUTPUT_MAX = config.maxOutputTokensOpenAI;
|
||||
|
||||
// TODO: move schemas to shared
|
||||
|
||||
// https://console.anthropic.com/docs/api/reference#-v1-complete
|
||||
export const AnthropicV1CompleteSchema = z
|
||||
.object({
|
||||
model: z.string().max(100),
|
||||
prompt: z.string({
|
||||
required_error:
|
||||
"No prompt found. Are you sending an OpenAI-formatted request to the Claude endpoint?",
|
||||
}),
|
||||
max_tokens_to_sample: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.transform((v) => Math.min(v, CLAUDE_OUTPUT_MAX)),
|
||||
stop_sequences: z.array(z.string().max(500)).optional(),
|
||||
stream: z.boolean().optional().default(false),
|
||||
temperature: z.coerce.number().optional().default(1),
|
||||
top_k: z.coerce.number().optional(),
|
||||
top_p: z.coerce.number().optional(),
|
||||
})
|
||||
.strip();
|
||||
|
||||
// https://platform.openai.com/docs/api-reference/chat/create
|
||||
const OpenAIV1ChatContentArraySchema = z.array(
|
||||
z.union([
|
||||
z.object({ type: z.literal("text"), text: z.string() }),
|
||||
z.object({
|
||||
type: z.literal("image_url"),
|
||||
image_url: z.object({
|
||||
url: z.string().url(),
|
||||
detail: z.enum(["low", "auto", "high"]).optional().default("auto"),
|
||||
}),
|
||||
}),
|
||||
])
|
||||
);
|
||||
|
||||
export const OpenAIV1ChatCompletionSchema = z
|
||||
.object({
|
||||
model: z.string().max(100),
|
||||
messages: z.array(
|
||||
z.object({
|
||||
role: z.enum(["system", "user", "assistant"]),
|
||||
content: z.union([z.string(), OpenAIV1ChatContentArraySchema]),
|
||||
name: z.string().optional(),
|
||||
}),
|
||||
{
|
||||
required_error:
|
||||
"No `messages` found. Ensure you've set the correct completion endpoint.",
|
||||
invalid_type_error:
|
||||
"Messages were not formatted correctly. Refer to the OpenAI Chat API documentation for more information.",
|
||||
}
|
||||
),
|
||||
temperature: z.number().optional().default(1),
|
||||
top_p: z.number().optional().default(1),
|
||||
n: z
|
||||
.literal(1, {
|
||||
errorMap: () => ({
|
||||
message: "You may only request a single completion at a time.",
|
||||
}),
|
||||
})
|
||||
.optional(),
|
||||
stream: z.boolean().optional().default(false),
|
||||
stop: z
|
||||
.union([z.string().max(500), z.array(z.string().max(500))])
|
||||
.optional(),
|
||||
max_tokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.nullish()
|
||||
.default(16)
|
||||
.transform((v) => Math.min(v ?? OPENAI_OUTPUT_MAX, OPENAI_OUTPUT_MAX)),
|
||||
frequency_penalty: z.number().optional().default(0),
|
||||
presence_penalty: z.number().optional().default(0),
|
||||
logit_bias: z.any().optional(),
|
||||
user: z.string().max(500).optional(),
|
||||
seed: z.number().int().optional(),
|
||||
})
|
||||
.strip();
|
||||
|
||||
export type OpenAIChatMessage = z.infer<
|
||||
typeof OpenAIV1ChatCompletionSchema
|
||||
>["messages"][0];
|
||||
|
||||
const OpenAIV1TextCompletionSchema = z
|
||||
.object({
|
||||
model: z
|
||||
.string()
|
||||
.max(100)
|
||||
.regex(
|
||||
/^gpt-3.5-turbo-instruct/,
|
||||
"Model must start with 'gpt-3.5-turbo-instruct'"
|
||||
),
|
||||
prompt: z.string({
|
||||
required_error:
|
||||
"No `prompt` found. Ensure you've set the correct completion endpoint.",
|
||||
}),
|
||||
logprobs: z.number().int().nullish().default(null),
|
||||
echo: z.boolean().optional().default(false),
|
||||
best_of: z.literal(1).optional(),
|
||||
stop: z
|
||||
.union([z.string().max(500), z.array(z.string().max(500)).max(4)])
|
||||
.optional(),
|
||||
suffix: z.string().max(1000).optional(),
|
||||
})
|
||||
.strip()
|
||||
.merge(OpenAIV1ChatCompletionSchema.omit({ messages: true }));
|
||||
|
||||
// https://platform.openai.com/docs/api-reference/images/create
|
||||
const OpenAIV1ImagesGenerationSchema = z
|
||||
.object({
|
||||
prompt: z.string().max(4000),
|
||||
model: z.string().max(100).optional(),
|
||||
quality: z.enum(["standard", "hd"]).optional().default("standard"),
|
||||
n: z.number().int().min(1).max(4).optional().default(1),
|
||||
response_format: z.enum(["url", "b64_json"]).optional(),
|
||||
size: z
|
||||
.enum(["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"])
|
||||
.optional()
|
||||
.default("1024x1024"),
|
||||
style: z.enum(["vivid", "natural"]).optional().default("vivid"),
|
||||
user: z.string().max(500).optional(),
|
||||
})
|
||||
.strip();
|
||||
|
||||
// https://developers.generativeai.google/api/rest/generativelanguage/models/generateContent
|
||||
const GoogleAIV1GenerateContentSchema = z
|
||||
.object({
|
||||
model: z.string().max(100), //actually specified in path but we need it for the router
|
||||
stream: z.boolean().optional().default(false), // also used for router
|
||||
contents: z.array(
|
||||
z.object({
|
||||
parts: z.array(z.object({ text: z.string() })),
|
||||
role: z.enum(["user", "model"]),
|
||||
})
|
||||
),
|
||||
tools: z.array(z.object({})).max(0).optional(),
|
||||
safetySettings: z.array(z.object({})).max(0).optional(),
|
||||
generationConfig: z.object({
|
||||
temperature: z.number().optional(),
|
||||
maxOutputTokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.optional()
|
||||
.default(16)
|
||||
.transform((v) => Math.min(v, 1024)), // TODO: Add config
|
||||
candidateCount: z.literal(1).optional(),
|
||||
topP: z.number().optional(),
|
||||
topK: z.number().optional(),
|
||||
stopSequences: z.array(z.string().max(500)).max(5).optional(),
|
||||
}),
|
||||
})
|
||||
.strip();
|
||||
|
||||
export type GoogleAIChatMessage = z.infer<
|
||||
typeof GoogleAIV1GenerateContentSchema
|
||||
>["contents"][0];
|
||||
|
||||
// https://docs.mistral.ai/api#operation/createChatCompletion
|
||||
const MistralAIV1ChatCompletionsSchema = z.object({
|
||||
model: z.string(),
|
||||
messages: z.array(
|
||||
z.object({
|
||||
role: z.enum(["system", "user", "assistant"]),
|
||||
content: z.string(),
|
||||
})
|
||||
),
|
||||
temperature: z.number().optional().default(0.7),
|
||||
top_p: z.number().optional().default(1),
|
||||
max_tokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.nullish()
|
||||
.transform((v) => Math.min(v ?? OPENAI_OUTPUT_MAX, OPENAI_OUTPUT_MAX)),
|
||||
stream: z.boolean().optional().default(false),
|
||||
safe_mode: z.boolean().optional().default(false),
|
||||
random_seed: z.number().int().optional(),
|
||||
});
|
||||
|
||||
export type MistralAIChatMessage = z.infer<
|
||||
typeof MistralAIV1ChatCompletionsSchema
|
||||
>["messages"][0];
|
||||
|
||||
const VALIDATORS: Record<APIFormat, z.ZodSchema<any>> = {
|
||||
anthropic: AnthropicV1CompleteSchema,
|
||||
openai: OpenAIV1ChatCompletionSchema,
|
||||
"openai-text": OpenAIV1TextCompletionSchema,
|
||||
"openai-image": OpenAIV1ImagesGenerationSchema,
|
||||
"google-ai": GoogleAIV1GenerateContentSchema,
|
||||
"mistral-ai": MistralAIV1ChatCompletionsSchema,
|
||||
};
|
||||
|
||||
/** Transforms an incoming request body to one that matches the target API. */
|
||||
export const transformOutboundPayload: RequestPreprocessor = async (req) => {
|
||||
@@ -19,17 +211,8 @@ export const transformOutboundPayload: RequestPreprocessor = async (req) => {
|
||||
|
||||
if (alreadyTransformed || notTransformable) return;
|
||||
|
||||
if (req.inboundApi === "mistral-ai") {
|
||||
const messages = req.body.messages;
|
||||
req.body.messages = fixMistralPrompt(messages);
|
||||
req.log.info(
|
||||
{ old: messages.length, new: req.body.messages.length },
|
||||
"Fixed Mistral prompt"
|
||||
);
|
||||
}
|
||||
|
||||
if (sameService) {
|
||||
const result = API_SCHEMA_VALIDATORS[req.inboundApi].safeParse(req.body);
|
||||
const result = VALIDATORS[req.inboundApi].safeParse(req.body);
|
||||
if (!result.success) {
|
||||
req.log.error(
|
||||
{ issues: result.error.issues, body: req.body },
|
||||
@@ -42,22 +225,22 @@ export const transformOutboundPayload: RequestPreprocessor = async (req) => {
|
||||
}
|
||||
|
||||
if (req.inboundApi === "openai" && req.outboundApi === "anthropic") {
|
||||
req.body = openAIToAnthropic(req);
|
||||
req.body = openaiToAnthropic(req);
|
||||
return;
|
||||
}
|
||||
|
||||
if (req.inboundApi === "openai" && req.outboundApi === "google-ai") {
|
||||
req.body = openAIToGoogleAI(req);
|
||||
req.body = openaiToGoogleAI(req);
|
||||
return;
|
||||
}
|
||||
|
||||
if (req.inboundApi === "openai" && req.outboundApi === "openai-text") {
|
||||
req.body = openAIToOpenAIText(req);
|
||||
req.body = openaiToOpenaiText(req);
|
||||
return;
|
||||
}
|
||||
|
||||
if (req.inboundApi === "openai" && req.outboundApi === "openai-image") {
|
||||
req.body = openAIToOpenAIImage(req);
|
||||
req.body = openaiToOpenaiImage(req);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -65,3 +248,277 @@ export const transformOutboundPayload: RequestPreprocessor = async (req) => {
|
||||
`'${req.inboundApi}' -> '${req.outboundApi}' request proxying is not supported. Make sure your client is configured to use the correct API.`
|
||||
);
|
||||
};
|
||||
|
||||
function openaiToAnthropic(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-Anthropic request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
req.headers["anthropic-version"] = "2023-06-01";
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const prompt = openAIMessagesToClaudePrompt(messages);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
// Recommended by Anthropic
|
||||
stops.push("\n\nHuman:");
|
||||
// Helps with jailbreak prompts that send fake system messages and multi-bot
|
||||
// chats that prefix bot messages with "System: Respond as <bot name>".
|
||||
stops.push("\n\nSystem:");
|
||||
// Remove duplicates
|
||||
stops = [...new Set(stops)];
|
||||
|
||||
return {
|
||||
// Model may be overridden in `calculate-context-size.ts` to avoid having
|
||||
// a circular dependency (`calculate-context-size.ts` needs an already-
|
||||
// transformed request body to count tokens, but this function would like
|
||||
// to know the count to select a model).
|
||||
model: process.env.CLAUDE_SMALL_MODEL || "claude-v1",
|
||||
prompt: prompt,
|
||||
max_tokens_to_sample: rest.max_tokens,
|
||||
stop_sequences: stops,
|
||||
stream: rest.stream,
|
||||
temperature: rest.temperature,
|
||||
top_p: rest.top_p,
|
||||
};
|
||||
}
|
||||
|
||||
function openaiToOpenaiText(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-OpenAI-text request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const prompt = flattenOpenAIChatMessages(messages);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
stops.push("\n\nUser:");
|
||||
stops = [...new Set(stops)];
|
||||
|
||||
const transformed = { ...rest, prompt: prompt, stop: stops };
|
||||
return OpenAIV1TextCompletionSchema.parse(transformed);
|
||||
}
|
||||
|
||||
// Takes the last chat message and uses it verbatim as the image prompt.
|
||||
function openaiToOpenaiImage(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-OpenAI-image request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages } = result.data;
|
||||
const prompt = messages.filter((m) => m.role === "user").pop()?.content;
|
||||
if (Array.isArray(prompt)) {
|
||||
throw new Error("Image generation prompt must be a text message.");
|
||||
}
|
||||
|
||||
if (body.stream) {
|
||||
throw new Error(
|
||||
"Streaming is not supported for image generation requests."
|
||||
);
|
||||
}
|
||||
|
||||
// Some frontends do weird things with the prompt, like prefixing it with a
|
||||
// character name or wrapping the entire thing in quotes. We will look for
|
||||
// the index of "Image:" and use everything after that as the prompt.
|
||||
|
||||
const index = prompt?.toLowerCase().indexOf("image:");
|
||||
if (index === -1 || !prompt) {
|
||||
throw new Error(
|
||||
`Start your prompt with 'Image:' followed by a description of the image you want to generate (received: ${prompt}).`
|
||||
);
|
||||
}
|
||||
|
||||
// TODO: Add some way to specify parameters via chat message
|
||||
const transformed = {
|
||||
model: body.model.includes("dall-e") ? body.model : "dall-e-3",
|
||||
quality: "standard",
|
||||
size: "1024x1024",
|
||||
response_format: "url",
|
||||
prompt: prompt.slice(index! + 6).trim(),
|
||||
};
|
||||
return OpenAIV1ImagesGenerationSchema.parse(transformed);
|
||||
}
|
||||
|
||||
function openaiToGoogleAI(
|
||||
req: Request
|
||||
): z.infer<typeof GoogleAIV1GenerateContentSchema> {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse({
|
||||
...body,
|
||||
model: "gpt-3.5-turbo",
|
||||
});
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-Google AI request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const foundNames = new Set<string>();
|
||||
const contents = messages
|
||||
.map((m) => {
|
||||
const role = m.role === "assistant" ? "model" : "user";
|
||||
// Detects character names so we can set stop sequences for them as Gemini
|
||||
// is prone to continuing as the next character.
|
||||
// If names are not available, we'll still try to prefix the message
|
||||
// with generic names so we can set stops for them but they don't work
|
||||
// as well as real names.
|
||||
const text = flattenOpenAIMessageContent(m.content);
|
||||
const propName = m.name?.trim();
|
||||
const textName =
|
||||
m.role === "system" ? "" : text.match(/^(.{0,50}?): /)?.[1]?.trim();
|
||||
const name =
|
||||
propName || textName || (role === "model" ? "Character" : "User");
|
||||
|
||||
foundNames.add(name);
|
||||
|
||||
// Prefixing messages with their character name seems to help avoid
|
||||
// Gemini trying to continue as the next character, or at the very least
|
||||
// ensures it will hit the stop sequence. Otherwise it will start a new
|
||||
// paragraph and switch perspectives.
|
||||
// The response will be very likely to include this prefix so frontends
|
||||
// will need to strip it out.
|
||||
const textPrefix = textName ? "" : `${name}: `;
|
||||
return {
|
||||
parts: [{ text: textPrefix + text }],
|
||||
role: m.role === "assistant" ? ("model" as const) : ("user" as const),
|
||||
};
|
||||
})
|
||||
.reduce<GoogleAIChatMessage[]>((acc, msg) => {
|
||||
const last = acc[acc.length - 1];
|
||||
if (last?.role === msg.role) {
|
||||
last.parts[0].text += "\n\n" + msg.parts[0].text;
|
||||
} else {
|
||||
acc.push(msg);
|
||||
}
|
||||
return acc;
|
||||
}, []);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
stops.push(...Array.from(foundNames).map((name) => `\n${name}:`));
|
||||
stops = [...new Set(stops)].slice(0, 5);
|
||||
|
||||
return {
|
||||
model: "gemini-pro",
|
||||
stream: rest.stream,
|
||||
contents,
|
||||
tools: [],
|
||||
generationConfig: {
|
||||
maxOutputTokens: rest.max_tokens,
|
||||
stopSequences: stops,
|
||||
topP: rest.top_p,
|
||||
topK: 40, // openai schema doesn't have this, google ai defaults to 40
|
||||
temperature: rest.temperature,
|
||||
},
|
||||
safetySettings: [
|
||||
{ category: "HARM_CATEGORY_HARASSMENT", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_HATE_SPEECH", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_SEXUALLY_EXPLICIT", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_DANGEROUS_CONTENT", threshold: "BLOCK_NONE" },
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
export function openAIMessagesToClaudePrompt(messages: OpenAIChatMessage[]) {
|
||||
return (
|
||||
messages
|
||||
.map((m) => {
|
||||
let role: string = m.role;
|
||||
if (role === "assistant") {
|
||||
role = "Assistant";
|
||||
} else if (role === "system") {
|
||||
role = "System";
|
||||
} else if (role === "user") {
|
||||
role = "Human";
|
||||
}
|
||||
const name = m.name?.trim();
|
||||
const content = flattenOpenAIMessageContent(m.content);
|
||||
// https://console.anthropic.com/docs/prompt-design
|
||||
// `name` isn't supported by Anthropic but we can still try to use it.
|
||||
return `\n\n${role}: ${name ? `(as ${name}) ` : ""}${content}`;
|
||||
})
|
||||
.join("") + "\n\nAssistant:"
|
||||
);
|
||||
}
|
||||
|
||||
function flattenOpenAIChatMessages(messages: OpenAIChatMessage[]) {
|
||||
// Temporary to allow experimenting with prompt strategies
|
||||
const PROMPT_VERSION: number = 1;
|
||||
switch (PROMPT_VERSION) {
|
||||
case 1:
|
||||
return (
|
||||
messages
|
||||
.map((m) => {
|
||||
// Claude-style human/assistant turns
|
||||
let role: string = m.role;
|
||||
if (role === "assistant") {
|
||||
role = "Assistant";
|
||||
} else if (role === "system") {
|
||||
role = "System";
|
||||
} else if (role === "user") {
|
||||
role = "User";
|
||||
}
|
||||
return `\n\n${role}: ${flattenOpenAIMessageContent(m.content)}`;
|
||||
})
|
||||
.join("") + "\n\nAssistant:"
|
||||
);
|
||||
case 2:
|
||||
return messages
|
||||
.map((m) => {
|
||||
// Claude without prefixes (except system) and no Assistant priming
|
||||
let role: string = "";
|
||||
if (role === "system") {
|
||||
role = "System: ";
|
||||
}
|
||||
return `\n\n${role}${flattenOpenAIMessageContent(m.content)}`;
|
||||
})
|
||||
.join("");
|
||||
default:
|
||||
throw new Error(`Unknown prompt version: ${PROMPT_VERSION}`);
|
||||
}
|
||||
}
|
||||
|
||||
function flattenOpenAIMessageContent(
|
||||
content: OpenAIChatMessage["content"]
|
||||
): string {
|
||||
return Array.isArray(content)
|
||||
? content
|
||||
.map((contentItem) => {
|
||||
if ("text" in contentItem) return contentItem.text;
|
||||
if ("image_url" in contentItem) return "[ Uploaded Image Omitted ]";
|
||||
})
|
||||
.join("\n")
|
||||
: content;
|
||||
}
|
||||
|
||||
@@ -37,7 +37,6 @@ export const validateContextSize: RequestPreprocessor = async (req) => {
|
||||
break;
|
||||
case "mistral-ai":
|
||||
proxyMax = MISTRAL_AI_MAX_CONTENT;
|
||||
break;
|
||||
case "openai-image":
|
||||
return;
|
||||
default:
|
||||
@@ -48,9 +47,7 @@ export const validateContextSize: RequestPreprocessor = async (req) => {
|
||||
let modelMax: number;
|
||||
if (model.match(/gpt-3.5-turbo-16k/)) {
|
||||
modelMax = 16384;
|
||||
} else if (model.match(/gpt-4-turbo(-preview)?$/)) {
|
||||
modelMax = 131072;
|
||||
} else if (model.match(/gpt-4-(0125|1106)(-preview)?$/)) {
|
||||
} else if (model.match(/gpt-4-1106(-preview)?/)) {
|
||||
modelMax = 131072;
|
||||
} else if (model.match(/^gpt-4(-\d{4})?-vision(-preview)?$/)) {
|
||||
modelMax = 131072;
|
||||
@@ -72,8 +69,6 @@ export const validateContextSize: RequestPreprocessor = async (req) => {
|
||||
modelMax = GOOGLE_AI_MAX_CONTEXT;
|
||||
} else if (model.match(/^mistral-(tiny|small|medium)$/)) {
|
||||
modelMax = MISTRAL_AI_MAX_CONTENT;
|
||||
} else if (model.match(/^anthropic\.claude-v2:\d/)) {
|
||||
modelMax = 200000;
|
||||
} else if (model.match(/^anthropic\.claude/)) {
|
||||
// Not sure if AWS Claude has the same context limit as Anthropic Claude.
|
||||
modelMax = 100000;
|
||||
|
||||
@@ -15,18 +15,14 @@ import { keyPool } from "../../../shared/key-management";
|
||||
const pipelineAsync = promisify(pipeline);
|
||||
|
||||
/**
|
||||
* `handleStreamedResponse` consumes and transforms a streamed response from the
|
||||
* upstream service, forwarding events to the client in their requested format.
|
||||
* After the entire stream has been consumed, it resolves with the full response
|
||||
* body so that subsequent middleware in the chain can process it as if it were
|
||||
* a non-streaming response.
|
||||
* Consume the SSE stream and forward events to the client. Once the stream is
|
||||
* stream is closed, resolve with the full response body so that subsequent
|
||||
* middleware can work with it.
|
||||
*
|
||||
* In the event of an error, the request's streaming flag is unset and the non-
|
||||
* streaming response handler is called instead.
|
||||
*
|
||||
* If the error is retryable, that handler will re-enqueue the request and also
|
||||
* reset the streaming flag. Unfortunately the streaming flag is set and unset
|
||||
* in multiple places, so it's hard to keep track of.
|
||||
* Typically we would only need of the raw response handlers to execute, but
|
||||
* in the event a streamed request results in a non-200 response, we need to
|
||||
* fall back to the non-streaming response handler so that the error handler
|
||||
* can inspect the error response.
|
||||
*/
|
||||
export const handleStreamedResponse: RawResponseBodyHandler = async (
|
||||
proxyRes,
|
||||
@@ -52,8 +48,8 @@ export const handleStreamedResponse: RawResponseBodyHandler = async (
|
||||
`Starting to proxy SSE stream.`
|
||||
);
|
||||
|
||||
// Typically, streaming will have already been initialized by the request
|
||||
// queue to send heartbeat pings.
|
||||
// Users waiting in the queue already have a SSE connection open for the
|
||||
// heartbeat, so we can't always send the stream headers.
|
||||
if (!res.headersSent) {
|
||||
copySseResponseHeaders(proxyRes, res);
|
||||
initializeSseStream(res);
|
||||
@@ -62,11 +58,8 @@ export const handleStreamedResponse: RawResponseBodyHandler = async (
|
||||
const prefersNativeEvents = req.inboundApi === req.outboundApi;
|
||||
const contentType = proxyRes.headers["content-type"];
|
||||
|
||||
// Adapter turns some arbitrary stream (binary, JSON, etc.) into SSE events.
|
||||
const adapter = new SSEStreamAdapter({ contentType, api: req.outboundApi });
|
||||
// Aggregator compiles all events into a single response object.
|
||||
const aggregator = new EventAggregator({ format: req.outboundApi });
|
||||
// Transformer converts events to the user's requested format.
|
||||
const transformer = new SSEMessageTransformer({
|
||||
inputFormat: req.outboundApi,
|
||||
inputApiVersion: String(req.headers["anthropic-version"]),
|
||||
|
||||
@@ -12,7 +12,7 @@ import { assertNever } from "../../../shared/utils";
|
||||
import {
|
||||
MistralAIChatMessage,
|
||||
OpenAIChatMessage,
|
||||
} from "../../../shared/api-schemas";
|
||||
} from "../request/preprocessors/transform-outbound-payload";
|
||||
|
||||
/** If prompt logging is enabled, enqueues the prompt for logging. */
|
||||
export const logPrompt: ProxyResHandlerWithBody = async (
|
||||
|
||||
@@ -85,6 +85,9 @@ export class SSEStreamAdapter extends Transform {
|
||||
}
|
||||
} else {
|
||||
const { bytes } = payload;
|
||||
// technically this is a transformation but we don't really distinguish
|
||||
// between aws claude and anthropic claude at the APIFormat level, so
|
||||
// these will short circuit the message transformer
|
||||
return [
|
||||
"event: completion",
|
||||
`data: ${Buffer.from(bytes, "base64").toString("utf8")}`,
|
||||
@@ -92,7 +95,8 @@ export class SSEStreamAdapter extends Transform {
|
||||
}
|
||||
}
|
||||
|
||||
/** Processes an incoming array element from the Google AI JSON stream. */
|
||||
// Google doesn't use event streams and just sends elements in an array over
|
||||
// a long-lived HTTP connection. Needs stream-json to parse the array.
|
||||
protected processGoogleValue(value: any): string | null {
|
||||
try {
|
||||
const candidates = value.candidates ?? [{}];
|
||||
|
||||
@@ -54,9 +54,7 @@ export function generateModelList(models = KNOWN_MISTRAL_AI_MODELS) {
|
||||
}
|
||||
|
||||
const handleModelRequest: RequestHandler = (_req, res) => {
|
||||
if (new Date().getTime() - modelsCacheTime < 1000 * 60){
|
||||
return res.status(200).json(modelsCache);
|
||||
}
|
||||
if (new Date().getTime() - modelsCacheTime < 1000 * 60) return modelsCache;
|
||||
const result = generateModelList();
|
||||
modelsCache = { object: "list", data: result };
|
||||
modelsCacheTime = new Date().getTime();
|
||||
|
||||
@@ -25,9 +25,7 @@ const KNOWN_MODELS = ["dall-e-2", "dall-e-3"];
|
||||
let modelListCache: any = null;
|
||||
let modelListValid = 0;
|
||||
const handleModelRequest: RequestHandler = (_req, res) => {
|
||||
if (new Date().getTime() - modelListValid < 1000 * 60) {
|
||||
return res.status(200).json(modelListCache);
|
||||
}
|
||||
if (new Date().getTime() - modelListValid < 1000 * 60) return modelListCache;
|
||||
const result = generateModelList(KNOWN_MODELS);
|
||||
modelListCache = { object: "list", data: result };
|
||||
modelListValid = new Date().getTime();
|
||||
|
||||
+5
-7
@@ -28,8 +28,6 @@ import {
|
||||
|
||||
// https://platform.openai.com/docs/models/overview
|
||||
export const KNOWN_OPENAI_MODELS = [
|
||||
"gpt-4-turbo-preview",
|
||||
"gpt-4-0125-preview",
|
||||
"gpt-4-1106-preview",
|
||||
"gpt-4-vision-preview",
|
||||
"gpt-4",
|
||||
@@ -37,7 +35,7 @@ export const KNOWN_OPENAI_MODELS = [
|
||||
"gpt-4-0314", // EOL 2024-06-13
|
||||
"gpt-4-32k",
|
||||
"gpt-4-32k-0613",
|
||||
// "gpt-4-32k-0314", // EOL 2024-06-13
|
||||
"gpt-4-32k-0314", // EOL 2024-06-13
|
||||
"gpt-3.5-turbo",
|
||||
"gpt-3.5-turbo-0301", // EOL 2024-06-13
|
||||
"gpt-3.5-turbo-0613",
|
||||
@@ -85,9 +83,7 @@ export function generateModelList(models = KNOWN_OPENAI_MODELS) {
|
||||
}
|
||||
|
||||
const handleModelRequest: RequestHandler = (_req, res) => {
|
||||
if (new Date().getTime() - modelsCacheTime < 1000 * 60) {
|
||||
return res.status(200).json(modelsCache);
|
||||
}
|
||||
if (new Date().getTime() - modelsCacheTime < 1000 * 60) return modelsCache;
|
||||
const result = generateModelList();
|
||||
modelsCache = { object: "list", data: result };
|
||||
modelsCacheTime = new Date().getTime();
|
||||
@@ -165,7 +161,9 @@ const openaiProxy = createQueueMiddleware({
|
||||
selfHandleResponse: true,
|
||||
logger,
|
||||
on: {
|
||||
proxyReq: createOnProxyReqHandler({ pipeline: [addKey, finalizeBody], }),
|
||||
proxyReq: createOnProxyReqHandler({
|
||||
pipeline: [addKey, finalizeBody],
|
||||
}),
|
||||
proxyRes: createOnProxyResHandler([openaiResponseHandler]),
|
||||
error: handleProxyError,
|
||||
},
|
||||
|
||||
+1
-1
@@ -527,7 +527,7 @@ function monitorHeartbeat(req: Request) {
|
||||
if (bytesSinceLast < minBytes) {
|
||||
req.log.warn(
|
||||
{ minBytes, bytesSinceLast },
|
||||
"Queued request is not processing heartbeats enough data or server is overloaded; killing connection."
|
||||
"Queued request is processing heartbeats enough data or server is overloaded; killing connection."
|
||||
);
|
||||
res.destroy();
|
||||
}
|
||||
|
||||
+2
-2
@@ -19,8 +19,8 @@ proxyRouter.use((req, _res, next) => {
|
||||
next();
|
||||
});
|
||||
proxyRouter.use(
|
||||
express.json({ limit: "10mb" }),
|
||||
express.urlencoded({ extended: true, limit: "10mb" })
|
||||
express.json({ limit: "1mb" }),
|
||||
express.urlencoded({ extended: true, limit: "1mb" })
|
||||
);
|
||||
proxyRouter.use(gatekeeper);
|
||||
proxyRouter.use(checkRisuToken);
|
||||
|
||||
+19
-31
@@ -12,7 +12,7 @@ import { setupAssetsDir } from "./shared/file-storage/setup-assets-dir";
|
||||
import { keyPool } from "./shared/key-management";
|
||||
import { adminRouter } from "./admin/routes";
|
||||
import { proxyRouter } from "./proxy/routes";
|
||||
import { handleInfoPage } from "./info-page";
|
||||
import { handleInfoPage, renderPage } from "./info-page";
|
||||
import { buildInfo } from "./service-info";
|
||||
import { logQueue } from "./shared/prompt-logging";
|
||||
import { start as startRequestQueue } from "./proxy/queue";
|
||||
@@ -22,7 +22,6 @@ import { checkOrigin } from "./proxy/check-origin";
|
||||
import { userRouter } from "./user/routes";
|
||||
|
||||
const PORT = config.port;
|
||||
const BIND_ADDRESS = config.bindAddress;
|
||||
|
||||
const app = express();
|
||||
// middleware
|
||||
@@ -51,7 +50,14 @@ app.use(
|
||||
})
|
||||
);
|
||||
|
||||
app.set("trust proxy", Number(config.trustedProxies));
|
||||
// TODO: Detect (or support manual configuration of) whether the app is behind
|
||||
// a load balancer/reverse proxy, which is necessary to determine request IP
|
||||
// addresses correctly.
|
||||
app.set("trust proxy", true);
|
||||
app.use((req, _res, next) => {
|
||||
req.log.info({ ip: req.ip, forwardedFor: req.get("x-forwarded-for") });
|
||||
next();
|
||||
});
|
||||
|
||||
app.set("view engine", "ejs");
|
||||
app.set("views", [
|
||||
@@ -66,11 +72,7 @@ app.get("/health", (_req, res) => res.sendStatus(200));
|
||||
app.use(cors());
|
||||
app.use(checkOrigin);
|
||||
|
||||
if (config.staticServiceInfo) {
|
||||
app.get("/", (_req, res) => res.sendStatus(200));
|
||||
} else {
|
||||
app.get("/", handleInfoPage);
|
||||
}
|
||||
app.get("/", handleInfoPage);
|
||||
app.get("/status", (req, res) => {
|
||||
res.json(buildInfo(req.protocol + "://" + req.get("host"), false));
|
||||
});
|
||||
@@ -124,18 +126,15 @@ async function start() {
|
||||
logger.info("Starting request queue...");
|
||||
startRequestQueue();
|
||||
|
||||
app.listen(PORT, async () => {
|
||||
logger.info({ port: PORT }, "Now listening for connections.");
|
||||
registerUncaughtExceptionHandler();
|
||||
});
|
||||
|
||||
const diskSpace = await checkDiskSpace(
|
||||
__dirname.startsWith("/app") ? "/app" : os.homedir()
|
||||
);
|
||||
|
||||
app.listen(PORT, BIND_ADDRESS, () => {
|
||||
logger.info(
|
||||
{ port: PORT, interface: BIND_ADDRESS },
|
||||
"Now listening for connections."
|
||||
);
|
||||
registerUncaughtExceptionHandler();
|
||||
});
|
||||
|
||||
logger.info(
|
||||
{ build: process.env.BUILD_INFO, nodeEnv: process.env.NODE_ENV, diskSpace },
|
||||
"Startup complete."
|
||||
@@ -165,18 +164,7 @@ function registerUncaughtExceptionHandler() {
|
||||
* didn't set it to something misleading.
|
||||
*/
|
||||
async function setBuildInfo() {
|
||||
// For CI builds, use the env vars set during the build process
|
||||
if (process.env.GITGUD_BRANCH) {
|
||||
const sha = process.env.GITGUD_COMMIT?.slice(0, 7) || "unknown SHA";
|
||||
const branch = process.env.GITGUD_BRANCH;
|
||||
const repo = process.env.GITGUD_PROJECT;
|
||||
const buildInfo = `[ci] ${sha} (${branch}@${repo})`;
|
||||
process.env.BUILD_INFO = buildInfo;
|
||||
logger.info({ build: buildInfo }, "Using build info from CI image.");
|
||||
return;
|
||||
}
|
||||
|
||||
// For render, the git directory is dockerignore'd so we use env vars
|
||||
// Render .dockerignore's the .git directory but provides info in the env
|
||||
if (process.env.RENDER) {
|
||||
const sha = process.env.RENDER_GIT_COMMIT?.slice(0, 7) || "unknown SHA";
|
||||
const branch = process.env.RENDER_GIT_BRANCH || "unknown branch";
|
||||
@@ -187,10 +175,10 @@ async function setBuildInfo() {
|
||||
return;
|
||||
}
|
||||
|
||||
// For huggingface and bare metal deployments, we can get the info from git
|
||||
try {
|
||||
// Ignore git's complaints about dubious directory ownership on Huggingface
|
||||
// (which evidently runs dockerized Spaces on Windows with weird NTFS perms)
|
||||
if (process.env.SPACE_ID) {
|
||||
// TODO: may not be necessary anymore with adjusted Huggingface dockerfile
|
||||
childProcess.execSync("git config --global --add safe.directory /app");
|
||||
}
|
||||
|
||||
@@ -210,7 +198,7 @@ async function setBuildInfo() {
|
||||
|
||||
let [sha, branch, remote, status] = await Promise.all(promises);
|
||||
|
||||
remote = remote.match(/.*[\/:]([\w-]+)\/([\w\-.]+?)(?:\.git)?$/) || [];
|
||||
remote = remote.match(/.*[\/:]([\w-]+)\/([\w\-\.]+?)(?:\.git)?$/) || [];
|
||||
const repo = remote.slice(-2).join("/");
|
||||
status = status
|
||||
// ignore Dockerfile changes since that's how the user deploys the app
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { Request } from "express";
|
||||
import { config } from "../../config";
|
||||
import {
|
||||
flattenOpenAIMessageContent,
|
||||
OpenAIChatMessage,
|
||||
OpenAIV1ChatCompletionSchema,
|
||||
} from "./openai";
|
||||
|
||||
const CLAUDE_OUTPUT_MAX = config.maxOutputTokensAnthropic;
|
||||
|
||||
// https://console.anthropic.com/docs/api/reference#-v1-complete
|
||||
export const AnthropicV1CompleteSchema = z
|
||||
.object({
|
||||
model: z.string().max(100),
|
||||
prompt: z.string({
|
||||
required_error:
|
||||
"No prompt found. Are you sending an OpenAI-formatted request to the Claude endpoint?",
|
||||
}),
|
||||
max_tokens_to_sample: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.transform((v) => Math.min(v, CLAUDE_OUTPUT_MAX)),
|
||||
stop_sequences: z.array(z.string().max(500)).optional(),
|
||||
stream: z.boolean().optional().default(false),
|
||||
temperature: z.coerce.number().optional().default(1),
|
||||
top_k: z.coerce.number().optional(),
|
||||
top_p: z.coerce.number().optional(),
|
||||
})
|
||||
.strip();
|
||||
|
||||
export function openAIMessagesToClaudePrompt(messages: OpenAIChatMessage[]) {
|
||||
return (
|
||||
messages
|
||||
.map((m) => {
|
||||
let role: string = m.role;
|
||||
if (role === "assistant") {
|
||||
role = "Assistant";
|
||||
} else if (role === "system") {
|
||||
role = "System";
|
||||
} else if (role === "user") {
|
||||
role = "Human";
|
||||
}
|
||||
const name = m.name?.trim();
|
||||
const content = flattenOpenAIMessageContent(m.content);
|
||||
// https://console.anthropic.com/docs/prompt-design
|
||||
// `name` isn't supported by Anthropic but we can still try to use it.
|
||||
return `\n\n${role}: ${name ? `(as ${name}) ` : ""}${content}`;
|
||||
})
|
||||
.join("") + "\n\nAssistant:"
|
||||
);
|
||||
}
|
||||
|
||||
export function openAIToAnthropic(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-Anthropic request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
req.headers["anthropic-version"] = "2023-06-01";
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const prompt = openAIMessagesToClaudePrompt(messages);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
// Recommended by Anthropic
|
||||
stops.push("\n\nHuman:");
|
||||
// Helps with jailbreak prompts that send fake system messages and multi-bot
|
||||
// chats that prefix bot messages with "System: Respond as <bot name>".
|
||||
stops.push("\n\nSystem:");
|
||||
// Remove duplicates
|
||||
stops = [...new Set(stops)];
|
||||
|
||||
return {
|
||||
model: rest.model,
|
||||
prompt: prompt,
|
||||
max_tokens_to_sample: rest.max_tokens,
|
||||
stop_sequences: stops,
|
||||
stream: rest.stream,
|
||||
temperature: rest.temperature,
|
||||
top_p: rest.top_p,
|
||||
};
|
||||
}
|
||||
@@ -1,124 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { Request } from "express";
|
||||
import {
|
||||
flattenOpenAIMessageContent,
|
||||
OpenAIV1ChatCompletionSchema,
|
||||
} from "./openai";
|
||||
|
||||
// https://developers.generativeai.google/api/rest/generativelanguage/models/generateContent
|
||||
export const GoogleAIV1GenerateContentSchema = z
|
||||
.object({
|
||||
model: z.string().max(100), //actually specified in path but we need it for the router
|
||||
stream: z.boolean().optional().default(false), // also used for router
|
||||
contents: z.array(
|
||||
z.object({
|
||||
parts: z.array(z.object({ text: z.string() })),
|
||||
role: z.enum(["user", "model"]),
|
||||
}),
|
||||
),
|
||||
tools: z.array(z.object({})).max(0).optional(),
|
||||
safetySettings: z.array(z.object({})).max(0).optional(),
|
||||
generationConfig: z.object({
|
||||
temperature: z.number().optional(),
|
||||
maxOutputTokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.optional()
|
||||
.default(16)
|
||||
.transform((v) => Math.min(v, 1024)), // TODO: Add config
|
||||
candidateCount: z.literal(1).optional(),
|
||||
topP: z.number().optional(),
|
||||
topK: z.number().optional(),
|
||||
stopSequences: z.array(z.string().max(500)).max(5).optional(),
|
||||
}),
|
||||
})
|
||||
.strip();
|
||||
export type GoogleAIChatMessage = z.infer<
|
||||
typeof GoogleAIV1GenerateContentSchema
|
||||
>["contents"][0];
|
||||
|
||||
export function openAIToGoogleAI(
|
||||
req: Request,
|
||||
): z.infer<typeof GoogleAIV1GenerateContentSchema> {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse({
|
||||
...body,
|
||||
model: "gpt-3.5-turbo",
|
||||
});
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-Google AI request",
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const foundNames = new Set<string>();
|
||||
const contents = messages
|
||||
.map((m) => {
|
||||
const role = m.role === "assistant" ? "model" : "user";
|
||||
// Detects character names so we can set stop sequences for them as Gemini
|
||||
// is prone to continuing as the next character.
|
||||
// If names are not available, we'll still try to prefix the message
|
||||
// with generic names so we can set stops for them but they don't work
|
||||
// as well as real names.
|
||||
const text = flattenOpenAIMessageContent(m.content);
|
||||
const propName = m.name?.trim();
|
||||
const textName =
|
||||
m.role === "system" ? "" : text.match(/^(.{0,50}?): /)?.[1]?.trim();
|
||||
const name =
|
||||
propName || textName || (role === "model" ? "Character" : "User");
|
||||
|
||||
foundNames.add(name);
|
||||
|
||||
// Prefixing messages with their character name seems to help avoid
|
||||
// Gemini trying to continue as the next character, or at the very least
|
||||
// ensures it will hit the stop sequence. Otherwise it will start a new
|
||||
// paragraph and switch perspectives.
|
||||
// The response will be very likely to include this prefix so frontends
|
||||
// will need to strip it out.
|
||||
const textPrefix = textName ? "" : `${name}: `;
|
||||
return {
|
||||
parts: [{ text: textPrefix + text }],
|
||||
role: m.role === "assistant" ? ("model" as const) : ("user" as const),
|
||||
};
|
||||
})
|
||||
.reduce<GoogleAIChatMessage[]>((acc, msg) => {
|
||||
const last = acc[acc.length - 1];
|
||||
if (last?.role === msg.role) {
|
||||
last.parts[0].text += "\n\n" + msg.parts[0].text;
|
||||
} else {
|
||||
acc.push(msg);
|
||||
}
|
||||
return acc;
|
||||
}, []);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
stops.push(...Array.from(foundNames).map((name) => `\n${name}:`));
|
||||
stops = [...new Set(stops)].slice(0, 5);
|
||||
|
||||
return {
|
||||
model: "gemini-pro",
|
||||
stream: rest.stream,
|
||||
contents,
|
||||
tools: [],
|
||||
generationConfig: {
|
||||
maxOutputTokens: rest.max_tokens,
|
||||
stopSequences: stops,
|
||||
topP: rest.top_p,
|
||||
topK: 40, // openai schema doesn't have this, google ai defaults to 40
|
||||
temperature: rest.temperature,
|
||||
},
|
||||
safetySettings: [
|
||||
{ category: "HARM_CATEGORY_HARASSMENT", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_HATE_SPEECH", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_SEXUALLY_EXPLICIT", threshold: "BLOCK_NONE" },
|
||||
{ category: "HARM_CATEGORY_DANGEROUS_CONTENT", threshold: "BLOCK_NONE" },
|
||||
],
|
||||
};
|
||||
}
|
||||
@@ -1,21 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { APIFormat } from "../key-management";
|
||||
import { AnthropicV1CompleteSchema } from "./anthropic";
|
||||
import { OpenAIV1ChatCompletionSchema } from "./openai";
|
||||
import { OpenAIV1TextCompletionSchema } from "./openai-text";
|
||||
import { OpenAIV1ImagesGenerationSchema } from "./openai-image";
|
||||
import { GoogleAIV1GenerateContentSchema } from "./google-ai";
|
||||
import { MistralAIV1ChatCompletionsSchema } from "./mistral-ai";
|
||||
|
||||
export { OpenAIChatMessage } from "./openai";
|
||||
export { GoogleAIChatMessage } from "./google-ai";
|
||||
export { MistralAIChatMessage } from "./mistral-ai";
|
||||
|
||||
export const API_SCHEMA_VALIDATORS: Record<APIFormat, z.ZodSchema<any>> = {
|
||||
anthropic: AnthropicV1CompleteSchema,
|
||||
openai: OpenAIV1ChatCompletionSchema,
|
||||
"openai-text": OpenAIV1TextCompletionSchema,
|
||||
"openai-image": OpenAIV1ImagesGenerationSchema,
|
||||
"google-ai": GoogleAIV1GenerateContentSchema,
|
||||
"mistral-ai": MistralAIV1ChatCompletionsSchema,
|
||||
};
|
||||
@@ -1,60 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { OPENAI_OUTPUT_MAX } from "./openai";
|
||||
|
||||
// https://docs.mistral.ai/api#operation/createChatCompletion
|
||||
export const MistralAIV1ChatCompletionsSchema = z.object({
|
||||
model: z.string(),
|
||||
messages: z.array(
|
||||
z.object({
|
||||
role: z.enum(["system", "user", "assistant"]),
|
||||
content: z.string(),
|
||||
})
|
||||
),
|
||||
temperature: z.number().optional().default(0.7),
|
||||
top_p: z.number().optional().default(1),
|
||||
max_tokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.nullish()
|
||||
.transform((v) => Math.min(v ?? OPENAI_OUTPUT_MAX, OPENAI_OUTPUT_MAX)),
|
||||
stream: z.boolean().optional().default(false),
|
||||
safe_prompt: z.boolean().optional().default(false),
|
||||
random_seed: z.number().int().optional(),
|
||||
});
|
||||
export type MistralAIChatMessage = z.infer<
|
||||
typeof MistralAIV1ChatCompletionsSchema
|
||||
>["messages"][0];
|
||||
|
||||
export function fixMistralPrompt(
|
||||
messages: MistralAIChatMessage[]
|
||||
): MistralAIChatMessage[] {
|
||||
// Mistral uses OpenAI format but has some additional requirements:
|
||||
// - Only one system message per request, and it must be the first message if
|
||||
// present.
|
||||
// - Final message must be a user message.
|
||||
// - Cannot have multiple messages from the same role in a row.
|
||||
// While frontends should be able to handle this, we can fix it here in the
|
||||
// meantime.
|
||||
|
||||
return messages.reduce<MistralAIChatMessage[]>((acc, msg) => {
|
||||
if (acc.length === 0) {
|
||||
acc.push(msg);
|
||||
return acc;
|
||||
}
|
||||
|
||||
const copy = { ...msg };
|
||||
// Reattribute subsequent system messages to the user
|
||||
if (msg.role === "system") {
|
||||
copy.role = "user";
|
||||
}
|
||||
|
||||
// Consolidate multiple messages from the same role
|
||||
const last = acc[acc.length - 1];
|
||||
if (last.role === copy.role) {
|
||||
last.content += "\n\n" + copy.content;
|
||||
} else {
|
||||
acc.push(copy);
|
||||
}
|
||||
return acc;
|
||||
}, []);
|
||||
}
|
||||
@@ -1,66 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { Request } from "express";
|
||||
import { OpenAIV1ChatCompletionSchema } from "./openai";
|
||||
|
||||
// https://platform.openai.com/docs/api-reference/images/create
|
||||
export const OpenAIV1ImagesGenerationSchema = z
|
||||
.object({
|
||||
prompt: z.string().max(4000),
|
||||
model: z.string().max(100).optional(),
|
||||
quality: z.enum(["standard", "hd"]).optional().default("standard"),
|
||||
n: z.number().int().min(1).max(4).optional().default(1),
|
||||
response_format: z.enum(["url", "b64_json"]).optional(),
|
||||
size: z
|
||||
.enum(["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"])
|
||||
.optional()
|
||||
.default("1024x1024"),
|
||||
style: z.enum(["vivid", "natural"]).optional().default("vivid"),
|
||||
user: z.string().max(500).optional(),
|
||||
})
|
||||
.strip();
|
||||
|
||||
// Takes the last chat message and uses it verbatim as the image prompt.
|
||||
export function openAIToOpenAIImage(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-OpenAI-image request",
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages } = result.data;
|
||||
const prompt = messages.filter((m) => m.role === "user").pop()?.content;
|
||||
if (Array.isArray(prompt)) {
|
||||
throw new Error("Image generation prompt must be a text message.");
|
||||
}
|
||||
|
||||
if (body.stream) {
|
||||
throw new Error(
|
||||
"Streaming is not supported for image generation requests.",
|
||||
);
|
||||
}
|
||||
|
||||
// Some frontends do weird things with the prompt, like prefixing it with a
|
||||
// character name or wrapping the entire thing in quotes. We will look for
|
||||
// the index of "Image:" and use everything after that as the prompt.
|
||||
|
||||
const index = prompt?.toLowerCase().indexOf("image:");
|
||||
if (index === -1 || !prompt) {
|
||||
throw new Error(
|
||||
`Start your prompt with 'Image:' followed by a description of the image you want to generate (received: ${prompt}).`,
|
||||
);
|
||||
}
|
||||
|
||||
// TODO: Add some way to specify parameters via chat message
|
||||
const transformed = {
|
||||
model: body.model.includes("dall-e") ? body.model : "dall-e-3",
|
||||
quality: "standard",
|
||||
size: "1024x1024",
|
||||
response_format: "url",
|
||||
prompt: prompt.slice(index! + 6).trim(),
|
||||
};
|
||||
return OpenAIV1ImagesGenerationSchema.parse(transformed);
|
||||
}
|
||||
@@ -1,56 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import {
|
||||
flattenOpenAIChatMessages,
|
||||
OpenAIV1ChatCompletionSchema,
|
||||
} from "./openai";
|
||||
import { Request } from "express";
|
||||
|
||||
export const OpenAIV1TextCompletionSchema = z
|
||||
.object({
|
||||
model: z
|
||||
.string()
|
||||
.max(100)
|
||||
.regex(
|
||||
/^gpt-3.5-turbo-instruct/,
|
||||
"Model must start with 'gpt-3.5-turbo-instruct'"
|
||||
),
|
||||
prompt: z.string({
|
||||
required_error:
|
||||
"No `prompt` found. Ensure you've set the correct completion endpoint.",
|
||||
}),
|
||||
logprobs: z.number().int().nullish().default(null),
|
||||
echo: z.boolean().optional().default(false),
|
||||
best_of: z.literal(1).optional(),
|
||||
stop: z
|
||||
.union([z.string().max(500), z.array(z.string().max(500)).max(4)])
|
||||
.optional(),
|
||||
suffix: z.string().max(1000).optional(),
|
||||
})
|
||||
.strip()
|
||||
.merge(OpenAIV1ChatCompletionSchema.omit({ messages: true, logprobs: true }));
|
||||
|
||||
export function openAIToOpenAIText(req: Request) {
|
||||
const { body } = req;
|
||||
const result = OpenAIV1ChatCompletionSchema.safeParse(body);
|
||||
if (!result.success) {
|
||||
req.log.warn(
|
||||
{ issues: result.error.issues, body },
|
||||
"Invalid OpenAI-to-OpenAI-text request"
|
||||
);
|
||||
throw result.error;
|
||||
}
|
||||
|
||||
const { messages, ...rest } = result.data;
|
||||
const prompt = flattenOpenAIChatMessages(messages);
|
||||
|
||||
let stops = rest.stop
|
||||
? Array.isArray(rest.stop)
|
||||
? rest.stop
|
||||
: [rest.stop]
|
||||
: [];
|
||||
stops.push("\n\nUser:");
|
||||
stops = [...new Set(stops)];
|
||||
|
||||
const transformed = { ...rest, prompt: prompt, stop: stops };
|
||||
return OpenAIV1TextCompletionSchema.parse(transformed);
|
||||
}
|
||||
@@ -1,133 +0,0 @@
|
||||
import { z } from "zod";
|
||||
import { config } from "../../config";
|
||||
|
||||
export const OPENAI_OUTPUT_MAX = config.maxOutputTokensOpenAI;
|
||||
|
||||
// https://platform.openai.com/docs/api-reference/chat/create
|
||||
const OpenAIV1ChatContentArraySchema = z.array(
|
||||
z.union([
|
||||
z.object({ type: z.literal("text"), text: z.string() }),
|
||||
z.object({
|
||||
type: z.union([z.literal("image"), z.literal("image_url")]),
|
||||
image_url: z.object({
|
||||
url: z.string().url(),
|
||||
detail: z.enum(["low", "auto", "high"]).optional().default("auto"),
|
||||
}),
|
||||
}),
|
||||
])
|
||||
);
|
||||
export const OpenAIV1ChatCompletionSchema = z
|
||||
.object({
|
||||
model: z.string().max(100),
|
||||
messages: z.array(
|
||||
z.object({
|
||||
role: z.enum(["system", "user", "assistant", "tool", "function"]),
|
||||
content: z.union([z.string(), OpenAIV1ChatContentArraySchema]),
|
||||
name: z.string().optional(),
|
||||
tool_calls: z.array(z.any()).optional(),
|
||||
function_call: z.array(z.any()).optional(),
|
||||
tool_call_id: z.string().optional(),
|
||||
}),
|
||||
{
|
||||
required_error:
|
||||
"No `messages` found. Ensure you've set the correct completion endpoint.",
|
||||
invalid_type_error:
|
||||
"Messages were not formatted correctly. Refer to the OpenAI Chat API documentation for more information.",
|
||||
}
|
||||
),
|
||||
temperature: z.number().optional().default(1),
|
||||
top_p: z.number().optional().default(1),
|
||||
n: z
|
||||
.literal(1, {
|
||||
errorMap: () => ({
|
||||
message: "You may only request a single completion at a time.",
|
||||
}),
|
||||
})
|
||||
.optional(),
|
||||
stream: z.boolean().optional().default(false),
|
||||
stop: z
|
||||
.union([z.string().max(500), z.array(z.string().max(500))])
|
||||
.optional(),
|
||||
max_tokens: z.coerce
|
||||
.number()
|
||||
.int()
|
||||
.nullish()
|
||||
.default(16)
|
||||
.transform((v) => Math.min(v ?? OPENAI_OUTPUT_MAX, OPENAI_OUTPUT_MAX)),
|
||||
frequency_penalty: z.number().optional().default(0),
|
||||
presence_penalty: z.number().optional().default(0),
|
||||
logit_bias: z.any().optional(),
|
||||
user: z.string().max(500).optional(),
|
||||
seed: z.number().int().optional(),
|
||||
// Be warned that Azure OpenAI combines these two into a single field.
|
||||
// It's the only deviation from the OpenAI API that I'm aware of so I have
|
||||
// special cased it in `addAzureKey` rather than expecting clients to do it.
|
||||
logprobs: z.boolean().optional(),
|
||||
top_logprobs: z.number().int().optional(),
|
||||
// Quickly adding some newer tool usage params, not tested. They will be
|
||||
// passed through to the API as-is.
|
||||
tools: z.array(z.any()).optional(),
|
||||
functions: z.array(z.any()).optional(),
|
||||
tool_choice: z.any().optional(),
|
||||
function_choice: z.any().optional(),
|
||||
response_format: z.any(),
|
||||
})
|
||||
// Tool usage must be enabled via config because we currently have no way to
|
||||
// track quota usage for them or enforce limits.
|
||||
.omit(
|
||||
Boolean(config.allowOpenAIToolUsage) ? {} : { tools: true, functions: true }
|
||||
)
|
||||
.strip();
|
||||
export type OpenAIChatMessage = z.infer<
|
||||
typeof OpenAIV1ChatCompletionSchema
|
||||
>["messages"][0];
|
||||
|
||||
export function flattenOpenAIMessageContent(
|
||||
content: OpenAIChatMessage["content"]
|
||||
): string {
|
||||
return Array.isArray(content)
|
||||
? content
|
||||
.map((contentItem) => {
|
||||
if ("text" in contentItem) return contentItem.text;
|
||||
if ("image_url" in contentItem) return "[ Uploaded Image Omitted ]";
|
||||
})
|
||||
.join("\n")
|
||||
: content;
|
||||
}
|
||||
|
||||
export function flattenOpenAIChatMessages(messages: OpenAIChatMessage[]) {
|
||||
// Temporary to allow experimenting with prompt strategies
|
||||
const PROMPT_VERSION: number = 1;
|
||||
switch (PROMPT_VERSION) {
|
||||
case 1:
|
||||
return (
|
||||
messages
|
||||
.map((m) => {
|
||||
// Claude-style human/assistant turns
|
||||
let role: string = m.role;
|
||||
if (role === "assistant") {
|
||||
role = "Assistant";
|
||||
} else if (role === "system") {
|
||||
role = "System";
|
||||
} else if (role === "user") {
|
||||
role = "User";
|
||||
}
|
||||
return `\n\n${role}: ${flattenOpenAIMessageContent(m.content)}`;
|
||||
})
|
||||
.join("") + "\n\nAssistant:"
|
||||
);
|
||||
case 2:
|
||||
return messages
|
||||
.map((m) => {
|
||||
// Claude without prefixes (except system) and no Assistant priming
|
||||
let role: string = "";
|
||||
if (role === "system") {
|
||||
role = "System: ";
|
||||
}
|
||||
return `\n\n${role}${flattenOpenAIMessageContent(m.content)}`;
|
||||
})
|
||||
.join("");
|
||||
default:
|
||||
throw new Error(`Unknown prompt version: ${PROMPT_VERSION}`);
|
||||
}
|
||||
}
|
||||
@@ -4,13 +4,8 @@ import os from "os";
|
||||
import schedule from "node-schedule";
|
||||
import { config } from "../../config";
|
||||
import { logger } from "../../logger";
|
||||
import {
|
||||
getServiceForModel,
|
||||
LLMService,
|
||||
MODEL_FAMILY_SERVICE,
|
||||
ModelFamily,
|
||||
} from "../models";
|
||||
import { Key, KeyProvider, Model } from "./index";
|
||||
import { LLMService, MODEL_FAMILY_SERVICE, ModelFamily } from "../models";
|
||||
import { Key, Model, KeyProvider } from "./index";
|
||||
import { AnthropicKeyProvider, AnthropicKeyUpdate } from "./anthropic/provider";
|
||||
import { OpenAIKeyProvider, OpenAIKeyUpdate } from "./openai/provider";
|
||||
import { GoogleAIKeyProvider } from "./google-ai/provider";
|
||||
@@ -47,7 +42,7 @@ export class KeyPool {
|
||||
}
|
||||
|
||||
public get(model: Model): Key {
|
||||
const service = getServiceForModel(model);
|
||||
const service = this.getServiceForModel(model);
|
||||
return this.getKeyProvider(service).get(model);
|
||||
}
|
||||
|
||||
@@ -77,7 +72,7 @@ export class KeyPool {
|
||||
public available(model: Model | "all" = "all"): number {
|
||||
return this.keyProviders.reduce((sum, provider) => {
|
||||
const includeProvider =
|
||||
model === "all" || getServiceForModel(model) === provider.service;
|
||||
model === "all" || this.getServiceForModel(model) === provider.service;
|
||||
return sum + (includeProvider ? provider.available() : 0);
|
||||
}, 0);
|
||||
}
|
||||
@@ -114,6 +109,33 @@ export class KeyPool {
|
||||
provider.recheck();
|
||||
}
|
||||
|
||||
private getServiceForModel(model: Model): LLMService {
|
||||
if (
|
||||
model.startsWith("gpt") ||
|
||||
model.startsWith("text-embedding-ada") ||
|
||||
model.startsWith("dall-e")
|
||||
) {
|
||||
// https://platform.openai.com/docs/models/model-endpoint-compatibility
|
||||
return "openai";
|
||||
} else if (model.startsWith("claude-")) {
|
||||
// https://console.anthropic.com/docs/api/reference#parameters
|
||||
return "anthropic";
|
||||
} else if (model.includes("gemini")) {
|
||||
// https://developers.generativeai.google.com/models/language
|
||||
return "google-ai";
|
||||
} else if (model.includes("mistral")) {
|
||||
// https://docs.mistral.ai/platform/endpoints
|
||||
return "mistral-ai";
|
||||
} else if (model.startsWith("anthropic.claude")) {
|
||||
// AWS offers models from a few providers
|
||||
// https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html
|
||||
return "aws";
|
||||
} else if (model.startsWith("azure")) {
|
||||
return "azure";
|
||||
}
|
||||
throw new Error(`Unknown service for model '${model}'`);
|
||||
}
|
||||
|
||||
private getKeyProvider(service: LLMService): KeyProvider {
|
||||
return this.keyProviders.find((provider) => provider.service === service)!;
|
||||
}
|
||||
|
||||
@@ -73,12 +73,6 @@ export class OpenAIKeyChecker extends KeyCheckerBase<OpenAIKey> {
|
||||
|
||||
const families = new Set<OpenAIModelFamily>();
|
||||
models.forEach(({ id }) => families.add(getOpenAIModelFamily(id, "turbo")));
|
||||
|
||||
// disable dall-e for trial keys due to very low per-day quota that tends to
|
||||
// render the key unusable.
|
||||
if (key.isTrial) {
|
||||
families.delete("dall-e");
|
||||
}
|
||||
|
||||
// as of 2023-11-18, many keys no longer return the dalle3 model but still
|
||||
// have access to it via the api for whatever reason.
|
||||
@@ -86,15 +80,6 @@ export class OpenAIKeyChecker extends KeyCheckerBase<OpenAIKey> {
|
||||
// families.delete("dall-e");
|
||||
// }
|
||||
|
||||
// as of 2024-01-10, the models endpoint has a bug and sometimes returns the
|
||||
// gpt-4-32k-0314 snapshot even though the key doesn't have access to
|
||||
// base gpt-4-32k. we will ignore this model if the snapshot is returned
|
||||
// without the base model.
|
||||
const has32k = models.find(({ id }) => id === "gpt-4-32k");
|
||||
if (families.has("gpt4-32k") && !has32k) {
|
||||
families.delete("gpt4-32k");
|
||||
}
|
||||
|
||||
// We want to update the key's model families here, but we don't want to
|
||||
// update its `lastChecked` timestamp because we need to let the liveness
|
||||
// check run before we can consider the key checked.
|
||||
|
||||
+1
-29
@@ -73,8 +73,7 @@ export const LLM_SERVICES = (<A extends readonly LLMService[]>(
|
||||
] as const);
|
||||
|
||||
export const OPENAI_MODEL_FAMILY_MAP: { [regex: string]: OpenAIModelFamily } = {
|
||||
"^gpt-4-turbo(-preview)?$": "gpt4-turbo",
|
||||
"^gpt-4-(0125|1106)(-preview)?$": "gpt4-turbo",
|
||||
"^gpt-4-1106(-preview)?$": "gpt4-turbo",
|
||||
"^gpt-4(-\\d{4})?-vision(-preview)?$": "gpt4-turbo",
|
||||
"^gpt-4-32k-\\d{4}$": "gpt4-32k",
|
||||
"^gpt-4-32k$": "gpt4-32k",
|
||||
@@ -205,33 +204,6 @@ export function getModelFamilyForRequest(req: Request): ModelFamily {
|
||||
return (req.modelFamily = modelFamily);
|
||||
}
|
||||
|
||||
export function getServiceForModel(model: string): LLMService {
|
||||
if (
|
||||
model.startsWith("gpt") ||
|
||||
model.startsWith("text-embedding-ada") ||
|
||||
model.startsWith("dall-e")
|
||||
) {
|
||||
// https://platform.openai.com/docs/models/model-endpoint-compatibility
|
||||
return "openai";
|
||||
} else if (model.startsWith("claude-")) {
|
||||
// https://console.anthropic.com/docs/api/reference#parameters
|
||||
return "anthropic";
|
||||
} else if (model.includes("gemini")) {
|
||||
// https://developers.generativeai.google.com/models/language
|
||||
return "google-ai";
|
||||
} else if (model.includes("mistral")) {
|
||||
// https://docs.mistral.ai/platform/endpoints
|
||||
return "mistral-ai";
|
||||
} else if (model.startsWith("anthropic.claude")) {
|
||||
// AWS offers models from a few providers
|
||||
// https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html
|
||||
return "aws";
|
||||
} else if (model.startsWith("azure")) {
|
||||
return "azure";
|
||||
}
|
||||
throw new Error(`Unknown service for model '${model}'`);
|
||||
}
|
||||
|
||||
function assertNever(x: never): never {
|
||||
throw new Error(`Called assertNever with argument ${x}.`);
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import { MistralAIChatMessage } from "../../proxy/middleware/request/preprocessors/transform-outbound-payload.js";
|
||||
import * as tokenizer from "./mistral-tokenizer-js";
|
||||
import { MistralAIChatMessage } from "../api-schemas";
|
||||
|
||||
export function init() {
|
||||
tokenizer.initializemistralTokenizer();
|
||||
|
||||
@@ -2,7 +2,11 @@ import { Tiktoken } from "tiktoken/lite";
|
||||
import cl100k_base from "tiktoken/encoders/cl100k_base.json";
|
||||
import { logger } from "../../logger";
|
||||
import { libSharp } from "../file-storage";
|
||||
import { GoogleAIChatMessage, OpenAIChatMessage } from "../api-schemas";
|
||||
import type {
|
||||
GoogleAIChatMessage,
|
||||
OpenAIChatMessage,
|
||||
} from "../../proxy/middleware/request/preprocessors/transform-outbound-payload";
|
||||
import { z } from "zod";
|
||||
|
||||
const log = logger.child({ module: "tokenizer", service: "openai" });
|
||||
const GPT4_VISION_SYSTEM_PROMPT_SIZE = 170;
|
||||
@@ -50,7 +54,7 @@ export async function getTokenCount(
|
||||
for (const item of value) {
|
||||
if (item.type === "text") {
|
||||
textContent += item.text;
|
||||
} else if (["image", "image_url"].includes(item.type)) {
|
||||
} else if (item.type === "image_url") {
|
||||
const { url, detail } = item.image_url;
|
||||
const cost = await getGpt4VisionTokenCost(url, detail);
|
||||
numTokens += cost ?? 0;
|
||||
@@ -229,9 +233,7 @@ export function getOpenAIImageCost(params: {
|
||||
};
|
||||
}
|
||||
|
||||
export function estimateGoogleAITokenCount(
|
||||
prompt: string | GoogleAIChatMessage[]
|
||||
) {
|
||||
export function estimateGoogleAITokenCount(prompt: string | GoogleAIChatMessage[]) {
|
||||
if (typeof prompt === "string") {
|
||||
return getTextTokenCount(prompt);
|
||||
}
|
||||
|
||||
@@ -1,25 +1,25 @@
|
||||
import { Request } from "express";
|
||||
import { assertNever } from "../utils";
|
||||
import {
|
||||
getTokenCount as getClaudeTokenCount,
|
||||
init as initClaude,
|
||||
} from "./claude";
|
||||
import {
|
||||
estimateGoogleAITokenCount,
|
||||
getOpenAIImageCost,
|
||||
getTokenCount as getOpenAITokenCount,
|
||||
init as initOpenAi,
|
||||
} from "./openai";
|
||||
import {
|
||||
getTokenCount as getMistralAITokenCount,
|
||||
init as initMistralAI,
|
||||
} from "./mistral";
|
||||
import { APIFormat } from "../key-management";
|
||||
import {
|
||||
import type {
|
||||
GoogleAIChatMessage,
|
||||
MistralAIChatMessage,
|
||||
OpenAIChatMessage,
|
||||
} from "../api-schemas";
|
||||
} from "../../proxy/middleware/request/preprocessors/transform-outbound-payload";
|
||||
import { assertNever } from "../utils";
|
||||
import {
|
||||
init as initClaude,
|
||||
getTokenCount as getClaudeTokenCount,
|
||||
} from "./claude";
|
||||
import {
|
||||
init as initOpenAi,
|
||||
getTokenCount as getOpenAITokenCount,
|
||||
getOpenAIImageCost,
|
||||
estimateGoogleAITokenCount,
|
||||
} from "./openai";
|
||||
import {
|
||||
init as initMistralAI,
|
||||
getTokenCount as getMistralAITokenCount,
|
||||
} from "./mistral";
|
||||
import { APIFormat } from "../key-management";
|
||||
|
||||
export async function init() {
|
||||
initClaude();
|
||||
@@ -37,11 +37,7 @@ type TokenCountRequest = { req: Request } & (
|
||||
service: "openai-text" | "anthropic" | "google-ai";
|
||||
}
|
||||
| { prompt?: GoogleAIChatMessage[]; completion?: never; service: "google-ai" }
|
||||
| {
|
||||
prompt: MistralAIChatMessage[];
|
||||
completion?: never;
|
||||
service: "mistral-ai";
|
||||
}
|
||||
| { prompt: MistralAIChatMessage[]; completion?: never; service: "mistral-ai" }
|
||||
| { prompt?: never; completion: string; service: APIFormat }
|
||||
| { prompt?: never; completion?: never; service: "openai-image" }
|
||||
);
|
||||
|
||||
Reference in New Issue
Block a user