Per-user token quotas and automatic quota refreshing (khanon/oai-reverse-proxy!37)

This commit is contained in:
khanon
2023-08-28 19:33:14 +00:00
parent 785b1f69f3
commit cb780e85da
31 changed files with 544 additions and 145 deletions

Before

Width:  |  Height:  |  Size: 153 KiB

After

Width:  |  Height:  |  Size: 153 KiB

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 36 KiB

+3 -5
View File
@@ -12,12 +12,12 @@ This repository can be deployed to a [Huggingface Space](https://huggingface.co/
- Provide a name for your Space and select "Docker" as the SDK. Select "Blank" for the template.
- Click "Create Space" and wait for the Space to be created.
![Create Space](huggingface-createspace.png)
![Create Space](assets/huggingface-createspace.png)
### 3. Create an empty Dockerfile
- Once your Space is created, you'll see an option to "Create the Dockerfile in your browser". Click that link.
![Create Dockerfile](huggingface-dockerfile.png)
![Create Dockerfile](assets/huggingface-dockerfile.png)
- Paste the following into the text editor and click "Save".
```dockerfile
FROM node:18-bullseye-slim
@@ -34,7 +34,7 @@ CMD [ "npm", "start" ]
```
- Click "Commit new file to `main`" to save the Dockerfile.
![Commit](huggingface-savedockerfile.png)
![Commit](assets/huggingface-savedockerfile.png)
### 4. Set your API key as a secret
- Click the Settings button in the top right corner of your repository.
@@ -82,8 +82,6 @@ MAX_OUTPUT_TOKENS_ANTHROPIC=512
# Block prompts containing disallowed characters
REJECT_DISALLOWED=false
REJECT_MESSAGE="This content violates /aicg/'s acceptable use policy."
# Show exact quota usage on the Server Info page
QUOTA_DISPLAY_MODE=full
```
See `.env.example` for a full list of available settings, or check `config.ts` for details on what each setting does.
+3
View File
@@ -11,6 +11,7 @@ Several of these features require you to set secrets in your environment. If us
- [Memory](#memory)
- [Firebase Realtime Database](#firebase-realtime-database)
- [Firebase setup instructions](#firebase-setup-instructions)
- [User quotas](#user-quotas)
## No user management (`GATEKEEPER=none`)
@@ -59,3 +60,5 @@ To use Firebase Realtime Database to persist user data, set the following enviro
8. Set `GATEKEEPER_STORE` to `firebase_rtdb` in your environment if you haven't already.
The proxy server will attempt to connect to your Firebase Realtime Database at startup and will throw an error if it cannot connect. If you see this error, check that your `FIREBASE_RTDB_URL` and `FIREBASE_KEY` secrets are set correctly.
# User quotas
+36
View File
@@ -0,0 +1,36 @@
# User Quotas
When using `user_token` authentication, you can set (model) token quotas for user. These quotas are enforced by the proxy server and are separate from the quotas enforced by OpenAI.
You can set the default quota via environment variables. Quotas are enforced on a per-model basis, and count both prompt tokens and completion tokens. By default, all quotas are disabled.
Set the following environment variables to set the default quotas:
- `TOKEN_QUOTA_TURBO`
- `TOKEN_QUOTA_GPT4`
- `TOKEN_QUOTA_CLAUDE`
Quotas only apply to `normal`-type users; `special`-type users are exempt from quotas. You can change users' types via the REST API.
**Note that changes to these environment variables will only apply to newly created users.** To modify existing users' quotas, use the REST API or the admin UI.
## Automatically refreshing quotas
You can use the `QUOTA_REFRESH_PERIOD` environment variable to automatically refresh users' quotas periodically. This is useful if you want to give users a certain number of tokens per day, for example. The entire quota will be refreshed at the start of the specified period, and any tokens a user has not used will not be carried over.
Quotas for all models and users will be refreshed. If you haven't set `TOKEN_QUOTA_*` for a particular model, quotas for that model will not be refreshed (so any manually set quotas will not be overwritten).
Set the `QUOTA_REFRESH_PERIOD` environment variable to one of the following values:
- `daily` (at midnight)
- `hourly`
- leave unset to disable automatic refreshing
You can also use a cron expression, for example:
- Every 45 seconds: `"*/45 * * * * *"`
- Every 30 minutes: `"*/30 * * * *"`
- Every 6 hours: `"0 */6 * * *"`
- Every 3 days: `"0 0 */3 * *"`
- Daily, but at mid-day: `"0 12 * * *"`
Make sure to enclose the cron expression in quotation marks.
All times are in the server's local time zone. Refer to [crontab.guru](https://crontab.guru/) for more examples.