Skip to main content

💥 OpenAI Proxy Server - Deploy LiteLLM

A simple, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs in the OpenAI Input/Output format

Endpoints:​

  • /chat/completions - chat completions endpoint to call 100+ LLMs
  • /models - available models on server

Deploy Deploy

info

We want to learn how we can make the proxy better! Meet the founders or join our discord

Local Usage​

$ git clone https://github.com/BerriAI/litellm.git
$ cd ./litellm/openai-proxy
$ uvicorn main:app --host 0.0.0.0 --port 8000

Test Request​

Ensure your API keys are set in the Environment for these requests

curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'

Setting LLM API keys​

This server allows two ways of passing API keys to litellm

  • Environment Variables - This server by default assumes the LLM API Keys are stored in the environment variables
  • Dynamic Variables passed to /chat/completions
    • Set AUTH_STRATEGY=DYNAMIC in the Environment
    • Pass required auth params api_key,api_base, api_version with the request params

Deploy on Google Cloud Run​

Click the button to deploy to Google Cloud Run

Deploy

On a successfull deploy your Cloud Run Shell will have this output

Testing your deployed proxy​

Assuming the required keys are set as Environment Variables

https://litellm-7yjrj3ha2q-uc.a.run.app is our example proxy, substitute it with your deployed cloud run app

curl https://litellm-7yjrj3ha2q-uc.a.run.app/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'

Set LLM API Keys​

Environment Variables​

More info here

  1. In the Google Cloud console, go to Cloud Run: Go to Cloud Run

  2. Click on the litellm service

  3. Click Edit and Deploy New Revision

  4. Enter your Environment Variables Example OPENAI_API_KEY, ANTHROPIC_API_KEY

Deploy on Render​

Click the button to deploy to Render

Deploy

On a successfull deploy https://dashboard.render.com/ should display the following

Advanced​

Caching - Completion() and Embedding() Responses​

Enable caching by adding the following credentials to your server environment

REDIS_HOST = ""       # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = "" # REDIS_PORT='18841'
REDIS_PASSWORD = "" # REDIS_PASSWORD='liteLlmIsAmazing'

Test Caching​

Send the same request twice:

curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
"temperature": 0.7
}'

curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
"temperature": 0.7
}'

Control caching per completion request​