litellm
litellm
https://github.com/BerriAI/litellm/tree/main
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
LiteLLM manages:
- Translate inputs to provider's
completion,embedding, andimage_generationendpoints- Consistent output, text responses will always be available at
['choices'][0]['message']['content']- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
- Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)
Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers🚨 Stable Release: Use docker images with the
-stabletag. These have undergone 12 hour load tests, before being published.Support for more providers. Missing a provider or LLM Platform, raise a feature request.
Proxy Config.yaml
https://docs.litellm.ai/docs/proxy/configs#:~:text=Quick%20Start.%20Set%20a%20model%20alias%20for%20your
model_list:
- model_name: gpt-3.5-turbo ### RECEIVED MODEL NAME ###
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: azure/gpt-turbo-small-eu ### MODEL NAME sent to `litellm.completion()` ###
api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
rpm: 6 # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
- model_name: bedrock-claude-v1
litellm_params:
model: bedrock/anthropic.claude-instant-v1
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: "os.environ/AZURE_API_KEY_CA"
rpm: 6
- model_name: anthropic-claude
litellm_params:
model: bedrock/anthropic.claude-instant-v1
### [OPTIONAL] SET AWS REGION ###
aws_region_name: us-east-1
- model_name: vllm-models
litellm_params:
model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
api_base: http://0.0.0.0:4000/v1
api_key: none
rpm: 1440
model_info:
version: 2
# Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
# Default models
# Works for ALL Providers and needs the default provider credentials in .env
- model_name: "*"
litellm_params:
model: "*"
litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
drop_params: True
success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env
general_settings:
master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env
Step 2: Start Proxy with config
$ litellm --config /path/to/config.yamltipRun with
--detailed_debugif you need detailed debug logs$ litellm --config /path/to/config.yaml --detailed_debugStep 3: Test it
Sends request to model where
model_name=gpt-3.5-turboon config.yaml.If multiple with
model_name=gpt-3.5-turbodoes Load BalancingLangchain, OpenAI SDK Usage Examples
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
https://docs.litellm.ai/docs/proxy/quick_start
Quick Start
Quick start CLI, Config, Docker
LiteLLM Server (LLM Gateway) manages:
- Unified Interface: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI
ChatCompletions&Completionsformat- Cost tracking: Authentication, Spend Tracking & Budgets Virtual Keys
- Load Balancing: between Multiple Models + Deployments of the same model - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
$ pip install 'litellm[proxy]'
https://docs.litellm.ai/docs/completion/input
Provider temperature max_completion_tokens max_tokens top_p stream stream_options stop n presence_penalty frequency_penalty functions function_call logit_bias user response_format seed tools tool_choice logprobs top_logprobs extra_headers Anthropic ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Azure OpenAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Replicate ✅ ✅ ✅ ✅ ✅ ✅ Anyscale ✅ ✅ ✅ ✅ ✅ ✅ Cohere ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Huggingface ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Openrouter ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ AI21 ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ VertexAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ Bedrock ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent) Sagemaker ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ TogetherAI ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ AlephAlpha ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ NLP Cloud ✅ ✅ ✅ ✅ ✅ ✅ Petals ✅ ✅ ✅ ✅ Ollama ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ Databricks ✅ ✅ ✅ ✅ ✅ ✅ ClarifAI ✅ ✅ ✅ ✅ ✅ Github ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ (model dependent) ✅ (model dependent)
成功案例
https://github.com/wandb/openui/tree/main
https://www.tinyash.com/blog/litellm/#:~:text=litellm
https://zhuanlan.zhihu.com/p/692686053#:~:text=LiteLLM%E7%9A%84