Spaces:
Configuration error
title: v1.71.1-stable - 2x Higher Requests Per Second (RPS)
slug: v1.71.1-stable
date: 2025-05-24T10:00:00.000Z
authors:
- name: Krrish Dholakia
title: CEO, LiteLLM
url: https://www.linkedin.com/in/krish-d/
image_url: >-
https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8
- name: Ishaan Jaffer
title: CTO, LiteLLM
url: https://www.linkedin.com/in/reffajnaahsi/
image_url: >-
https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
hide_table_of_contents: false
import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Deploy this version
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.71.1-stable
pip install litellm==1.71.1
Key Highlights
LiteLLM v1.71.1-stable is live now. Here are the key highlights of this release:
- Performance improvements: LiteLLM can now scale to 200 RPS per instance with a 74ms median response time.
- File Permissions: Control file access across OpenAI, Azure, VertexAI.
- MCP x OpenAI: Use MCP servers with OpenAI Responses API.
Performance Improvements
<Image img={require('../../img/perf_imp.png')} style={{ width: '800px', height: 'auto' }} />
This release brings aiohttp support for all LLM api providers. This means that LiteLLM can now scale to 200 RPS per instance with a 40ms median latency overhead.
This change doubles the RPS LiteLLM can scale to at this latency overhead.
You can opt into this by enabling the flag below. (We expect to make this the default in 1 week.)
Flag to enable
On LiteLLM Proxy
Set the USE_AIOHTTP_TRANSPORT=True
in the environment variables.
export USE_AIOHTTP_TRANSPORT="True"
On LiteLLM Python SDK
Set the use_aiohttp_transport=True
to enable aiohttp transport.
import litellm
litellm.use_aiohttp_transport = True # default is False, enable this to use aiohttp transport
result = litellm.completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello, world!"}],
)
print(result)
File Permissions
<Image img={require('../../img/files_api_graphic.png')} style={{ width: '800px', height: 'auto' }} />
This release brings support for File Permissions and Finetuning APIs to LiteLLM Managed Files. This is great for:
- Proxy Admins: as users can only view/edit/delete files they’ve created - even when using shared OpenAI/Azure/Vertex deployments.
- Developers: get a standard interface to use Files across Chat/Finetuning/Batch APIs.
New Models / Updated Models
- Gemini VertexAI, Google AI Studio
- Anthropic
- Claude-4 model family support - PR
- Bedrock
- VertexAI
- xAI
xai/grok-3
pricing information - PR
- LM Studio
- Structured JSON schema outputs support - PR
- SambaNova
- Updated models and parameters - PR
- Databricks
- Azure
- Mistral
- devstral-small-2505 model pricing and context window - PR
- Ollama
- Wildcard model support - PR
- CustomLLM
- Embeddings support added - PR
- Featherless AI
- Access to 4200+ models - PR
LLM API Endpoints
- Image Edits
- Responses API
- MCP support for Responses API - PR
- Files API
Management Endpoints / UI
- Teams
- Keys
- Logs
- Guardrails
- Config.yaml guardrails display - PR
- Organizations/Users
- Audit Logs
/list
and/info
endpoints for Audit Logs - PR
Logging / Alerting Integrations
- Prometheus
- Track
route
on proxy_* metrics - PR
- Track
- Langfuse
- DeepEval/ConfidentAI
- Logging enabled for proxy and SDK - PR
- Logfire
- Fix otel proxy server initialization when using Logfire - PR
Authentication & Security
- JWT Authentication
- Custom Auth
- Support for switching between custom auth and API key auth - PR
Performance / Reliability Improvements
- aiohttp Transport
- Background Health Checks
- Improved reliability - PR
- Response Handling
- Thread Management
- Removed error-creating threads for reliability - PR
General Proxy Improvements
Bug Fixes
This release includes numerous bug fixes to improve stability and reliability:
LLM Provider Fixes
Authentication & Users
Database & Infrastructure
UI & Display
Model & Routing
New Contributors
- @DarinVerheijke made their first contribution in PR #10596
- @estsauver made their first contribution in PR #10929
- @mohittalele made their first contribution in PR #10665
- @pselden made their first contribution in PR #10899
- @unrealandychan made their first contribution in PR #10842
- @dastaiger made their first contribution in PR #10946
- @slytechnical made their first contribution in PR #10881
- @daarko10 made their first contribution in PR #11006
- @sorenmat made their first contribution in PR #10658
- @matthid made their first contribution in PR #10982
- @jgowdy-godaddy made their first contribution in PR #11032
- @bepotp made their first contribution in PR #11008
- @jmorenoc-o made their first contribution in PR #11031
- @martin-liu made their first contribution in PR #11076
- @gunjan-solanki made their first contribution in PR #11064
- @tokoko made their first contribution in PR #10980
- @spike-spiegel-21 made their first contribution in PR #10649
- @kreatoo made their first contribution in PR #10927
- @baejooc made their first contribution in PR #10887
- @keykbd made their first contribution in PR #11114
- @dalssoft made their first contribution in PR #11088
- @jtong99 made their first contribution in PR #10853
Demo Instance
Here's a Demo Instance to test changes:
- Instance: https://demo.litellm.ai/
- Login Credentials:
- Username: admin
- Password: sk-1234