Alibaba Qwen & Kimi Models
Complete integration guide for adding Alibaba Cloud's Qwen and Moonshot AI's Kimi models to your existing LiteLLM architecture
Excellent News: Full Native Support!
LiteLLM has complete native support for both Alibaba Cloud Qwen (via DashScope) and Moonshot AI Kimi models. Since you're already using LiteLLM with Claude from Bedrock, adding these providers is straightforward and requires minimal configuration changes.
Provider Support Status
- Status: Fully supported (2024-2025)
- Provider: dashscope
- OpenAI-compatible API
- International endpoint available
- Status: Fully supported (v1.80.0+)
- Provider: moonshot
- OpenAI-compatible API
- K2 reasoning models included
Available Models
Qwen Models (DashScope)
Qwen 3 Series (Latest - Recommended)
dashscope/qwen3-32b
dashscope/qwen3-coder-480b-a35b-instruct
dashscope/qwen3-coder-30b-a3b-instruct
dashscope/qwen3-vl-flash
dashscope/qwen3-omni-30b-a3b-captioner # Audio captioning
dashscope/qwen2.5-omni-7b # Multimodal
dashscope/qwq-32b # Reasoning model
Qwen Standard Models
dashscope/qwen-turbo
dashscope/qwen-plus
dashscope/qwen-max
dashscope/qwen-turbo-latest
dashscope/qwen-plus-latest
dashscope/qwen-max-latest
Kimi Models (Moonshot AI)
moonshot/moonshot-v1-8k # 8K context window
moonshot/moonshot-v1-32k # 32K context window
moonshot/moonshot-v1-128k # 128K context window
moonshot/kimi-k2 # Latest K2 model
moonshot/kimi-k2-thinking # K2 with reasoning
moonshot/kimi-k2-instruct # K2 instruction-tuned
Integration Options
Option 1: Direct SDK Integration (Python)
Quick integration for testing and development:
For Qwen:
import os
from litellm import completion
# Set API key
os.environ['DASHSCOPE_API_KEY'] = "your-dashscope-api-key"
# Optional: International endpoint
os.environ['DASHSCOPE_API_BASE'] = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
# Make a call
response = completion(
model="dashscope/qwen-turbo",
messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)
For Kimi:
import os
from litellm import completion
# Set API key
os.environ['MOONSHOT_API_KEY'] = "your-moonshot-api-key"
# Make a call
response = completion(
model="moonshot/moonshot-v1-8k",
messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)
Option 2: LiteLLM Proxy Server (Recommended)
Best for production environments with existing LiteLLM architecture:
Configuration File (config.yaml):
model_list:
# Existing Claude/Bedrock models
- model_name: claude-3-opus
litellm_params:
model: bedrock/anthropic.claude-3-opus-20240229-v1:0
aws_region_name: us-east-1
# Add Qwen models
- model_name: qwen-turbo
litellm_params:
model: dashscope/qwen-turbo
api_key: os.environ/DASHSCOPE_API_KEY
api_base: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
- model_name: qwen-max
litellm_params:
model: dashscope/qwen-max
api_key: os.environ/DASHSCOPE_API_KEY
# Add Kimi models
- model_name: kimi-8k
litellm_params:
model: moonshot/moonshot-v1-8k
api_key: os.environ/MOONSHOT_API_KEY
- model_name: kimi-128k
litellm_params:
model: moonshot/moonshot-v1-128k
api_key: os.environ/MOONSHOT_API_KEY
router_settings:
routing_strategy: simple-shuffle
num_retries: 3
timeout: 600
Start the Proxy:
# Set environment variables
export DASHSCOPE_API_KEY="your-dashscope-key"
export MOONSHOT_API_KEY="your-moonshot-key"
# Start proxy
litellm --config config.yaml --port 4000
Option 3: Docker Deployment
Containerized deployment for scalable production environments:
# Pull LiteLLM image
docker pull ghcr.io/berriai/litellm:main-latest
# Run with environment variables
docker run -d \
-p 4000:4000 \
-e DASHSCOPE_API_KEY=your-dashscope-key \
-e MOONSHOT_API_KEY=your-moonshot-key \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --port 4000
Cost Comparison
Approximate pricing as of 2025 (per 1M tokens):
| Provider | Model | Input | Output |
|---|---|---|---|
| Bedrock | Claude 3 Opus | $15.00 | $75.00 |
| Bedrock | Claude 3 Sonnet | $3.00 | $15.00 |
| Qwen | qwen-turbo | ~$0.50 | ~$1.50 |
| Qwen | qwen-plus | ~$2.00 | ~$6.00 |
| Qwen | qwen-max | ~$4.00 | ~$12.00 |
| Kimi | moonshot-v1-8k | ~$1.00 | ~$3.00 |
| Kimi | moonshot-v1-128k | ~$2.00 | ~$6.00 |
Potential Cost Savings
60-90% cost reduction for similar quality tasks compared to Claude models
Quick Start Checklist
- Get Alibaba Cloud DashScope API key
- Get Moonshot AI API key
- Add Qwen models to config.yaml
- Add Kimi models to config.yaml
- Set environment variables
- Test Qwen model calls
- Test Kimi model calls
- Verify streaming works
- Check cost tracking
- Configure fallbacks and load balancing
- Enable monitoring
- Set budget limits
- Deploy to production
Recommended Approach
- Chinese language tasks
- Code generation
- Cost-sensitive workloads
- High-volume requests
- Long-context tasks (128K)
- Reasoning-heavy workloads
- Alternative to Claude
- Provider diversity
Migration Strategy
Phase 1 (Week 1-2): Add new models alongside Claude for testing
Phase 2 (Week 3-4): A/B test with traffic mirroring
Phase 3 (Week 5+): Configure fallbacks and gradually shift traffic