Alibaba Qwen & Kimi Models

Complete integration guide for adding Alibaba Cloud's Qwen and Moonshot AI's Kimi models to your existing LiteLLM architecture

Excellent News: Full Native Support!

LiteLLM has complete native support for both Alibaba Cloud Qwen (via DashScope) and Moonshot AI Kimi models. Since you're already using LiteLLM with Claude from Bedrock, adding these providers is straightforward and requires minimal configuration changes.

Provider Support Status

Alibaba Cloud Qwen

Status: Fully supported (2024-2025)
Provider: dashscope
OpenAI-compatible API
International endpoint available

Moonshot AI Kimi

Status: Fully supported (v1.80.0+)
Provider: moonshot
OpenAI-compatible API
K2 reasoning models included

Available Models

Qwen Models (DashScope)

Qwen 3 Series (Latest - Recommended)

dashscope/qwen3-32b
dashscope/qwen3-coder-480b-a35b-instruct
dashscope/qwen3-coder-30b-a3b-instruct
dashscope/qwen3-vl-flash
dashscope/qwen3-omni-30b-a3b-captioner  # Audio captioning
dashscope/qwen2.5-omni-7b               # Multimodal
dashscope/qwq-32b                       # Reasoning model

Qwen Standard Models

dashscope/qwen-turbo
dashscope/qwen-plus
dashscope/qwen-max
dashscope/qwen-turbo-latest
dashscope/qwen-plus-latest
dashscope/qwen-max-latest

Kimi Models (Moonshot AI)

moonshot/moonshot-v1-8k      # 8K context window
moonshot/moonshot-v1-32k     # 32K context window
moonshot/moonshot-v1-128k    # 128K context window
moonshot/kimi-k2             # Latest K2 model
moonshot/kimi-k2-thinking    # K2 with reasoning
moonshot/kimi-k2-instruct    # K2 instruction-tuned

Integration Options

Option 1: Direct SDK Integration (Python)

Quick integration for testing and development:

For Qwen:

import os
from litellm import completion

# Set API key
os.environ['DASHSCOPE_API_KEY'] = "your-dashscope-api-key"

# Optional: International endpoint
os.environ['DASHSCOPE_API_BASE'] = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

# Make a call
response = completion(
    model="dashscope/qwen-turbo",
    messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)

For Kimi:

import os
from litellm import completion

# Set API key
os.environ['MOONSHOT_API_KEY'] = "your-moonshot-api-key"

# Make a call
response = completion(
    model="moonshot/moonshot-v1-8k",
    messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)

Option 2: LiteLLM Proxy Server (Recommended)

Best for production environments with existing LiteLLM architecture:

Configuration File (config.yaml):

model_list:
  # Existing Claude/Bedrock models
  - model_name: claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_region_name: us-east-1
  
  # Add Qwen models
  - model_name: qwen-turbo
    litellm_params:
      model: dashscope/qwen-turbo
      api_key: os.environ/DASHSCOPE_API_KEY
      api_base: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
  
  - model_name: qwen-max
    litellm_params:
      model: dashscope/qwen-max
      api_key: os.environ/DASHSCOPE_API_KEY
  
  # Add Kimi models
  - model_name: kimi-8k
    litellm_params:
      model: moonshot/moonshot-v1-8k
      api_key: os.environ/MOONSHOT_API_KEY
  
  - model_name: kimi-128k
    litellm_params:
      model: moonshot/moonshot-v1-128k
      api_key: os.environ/MOONSHOT_API_KEY

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  timeout: 600

Start the Proxy:

# Set environment variables
export DASHSCOPE_API_KEY="your-dashscope-key"
export MOONSHOT_API_KEY="your-moonshot-key"

# Start proxy
litellm --config config.yaml --port 4000

Option 3: Docker Deployment

Containerized deployment for scalable production environments:

# Pull LiteLLM image
docker pull ghcr.io/berriai/litellm:main-latest

# Run with environment variables
docker run -d \
  -p 4000:4000 \
  -e DASHSCOPE_API_KEY=your-dashscope-key \
  -e MOONSHOT_API_KEY=your-moonshot-key \
  -v $(pwd)/config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml --port 4000

Cost Comparison

Approximate pricing as of 2025 (per 1M tokens):

Provider	Model	Input	Output
Bedrock	Claude 3 Opus	$15.00	$75.00
Bedrock	Claude 3 Sonnet	$3.00	$15.00
Qwen	qwen-turbo	~$0.50	~$1.50
Qwen	qwen-plus	~$2.00	~$6.00
Qwen	qwen-max	~$4.00	~$12.00
Kimi	moonshot-v1-8k	~$1.00	~$3.00
Kimi	moonshot-v1-128k	~$2.00	~$6.00

Potential Cost Savings

60-90% cost reduction for similar quality tasks compared to Claude models

Quick Start Checklist

Get Alibaba Cloud DashScope API key
Get Moonshot AI API key
Add Qwen models to config.yaml
Add Kimi models to config.yaml
Set environment variables
Test Qwen model calls
Test Kimi model calls
Verify streaming works
Check cost tracking
Configure fallbacks and load balancing
Enable monitoring
Set budget limits
Deploy to production

Recommended Approach

Use Qwen For

Chinese language tasks
Code generation
Cost-sensitive workloads
High-volume requests

Use Kimi For

Long-context tasks (128K)
Reasoning-heavy workloads
Alternative to Claude
Provider diversity

Migration Strategy

Phase 1 (Week 1-2): Add new models alongside Claude for testing
Phase 2 (Week 3-4): A/B test with traffic mirroring
Phase 3 (Week 5+): Configure fallbacks and gradually shift traffic

Resources

Official Documentation

Community & Support