Fully Supported

Alibaba Qwen & Kimi Models

Complete integration guide for adding Alibaba Cloud's Qwen and Moonshot AI's Kimi models to your existing LiteLLM architecture

Excellent News: Full Native Support!

LiteLLM has complete native support for both Alibaba Cloud Qwen (via DashScope) and Moonshot AI Kimi models. Since you're already using LiteLLM with Claude from Bedrock, adding these providers is straightforward and requires minimal configuration changes.

Provider Support Status

Alibaba Cloud Qwen
  • Status: Fully supported (2024-2025)
  • Provider: dashscope
  • OpenAI-compatible API
  • International endpoint available
Moonshot AI Kimi
  • Status: Fully supported (v1.80.0+)
  • Provider: moonshot
  • OpenAI-compatible API
  • K2 reasoning models included

Available Models

Qwen Models (DashScope)

Qwen 3 Series (Latest - Recommended)

dashscope/qwen3-32b
dashscope/qwen3-coder-480b-a35b-instruct
dashscope/qwen3-coder-30b-a3b-instruct
dashscope/qwen3-vl-flash
dashscope/qwen3-omni-30b-a3b-captioner  # Audio captioning
dashscope/qwen2.5-omni-7b               # Multimodal
dashscope/qwq-32b                       # Reasoning model

Qwen Standard Models

dashscope/qwen-turbo
dashscope/qwen-plus
dashscope/qwen-max
dashscope/qwen-turbo-latest
dashscope/qwen-plus-latest
dashscope/qwen-max-latest

Kimi Models (Moonshot AI)

moonshot/moonshot-v1-8k      # 8K context window
moonshot/moonshot-v1-32k     # 32K context window
moonshot/moonshot-v1-128k    # 128K context window
moonshot/kimi-k2             # Latest K2 model
moonshot/kimi-k2-thinking    # K2 with reasoning
moonshot/kimi-k2-instruct    # K2 instruction-tuned

Integration Options

Option 1: Direct SDK Integration (Python)

Quick integration for testing and development:

For Qwen:

import os
from litellm import completion

# Set API key
os.environ['DASHSCOPE_API_KEY'] = "your-dashscope-api-key"

# Optional: International endpoint
os.environ['DASHSCOPE_API_BASE'] = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

# Make a call
response = completion(
    model="dashscope/qwen-turbo",
    messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)

For Kimi:

import os
from litellm import completion

# Set API key
os.environ['MOONSHOT_API_KEY'] = "your-moonshot-api-key"

# Make a call
response = completion(
    model="moonshot/moonshot-v1-8k",
    messages=[{"role": "user", "content": "Hello from LiteLLM"}]
)
print(response.choices[0].message.content)

Option 2: LiteLLM Proxy Server (Recommended)

Best for production environments with existing LiteLLM architecture:

Configuration File (config.yaml):

model_list:
  # Existing Claude/Bedrock models
  - model_name: claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_region_name: us-east-1
  
  # Add Qwen models
  - model_name: qwen-turbo
    litellm_params:
      model: dashscope/qwen-turbo
      api_key: os.environ/DASHSCOPE_API_KEY
      api_base: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
  
  - model_name: qwen-max
    litellm_params:
      model: dashscope/qwen-max
      api_key: os.environ/DASHSCOPE_API_KEY
  
  # Add Kimi models
  - model_name: kimi-8k
    litellm_params:
      model: moonshot/moonshot-v1-8k
      api_key: os.environ/MOONSHOT_API_KEY
  
  - model_name: kimi-128k
    litellm_params:
      model: moonshot/moonshot-v1-128k
      api_key: os.environ/MOONSHOT_API_KEY

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 3
  timeout: 600

Start the Proxy:

# Set environment variables
export DASHSCOPE_API_KEY="your-dashscope-key"
export MOONSHOT_API_KEY="your-moonshot-key"

# Start proxy
litellm --config config.yaml --port 4000

Option 3: Docker Deployment

Containerized deployment for scalable production environments:

# Pull LiteLLM image
docker pull ghcr.io/berriai/litellm:main-latest

# Run with environment variables
docker run -d \
  -p 4000:4000 \
  -e DASHSCOPE_API_KEY=your-dashscope-key \
  -e MOONSHOT_API_KEY=your-moonshot-key \
  -v $(pwd)/config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml --port 4000

Cost Comparison

Approximate pricing as of 2025 (per 1M tokens):

Provider Model Input Output
Bedrock Claude 3 Opus $15.00 $75.00
Bedrock Claude 3 Sonnet $3.00 $15.00
Qwen qwen-turbo ~$0.50 ~$1.50
Qwen qwen-plus ~$2.00 ~$6.00
Qwen qwen-max ~$4.00 ~$12.00
Kimi moonshot-v1-8k ~$1.00 ~$3.00
Kimi moonshot-v1-128k ~$2.00 ~$6.00

Potential Cost Savings

60-90% cost reduction for similar quality tasks compared to Claude models

Quick Start Checklist

  • Get Alibaba Cloud DashScope API key
  • Get Moonshot AI API key
  • Add Qwen models to config.yaml
  • Add Kimi models to config.yaml
  • Set environment variables
  • Test Qwen model calls
  • Test Kimi model calls
  • Verify streaming works
  • Check cost tracking
  • Configure fallbacks and load balancing
  • Enable monitoring
  • Set budget limits
  • Deploy to production

Recommended Approach

Use Qwen For
  • Chinese language tasks
  • Code generation
  • Cost-sensitive workloads
  • High-volume requests
Use Kimi For
  • Long-context tasks (128K)
  • Reasoning-heavy workloads
  • Alternative to Claude
  • Provider diversity

Migration Strategy

Phase 1 (Week 1-2): Add new models alongside Claude for testing
Phase 2 (Week 3-4): A/B test with traffic mirroring
Phase 3 (Week 5+): Configure fallbacks and gradually shift traffic

Resources

Official Documentation
Community & Support