Distributed Deployment of DeepSeek-V3 (671B) on H200 Clusters: Cloud Inference for Billion-Parameter Models
HPC-AI delivers stable, efficient, and high-performance computing resources. Quickly deploy and publish dedicated large model services on our cloud platform.
Step 0: Prepare Your Model
Run custom models by uploading them to your instance or shared storage. [How to upload data?]
Example: Mounting DeepSeek-V3 via Shared Storage
- Create shared storage (up to 1000GB) for multi-machine data access. [Custom storage capacity?]
- Launch an instance in the same region as your shared storage and mount it during configuration.
- Access the storage directory via Jupyter/SSH and download the model using Hugging Face CLI:
pip install huggingface_hub
huggingface-cli download --resume-download deepseek-ai/DeepSeek-V3 --local-dir ./models/DeepSeek-V3
Step 1: Configure Your Instance
- Launch two H200 x 8 instances in your shared storage region and mount the storage.
- Configure the environment to leverage high-performance cluster features (InfiniBand/IBGDA):
export NCCL_IB_GID_INDEX=3
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_DISABLE=0
# Optional debugging:
# export CUDA_LAUNCH_BLOCKING=1
# export NCCL_DEBUG=INFO
Step 2: Launch Distributed Inference with SGLang
Use SGLang for distributed inference across nodes.
Environment Setup
pip install "sglang[all]>=0.4.8"
sudo apt update && sudo apt install -y libnuma1
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATHMulti-Node Launch
Install network tools and get the master node's IP:sudo apt-get install -y net-tools
ifconfig # Note the eth0 inet addressRun these commands on respective nodes:
# Master Node (Node 0)
python3 -m sglang.launch_server --model-path <your_model_path> --tp 16 \
--dist-init-addr <MASTER_IP>:50000 --nnodes 2 --node-rank 0 \
--port 30000 --trust-remote-code
# Worker Node (Node 1)
python3 -m sglang.launch_server --model-path <your_model_path> --tp 16 \
--dist-init-addr <MASTER_IP>:50000 --nnodes 2 --node-rank 1 \
--port 30000 --trust-remote-code
Step 3: Publish, Manage & Monitor Your Service
- Publish your service by configuring instance startup options and HTTP ports.
- Test the service:
curl -s http://[HttpPortsAddress]/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "DeepSeek-V3",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 50
}' - Terminate the service:
ps aux | grep sglang.launch_server
kill -9 <PID>