March 25, 2024

314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, Efficient and Easy-to-Use PyTorch+HuggingFace version is Here!

1 minute read

Grok-1, the 314-billion-parameter Mixture of Experts (MoE) model open-sourced by Musk's xAI, is the largest open-source large language model, and allows for free distribution and commercialization of changes.

Grok-1 has attracted a lot of attention in the open source community since its release, and has been ranked No. 1 in the world on the GitHub Trending.

However, Grok-1 is built using Rust+JAX, which has a high threshold for users who are used to mainstream software ecosystems such as Python+PyTorch+HuggingFace to get started.

Colossal-AI team followed up immediately and provided an easy-to-use Python + PyTorch + HuggingFace version of Grok-1 for all AI developers.

HuggingFace Download: https://huggingface.co/hpcai-tech/grok-1

Performance Optimization

Combined with Colossal-AI's accumulation in large AI model system optimizations, it has rapidly supported tensor parallelism for Grok-1.

On a 8*H800 80GB server, the inference latency is accelerated by nearly 4 times compared to methods such as JAX and HuggingFace's auto device map.

Tutorial

After downloading and installing Colossal-AI, just run the inference scripts

./run_inference_fast.sh hpcaitech/grok-1

Model weights will be downloaded and loaded automatically and the inference results will alos be aligned. The following figure shows a test of Grok-1 greedy search.

More details can be found in:

https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/grok-1

Colossal-AI will further introduce optimizations for Grok-1 in parallel acceleration, quantization reduction of cost, etc. in the near future, welcome to stay tuned.

Colossal-AI open source address: https://github.com/hpcaitech/ColossalAI

314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, Efficient and Easy-to-Use PyTorch+HuggingFace version is Here!

Performance Optimization

Tutorial

Comments