Signed-off-by: ZePan110 <ze.pan@intel.com> Co-authored-by: Ying Hu <ying.hu@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
245 lines
12 KiB
Markdown
245 lines
12 KiB
Markdown
# Deploy CodeGen Application on Intel Gaudi HPU with Docker Compose
|
|
|
|
This README provides instructions for deploying the CodeGen application using Docker Compose on systems equipped with Intel Gaudi HPUs.
|
|
|
|
## Table of Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Quick Start Deployment](#quick-start-deployment)
|
|
- [Building Custom Images (Optional)](#building-custom-images-optional)
|
|
- [Validate Services](#validate-services)
|
|
- [Accessing the User Interface (UI)](#accessing-the-user-interface-ui)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Stopping the Application](#stopping-the-application)
|
|
- [Next Steps](#next-steps)
|
|
|
|
## Overview
|
|
|
|
This guide focuses on running the pre-configured CodeGen service using Docker Compose on Intel Gaudi HPUs. It leverages containers optimized for Gaudi for the LLM serving component (vLLM or TGI), along with CPU-based containers for other microservices like embedding, retrieval, data preparation, the main gateway, and the UI.
|
|
|
|
## Prerequisites
|
|
|
|
- Docker and Docker Compose installed.
|
|
- Intel Gaudi HPU(s) with the necessary drivers and software stack installed on the host system. (Refer to [Intel Gaudi Documentation](https://docs.habana.ai/en/latest/)).
|
|
- Git installed (for cloning repository).
|
|
- Hugging Face Hub API Token (for downloading models).
|
|
- Access to the internet (or a private model cache).
|
|
- Clone the `GenAIExamples` repository:
|
|
```bash
|
|
git clone https://github.com/opea-project/GenAIExamples.git
|
|
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
|
|
```
|
|
|
|
## Quick Start Deployment
|
|
|
|
This uses the default vLLM-based deployment profile (`codegen-gaudi-vllm`).
|
|
|
|
1. **Configure Environment:**
|
|
Set required environment variables in your shell:
|
|
|
|
```bash
|
|
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
|
|
export HOST_IP="your_external_ip_address"
|
|
# Replace with your Hugging Face Hub API token
|
|
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"
|
|
|
|
# Optional: Configure proxy if needed
|
|
# export http_proxy="your_http_proxy"
|
|
# export https_proxy="your_https_proxy"
|
|
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
|
|
source ../../set_env.sh
|
|
```
|
|
|
|
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
|
|
For instance, edit the set_env.sh to change the LLM model
|
|
|
|
```
|
|
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
|
|
```
|
|
|
|
can be changed to other model if needed
|
|
|
|
```
|
|
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
|
|
```
|
|
|
|
2. **Start Services (vLLM Profile):**
|
|
|
|
```bash
|
|
docker compose --profile codegen-gaudi-vllm up -d
|
|
```
|
|
|
|
3. **Validate:**
|
|
Wait several minutes for models to download and services to initialize (Gaudi initialization can take time). Check container logs (`docker compose logs -f <service_name>`, especially `codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) or proceed to the validation steps below.
|
|
|
|
## Available Deployment Options
|
|
|
|
The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend accelerated on Gaudi.
|
|
|
|
### Default: vLLM-based Deployment (`--profile codegen-gaudi-vllm`)
|
|
|
|
- **Profile:** `codegen-gaudi-vllm`
|
|
- **Description:** Uses vLLM optimized for Intel Gaudi HPUs as the LLM serving engine. This is the default profile used in the Quick Start.
|
|
- **Gaudi Service:** `codegen-vllm-gaudi-server`
|
|
- **Other Services:** `codegen-llm-server`, `codegen-tei-embedding-server` (CPU), `codegen-retriever-server` (CPU), `redis-vector-db` (CPU), `codegen-dataprep-server` (CPU), `codegen-backend-server` (CPU), `codegen-gradio-ui-server` (CPU).
|
|
|
|
### TGI-based Deployment (`--profile codegen-gaudi-tgi`)
|
|
|
|
- **Profile:** `codegen-gaudi-tgi`
|
|
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel Gaudi HPUs as the LLM serving engine.
|
|
- **Gaudi Service:** `codegen-tgi-gaudi-server`
|
|
- **Other Services:** Same CPU-based services as the vLLM profile.
|
|
- **To Run:**
|
|
```bash
|
|
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
|
|
docker compose --profile codegen-gaudi-tgi up -d
|
|
```
|
|
|
|
## Configuration Parameters
|
|
|
|
### Environment Variables
|
|
|
|
Key parameters are configured via environment variables set before running `docker compose up`.
|
|
|
|
| Environment Variable | Description | Default (Set Externally) |
|
|
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :--------------------------------------------- | ------------------------------------ |
|
|
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
|
|
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
|
|
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
|
|
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
|
|
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `llm-codegen-vllm-server`). Configured in `compose.yaml`. | http://codegen-vllm | tgi-server:9000/v1/chat/completions` |
|
|
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
|
|
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
|
|
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
|
|
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
|
|
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
|
|
|
|
Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
|
|
|
|
```shell
|
|
source CodeGen/docker_compose/set_env.sh
|
|
```
|
|
|
|
### Compose Profiles
|
|
|
|
Docker Compose profiles (`codegen-gaudi-vllm`, `codegen-gaudi-tgi`) select the Gaudi-accelerated LLM serving backend (vLLM or TGI). CPU-based services run under both profiles.
|
|
|
|
### Docker Compose Gaudi Configuration
|
|
|
|
The `compose.yaml` file includes specific configurations for Gaudi services (`codegen-vllm-gaudi-server`, `codegen-tgi-gaudi-server`):
|
|
|
|
```yaml
|
|
# Example snippet for codegen-vllm-gaudi-server
|
|
runtime: habana # Specifies the Habana runtime for Docker
|
|
volumes:
|
|
- /dev/vfio:/dev/vfio # Mount necessary device files
|
|
cap_add:
|
|
- SYS_NICE # Add capabilities needed by Gaudi drivers/runtime
|
|
ipc: host # Use host IPC namespace
|
|
environment:
|
|
HABANA_VISIBLE_DEVICES: all # Make all Gaudi devices visible
|
|
# Other model/service specific env vars
|
|
```
|
|
|
|
This setup grants the container access to Gaudi devices. Ensure the host system has the Habana Docker runtime correctly installed and configured.
|
|
|
|
## Building Custom Images (Optional)
|
|
|
|
If you need to modify microservices:
|
|
|
|
1. **For Gaudi Services (TGI/vLLM):** Refer to specific build instructions for TGI-Gaudi or vLLM-Gaudi within [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) or their respective upstream projects. Building Gaudi-optimized images often requires a specific build environment.
|
|
2. **For CPU Services:** Follow instructions in `GenAIComps` component directories (e.g., `comps/codegen`, `comps/ui/gradio`). Use the provided Dockerfiles.
|
|
3. Tag your custom images.
|
|
4. Update the `image:` fields in the `compose.yaml` file.
|
|
|
|
## Validate Services
|
|
|
|
### Check Container Status
|
|
|
|
Ensure all containers are running, especially the Gaudi-accelerated LLM service:
|
|
|
|
```bash
|
|
docker compose --profile <profile_name> ps
|
|
# Example: docker compose --profile codegen-gaudi-vllm ps
|
|
```
|
|
|
|
Check logs: `docker compose logs <service_name>`. Pay attention to `vllm-gaudi-server` or `tgi-gaudi-server` logs for initialization status and errors.
|
|
|
|
### Run Validation Script/Commands
|
|
|
|
Use `curl` commands targeting the main service endpoints. Ensure `HOST_IP` is correctly set.
|
|
|
|
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 9000 internally, exposed differently):**
|
|
|
|
```bash
|
|
# This command structure targets the OpenAI-compatible vLLM endpoint
|
|
curl http://${HOST_IP}:9000/v1/chat/completions \
|
|
-X POST \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
|
|
```
|
|
|
|
2. **Validate CodeGen Gateway (MegaService, default host port 7778):**
|
|
```bash
|
|
curl http://${HOST_IP}:7778/v1/codegen \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"messages": "Implement a sorting algorithm in Python."}'
|
|
```
|
|
- **Expected Output:** Stream of JSON data chunks with generated code, ending in `data: [DONE]`.
|
|
|
|
## Accessing the User Interface (UI)
|
|
|
|
UI options are similar to the Xeon deployment.
|
|
|
|
### Gradio UI (Default)
|
|
|
|
Access the default Gradio UI:
|
|
`http://{HOST_IP}:5173`
|
|
_(Port `5173` is the default host mapping)_
|
|
|
|

|
|
|
|
### Svelte UI (Optional)
|
|
|
|
1. Modify `compose.yaml`: Swap Gradio service for Svelte (`codegen-gaudi-ui-server`), check port map (e.g., `5173:5173`).
|
|
2. Restart: `docker compose --profile <profile_name> up -d`
|
|
3. Access: `http://{HOST_IP}:5173`
|
|
|
|
### React UI (Optional)
|
|
|
|
1. Modify `compose.yaml`: Swap Gradio service for React (`codegen-gaudi-react-ui-server`), check port map (e.g., `5174:80`).
|
|
2. Restart: `docker compose --profile <profile_name> up -d`
|
|
3. Access: `http://{HOST_IP}:5174`
|
|
|
|
### VS Code Extension (Optional)
|
|
|
|
Use the `Neural Copilot` extension configured with the CodeGen backend URL: `http://${HOST_IP}:7778/v1/codegen`. (See Xeon README for detailed setup screenshots).
|
|
|
|
## Troubleshooting
|
|
|
|
- **Gaudi Service Issues:**
|
|
- Check logs (`codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) for Habana/Gaudi specific errors.
|
|
- Ensure host drivers and Habana Docker runtime are installed and working (`habana-container-runtime`).
|
|
- Verify `runtime: habana` and volume mounts in `compose.yaml`.
|
|
- Gaudi initialization can take significant time and memory. Monitor resource usage.
|
|
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`, internet access, proxy settings. Check LLM service logs.
|
|
- **Connection Errors:** Verify `HOST_IP`, ports, and proxy settings. Use `docker ps` and check service logs.
|
|
|
|
## Stopping the Application
|
|
|
|
```bash
|
|
docker compose --profile <profile_name> down
|
|
# Example: docker compose --profile codegen-gaudi-vllm down
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
- Experiment with different models supported by TGI/vLLM on Gaudi.
|
|
- Consult [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) for microservice details.
|
|
- Refer to the main [CodeGen README](../../../../README.md) for benchmarking and Kubernetes deployment options.
|
|
|
|
```
|
|
|
|
```
|