GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi/README.md

# Deploy CodeGen Application on Intel Gaudi HPU with Docker Compose

This README provides instructions for deploying the CodeGen application using Docker Compose on systems equipped with Intel Gaudi HPUs.

## Table of Contents

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Start Deployment](#quick-start-deployment)
- [Building Custom Images (Optional)](#building-custom-images-optional)
- [Validate Services](#validate-services)
- [Accessing the User Interface (UI)](#accessing-the-user-interface-ui)
- [Troubleshooting](#troubleshooting)
- [Stopping the Application](#stopping-the-application)
- [Next Steps](#next-steps)

## Overview

This guide focuses on running the pre-configured CodeGen service using Docker Compose on Intel Gaudi HPUs. It leverages containers optimized for Gaudi for the LLM serving component (vLLM or TGI), along with CPU-based containers for other microservices like embedding, retrieval, data preparation, the main gateway, and the UI.

## Prerequisites

- Docker and Docker Compose installed.
- Intel Gaudi HPU(s) with the necessary drivers and software stack installed on the host system. (Refer to [Intel Gaudi Documentation](https://docs.habana.ai/en/latest/)).
- Git installed (for cloning repository).
- Hugging Face Hub API Token (for downloading models).
- Access to the internet (or a private model cache).
- Clone the `GenAIExamples` repository:
  ```bash
  git clone https://github.com/opea-project/GenAIExamples.git
  cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
  ```

## Quick Start Deployment

This uses the default vLLM-based deployment profile (`codegen-gaudi-vllm`).

1.  **Configure Environment:**
    Set required environment variables in your shell:

    ```bash
    # Replace with your host's external IP address (do not use localhost or 127.0.0.1)
    export HOST_IP="your_external_ip_address"
    # Replace with your Hugging Face Hub API token
    export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"

    # Optional: Configure proxy if needed
    # export http_proxy="your_http_proxy"
    # export https_proxy="your_https_proxy"
    # export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
    source ../../set_env.sh
    ```

    _Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
    For instance, edit the set_env.sh to change the LLM model

    ```
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
    ```

    can be changed to other model if needed

    ```
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
    ```

2.  **Start Services (vLLM Profile):**

    ```bash
    docker compose --profile codegen-gaudi-vllm up -d
    ```

3.  **Validate:**
    Wait several minutes for models to download and services to initialize (Gaudi initialization can take time). Check container logs (`docker compose logs -f <service_name>`, especially `codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) or proceed to the validation steps below.

## Available Deployment Options

The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend accelerated on Gaudi.

### Default: vLLM-based Deployment (`--profile codegen-gaudi-vllm`)

- **Profile:** `codegen-gaudi-vllm`
- **Description:** Uses vLLM optimized for Intel Gaudi HPUs as the LLM serving engine. This is the default profile used in the Quick Start.
- **Gaudi Service:** `codegen-vllm-gaudi-server`
- **Other Services:** `codegen-llm-server`, `codegen-tei-embedding-server` (CPU), `codegen-retriever-server` (CPU), `redis-vector-db` (CPU), `codegen-dataprep-server` (CPU), `codegen-backend-server` (CPU), `codegen-gradio-ui-server` (CPU).

### TGI-based Deployment (`--profile codegen-gaudi-tgi`)

- **Profile:** `codegen-gaudi-tgi`
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel Gaudi HPUs as the LLM serving engine.
- **Gaudi Service:** `codegen-tgi-gaudi-server`
- **Other Services:** Same CPU-based services as the vLLM profile.
- **To Run:**
  ```bash
  # Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
  docker compose --profile codegen-gaudi-tgi up -d
  ```

## Configuration Parameters

### Environment Variables

Key parameters are configured via environment variables set before running `docker compose up`.

| Environment Variable                    | Description                                                                                                         | Default (Set Externally)                       |
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :--------------------------------------------- | ------------------------------------ |
| `HOST_IP`                               | External IP address of the host machine. **Required.**                                                              | `your_external_ip_address`                     |
| `HUGGINGFACEHUB_API_TOKEN`              | Your Hugging Face Hub token for model access. **Required.**                                                         | `your_huggingface_token`                       |
| `LLM_MODEL_ID`                          | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct`               |
| `EMBEDDING_MODEL_ID`                    | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment.  | `BAAI/bge-base-en-v1.5`                        |
| `LLM_ENDPOINT`                          | Internal URL for the LLM serving endpoint (used by `llm-codegen-vllm-server`). Configured in `compose.yaml`.        | http://codegen-vllm                            | tgi-server:9000/v1/chat/completions` |
| `TEI_EMBEDDING_ENDPOINT`                | Internal URL for the Embedding service. Configured in `compose.yaml`.                                               | `http://codegen-tei-embedding-server:80/embed` |
| `DATAPREP_ENDPOINT`                     | Internal URL for the Data Preparation service. Configured in `compose.yaml`.                                        | `http://codegen-dataprep-server:80/dataprep`   |
| `BACKEND_SERVICE_ENDPOINT`              | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`.                         | `http://${HOST_IP}:7778/v1/codegen`            |
| `*_PORT` (Internal)                     | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`.                                           | N/A                                            |
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required).                                                                               | `""`                                           |

Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.

```shell
source CodeGen/docker_compose/set_env.sh
```

### Compose Profiles

Docker Compose profiles (`codegen-gaudi-vllm`, `codegen-gaudi-tgi`) select the Gaudi-accelerated LLM serving backend (vLLM or TGI). CPU-based services run under both profiles.

### Docker Compose Gaudi Configuration

The `compose.yaml` file includes specific configurations for Gaudi services (`codegen-vllm-gaudi-server`, `codegen-tgi-gaudi-server`):

```yaml
# Example snippet for codegen-vllm-gaudi-server
runtime: habana # Specifies the Habana runtime for Docker
volumes:
  - /dev/vfio:/dev/vfio # Mount necessary device files
cap_add:
  - SYS_NICE # Add capabilities needed by Gaudi drivers/runtime
ipc: host # Use host IPC namespace
environment:
  HABANA_VISIBLE_DEVICES: all # Make all Gaudi devices visible
  # Other model/service specific env vars
```

This setup grants the container access to Gaudi devices. Ensure the host system has the Habana Docker runtime correctly installed and configured.

## Building Custom Images (Optional)

If you need to modify microservices:

1.  **For Gaudi Services (TGI/vLLM):** Refer to specific build instructions for TGI-Gaudi or vLLM-Gaudi within [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) or their respective upstream projects. Building Gaudi-optimized images often requires a specific build environment.
2.  **For CPU Services:** Follow instructions in `GenAIComps` component directories (e.g., `comps/codegen`, `comps/ui/gradio`). Use the provided Dockerfiles.
3.  Tag your custom images.
4.  Update the `image:` fields in the `compose.yaml` file.

## Validate Services

### Check Container Status

Ensure all containers are running, especially the Gaudi-accelerated LLM service:

```bash
docker compose --profile <profile_name> ps
# Example: docker compose --profile codegen-gaudi-vllm ps
```

Check logs: `docker compose logs <service_name>`. Pay attention to `vllm-gaudi-server` or `tgi-gaudi-server` logs for initialization status and errors.

### Run Validation Script/Commands

Use `curl` commands targeting the main service endpoints. Ensure `HOST_IP` is correctly set.

1.  **Validate LLM Serving Endpoint (Example for vLLM on default port 9000 internally, exposed differently):**

    ```bash
    # This command structure targets the OpenAI-compatible vLLM endpoint
    curl http://${HOST_IP}:9000/v1/chat/completions \
       -X POST \
       -H 'Content-Type: application/json' \
       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
    ```

2.  **Validate CodeGen Gateway (MegaService, default host port 7778):**
    ```bash
    curl http://${HOST_IP}:7778/v1/codegen \
      -H "Content-Type: application/json" \
      -d '{"messages": "Implement a sorting algorithm in Python."}'
    ```
    - **Expected Output:** Stream of JSON data chunks with generated code, ending in `data: [DONE]`.

## Accessing the User Interface (UI)

UI options are similar to the Xeon deployment.

### Gradio UI (Default)

Access the default Gradio UI:
`http://{HOST_IP}:5173`
_(Port `5173` is the default host mapping)_

![Gradio UI](../../../../assets/img/codegen_gradio_ui_main.png)

### Svelte UI (Optional)

1.  Modify `compose.yaml`: Swap Gradio service for Svelte (`codegen-gaudi-ui-server`), check port map (e.g., `5173:5173`).
2.  Restart: `docker compose --profile <profile_name> up -d`
3.  Access: `http://{HOST_IP}:5173`

### React UI (Optional)

1.  Modify `compose.yaml`: Swap Gradio service for React (`codegen-gaudi-react-ui-server`), check port map (e.g., `5174:80`).
2.  Restart: `docker compose --profile <profile_name> up -d`
3.  Access: `http://{HOST_IP}:5174`

### VS Code Extension (Optional)

Use the `Neural Copilot` extension configured with the CodeGen backend URL: `http://${HOST_IP}:7778/v1/codegen`. (See Xeon README for detailed setup screenshots).

## Troubleshooting

- **Gaudi Service Issues:**
  - Check logs (`codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) for Habana/Gaudi specific errors.
  - Ensure host drivers and Habana Docker runtime are installed and working (`habana-container-runtime`).
  - Verify `runtime: habana` and volume mounts in `compose.yaml`.
  - Gaudi initialization can take significant time and memory. Monitor resource usage.
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`, internet access, proxy settings. Check LLM service logs.
- **Connection Errors:** Verify `HOST_IP`, ports, and proxy settings. Use `docker ps` and check service logs.

## Stopping the Application

```bash
docker compose --profile <profile_name> down
# Example: docker compose --profile codegen-gaudi-vllm down
```

## Next Steps

- Experiment with different models supported by TGI/vLLM on Gaudi.
- Consult [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) for microservice details.
- Refer to the main [CodeGen README](../../../../README.md) for benchmarking and Kubernetes deployment options.

```

```