Refine readme of CodeGen (#1797)

Signed-off-by: Yao, Qing <qing.yao@intel.com>
This commit is contained in:
Yao Qing
2025-04-21 17:49:15 +08:00
committed by GitHub
parent 608dc963c9
commit 262ad7d6ec
8 changed files with 1062 additions and 1243 deletions

View File

@@ -1,24 +1,38 @@
# Code Generation Application
# Code Generation Example (CodeGen)
Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions. Additionally Retrieval-Augmented Generation (RAG) and Agents are parts of the CodeGen example which provide an additional layer of intelligence and adaptability, ensuring that the generated code is not only relevant but also accurate, efficient, and tailored to the specific needs of the developers and programmers.
## Table of Contents
The capabilities of CodeGen LLMs include:
- [Overview](#overview)
- [Problem Motivation](#problem-motivation)
- [Architecture](#architecture)
- [High-Level Diagram](#high-level-diagram)
- [OPEA Microservices Diagram](#opea-microservices-diagram)
- [Deployment Options](#deployment-options)
- [Benchmarking](#benchmarking)
- [Automated Deployment using Terraform](#automated-deployment-using-terraform)
- [Contribution](#contribution)
- Code Generation: Streamline coding through Code Generation, enabling non-programmers to describe tasks for code creation.
- Code Completion: Accelerate coding by suggesting contextually relevant snippets as developers type.
- Code Translation and Modernization: Translate and modernize code across multiple programming languages, aiding interoperability and updating legacy projects.
- Code Summarization: Extract key insights from codebases, improving readability and developer productivity.
- Code Refactoring: Offer suggestions for code refactoring, enhancing code performance and efficiency.
- AI-Assisted Testing: Assist in creating test cases, ensuring code robustness and accelerating development cycles.
- Error Detection and Debugging: Detect errors in code and provide detailed descriptions and potential fixes, expediting debugging processes.
## Overview
In this example, we present a Code Copilot application to showcase how code generation can be executed on either Intel Gaudi2 platform or Intel Xeon Processor platform. This CodeGen use case involves code generation utilizing open-source models such as `m-a-p/OpenCodeInterpreter-DS-6.7B` and `deepseek-ai/deepseek-coder-33b-instruct` with Text Generation Inference (TGI) for serving deployment.
The Code Generation (CodeGen) example demonstrates an AI application designed to assist developers by generating computer code based on natural language prompts or existing code context. It leverages Large Language Models (LLMs) trained on vast datasets of repositories, documentation, and code for programming.
The workflow falls into the following architecture:
This example showcases how developers can quickly deploy and utilize a CodeGen service, potentially integrating it into their IDEs or development workflows to accelerate tasks like code completion, translation, summarization, refactoring, and error detection.
![architecture](./assets/img/codegen_architecture.png)
## Problem Motivation
The CodeGen example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
Writing, understanding, and maintaining code can be time-consuming and complex. Developers often perform repetitive coding tasks, struggle with translating between languages, or need assistance understanding large codebases. CodeGen LLMs address this by automating code generation, providing intelligent suggestions, and assisting with various code-related tasks, thereby boosting productivity and reducing development friction. This OPEA example provides a blueprint for deploying such capabilities using optimized components.
## Architecture
### High-Level Diagram
The CodeGen application follows a microservice-based architecture enabling scalability and flexibility. User requests are processed through a gateway, which orchestrates interactions between various backend services, including the core LLM for code generation and potentially retrieval-augmented generation (RAG) components for context-aware responses.
![High-level Architecture](./assets/img/codegen_architecture.png)
### OPEA Microservices Diagram
This example utilizes several microservices from the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository. The diagram below illustrates the interaction between these components for a typical CodeGen request, potentially involving RAG using a vector database.
```mermaid
---
@@ -57,11 +71,10 @@ flowchart LR
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
LLM_gen{{LLM Serving}}
GW([CodeGen GateWay]):::orange
%% Data Preparation flow
%% Ingest data flow
direction LR
Ingest[Ingest data] --> UI
UI --> DP
@@ -89,161 +102,42 @@ flowchart LR
DP <-.->VDB
```
## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
## Deployment Options
| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments |
| -------------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------- | -------- |
| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Deployment](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-codegen) | |
| GCP | 4th/5th Gen Intel Xeon | [GCP Deployment](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-codegen) | |
| Azure | 4th/5th Gen Intel Xeon | Work-in-progress | |
| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | |
This CodeGen example can be deployed manually on various hardware platforms using Docker Compose or Kubernetes. Select the appropriate guide based on your target environment:
## Manual Deployment of CodeGen Service
| Hardware | Deployment Mode | Guide Link |
| :-------------- | :------------------- | :----------------------------------------------------------------------- |
| Intel Xeon CPU | Single Node (Docker) | [Xeon Docker Compose Guide](./docker_compose/intel/cpu/xeon/README.md) |
| Intel Gaudi HPU | Single Node (Docker) | [Gaudi Docker Compose Guide](./docker_compose/intel/hpu/gaudi/README.md) |
| AMD ROCm GPU | Single Node (Docker) | [ROCm Docker Compose Guide](./docker_compose/amd/gpu/rocm/README.md) |
| Intel Xeon CPU | Kubernetes (Helm) | [Kubernetes Helm Guide](./kubernetes/helm/README.md) |
| Intel Gaudi HPU | Kubernetes (Helm) | [Kubernetes Helm Guide](./kubernetes/helm/README.md) |
| Intel Xeon CPU | Kubernetes (GMC) | [Kubernetes GMC Guide](./kubernetes/gmc/README.md) |
| Intel Gaudi HPU | Kubernetes (GMC) | [Kubernetes GMC Guide](./kubernetes/gmc/README.md) |
The CodeGen service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processor.
_Note: Building custom microservice images can be done using the resources in [GenAIComps](https://github.com/opea-project/GenAIComps)._
Currently we support two ways of deploying CodeGen services with docker compose:
## Benchmarking
1. Start services using the docker image on `docker hub`:
Guides for evaluating the performance and accuracy of this CodeGen deployment are available:
```bash
docker pull opea/codegen:latest
```
| Benchmark Type | Guide Link |
| :------------- | :--------------------------------------------------------------- |
| Accuracy | [Accuracy Benchmark Guide](./benchmark/accuracy/README.md) |
| Performance | [Performance Benchmark Guide](./benchmark/performance/README.md) |
2. Start services using the docker images built from source. See the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) or [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more information.
## Automated Deployment using Terraform
### Required Models
Intel® Optimized Cloud Modules for Terraform provide an automated way to deploy this CodeGen example on various Cloud Service Providers (CSPs).
By default, the LLM model is set to a default value as listed below:
| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments |
| :------------------- | :-------------------------------- | :------------------------------------------------------------------------------------------------------------ | :---------- |
| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Deployment](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-codegen) | Available |
| GCP | 4th/5th Gen Intel Xeon | [GCP Deployment](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-codegen) | Available |
| Azure | 4th/5th Gen Intel Xeon | Work-in-progress | Coming Soon |
| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Coming Soon |
| Service | Model |
| ------------ | ----------------------------------------------------------------------------------------- |
| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
## Contribution
[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model for m.
Change the `LLM_MODEL_ID` below for your needs, such as: [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct), [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
If you choose to use `meta-llama/CodeLlama-7b-hf` as LLM model, you will need to visit [here](https://huggingface.co/meta-llama/CodeLlama-7b-hf), click the `Expand to review and access` button to ask for model access.
### Setup Environment Variable
To set up environment variables for deploying CodeGen services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./docker_compose/set_env.sh
```
### Deploy CodeGen using Docker
#### Deploy CodeGen on Gaudi
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
```
TGI service:
```bash
docker compose --profile codegen-gaudi-tgi up -d
```
vLLM service:
```bash
docker compose --profile codegen-gaudi-vllm up -d
```
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
#### Deploy CodeGen on Xeon
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
```
TGI service:
```bash
docker compose --profile codegen-xeon-tgi up -d
```
vLLM service:
```bash
docker compose --profile codegen-xeon-vllm up -d
```
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
### Deploy CodeGen on Kubernetes using Helm Chart
Refer to the [CodeGen helm chart](./kubernetes/helm/README.md) for instructions on deploying CodeGen on Kubernetes.
## Consume CodeGen Service
Two ways of consuming CodeGen Service:
1. Use cURL command on terminal
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
If the user wants a CodeGen service with RAG and Agents based on dedicated documentation.
```bash
curl http://localhost:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
2. Access via frontend
To access the frontend, open the following URL in your browser: http://{host_ip}:5173.
By default, the UI runs on port 5173 internally.
## Troubleshooting
1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/CodeGen/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
```bash
http_proxy=""
curl http://${host_ip}:8028/generate \
-X POST \
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
2. If you get errors like "aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host xx.xx.xx.xx:8028", check the `tgi service` first. If there is "Cannot access gated repo for url
https://huggingface.co/meta-llama/CodeLlama-7b-hf/resolve/main/config.json." error of `tgi service`, Then you need to ask for model access first. Follow the instruction in the [Required Models](#required-models) section for more information.
3. (Docker only) If all microservices work well, check the port ${host_ip}:7778, the port may be allocated by other users, you can modify the `compose.yaml`.
4. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
We welcome contributions to the OPEA project. Please refer to the contribution guidelines for more information.

View File

@@ -1,52 +1,69 @@
# CodeGen Accuracy
# CodeGen Accuracy Benchmark
## Table of Contents
- [Purpose](#purpose)
- [Evaluation Framework](#evaluation-framework)
- [Prerequisites](#prerequisites)
- [Environment Setup](#environment-setup)
- [Running the Accuracy Benchmark](#running-the-accuracy-benchmark)
- [Understanding the Results](#understanding-the-results)
## Purpose
This guide explains how to evaluate the accuracy of a deployed CodeGen service using standardized code generation benchmarks. It helps quantify the model's ability to generate correct and functional code based on prompts.
## Evaluation Framework
We evaluate accuracy by [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness). It is a framework for the evaluation of code generation models.
We utilize the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness), a framework specifically designed for evaluating code generation models. It supports various standard benchmarks such as [HumanEval](https://huggingface.co/datasets/openai_humaneval), [MBPP](https://huggingface.co/datasets/mbpp), and others.
## Evaluation FAQs
## Prerequisites
### Launch CodeGen microservice
- A running CodeGen service accessible via an HTTP endpoint. Refer to the main [CodeGen README](../../README.md) for deployment options.
- Python 3.8+ environment.
- Git installed.
Please refer to [CodeGen Examples](https://github.com/opea-project/GenAIExamples/tree/main/CodeGen/README.md), follow the guide to deploy CodeGen megeservice.
## Environment Setup
Use `curl` command to test codegen service and ensure that it has started properly
1. **Clone the Evaluation Repository:**
```bash
export CODEGEN_ENDPOINT="http://${your_ip}:7778/v1/codegen"
curl $CODEGEN_ENDPOINT \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```shell
git clone https://github.com/opea-project/GenAIEval
cd GenAIEval
```
```
2. **Install Dependencies:**
```shell
pip install -r requirements.txt
pip install -e .
```
### Generation and Evaluation
## Running the Accuracy Benchmark
For evaluating the models on coding tasks or specifically coding LLMs, we follow the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness) and provide the command line usage and function call usage. [HumanEval](https://huggingface.co/datasets/openai_humaneval), [HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus), [InstructHumanEval](https://huggingface.co/datasets/codeparrot/instructhumaneval), [APPS](https://huggingface.co/datasets/codeparrot/apps), [MBPP](https://huggingface.co/datasets/mbpp), [MBPP+](https://huggingface.co/datasets/evalplus/mbppplus), and [DS-1000](https://github.com/HKUNLP/DS-1000/) for both completion (left-to-right) and insertion (FIM) mode are available.
1. **Set Environment Variables:**
Replace `{your_ip}` with the IP address of your deployed CodeGen service and `{your_model_identifier}` with the identifier of the model being tested (e.g., `Qwen/CodeQwen1.5-7B-Chat`).
#### Environment
```shell
export CODEGEN_ENDPOINT="http://{your_ip}:7778/v1/codegen"
export CODEGEN_MODEL="{your_model_identifier}"
```
```shell
git clone https://github.com/opea-project/GenAIEval
cd GenAIEval
pip install -r requirements.txt
pip install -e .
_Note: Port `7778` is the default for the CodeGen gateway; adjust if you customized it._
```
2. **Execute the Benchmark Script:**
The script will run the evaluation tasks (e.g., HumanEval by default) against the specified endpoint.
#### Evaluation
```shell
bash run_acc.sh $CODEGEN_MODEL $CODEGEN_ENDPOINT
```
```
export CODEGEN_ENDPOINT="http://${your_ip}:7778/v1/codegen"
export CODEGEN_MODEL=your_model
bash run_acc.sh $CODEGEN_MODEL $CODEGEN_ENDPOINT
```
_Note: Currently, the framework runs the full task set by default. Using 'limit' parameters might affect result comparability._
**_Note:_** Currently, our framework is designed to execute tasks in full. To ensure the accuracy of results, we advise against using the 'limit' or 'limit_start' parameters to restrict the number of test samples.
## Understanding the Results
### accuracy Result
The results will be printed to the console and saved in `evaluation_results.json`. A key metric is `pass@k`, which represents the percentage of problems solved correctly within `k` generated attempts (e.g., `pass@1` means solved on the first try).
Here is the tested result for your reference
Example output snippet:
```json
{
@@ -54,20 +71,7 @@ Here is the tested result for your reference
"pass@1": 0.7195121951219512
},
"config": {
"prefix": "",
"do_sample": true,
"temperature": 0.2,
"top_k": 0,
"top_p": 0.95,
"n_samples": 1,
"eos": "<|endoftext|>",
"seed": 0,
"model": "Qwen/CodeQwen1.5-7B-Chat",
"modeltype": "causal",
"peft_model": null,
"revision": null,
"use_auth_token": false,
"trust_remote_code": false,
"tasks": "humaneval",
"instruction_tokens": null,
"batch_size": 1,
@@ -93,7 +97,9 @@ Here is the tested result for your reference
"prompt": "prompt",
"max_memory_per_gpu": null,
"check_references": false,
"codegen_url": "http://192.168.123.104:31234/v1/codegen"
"codegen_url": "http://192.168.123.104:7778/v1/codegen"
}
}
```
This indicates a `pass@1` score of approximately 72% on the HumanEval benchmark for the specified model via the CodeGen service endpoint.

View File

@@ -1,77 +1,73 @@
# CodeGen Benchmarking
# CodeGen Performance Benchmark
This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance.
## Table of Contents
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
- [Purpose](#purpose)
- [Benchmarking Tool](#benchmarking-tool)
- [Metrics Measured](#metrics-measured)
- [Prerequisites](#prerequisites)
- [Running the Performance Benchmark](#running-the-performance-benchmark)
- [Data Collection](#data-collection)
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
This guide describes how to benchmark the inference performance (throughput and latency) of a deployed CodeGen service. The results help understand the service's capacity under load and compare different deployment configurations or models. This benchmark primarily targets Kubernetes deployments but can be adapted for Docker.
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Benchmarking Tool
## Metrics
We use the [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) tool for performance benchmarking, which simulates concurrent users sending requests to the service endpoint.
The benchmark will report the below metrics, including:
## Metrics Measured
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
The benchmark reports several key performance indicators:
Results will be displayed in the terminal and saved as CSV file named `1_testspec.yaml`.
- **Concurrency:** Number of concurrent requests simulated.
- **End-to-End Latency:** Time from request submission to final response received (P50, P90, P99 in ms).
- **End-to-End First Token Latency:** Time from request submission to first token received (P50, P90, P99 in ms).
- **Average Next Token Latency:** Average time between subsequent generated tokens (in ms).
- **Average Token Latency:** Average time per generated token (in ms).
- **Requests Per Second (RPS):** Throughput of the service.
- **Output Tokens Per Second:** Rate of token generation.
- **Input Tokens Per Second:** Rate of token consumption.
## Getting Started
## Prerequisites
We recommend using Kubernetes to deploy the CodeGen service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs.
- A running CodeGen service accessible via an HTTP endpoint. Refer to the main [CodeGen README](../../README.md) for deployment options (Kubernetes recommended for load balancing/scalability).
- **If using Kubernetes:**
- A working Kubernetes cluster (refer to OPEA K8s setup guides if needed).
- `kubectl` configured to access the cluster from the node where the benchmark will run (typically the master node).
- Ensure sufficient `ulimit` for network connections on worker nodes hosting the service pods (e.g., `LimitNOFILE=65536` or higher in containerd/docker config).
- **General:**
- Python 3.8+ on the node running the benchmark script.
- Network access from the benchmark node to the CodeGen service endpoint.
### Prerequisites
## Running the Performance Benchmark
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
1. **Deploy CodeGen Service:** Ensure your CodeGen service is deployed and accessible. Note the service endpoint URL (e.g., obtained via `kubectl get svc` or your ingress configuration if using Kubernetes, or `http://{host_ip}:{port}` for Docker).
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running GenAIEval.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.
2. **Configure Benchmark Parameters (Optional):**
Set environment variables to customize the test queries and output directory. The `USER_QUERIES` variable defines the number of concurrent requests for each test run.
```bash
# The way to modify the containered ulimit:
sudo systemctl edit containerd
# Add two lines:
[Service]
LimitNOFILE=65536:1048576
```bash
# Example: Four runs with 128 concurrent requests each
export USER_QUERIES="[128, 128, 128, 128]"
# Example: Output directory
export TEST_OUTPUT_DIR="/tmp/benchmark_output"
# Set the target endpoint URL
export CODEGEN_ENDPOINT_URL="http://{your_service_ip_or_hostname}:{port}/v1/codegen"
```
sudo systemctl daemon-reload; sudo systemctl restart containerd
```
_Replace `{your_service_ip_or_hostname}:{port}` with the actual accessible URL of your CodeGen gateway service._
### Test Steps
3. **Execute the Benchmark Script:**
Run the script, optionally specifying the number of Kubernetes nodes involved if relevant for reporting context (the script itself runs from one node).
```bash
# Clone GenAIExamples if you haven't already
# cd GenAIExamples/CodeGen/benchmark/performance
bash benchmark.sh # Add '-n <node_count>' if desired for logging purposes
```
_Ensure the `benchmark.sh` script is adapted to use `CODEGEN_ENDPOINT_URL` and potentially `USER_QUERIES`, `TEST_OUTPUT_DIR`._
Please deploy CodeGen service before benchmarking.
## Data Collection
#### Run Benchmark Test
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[128, 128, 128, 128]"
export TEST_OUTPUT_DIR="/tmp/benchmark_output"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n <node_count>
```
The argument `-n` refers to the number of test nodes.
#### Data collection
All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
Benchmark results will be displayed in the terminal upon completion. Detailed results, typically including raw data and summary statistics, will be saved in the directory specified by `TEST_OUTPUT_DIR` (defaulting to `/tmp/benchmark_output`). CSV files (e.g., `1_testspec.yaml.csv`) containing metrics for each run are usually generated here.

View File

@@ -1,8 +1,145 @@
# Build and Deploy CodeGen Application on AMD GPU (ROCm)
# Deploy CodeGen Application on AMD GPU (ROCm) with Docker Compose
## Build Docker Images
This README provides instructions for deploying the CodeGen application using Docker Compose on a system equipped with AMD GPUs supporting ROCm, detailing the steps to configure, run, and validate the services. This guide defaults to using the **vLLM** backend for LLM serving.
### 1. Build Docker Image
## Table of Contents
- [Steps to Run with Docker Compose (Default vLLM)](#steps-to-run-with-docker-compose-default-vllm)
- [Service Overview](#service-overview)
- [Available Deployment Options](#available-deployment-options)
- [compose_vllm.yaml (vLLM - Default)](#compose_vllyaml-vllm---default)
- [compose.yaml (TGI)](#composeyaml-tgi)
- [Configuration Parameters and Usage](#configuration-parameters-and-usage)
- [Docker Compose GPU Configuration](#docker-compose-gpu-configuration)
- [Environment Variables (`set_env*.sh`)](#environment-variables-set_envsh)
- [Building Docker Images Locally (Optional)](#building-docker-images-locally-optional)
- [1. Setup Build Environment](#1-setup-build-environment)
- [2. Clone Repositories](#2-clone-repositories)
- [3. Select Services and Build](#3-select-services-and-build)
- [Validate Service Health](#validate-service-health)
- [1. Validate the vLLM/TGI Service](#1-validate-the-vllmtgi-service)
- [2. Validate the LLM Service](#2-validate-the-llm-service)
- [3. Validate the MegaService (Backend)](#3-validate-the-megaservice-backend)
- [4. Validate the Frontend (UI)](#4-validate-the-frontend-ui)
- [How to Open the UI](#how-to-open-the-ui)
- [Troubleshooting](#troubleshooting)
- [Stopping the Application](#stopping-the-application)
- [Next Steps](#next-steps)
## Steps to Run with Docker Compose (Default vLLM)
_This section assumes you are using pre-built images and targets the default vLLM deployment._
1. **Set Deploy Environment Variables:**
- Go to the Docker Compose directory:
```bash
# Adjust path if your GenAIExamples clone is located elsewhere
cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
```
- Setting variables in the operating system environment:
- Set variable `HUGGINGFACEHUB_API_TOKEN`:
```bash
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
```
- Edit the environment script for the **vLLM** deployment (`set_env_vllm.sh`):
```bash
nano set_env_vllm.sh
```
- Configure `HOST_IP`, `EXTERNAL_HOST_IP`, `*_PORT` variables, and proxies (`http_proxy`, `https_proxy`, `no_proxy`) as described in the Configuration section below.
- Source the environment variables:
```bash
. set_env_vllm.sh
```
2. **Start the Services (vLLM):**
```bash
docker compose -f compose_vllm.yaml up -d
```
3. **Verify:** Proceed to the [Validate Service Health](#validate-service-health) section after allowing time for services to start.
## Service Overview
When using the default `compose_vllm.yaml` (vLLM-based), the following services are deployed:
| Service Name | Default Port (Host) | Internal Port | Purpose |
| :--------------------- | :--------------------------------------------- | :------------ | :-------------------------- |
| codegen-vllm-service | `${CODEGEN_VLLM_SERVICE_PORT}` (e.g., 8028) | 8000 | LLM Serving (vLLM on ROCm) |
| codegen-llm-server | `${CODEGEN_LLM_SERVICE_PORT}` (e.g., 9000) | 80 | LLM Microservice Wrapper |
| codegen-backend-server | `${CODEGEN_BACKEND_SERVICE_PORT}` (e.g., 7778) | 80 | CodeGen MegaService/Gateway |
| codegen-ui-server | `${CODEGEN_UI_SERVICE_PORT}` (e.g., 5173) | 80 | Frontend User Interface |
_(Note: Ports are configurable via `set_env_vllm.sh`. Check the script for actual defaults used.)_
_(Note: The TGI deployment (`compose.yaml`) uses `codegen-tgi-service` instead of `codegen-vllm-service`)_
## Available Deployment Options
This directory provides different Docker Compose files:
### compose_vllm.yaml (vLLM - Default)
- **Description:** Deploys the CodeGen application using vLLM optimized for ROCm as the backend LLM service. This is the default setup.
- **Services Deployed:** `codegen-vllm-service`, `codegen-llm-server`, `codegen-backend-server`, `codegen-ui-server`. Requires `set_env_vllm.sh`.
### compose.yaml (TGI)
- **Description:** Deploys the CodeGen application using Text Generation Inference (TGI) optimized for ROCm as the backend LLM service.
- **Services Deployed:** `codegen-tgi-service`, `codegen-llm-server`, `codegen-backend-server`, `codegen-ui-server`. Requires `set_env.sh`.
## Configuration Parameters and Usage
### Docker Compose GPU Configuration
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose files (`compose.yaml`, `compose_vllm.yaml`) for the LLM serving container:
```yaml
# Example for vLLM service in compose_vllm.yaml
# Note: Modern docker compose might use deploy.resources syntax instead.
# Check your docker version and compose file.
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
# - /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs (e.g., `/dev/dri/card0:/dev/dri/card0`, `/dev/dri/render128:/dev/dri/render128`). Use AMD GPU driver utilities to identify device IDs.
### Environment Variables (`set_env*.sh`)
These scripts (`set_env_vllm.sh` for vLLM, `set_env.sh` for TGI) configure crucial parameters passed to the containers.
| Environment Variable | Description | Example Value (Edit in Script) |
| :----------------------------- | :------------------------------------------------------------------------------------------------------- | :------------------------------- |
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingfacehub_token` |
| `HOST_IP` | Internal/Primary IP address of the host machine. Used for inter-service communication. **Required.** | `192.168.1.100` |
| `EXTERNAL_HOST_IP` | External IP/hostname used to access the UI from outside. Same as `HOST_IP` if no proxy/LB. **Required.** | `192.168.1.100` |
| `CODEGEN_LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
| `CODEGEN_VLLM_SERVICE_PORT` | Host port mapping for the vLLM serving endpoint (in `set_env_vllm.sh`). | `8028` |
| `CODEGEN_TGI_SERVICE_PORT` | Host port mapping for the TGI serving endpoint (in `set_env.sh`). | `8028` |
| `CODEGEN_LLM_SERVICE_PORT` | Host port mapping for the LLM Microservice wrapper. | `9000` |
| `CODEGEN_BACKEND_SERVICE_PORT` | Host port mapping for the CodeGen MegaService/Gateway. | `7778` |
| `CODEGEN_UI_SERVICE_PORT` | Host port mapping for the UI service. | `5173` |
| `http_proxy` | Network HTTP Proxy URL (if required). | `Your_HTTP_Proxy` |
| `https_proxy` | Network HTTPS Proxy URL (if required). | `Your_HTTPs_Proxy` |
| `no_proxy` | Comma-separated list of hosts to bypass proxy. Should include `localhost,127.0.0.1,$HOST_IP`. | `localhost,127.0.0.1` |
**How to Use:** Edit the relevant `set_env*.sh` file (`set_env_vllm.sh` for the default) with your values, then source it (`. ./set_env*.sh`) before running `docker compose`.
## Building Docker Images Locally (Optional)
Follow these steps if you need to build the Docker images from source instead of using pre-built ones.
### 1. Setup Build Environment
- #### Create application install directory and go to it:
@@ -10,6 +147,8 @@
mkdir ~/codegen-install && cd codegen-install
```
### 2. Clone Repositories
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
```bash
@@ -22,7 +161,7 @@
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
```
We remind you that when using a specific version of the code, you need to use the README from this version:
We remind you that when using a specific version of the code, you need to use the README from this version.
- #### Go to build directory:
@@ -52,23 +191,25 @@
We remind you that when using a specific version of the code, you need to use the README from this version.
### 3. Select Services and Build
- #### Setting the list of images for the build (from the build file.yaml)
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
Select the services corresponding to your desired deployment (vLLM is the default):
#### vLLM-based application
##### vLLM-based application (Default)
```bash
service_list="vllm-rocm llm-textgen codegen codegen-ui"
```
#### TGI-based application
##### TGI-based application
```bash
service_list="llm-textgen codegen codegen-ui"
```
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
- #### Optional. Pull TGI Docker Image (Do this if you plan to build/use the TGI variant)
```bash
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
@@ -76,350 +217,197 @@
- #### Build Docker Images
_Ensure you are in the `~/codegen-install/GenAIExamples/CodeGen/docker_image_build` directory._
```bash
docker compose -f build.yaml build ${service_list} --no-cache
```
After the build, we check the list of images with the command:
After the build, check the list of images with the command:
```bash
docker image ls
```
The list of images should include:
The list of images should include (depending on `service_list`):
##### vLLM-based application:
###### vLLM-based application:
- opea/vllm-rocm:latest
- opea/llm-textgen:latest
- opea/codegen:latest
- opea/codegen-ui:latest
##### TGI-based application:
###### TGI-based application:
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (if pulled)
- opea/llm-textgen:latest
- opea/codegen:latest
- opea/codegen-ui:latest
---
_After building, ensure the `image:` tags in the main `compose_vllm.yaml` or `compose.yaml` (in the `amd/gpu/rocm` directory) match these built images (e.g., `opea/vllm-rocm:latest`)._
## Deploy the CodeGen Application
## Validate Service Health
### Docker Compose Configuration for AMD GPUs
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
- compose_vllm.yaml - for vLLM-based application
- compose.yaml - for TGI-based
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
**How to Identify GPU Device IDs:**
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
### Set deploy environment variables
#### Setting variables in the operating system environment:
##### Set variable HUGGINGFACEHUB_API_TOKEN:
```bash
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
```
#### Set variables value in set_env\*\*\*\*.sh file:
Go to Docker Compose directory:
```bash
cd ~/codegen-install/GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
```
The example uses the Nano text editor. You can use any convenient text editor:
#### If you use vLLM
```bash
nano set_env_vllm.sh
```
#### If you use TGI
```bash
nano set_env.sh
```
If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
Set the values of the variables:
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
We set these values in the file set_env\*\*\*\*.sh
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
#### Set variables with script set_env\*\*\*\*.sh
#### If you use vLLM
```bash
. set_env_vllm.sh
```
#### If you use TGI
```bash
. set_env.sh
```
### Start the services:
#### If you use vLLM
```bash
docker compose -f compose_vllm.yaml up -d
```
#### If you use TGI
```bash
docker compose -f compose.yaml up -d
```
All containers should be running and should not restart:
##### If you use vLLM:
- codegen-vllm-service
- codegen-llm-server
- codegen-backend-server
- codegen-ui-server
##### If you use TGI:
- codegen-tgi-service
- codegen-llm-server
- codegen-backend-server
- codegen-ui-server
---
## Validate the Services
Run these checks after starting the services to ensure they are operational. Focus on the vLLM checks first as it's the default.
### 1. Validate the vLLM/TGI Service
#### If you use vLLM:
#### If you use vLLM (Default - using `compose_vllm.yaml` and `set_env_vllm.sh`)
```bash
DATA='{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", '\
'"messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. '\
'The API takes as input an operation request and updates the TODO list in place. '\
'If the request is invalid, raise an exception."}], "max_tokens": 256}'
- **How Tested:** Send a POST request with a sample prompt to the vLLM endpoint.
- **CURL Command:**
curl http://${HOST_IP}:${CODEGEN_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
```bash
DATA='{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", '\
'"messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. '\
'The API takes as input an operation request and updates the TODO list in place. '\
'If the request is invalid, raise an exception."}], "max_tokens": 256}'
Checking the response from the service. The response should be similar to JSON:
curl http://${HOST_IP}:${CODEGEN_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
````json
{
"id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
"object": "chat.completion",
"created": 1742270316,
"model": "Qwen/Qwen2.5-Coder-7B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "```python\nfrom typing import Optional, List, Dict, Union\nfrom pydantic import BaseModel, validator\n\nclass OperationRequest(BaseModel):\n # Assuming OperationRequest is already defined as per the given text\n pass\n\nclass UpdateOperation(OperationRequest):\n new_items: List[str]\n\n def apply_and_maybe_raise(self, updatable_item: \"Updatable todo list\") -> None:\n # Assuming updatable_item is an instance of Updatable todo list\n self.validate()\n updatable_item.add_items(self.new_items)\n\nclass Updatable:\n # Abstract class for items that can be updated\n pass\n\nclass TodoList(Updatable):\n # Class that represents a todo list\n items: List[str]\n\n def add_items(self, new_items: List[str]) -> None:\n self.items.extend(new_items)\n\ndef handle_request(operation_request: OperationRequest) -> None:\n # Function to handle an operation request\n if isinstance(operation_request, UpdateOperation):\n operation_request.apply_and_maybe_raise(get_todo_list_for_update())\n else:\n raise ValueError(\"Invalid operation request\")\n\ndef get_todo_list_for_update() -> TodoList:\n # Function to get the todo list for update\n # Assuming this function returns the",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
"prompt_logprobs": null
}
````
- **Sample Output:**
```json
{
"id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
"object": "chat.completion"
// ... (rest of output) ...
}
```
- **Expected Result:** A JSON response with a `choices[0].message.content` field containing meaningful generated code.
If the service response has a meaningful response in the value of the "choices.message.content" key,
then we consider the vLLM service to be successfully launched
#### If you use TGI (using `compose.yaml` and `set_env.sh`)
#### If you use TGI:
- **How Tested:** Send a POST request with a sample prompt to the TGI endpoint.
- **CURL Command:**
```bash
DATA='{"inputs":"Implement a high-level API for a TODO list application. '\
'The API takes as input an operation request and updates the TODO list in place. '\
'If the request is invalid, raise an exception.",'\
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
```bash
DATA='{"inputs":"Implement a high-level API for a TODO list application. '\
# ... (data payload as before) ...
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
curl http://${HOST_IP}:${CODEGEN_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
curl http://${HOST_IP}:${CODEGEN_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
````json
{
"generated_text": " The supported operations are \"add_task\", \"complete_task\", and \"remove_task\". Each operation can be defined with a corresponding function in the API.\n\nAdd your API in the following format:\n\n```\nTODO App API\n\nsupported operations:\n\noperation name description\n----------------------- ------------------------------------------------\n<operation_name> <operation description>\n```\n\nUse type hints for function parameters and return values. Specify a text description of the API's supported operations.\n\nUse the following code snippet as a starting point for your high-level API function:\n\n```\nclass TodoAPI:\n def __init__(self, tasks: List[str]):\n self.tasks = tasks # List of tasks to manage\n\n def add_task(self, task: str) -> None:\n self.tasks.append(task)\n\n def complete_task(self, task: str) -> None:\n self.tasks = [t for t in self.tasks if t != task]\n\n def remove_task(self, task: str) -> None:\n self.tasks = [t for t in self.tasks if t != task]\n\n def handle_request(self, request: Dict[str, str]) -> None:\n operation = request.get('operation')\n if operation == 'add_task':\n self.add_task(request.get('task'))\n elif"
}
````
If the service response has a meaningful response in the value of the "generated_text" key,
then we consider the TGI service to be successfully launched
- **Sample Output:**
```json
{
"generated_text": " The supported operations are \"add_task\", \"complete_task\", and \"remove_task\". # ... (generated code) ..."
}
```
- **Expected Result:** A JSON response with a `generated_text` field containing meaningful generated code.
### 2. Validate the LLM Service
```bash
DATA='{"query":"Implement a high-level API for a TODO list application. '\
'The API takes as input an operation request and updates the TODO list in place. '\
'If the request is invalid, raise an exception.",'\
'"max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\
'"repetition_penalty":1.03,"stream":false}'
- **Service Name:** `codegen-llm-server`
- **How Tested:** Send a POST request to the LLM microservice wrapper endpoint.
- **CURL Command:**
curl http://${HOST_IP}:${CODEGEN_LLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
```bash
DATA='{"query":"Implement a high-level API for a TODO list application. '\
# ... (data payload as before) ...
'"repetition_penalty":1.03,"stream":false}'
Checking the response from the service. The response should be similar to JSON:
curl http://${HOST_IP}:${CODEGEN_LLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
````json
{
"id": "cmpl-4e89a590b1af46bfb37ce8f12b2996f8",
"choices": [
{
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": " The API should support the following operations:\n\n1. Add a new task to the TODO list.\n2. Remove a task from the TODO list.\n3. Mark a task as completed.\n4. Retrieve the list of all tasks.\n\nThe API should also support the following features:\n\n1. The ability to filter tasks based on their completion status.\n2. The ability to sort tasks based on their priority.\n3. The ability to search for tasks based on their description.\n\nHere is an example of how the API can be used:\n\n```python\ntodo_list = []\napi = TodoListAPI(todo_list)\n\n# Add tasks\napi.add_task(\"Buy groceries\")\napi.add_task(\"Finish homework\")\n\n# Mark a task as completed\napi.mark_task_completed(\"Buy groceries\")\n\n# Retrieve the list of all tasks\nprint(api.get_all_tasks())\n\n# Filter tasks based on completion status\nprint(api.filter_tasks(completed=True))\n\n# Sort tasks based on priority\napi.sort_tasks(priority=\"high\")\n\n# Search for tasks based on description\nprint(api.search_tasks(description=\"homework\"))\n```\n\nIn this example, the `TodoListAPI` class is used to manage the TODO list. The `add_task` method adds a new task to the list, the `mark_task_completed` method",
"stop_reason": null,
"prompt_logprobs": null
}
],
"created": 1742270567,
"model": "Qwen/Qwen2.5-Coder-7B-Instruct",
"object": "text_completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 256,
"prompt_tokens": 37,
"total_tokens": 293,
"completion_tokens_details": null,
"prompt_tokens_details": null
- **Sample Output:** (Structure may vary slightly depending on whether vLLM or TGI is backend)
```json
{
"id": "cmpl-4e89a590b1af46bfb37ce8f12b2996f8" // Example ID
// ... (output structure depends on backend, check original validation) ...
}
}
````
```
- **Expected Result:** A JSON response containing meaningful generated code within the `choices` array.
If the service response has a meaningful response in the value of the "choices.text" key,
then we consider the vLLM service to be successfully launched
### 3. Validate the MegaService (Backend)
### 3. Validate the MegaService
- **Service Name:** `codegen-backend-server`
- **How Tested:** Send a POST request to the main CodeGen gateway endpoint.
- **CURL Command:**
```bash
DATA='{"messages": "Implement a high-level API for a TODO list application. '\
'The API takes as input an operation request and updates the TODO list in place. '\
'If the request is invalid, raise an exception."}'
```bash
DATA='{"messages": "Implement a high-level API for a TODO list application. '\
# ... (data payload as before) ...
'If the request is invalid, raise an exception."}'
curl http://${HOST_IP}:${CODEGEN_BACKEND_SERVICE_PORT}/v1/codegen \
-H "Content-Type: application/json" \
-d "$DATA"
```
curl http://${HOST_IP}:${CODEGEN_BACKEND_SERVICE_PORT}/v1/codegen \
-H "Content-Type: application/json" \
-d "$DATA"
```
Checking the response from the service. The response should be similar to text:
```textmate
data: {"id":"cmpl-cc5dc73819c640469f7c7c7424fe57e6","choices":[{"finish_reason":null,"index":0,"logprobs":null,"text":" of","stop_reason":null}],"created":1742270725,"model":"Qwen/Qwen2.5-Coder-7B-Instruct","object":"text_completion","system_fingerprint":null,"usage":null}
...........
data: {"id":"cmpl-cc5dc73819c640469f7c7c7424fe57e6","choices":[{"finish_reason":null,"index":0,"logprobs":null,"text":" all","stop_reason":null}],"created":1742270725,"model":"Qwen/Qwen2.5-Coder-7B-Instruct","object":"text_completion","system_fingerprint":null,"usage":null}
data: {"id":"cmpl-cc5dc73819c640469f7c7c7424fe57e6","choices":[{"finish_reason":null,"index":0,"logprobs":null,"text":" tasks","stop_reason":null}],"created":1742270725,"model":"Qwen/Qwen2.5-Coder-7B-Instruct","object":"text_completion","system_fingerprint":null,"usage":null}
data: {"id":"cmpl-cc5dc73819c640469f7c7c7424fe57e6","choices":[{"finish_reason":"length","index":0,"logprobs":null,"text":",","stop_reason":null}],"created":1742270725,"model":"Qwen/Qwen2.5-Coder-7B-Instruct","object":"text_completion","system_fingerprint":null,"usage":null}
data: [DONE]
```
If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully.
- **Sample Output:**
```textmate
data: {"id":"cmpl-...", ...}
# ... more data chunks ...
data: [DONE]
```
- **Expected Result:** A stream of server-sent events (SSE) containing JSON data with generated code tokens, ending with `data: [DONE]`.
### 4. Validate the Frontend (UI)
To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${CODEGEN_UI_SERVICE_PORT}
A page should open when you click through to this address:
- **Service Name:** `codegen-ui-server`
- **How Tested:** Access the UI URL in a web browser and perform a test query.
- **Steps:** See [How to Open the UI](#how-to-open-the-ui).
- **Expected Result:** The UI loads correctly, and submitting a prompt results in generated code displayed on the page.
![UI start page](../../../../assets/img/ui-starting-page.png)
## How to Open the UI
If a page of this type has opened, then we believe that the service is running and responding,
and we can proceed to functional UI testing.
1. Determine the UI access URL using the `EXTERNAL_HOST_IP` and `CODEGEN_UI_SERVICE_PORT` variables defined in your sourced `set_env*.sh` file (use `set_env_vllm.sh` for the default vLLM deployment). The default URL format is:
`http://${EXTERNAL_HOST_IP}:${CODEGEN_UI_SERVICE_PORT}`
(e.g., `http://192.168.1.100:5173`)
Let's enter the task for the service in the "Enter prompt here" field.
For example, "Write a Python code that returns the current time and date" and press Enter.
After that, a page with the result of the task should open:
2. Open this URL in your web browser.
![UI result page](../../../../assets/img/ui-result-page.png)
3. You should see the CodeGen starting page:
![UI start page](../../../../assets/img/ui-starting-page.png)
If the result shown on the page is correct, then we consider the verification of the UI service to be successful.
4. Enter a prompt in the input field (e.g., "Write a Python code that returns the current time and date") and press Enter or click the submit button.
### 5. Stop application
5. Verify that the generated code appears correctly:
![UI result page](../../../../assets/img/ui-result-page.png)
#### If you use vLLM
## Troubleshooting
_(No specific troubleshooting steps provided in the original content for this file. Add common issues if known.)_
- Check container logs (`docker compose -f <file> logs <service_name>`), especially for `codegen-vllm-service` or `codegen-tgi-service`.
- Ensure `HUGGINGFACEHUB_API_TOKEN` is correct.
- Verify ROCm drivers and Docker setup for GPU access.
- Confirm network connectivity and proxy settings.
- Ensure `HOST_IP` and `EXTERNAL_HOST_IP` are correctly set and accessible.
- If building locally, ensure build steps completed without error and image tags match compose file.
## Stopping the Application
### If you use vLLM (Default)
```bash
cd ~/codegen-install/GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
# Ensure you are in the correct directory
# cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use TGI
### If you use TGI
```bash
cd ~/codegen-install/GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
# Ensure you are in the correct directory
# cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
```
## Next Steps
- Explore the alternative TGI deployment option if needed.
- Refer to the main [CodeGen README](../../../../README.md) for architecture details and links to other deployment methods (Kubernetes, Xeon).
- Consult the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository for details on individual microservices.

View File

@@ -1,382 +1,239 @@
# Build MegaService of CodeGen on Xeon
# Deploy CodeGen Application on Intel Xeon CPU with Docker Compose
This document outlines the deployment process for a CodeGen application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker images creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to Docker Hub soon, further simplifying the deployment process for this service.
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice.
This README provides instructions for deploying the CodeGen application using Docker Compose on systems equipped with Intel Xeon CPUs.
## 🚀 Create an AWS Xeon Instance
## Table of Contents
To run the example on an AWS Xeon instance, start by creating an AWS account if you don't have one already. Then, get started with the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home). AWS EC2 M7i, C7i, C7i-flex and M7i-flex instances are 4th Generation Intel Xeon Scalable processors suitable for the task.
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Available Deployment Options](#available-deployment-options)
- [Default: vLLM-based Deployment (`--profile codegen-xeon-vllm`)](#default-vllm-based-deployment---profile-codegen-xeon-vllm)
- [TGI-based Deployment (`--profile codegen-xeon-tgi`)](#tgi-based-deployment---profile-codegen-xeon-tgi)
- [Configuration Parameters](#configuration-parameters)
- [Environment Variables](#environment-variables)
- [Compose Profiles](#compose-profiles)
- [Building Custom Images (Optional)](#building-custom-images-optional)
- [Validate Services](#validate-services)
- [Check Container Status](#check-container-status)
- [Run Validation Script/Commands](#run-validation-scriptcommands)
- [Accessing the User Interface (UI)](#accessing-the-user-interface-ui)
- [Gradio UI (Default)](#gradio-ui-default)
- [Svelte UI (Optional)](#svelte-ui-optional)
- [React UI (Optional)](#react-ui-optional)
- [VS Code Extension (Optional)](#vs-code-extension-optional)
- [Troubleshooting](#troubleshooting)
- [Stopping the Application](#stopping-the-application)
- [Next Steps](#next-steps)
For detailed information about these instance types, you can refer to [m7i](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
## Overview
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
This guide focuses on running the pre-configured CodeGen service using Docker Compose on Intel Xeon CPUs. It leverages containers optimized for Intel architecture for the CodeGen gateway, LLM serving (vLLM or TGI), RAG components (Embedding, Retriever, Vector DB), and UI.
## 🚀 Start Microservices and MegaService
## Prerequisites
The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
- Docker and Docker Compose installed.
- Intel Xeon CPU.
- Git installed (for cloning repository).
- Hugging Face Hub API Token (for downloading models).
- Access to the internet (or a private model cache).
- Clone the `GenAIExamples` repository:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
```
The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
## Quick Start
```mermaid
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 25px
---
flowchart LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style CodeGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph CodeGen-MegaService["CodeGen-MegaService"]
direction LR
EM([Embedding<br>MicroService]):::blue
RET([Retrieval<br>MicroService]):::blue
RER([Agents]):::blue
LLM([LLM<br>MicroService]):::blue
end
subgraph User Interface
direction LR
a([Submit Query Tab]):::orchid
UI([UI server]):::orchid
Ingest([Manage Resources]):::orchid
end
This uses the default vLLM-based deployment profile (`codegen-xeon-vllm`).
CLIP_EM{{Embedding<br>service}}
VDB{{Vector DB}}
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
GW([CodeGen GateWay]):::orange
1. **Configure Environment:**
Set required environment variables in your shell:
%% Data Preparation flow
%% Ingest data flow
direction LR
Ingest[Ingest data] --> UI
UI --> DP
DP <-.-> CLIP_EM
```bash
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
export HOST_IP="your_external_ip_address"
# Replace with your Hugging Face Hub API token
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> CodeGen-MegaService
EM ==> RET
RET ==> RER
RER ==> LLM
# Optional: Configure proxy if needed
# export http_proxy="your_http_proxy"
# export https_proxy="your_https_proxy"
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
source ../../../set_env.sh
```
_Note: The compose file might read additional variables from a `.env` file or expect them defined elsewhere. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
%% Embedding service flow
direction LR
EM <-.-> CLIP_EM
RET <-.-> V_RET
LLM <-.-> LLM_gen
2. **Start Services (vLLM Profile):**
direction TB
%% Vector DB interaction
V_RET <-.->VDB
DP <-.->VDB
```bash
docker compose --profile codegen-xeon-vllm up -d
```
3. **Validate:**
Wait several minutes for models to download (especially the first time) and services to initialize. Check container logs (`docker compose logs -f <service_name>`) or proceed to the validation steps below.
## Available Deployment Options
The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend.
### Default: vLLM-based Deployment (`--profile codegen-xeon-vllm`)
- **Profile:** `codegen-xeon-vllm`
- **Description:** Uses vLLM optimized for Intel CPUs as the LLM serving engine. This is the default profile used in the Quick Start.
- **Services Deployed:** `codegen-vllm-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
### TGI-based Deployment (`--profile codegen-xeon-tgi`)
- **Profile:** `codegen-xeon-tgi`
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel CPUs as the LLM serving engine.
- **Services Deployed:** `codegen-tgi-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
- **To Run:**
```bash
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
docker compose --profile codegen-xeon-tgi up -d
```
## Configuration Parameters
### Environment Variables
Key parameters are configured via environment variables set before running `docker compose up`.
| Environment Variable | Description | Default (Set Externally) |
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------- |
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-tgi-server:80/generate` or `http://codegen-vllm-server:8000/v1/chat/completions` |
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
```shell
source CodeGen/docker_compose/set_env.sh
```
### Setup Environment Variables
### Compose Profiles
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
Docker Compose profiles (`codegen-xeon-vllm`, `codegen-xeon-tgi`) control which LLM serving backend (vLLM or TGI) and its associated dependencies are started. Only one profile should typically be active.
1. set the host_ip and huggingface token
## Building Custom Images (Optional)
> Note:
> Please replace the `your_ip_address` with you external IP address, do not use `localhost`.
If you need to modify the microservices:
1. Clone the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository.
2. Follow build instructions in the respective component directories (e.g., `comps/llms/text-generation`, `comps/codegen`, `comps/ui/gradio`, etc.). Use the provided Dockerfiles (e.g., `CodeGen/Dockerfile`, `CodeGen/ui/docker/Dockerfile.gradio`).
3. Tag your custom images appropriately (e.g., `my-custom-codegen:latest`).
4. Update the `image:` fields in the `compose.yaml` file to use your custom image tags.
_Refer to the main [CodeGen README](../../../../README.md) for links to relevant GenAIComps components._
## Validate Services
### Check Container Status
Ensure all containers associated with the chosen profile are running:
```bash
export host_ip=${your_ip_address}
export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
docker compose --profile <profile_name> ps
# Example: docker compose --profile codegen-xeon-vllm ps
```
2. Set Network Proxy
Check logs for specific services: `docker compose logs <service_name>`
**If you access public network through proxy, set the network proxy, otherwise, skip this step**
### Run Validation Script/Commands
```bash
export no_proxy=${no_proxy},${host_ip}
export http_proxy=${your_http_proxy}
export https_proxy=${your_https_proxy}
```
Use `curl` commands to test the main service endpoints. Ensure `HOST_IP` is correctly set in your environment.
### Start the Docker Containers for All Services
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 8000 internally, exposed differently):**
Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
```
#### TGI service:
```bash
docker compose --profile codegen-xeon-tgi up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
#### vLLM service:
```bash
docker compose --profile codegen-xeon-vllm up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
- `opea/vllm`
### Building the Docker image locally
Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
In order to build the Docker image locally follow the instrustion provided below.
#### Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
#### Build the UI Gradio Image
Build the frontend Gradio image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
```
#### Dataprep Microservice with Redis
Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
#### Embedding Microservice with TEI
Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
#### LLM text generation Microservice
Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
#### Retriever Microservice
Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
#### Start Redis server
Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
### Validate the MicroServices and MegaService
1. LLM Service (for TGI, vLLM)
```bash
curl http://${host_ip}:8028/v1/chat/completions \
```bash
# This command structure targets the OpenAI-compatible vLLM endpoint
curl http://${HOST_IP}:8000/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
```
```
- **Expected Output:** A JSON response with generated code in `choices[0].message.content`.
2. LLM Microservices
2. **Validate CodeGen Gateway (MegaService on default port 7778):**
```bash
curl http://${HOST_IP}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Write a Python function that adds two numbers."}'
```
- **Expected Output:** A stream of JSON data chunks containing generated code, ending with `data: [DONE]`.
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-H 'Content-Type: application/json' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
```
## Accessing the User Interface (UI)
3. Dataprep Microservice
Multiple UI options can be configured via the `compose.yaml`.
Make sure to replace the file name placeholders with your correct file name
### Gradio UI (Default)
```bash
curl http://${host_ip}:6007/v1/dataprep/ingest \
-X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.pdf" \
-F "files=@./file2.txt" \
-F "index_name=my_API_document"
```
Access the default Gradio UI by navigating to:
`http://{HOST_IP}:8080`
_(Port `8080` is the default host mapping for `codegen-gradio-ui-server`)_
4. MegaService
![Gradio UI - Code Generation](../../../../assets/img/codegen_gradio_ui_main.png)
![Gradio UI - Resource Management](../../../../assets/img/codegen_gradio_ui_dataprep.png)
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
### Svelte UI (Optional)
CodeGen service with RAG and Agents activated based on an index.
1. Modify `compose.yaml`: Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
3. Access: `http://{HOST_IP}:5173` (or the host port you mapped).
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
![Svelte UI Init](../../../../assets/img/codeGen_ui_init.jpg)
## 🚀 Launch the Gradio Based UI (Recommended)
### React UI (Optional)
To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
1. Modify `compose.yaml`: Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
3. Access: `http://{HOST_IP}:5174` (or the host port you mapped).
Code Generation Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
![React UI](../../../../assets/img/codegen_react.png)
Resource Management Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
### VS Code Extension (Optional)
Uploading a Knowledge Index
Users can interact with the backend service using the `Neural Copilot` VS Code extension.
![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
1. **Install:** Find and install `Neural Copilot` from the VS Code Marketplace.
![Install Copilot](../../../../assets/img/codegen_copilot.png)
2. **Configure:** Set the "Service URL" in the extension settings to your CodeGen backend endpoint: `http://${HOST_IP}:7778/v1/codegen` (use the correct port if changed).
![Configure Endpoint](../../../../assets/img/codegen_endpoint.png)
3. **Usage:**
- **Inline Suggestion:** Type a comment describing the code you want (e.g., `# Python function to read a file`) and wait for suggestions.
![Code Suggestion](../../../../assets/img/codegen_suggestion.png)
- **Chat:** Use the Neural Copilot panel to chat with the AI assistant about code.
![Chat Dialog](../../../../assets/img/codegen_dialog.png)
Here is an example of running a query in the Gradio UI using an Index:
## Troubleshooting
![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`. Ensure internet connectivity or correct proxy settings. Check logs of `tgi-service`/`vllm-service` and `tei-embedding-server`. Gated models need prior Hugging Face access.
- **Connection Errors:** Verify `HOST_IP` is correct and accessible. Check `docker ps` for port mappings. Ensure `no_proxy` includes `HOST_IP` if using a proxy. Check logs of the service failing to connect (e.g., `codegen-backend-server` logs if it can't reach `codegen-llm-server`).
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in `compose.yaml`.
- **Resource Issues:** CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
## 🚀 Launch the Svelte Based UI (Optional)
## Stopping the Application
To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
codegen-xeon-ui-server:
image: opea/codegen-ui:latest
...
ports:
- "80:5173"
```bash
docker compose --profile <profile_name> down
# Example: docker compose --profile codegen-xeon-vllm down
```
![project-screenshot](../../../../assets/img/codeGen_ui_init.jpg)
## Next Steps
Here is an example of running CodeGen in the UI:
- Consult the [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) repository for details on individual microservices.
- Refer to the main [CodeGen README](../../../../README.md) for links to benchmarking and Kubernetes deployment options.
![project-screenshot](../../../../assets/img/codeGen_ui_response.png)
## 🚀 Launch the React Based UI (Optional)
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-xeon-ui-server` service with the `codegen-xeon-react-ui-server` service as per the config below:
```yaml
codegen-xeon-react-ui-server:
image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
container_name: codegen-xeon-react-ui-server
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT}
depends_on:
- codegen-xeon-backend-server
ports:
- "5174:80"
ipc: host
restart: always
```
![project-screenshot](../../../../assets/img/codegen_react.png)
## Install Copilot VSCode extension from Plugin Marketplace as the frontend
In addition to the Svelte UI, users can also install the Copilot VSCode extension from the Plugin Marketplace as the frontend.
Install `Neural Copilot` in VSCode as below.
![Install-screenshot](../../../../assets/img/codegen_copilot.png)
### How to Use
#### Service URL Setting
Please adjust the service URL in the extension settings based on the endpoint of the code generation backend service.
![Setting-screenshot](../../../../assets/img/codegen_settings.png)
![Setting-screenshot](../../../../assets/img/codegen_endpoint.png)
#### Customize
The Copilot enables users to input their corresponding sensitive information and tokens in the user settings according to their own needs. This customization enhances the accuracy and output content to better meet individual requirements.
![Customize](../../../../assets/img/codegen_customize.png)
#### Code Suggestion
To trigger inline completion, you'll need to type `# {your keyword} (start with your programming language's comment keyword, like // in C++ and # in python)`. Make sure the `Inline Suggest` is enabled from the VS Code Settings.
For example:
![code suggestion](../../../../assets/img/codegen_suggestion.png)
To provide programmers with a smooth experience, the Copilot supports multiple ways to trigger inline code suggestions. If you are interested in the details, they are summarized as follows:
- Generate code from single-line comments: The simplest way introduced before.
- Generate code from consecutive single-line comments:
![codegen from single-line comments](../../../../assets/img/codegen_single_line.png)
- Generate code from multi-line comments, which will not be triggered until there is at least one `space` outside the multi-line comment):
![codegen from multi-line comments](../../../../assets/img/codegen_multi_line.png)
- Automatically complete multi-line comments:
![auto complete](../../../../assets/img/codegen_auto_complete.jpg)
### Chat with AI assistant
You can start a conversation with the AI programming assistant by clicking on the robot icon in the plugin bar on the left:
![icon](../../../../assets/img/codegen_icon.png)
Then you can see the conversation window on the left, where you can chat with AI assistant:
![dialog](../../../../assets/img/codegen_dialog.png)
There are 4 areas worth noting as shown in the screenshot above:
1. Enter and submit your question
2. Your previous questions
3. Answers from AI assistant (Code will be highlighted properly according to the programming language it is written in, also support stream output)
4. Copy or replace code with one click (Note that you need to select the code in the editor first and then click "replace", otherwise the code will be inserted)
You can also select the code in the editor and ask the AI assistant questions about the code directly.
For example:
- Select code
![select code](../../../../assets/img/codegen_select_code.png)
- Ask question and get answer
![qna](../../../../assets/img/codegen_qna.png)
```

View File

@@ -1,373 +1,246 @@
# Build MegaService of CodeGen on Gaudi
# Deploy CodeGen Application on Intel Gaudi HPU with Docker Compose
This document outlines the deployment process for a CodeGen application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi2 server. The steps include Docker images creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to the Docker Hub soon, further simplifying the deployment process for this service.
This README provides instructions for deploying the CodeGen application using Docker Compose on systems equipped with Intel Gaudi HPUs.
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice.
## Table of Contents
## 🚀 Start MicroServices and MegaService
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Available Deployment Options](#available-deployment-options)
- [Default: vLLM-based Deployment (`--profile codegen-gaudi-vllm`)](#default-vllm-based-deployment---profile-codegen-gaudi-vllm)
- [TGI-based Deployment (`--profile codegen-gaudi-tgi`)](#tgi-based-deployment---profile-codegen-gaudi-tgi)
- [Configuration Parameters](#configuration-parameters)
- [Environment Variables](#environment-variables)
- [Compose Profiles](#compose-profiles)
- [Docker Compose Gaudi Configuration](#docker-compose-gaudi-configuration)
- [Building Custom Images (Optional)](#building-custom-images-optional)
- [Validate Services](#validate-services)
- [Check Container Status](#check-container-status)
- [Run Validation Script/Commands](#run-validation-scriptcommands)
- [Accessing the User Interface (UI)](#accessing-the-user-interface-ui)
- [Gradio UI (Default)](#gradio-ui-default)
- [Svelte UI (Optional)](#svelte-ui-optional)
- [React UI (Optional)](#react-ui-optional)
- [VS Code Extension (Optional)](#vs-code-extension-optional)
- [Troubleshooting](#troubleshooting)
- [Stopping the Application](#stopping-the-application)
- [Next Steps](#next-steps)
The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
## Overview
The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
This guide focuses on running the pre-configured CodeGen service using Docker Compose on Intel Gaudi HPUs. It leverages containers optimized for Gaudi for the LLM serving component (vLLM or TGI), along with CPU-based containers for other microservices like embedding, retrieval, data preparation, the main gateway, and the UI.
```mermaid
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 25px
---
flowchart LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style CodeGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph CodeGen-MegaService["CodeGen-MegaService"]
direction LR
EM([Embedding<br>MicroService]):::blue
RET([Retrieval<br>MicroService]):::blue
RER([Agents]):::blue
LLM([LLM<br>MicroService]):::blue
end
subgraph User Interface
direction LR
a([Submit Query Tab]):::orchid
UI([UI server]):::orchid
Ingest([Manage Resources]):::orchid
end
## Prerequisites
CLIP_EM{{Embedding<br>service}}
VDB{{Vector DB}}
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
GW([CodeGen GateWay]):::orange
- Docker and Docker Compose installed.
- Intel Gaudi HPU(s) with the necessary drivers and software stack installed on the host system. (Refer to [Intel Gaudi Documentation](https://docs.habana.ai/en/latest/)).
- Git installed (for cloning repository).
- Hugging Face Hub API Token (for downloading models).
- Access to the internet (or a private model cache).
- Clone the `GenAIExamples` repository:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
```
%% Data Preparation flow
%% Ingest data flow
direction LR
Ingest[Ingest data] --> UI
UI --> DP
DP <-.-> CLIP_EM
## Quick Start
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> CodeGen-MegaService
EM ==> RET
RET ==> RER
RER ==> LLM
This uses the default vLLM-based deployment profile (`codegen-gaudi-vllm`).
1. **Configure Environment:**
Set required environment variables in your shell:
%% Embedding service flow
direction LR
EM <-.-> CLIP_EM
RET <-.-> V_RET
LLM <-.-> LLM_gen
```bash
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
export HOST_IP="your_external_ip_address"
# Replace with your Hugging Face Hub API token
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"
direction TB
%% Vector DB interaction
V_RET <-.->VDB
DP <-.->VDB
# Optional: Configure proxy if needed
# export http_proxy="your_http_proxy"
# export https_proxy="your_https_proxy"
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
source ../../../set_env.sh
```
_Note: Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
2. **Start Services (vLLM Profile):**
```bash
docker compose --profile codegen-gaudi-vllm up -d
```
3. **Validate:**
Wait several minutes for models to download and services to initialize (Gaudi initialization can take time). Check container logs (`docker compose logs -f <service_name>`, especially `codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) or proceed to the validation steps below.
## Available Deployment Options
The `compose.yaml` file uses Docker Compose profiles to select the LLM serving backend accelerated on Gaudi.
### Default: vLLM-based Deployment (`--profile codegen-gaudi-vllm`)
- **Profile:** `codegen-gaudi-vllm`
- **Description:** Uses vLLM optimized for Intel Gaudi HPUs as the LLM serving engine. This is the default profile used in the Quick Start.
- **Gaudi Service:** `codegen-vllm-gaudi-server`
- **Other Services:** `codegen-llm-server`, `codegen-tei-embedding-server` (CPU), `codegen-retriever-server` (CPU), `redis-vector-db` (CPU), `codegen-dataprep-server` (CPU), `codegen-backend-server` (CPU), `codegen-gradio-ui-server` (CPU).
### TGI-based Deployment (`--profile codegen-gaudi-tgi`)
- **Profile:** `codegen-gaudi-tgi`
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel Gaudi HPUs as the LLM serving engine.
- **Gaudi Service:** `codegen-tgi-gaudi-server`
- **Other Services:** Same CPU-based services as the vLLM profile.
- **To Run:**
```bash
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
docker compose --profile codegen-gaudi-tgi up -d
```
## Configuration Parameters
### Environment Variables
Key parameters are configured via environment variables set before running `docker compose up`.
| Environment Variable | Description | Default (Set Externally) |
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------- |
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-tgi-server:80/generate` or `http://codegen-vllm-server:8000/v1/chat/completions` |
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
```shell
source CodeGen/docker_compose/set_env.sh
```
### Setup Environment Variables
### Compose Profiles
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
Docker Compose profiles (`codegen-gaudi-vllm`, `codegen-gaudi-tgi`) select the Gaudi-accelerated LLM serving backend (vLLM or TGI). CPU-based services run under both profiles.
1. set the host_ip and huggingface token
### Docker Compose Gaudi Configuration
> [!NOTE]
> Please replace the `your_ip_address` with you external IP address, do not use `localhost`.
The `compose.yaml` file includes specific configurations for Gaudi services (`codegen-vllm-gaudi-server`, `codegen-tgi-gaudi-server`):
```yaml
# Example snippet for codegen-vllm-gaudi-server
runtime: habana # Specifies the Habana runtime for Docker
volumes:
- /dev/vfio:/dev/vfio # Mount necessary device files
cap_add:
- SYS_NICE # Add capabilities needed by Gaudi drivers/runtime
ipc: host # Use host IPC namespace
environment:
HABANA_VISIBLE_DEVICES: all # Make all Gaudi devices visible
# Other model/service specific env vars
```
This setup grants the container access to Gaudi devices. Ensure the host system has the Habana Docker runtime correctly installed and configured.
## Building Custom Images (Optional)
If you need to modify microservices:
1. **For Gaudi Services (TGI/vLLM):** Refer to specific build instructions for TGI-Gaudi or vLLM-Gaudi within [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) or their respective upstream projects. Building Gaudi-optimized images often requires a specific build environment.
2. **For CPU Services:** Follow instructions in `GenAIComps` component directories (e.g., `comps/codegen`, `comps/ui/gradio`). Use the provided Dockerfiles.
3. Tag your custom images.
4. Update the `image:` fields in the `compose.yaml` file.
## Validate Services
### Check Container Status
Ensure all containers are running, especially the Gaudi-accelerated LLM service:
```bash
export host_ip=${your_ip_address}
export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
docker compose --profile <profile_name> ps
# Example: docker compose --profile codegen-gaudi-vllm ps
```
2. Set Network Proxy
Check logs: `docker compose logs <service_name>`. Pay attention to `vllm-gaudi-server` or `tgi-gaudi-server` logs for initialization status and errors.
**If you access public network through proxy, set the network proxy, otherwise, skip this step**
### Run Validation Script/Commands
```bash
export no_proxy=${no_proxy},${host_ip}
export http_proxy=${your_http_proxy}
export https_proxy=${your_https_proxy}
```
Use `curl` commands targeting the main service endpoints. Ensure `HOST_IP` is correctly set.
### Start the Docker Containers for All Services
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 8000 internally, exposed differently):**
Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
```
#### TGI service:
```bash
docker compose --profile codegen-gaudi-tgi up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
#### vLLM service:
```bash
docker compose --profile codegen-gaudi-vllm up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
- `opea/vllm`
Refer to the [Gaudi Guide](./README.md) to build docker images from source.
### Building the Docker image locally
Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
In order to build the Docker image locally follow the instrustion provided below.
#### Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
#### Build the UI Gradio Image
Build the frontend Gradio image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
```
#### Dataprep Microservice with Redis
Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
#### Embedding Microservice with TEI
Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
#### LLM text generation Microservice
Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
#### Retriever Microservice
Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
#### Start Redis server
Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
### Validate the MicroServices and MegaService
1. LLM Service (for TGI, vLLM)
```bash
curl http://${host_ip}:8028/v1/chat/completions \
```bash
# This command structure targets the OpenAI-compatible vLLM endpoint
curl http://${HOST_IP}:8000/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
```
```
2. **Validate CodeGen Gateway (MegaService, default host port 7778):**
```bash
curl http://${HOST_IP}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a sorting algorithm in Python."}'
```
- **Expected Output:** Stream of JSON data chunks with generated code, ending in `data: [DONE]`.
2. LLM Microservices
## Accessing the User Interface (UI)
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-H 'Content-Type: application/json' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
```
UI options are similar to the Xeon deployment.
3. Dataprep Microservice
### Gradio UI (Default)
Make sure to replace the file name placeholders with your correct file name
Access the default Gradio UI:
`http://{HOST_IP}:8080`
_(Port `8080` is the default host mapping)_
```bash
curl http://${host_ip}:6007/v1/dataprep/ingest \
-X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.pdf" \
-F "files=@./file2.txt" \
-F "index_name=my_API_document"
```
![Gradio UI](../../../../assets/img/codegen_gradio_ui_main.png)
4. MegaService
### Svelte UI (Optional)
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
1. Modify `compose.yaml`: Swap Gradio service for Svelte (`codegen-gaudi-ui-server`), check port map (e.g., `5173:5173`).
2. Restart: `docker compose --profile <profile_name> up -d`
3. Access: `http://{HOST_IP}:5173`
CodeGen service with RAG and Agents activated based on an index.
### React UI (Optional)
```bash
curl http://${host_ip}$:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
1. Modify `compose.yaml`: Swap Gradio service for React (`codegen-gaudi-react-ui-server`), check port map (e.g., `5174:80`).
2. Restart: `docker compose --profile <profile_name> up -d`
3. Access: `http://{HOST_IP}:5174`
## 🚀 Launch the Gradio Based UI (Recommended)
### VS Code Extension (Optional)
To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
Use the `Neural Copilot` extension configured with the CodeGen backend URL: `http://${HOST_IP}:7778/v1/codegen`. (See Xeon README for detailed setup screenshots).
Code Generation Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
## Troubleshooting
Resource Management Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
- **Gaudi Service Issues:**
- Check logs (`codegen-vllm-gaudi-server` or `codegen-tgi-gaudi-server`) for Habana/Gaudi specific errors.
- Ensure host drivers and Habana Docker runtime are installed and working (`habana-container-runtime`).
- Verify `runtime: habana` and volume mounts in `compose.yaml`.
- Gaudi initialization can take significant time and memory. Monitor resource usage.
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`, internet access, proxy settings. Check LLM service logs.
- **Connection Errors:** Verify `HOST_IP`, ports, and proxy settings. Use `docker ps` and check service logs.
Uploading a Knowledge Index
## Stopping the Application
![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
Here is an example of running a query in the Gradio UI using an Index:
![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
## 🚀 Launch the Svelte Based UI (Optional)
To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
codegen-gaudi-ui-server:
image: opea/codegen-ui:latest
...
ports:
- "80:5173"
```bash
docker compose --profile <profile_name> down
# Example: docker compose --profile codegen-gaudi-vllm down
```
![project-screenshot](../../../../assets/img/codeGen_ui_init.jpg)
## Next Steps
## 🚀 Launch the React Based UI (Optional)
- Experiment with different models supported by TGI/vLLM on Gaudi.
- Consult [OPEA GenAIComps](https://github.com/opea-project/GenAIComps) for microservice details.
- Refer to the main [CodeGen README](../../../../README.md) for benchmarking and Kubernetes deployment options.
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-gaudi-ui-server` service with the `codegen-gaudi-react-ui-server` service as per the config below:
```yaml
codegen-gaudi-react-ui-server:
image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
container_name: codegen-gaudi-react-ui-server
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT}
depends_on:
- codegen-gaudi-backend-server
ports:
- "5174:80"
ipc: host
restart: always
```
![project-screenshot](../../../../assets/img/codegen_react.png)
## Install Copilot VSCode extension from Plugin Marketplace as the frontend
In addition to the Svelte UI, users can also install the Copilot VSCode extension from the Plugin Marketplace as the frontend.
Install `Neural Copilot` in VSCode as below.
![Install-screenshot](../../../../assets/img/codegen_copilot.png)
### How to Use
#### Service URL Setting
Please adjust the service URL in the extension settings based on the endpoint of the CodeGen backend service.
![Setting-screenshot](../../../../assets/img/codegen_settings.png)
![Setting-screenshot](../../../../assets/img/codegen_endpoint.png)
#### Customize
The Copilot enables users to input their corresponding sensitive information and tokens in the user settings according to their own needs. This customization enhances the accuracy and output content to better meet individual requirements.
![Customize](../../../../assets/img/codegen_customize.png)
#### Code Suggestion
To trigger inline completion, you'll need to type `# {your keyword} (start with your programming language's comment keyword, like // in C++ and # in python)`. Make sure the `Inline Suggest` is enabled from the VS Code Settings.
For example:
![code suggestion](../../../../assets/img/codegen_suggestion.png)
To provide programmers with a smooth experience, the Copilot supports multiple ways to trigger inline code suggestions. If you are interested in the details, they are summarized as follows:
- Generate code from single-line comments: The simplest way introduced before.
- Generate code from consecutive single-line comments:
![codegen from single-line comments](../../../../assets/img/codegen_single_line.png)
- Generate code from multi-line comments, which will not be triggered until there is at least one `space` outside the multi-line comment):
![codegen from multi-line comments](../../../../assets/img/codegen_multi_line.png)
- Automatically complete multi-line comments:
![auto complete](../../../../assets/img/codegen_auto_complete.jpg)
### Chat with AI assistant
You can start a conversation with the AI programming assistant by clicking on the robot icon in the plugin bar on the left:
![icon](../../../../assets/img/codegen_icon.png)
Then you can see the conversation window on the left, where you can chat with the AI assistant:
![dialog](../../../../assets/img/codegen_dialog.png)
There are 4 areas worth noting as shown in the screenshot above:
1. Enter and submit your question
2. Your previous questions
3. Answers from AI assistant (Code will be highlighted properly according to the programming language it is written in, also support stream output)
4. Copy or replace code with one click (Note that you need to select the code in the editor first and then click "replace", otherwise the code will be inserted)
You can also select the code in the editor and ask the AI assistant questions about the code directly.
For example:
- Select code
![select code](../../../../assets/img/codegen_select_code.png)
- Ask question and get answer
![qna](../../../../assets/img/codegen_qna.png)
```

View File

@@ -1,40 +1,130 @@
# Deploy CodeGen in a Kubernetes Cluster
# Deploy CodeGen using Kubernetes Microservices Connector (GMC)
This document outlines the deployment process for a Code Generation (CodeGen) application that utilizes the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice components on Intel Xeon servers and Gaudi machines.
This document guides you through deploying the CodeGen application on a Kubernetes cluster using the OPEA Microservices Connector (GMC).
Install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/README.md). We will soon publish images to Docker Hub, at which point no builds will be required, further simplifying install.
## Table of Contents
If you have only Intel Xeon machines you could use the codegen_xeon.yaml file or if you have a Gaudi cluster you could use codegen_gaudi.yaml
In the below example we illustrate on Xeon.
- [Purpose](#purpose)
- [Prerequisites](#prerequisites)
- [Deployment Steps](#deployment-steps)
- [1. Choose Configuration](#1-choose-configuration)
- [2. Prepare Namespace and Deploy](#2-prepare-namespace-and-deploy)
- [3. Verify Pod Status](#3-verify-pod-status)
- [Validation Steps](#validation-steps)
- [1. Deploy Test Client](#1-deploy-test-client)
- [2. Get Service URL](#2-get-service-url)
- [3. Send Test Request](#3-send-test-request)
- [Cleanup](#cleanup)
## Deploy the RAG application
## Purpose
1. Create the desired namespace if it does not already exist and deploy the application
```bash
export APP_NAMESPACE=CT
kubectl create ns $APP_NAMESPACE
sed -i "s|namespace: codegen|namespace: $APP_NAMESPACE|g" ./codegen_xeon.yaml
kubectl apply -f ./codegen_xeon.yaml
```
To deploy the multi-component CodeGen application on Kubernetes, leveraging GMC to manage the connections and routing between microservices based on a declarative configuration file.
2. Check if the application is up and ready
```bash
kubectl get pods -n $APP_NAMESPACE
```
## Prerequisites
3. Deploy a client pod for testing
```bash
kubectl create deployment client-test -n $APP_NAMESPACE --image=python:3.8.13 -- sleep infinity
```
- A running Kubernetes cluster.
- `kubectl` installed and configured to interact with your cluster.
- [GenAI Microservice Connector (GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/README.md) installed in your cluster. Follow the GMC installation guide if you haven't already.
- Access to the container images specified in the GMC configuration files (`codegen_xeon.yaml` or `codegen_gaudi.yaml`). These may be on Docker Hub or a private registry.
4. Check that client pod is ready
```bash
kubectl get pods -n $APP_NAMESPACE
```
## Deployment Steps
5. Send request to application
```bash
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
export accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codegen')].status.accessUrl}")
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl -s --no-buffer $accessUrl -X POST -d '{"query": "def print_hello_world():"}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
```
### 1. Choose Configuration
Two GMC configuration files are provided based on the target hardware for the LLM serving component:
- `codegen_xeon.yaml`: Deploys CodeGen using CPU-optimized components (suitable for Intel Xeon clusters).
- `codegen_gaudi.yaml`: Deploys CodeGen using Gaudi-optimized LLM serving components (suitable for clusters with Intel Gaudi nodes).
Select the file appropriate for your cluster hardware. The following steps use `codegen_xeon.yaml` as an example.
### 2. Prepare Namespace and Deploy
Choose a namespace for the deployment (e.g., `codegen-app`).
```bash
# Set the desired namespace
export APP_NAMESPACE=codegen-app
# Create the namespace if it doesn't exist
kubectl create ns $APP_NAMESPACE || true
# (Optional) Update the namespace within the chosen YAML file if it's not parameterized
# sed -i "s|namespace: codegen|namespace: $APP_NAMESPACE|g" ./codegen_xeon.yaml
# Apply the GMC configuration file to the chosen namespace
kubectl apply -f ./codegen_xeon.yaml -n $APP_NAMESPACE
```
*Note: If the YAML file uses a hardcoded namespace, ensure you either modify the file or deploy to that specific namespace.*
### 3. Verify Pod Status
Check that all the pods defined in the GMC configuration are successfully created and running.
```bash
kubectl get pods -n $APP_NAMESPACE
```
Wait until all pods are in the `Running` state. This might take some time for image pulling and initialization.
## Validation Steps
### 1. Deploy Test Client
Deploy a simple pod within the same namespace to use as a client for sending requests.
```bash
kubectl create deployment client-test -n $APP_NAMESPACE --image=curlimages/curl -- sleep infinity
```
Verify the client pod is running:
```bash
kubectl get pods -n $APP_NAMESPACE -l app=client-test
```
### 2. Get Service URL
Retrieve the access URL exposed by the GMC for the CodeGen application.
```bash
# Get the client pod name
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath='{.items[0].metadata.name}')
# Get the access URL provided by the 'codegen' GMC resource
# Adjust 'codegen' if the metadata.name in your YAML is different
export ACCESS_URL=$(kubectl get gmc codegen -n $APP_NAMESPACE -o jsonpath='{.status.accessUrl}')
# Display the URL (optional)
echo "Access URL: $ACCESS_URL"
```
*Note: The `accessUrl` typically points to the internal Kubernetes service endpoint for the gateway service defined in the GMC configuration.*
### 3. Send Test Request
Use the test client pod to send a `curl` request to the CodeGen service endpoint.
```bash
# Define the payload
PAYLOAD='{"messages": "def print_hello_world():"}'
# Execute curl from the client pod
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl -s --no-buffer "$ACCESS_URL" \
-X POST \
-d "$PAYLOAD" \
-H 'Content-Type: application/json'
```
**Expected Output:** A stream of JSON data containing the generated code, similar to the Docker Compose validation, ending with a `[DONE]` marker if streaming is enabled.
## Cleanup
To remove the deployed application and the test client:
```bash
# Delete the GMC deployment
kubectl delete -f ./codegen_xeon.yaml -n $APP_NAMESPACE
# Delete the test client deployment
kubectl delete deployment client-test -n $APP_NAMESPACE
# Optionally delete the namespace if no longer needed
# kubectl delete ns $APP_NAMESPACE
```

View File

@@ -1,18 +1,133 @@
# Deploy CodeGen on Kubernetes cluster
# Deploy CodeGen on Kubernetes using Helm
- You should have Helm (version >= 3.15) installed. Refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) for more information.
- For more deploy options, refer to [helm charts README](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#readme).
This guide explains how to deploy the CodeGen application to a Kubernetes cluster using the official OPEA Helm chart.
## Deploy on Xeon
## Table of Contents
- [Purpose](#purpose)
- [Prerequisites](#prerequisites)
- [Deployment Steps](#deployment-steps)
- [1. Set Hugging Face Token](#1-set-hugging-face-token)
- [2. Choose Hardware Configuration](#2-choose-hardware-configuration)
- [3. Install Helm Chart](#3-install-helm-chart)
- [Verify Deployment](#verify-deployment)
- [Accessing the Service](#accessing-the-service)
- [Customization](#customization)
- [Uninstalling the Chart](#uninstalling-the-chart)
## Purpose
To provide a standardized and configurable method for deploying the CodeGen application and its microservice dependencies onto Kubernetes using Helm.
## Prerequisites
- A running Kubernetes cluster.
- `kubectl` installed and configured to interact with your cluster.
- Helm (version >= 3.15) installed. Refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) if needed.
- Network access from your cluster nodes to download container images (from `ghcr.io/opea-project` and Hugging Face) and models.
## Deployment Steps
### 1. Set Hugging Face Token
The chart requires your Hugging Face Hub API token to download models. Set it as an environment variable:
```bash
export HFTOKEN="your-huggingface-api-token-here"
```
export HFTOKEN="insert-your-huggingface-token-here"
helm install codegen oci://ghcr.io/opea-project/charts/codegen --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f cpu-values.yaml
Replace `your-huggingface-api-token-here` with your actual token.
### 2. Choose Hardware Configuration
The CodeGen Helm chart supports different hardware configurations using values files:
- **Intel Xeon CPU:** Use `cpu-values.yaml` (located within the chart structure, or provide your own). This is suitable for general Kubernetes clusters without specific accelerators.
- **Intel Gaudi HPU:** Use `gaudi-values.yaml` (located within the chart structure, or provide your own). This requires nodes with Gaudi devices and the appropriate Kubernetes device plugins configured.
### 3. Install Helm Chart
Install the CodeGen chart from the OPEA OCI registry, providing your Hugging Face token and selecting the appropriate values file.
**Deploy on Xeon (CPU):**
```bash
helm install codegen oci://ghcr.io/opea-project/charts/codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
-f cpu-values.yaml \
--namespace codegen --create-namespace
```
*Note: `-f cpu-values.yaml` assumes a file named `cpu-values.yaml` exists locally or you are referencing one within the chart structure accessible to Helm. You might need to download it first or customize parameters directly using `--set`.*
**Deploy on Gaudi (HPU):**
```bash
helm install codegen oci://ghcr.io/opea-project/charts/codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
-f gaudi-values.yaml \
--namespace codegen --create-namespace
```
*Note: `-f gaudi-values.yaml` has the same assumption as above. Ensure your cluster meets Gaudi prerequisites.*
*The command installs the chart into the `codegen` namespace, creating it if necessary. Change `--namespace` if desired.*
## Verify Deployment
Check the status of the pods created by the Helm release:
```bash
kubectl get pods -n codegen
```
Wait for all pods (e.g., codegen-gateway, codegen-llm, codegen-embedding, redis, etc.) to reach the `Running` state. Check logs if any pods encounter issues:
```bash
kubectl logs -n codegen <pod-name>
```
## Deploy on Gaudi
## Accessing the Service
By default, the Helm chart typically exposes the CodeGen gateway service via a Kubernetes `Service` of type `ClusterIP` or `LoadBalancer`, depending on the chart's values.
- **If `ClusterIP`:** Access is typically internal to the cluster or requires port-forwarding:
```bash
# Find the service name (e.g., codegen-gateway)
kubectl get svc -n codegen
# Forward local port 7778 to the service port (usually 7778)
kubectl port-forward svc/<service-name> -n codegen 7778:7778
# Access via curl on localhost:7778
curl http://localhost:7778/v1/codegen -H "Content-Type: application/json" -d '{"messages": "Test"}'
```
- **If `LoadBalancer`:** Obtain the external IP address assigned by your cloud provider:
```bash
kubectl get svc -n codegen <service-name> -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Access using the external IP and service port (e.g., 7778)
export EXTERNAL_IP=$(kubectl get svc -n codegen <service-name> -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl http://${EXTERNAL_IP}:7778/v1/codegen -H "Content-Type: application/json" -d '{"messages": "Test"}'
```
Refer to the chart's documentation or `values.yaml` for specifics on service exposure. The UI service might also be exposed similarly (check for a UI-related service).
## Customization
You can customize the deployment by:
- Modifying the `cpu-values.yaml` or `gaudi-values.yaml` file before installation.
- Overriding parameters using the `--set` flag during `helm install`. Example:
```bash
helm install codegen oci://ghcr.io/opea-project/charts/codegen \
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
--namespace codegen --create-namespace
# Add other --set overrides or -f <your-custom-values.yaml>
```
- Refer to the [OPEA Helm Charts README](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#readme) for detailed information on available configuration options within the charts.
## Uninstalling the Chart
To remove the CodeGen deployment installed by Helm:
```bash
helm uninstall codegen -n codegen
```
export HFTOKEN="insert-your-huggingface-token-here"
helm install codegen oci://ghcr.io/opea-project/charts/codegen --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
Optionally, delete the namespace if it's no longer needed and empty:
```bash
# kubectl delete ns codegen
```