Enable vllm for CodeTrans (#1626)
Set vllm as default llm serving, and add related docker compose files, readmes, and test scripts. Issue: https://github.com/opea-project/GenAIExamples/issues/1436 Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
@@ -2,6 +2,8 @@
|
||||
|
||||
This document outlines the deployment process for a CodeTrans application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution using microservices `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
|
||||
|
||||
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [start-microservice-docker-containers](#start-microservice-docker-containers) section in this page.
|
||||
|
||||
## 🚀 Create an AWS Xeon Instance
|
||||
|
||||
To run the example on a AWS Xeon instance, start by creating an AWS account if you don't have one already. Then, get started with the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home). AWS EC2 M7i, C7i, C7i-flex and M7i-flex are Intel Xeon Scalable processor instances suitable for the task. (code named Sapphire Rapids).
|
||||
@@ -63,6 +65,37 @@ By default, the LLM model is set to a default value as listed below:
|
||||
|
||||
Change the `LLM_MODEL_ID` below for your needs.
|
||||
|
||||
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
|
||||
|
||||
1. Online
|
||||
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
model_name="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8008:80 -v ./data:/data --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80
|
||||
# Start TGI LLM Service
|
||||
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id $model_name
|
||||
```
|
||||
|
||||
2. Offline
|
||||
|
||||
- Search your model name in ModelScope. For example, check [this page](https://www.modelscope.cn/models/rubraAI/Mistral-7B-Instruct-v0.3/files) for model `mistralai/Mistral-7B-Instruct-v0.3`.
|
||||
|
||||
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
|
||||
|
||||
- Run the following command to start the LLM service.
|
||||
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export model_path="/path/to/model"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8008:80 -v $model_path:/data --name vllm-service --shm-size 128g opea/vllm:latest --model /data --host 0.0.0.0 --port 80
|
||||
# Start TGI LLM Service
|
||||
docker run -p 8008:80 -v $model_path:/data --name tgi-service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id /data
|
||||
```
|
||||
|
||||
### Setup Environment Variables
|
||||
|
||||
1. Set the required environment variables:
|
||||
@@ -95,15 +128,47 @@ Change the `LLM_MODEL_ID` below for your needs.
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker_compose/intel/cpu/xeon
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
If use vLLM as the LLM serving backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
If use TGI as the LLM serving backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose_tgi.yaml up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
1. TGI Service
|
||||
1. LLM backend Service
|
||||
|
||||
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready.
|
||||
|
||||
Try the command below to check whether the LLM serving is ready.
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8008/generate \
|
||||
# vLLM service
|
||||
docker logs codetrans-xeon-vllm-service 2>&1 | grep complete
|
||||
# If the service is ready, you will get the response like below.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
```bash
|
||||
# TGI service
|
||||
docker logs codetrans-xeon-tgi-service | grep Connected
|
||||
# If the service is ready, you will get the response like below.
|
||||
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
|
||||
```
|
||||
|
||||
Then try the `cURL` command below to validate services.
|
||||
|
||||
```bash
|
||||
# either vLLM or TGI service
|
||||
curl http://${host_ip}:8008/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \
|
||||
-H 'Content-Type: application/json'
|
||||
|
||||
@@ -2,9 +2,9 @@
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
services:
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
|
||||
container_name: codetrans-tgi-service
|
||||
vllm-service:
|
||||
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
|
||||
container_name: codetrans-xeon-vllm-service
|
||||
ports:
|
||||
- "8008:80"
|
||||
volumes:
|
||||
@@ -15,18 +15,19 @@ services:
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
host_ip: ${host_ip}
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
VLLM_TORCH_PROFILER_DIR: "/mnt"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl -f http://$host_ip:8008/health || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 10s
|
||||
retries: 100
|
||||
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
|
||||
command: --model $LLM_MODEL_ID --host 0.0.0.0 --port 80
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
|
||||
container_name: llm-textgen-server
|
||||
container_name: codetrans-xeon-llm-server
|
||||
depends_on:
|
||||
tgi-service:
|
||||
vllm-service:
|
||||
condition: service_healthy
|
||||
ports:
|
||||
- "9000:9000"
|
||||
@@ -35,18 +36,19 @@ services:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
LLM_ENDPOINT: ${LLM_ENDPOINT}
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
restart: unless-stopped
|
||||
codetrans-xeon-backend-server:
|
||||
image: ${REGISTRY:-opea}/codetrans:${TAG:-latest}
|
||||
container_name: codetrans-xeon-backend-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
- vllm-service
|
||||
- llm
|
||||
ports:
|
||||
- "7777:7777"
|
||||
- "${BACKEND_SERVICE_PORT:-7777}:7777"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
@@ -61,7 +63,7 @@ services:
|
||||
depends_on:
|
||||
- codetrans-xeon-backend-server
|
||||
ports:
|
||||
- "5173:5173"
|
||||
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
|
||||
95
CodeTrans/docker_compose/intel/cpu/xeon/compose_tgi.yaml
Normal file
95
CodeTrans/docker_compose/intel/cpu/xeon/compose_tgi.yaml
Normal file
@@ -0,0 +1,95 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
services:
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
|
||||
container_name: codetrans-xeon-tgi-service
|
||||
ports:
|
||||
- "8008:80"
|
||||
volumes:
|
||||
- "${MODEL_CACHE}:/data"
|
||||
shm_size: 1g
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
host_ip: ${host_ip}
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl -f http://$host_ip:8008/health || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 10s
|
||||
retries: 100
|
||||
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
|
||||
container_name: codetrans-xeon-llm-server
|
||||
depends_on:
|
||||
tgi-service:
|
||||
condition: service_healthy
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LLM_ENDPOINT: ${LLM_ENDPOINT}
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
restart: unless-stopped
|
||||
codetrans-xeon-backend-server:
|
||||
image: ${REGISTRY:-opea}/codetrans:${TAG:-latest}
|
||||
container_name: codetrans-xeon-backend-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
- llm
|
||||
ports:
|
||||
- "${BACKEND_SERVICE_PORT:-7777}:7777"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
ipc: host
|
||||
restart: always
|
||||
codetrans-xeon-ui-server:
|
||||
image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest}
|
||||
container_name: codetrans-xeon-ui-server
|
||||
depends_on:
|
||||
- codetrans-xeon-backend-server
|
||||
ports:
|
||||
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- BASE_URL=${BACKEND_SERVICE_ENDPOINT}
|
||||
ipc: host
|
||||
restart: always
|
||||
codetrans-xeon-nginx-server:
|
||||
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
|
||||
container_name: codetrans-xeon-nginx-server
|
||||
depends_on:
|
||||
- codetrans-xeon-backend-server
|
||||
- codetrans-xeon-ui-server
|
||||
ports:
|
||||
- "${NGINX_PORT:-80}:80"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
|
||||
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
|
||||
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
|
||||
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
|
||||
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
|
||||
ipc: host
|
||||
restart: always
|
||||
|
||||
networks:
|
||||
default:
|
||||
driver: bridge
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
This document outlines the deployment process for a CodeTrans application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution using microservices `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
|
||||
|
||||
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [start-microservice-docker-containers](#start-microservice-docker-containers) section in this page.
|
||||
|
||||
## 🚀 Build Docker Images
|
||||
|
||||
First of all, you need to build Docker Images locally and install the python package of it. This step can be ignored after the Docker images published to Docker hub.
|
||||
@@ -55,6 +57,37 @@ By default, the LLM model is set to a default value as listed below:
|
||||
|
||||
Change the `LLM_MODEL_ID` below for your needs.
|
||||
|
||||
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
|
||||
|
||||
1. Online
|
||||
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
model_name="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8008:80 -v ./data:/data --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80
|
||||
# Start TGI LLM Service
|
||||
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id $model_name
|
||||
```
|
||||
|
||||
2. Offline
|
||||
|
||||
- Search your model name in ModelScope. For example, check [this page](https://www.modelscope.cn/models/rubraAI/Mistral-7B-Instruct-v0.3/files) for model `mistralai/Mistral-7B-Instruct-v0.3`.
|
||||
|
||||
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
|
||||
|
||||
- Run the following command to start the LLM service.
|
||||
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export model_path="/path/to/model"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8008:80 -v $model_path:/data --name vllm-service --shm-size 128g opea/vllm:latest --model /data --host 0.0.0.0 --port 80
|
||||
# Start TGI LLM Service
|
||||
docker run -p 8008:80 -v $model_path:/data --name tgi-service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id /data
|
||||
```
|
||||
|
||||
### Setup Environment Variables
|
||||
|
||||
1. Set the required environment variables:
|
||||
@@ -87,12 +120,43 @@ Change the `LLM_MODEL_ID` below for your needs.
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
If use vLLM as the LLM serving backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
If use TGI as the LLM serving backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose_tgi.yaml up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
1. TGI Service
|
||||
1. LLM backend Service
|
||||
|
||||
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready.
|
||||
|
||||
Try the command below to check whether the LLM serving is ready.
|
||||
|
||||
```bash
|
||||
# vLLM service
|
||||
docker logs codetrans-gaudi-vllm-service 2>&1 | grep complete
|
||||
# If the service is ready, you will get the response like below.
|
||||
INFO: Application startup complete.
|
||||
```
|
||||
|
||||
```bash
|
||||
# TGI service
|
||||
docker logs codetrans-gaudi-tgi-service | grep Connected
|
||||
# If the service is ready, you will get the response like below.
|
||||
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
|
||||
```
|
||||
|
||||
Then try the `cURL` command below to validate services.
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8008/generate \
|
||||
|
||||
@@ -2,9 +2,9 @@
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
services:
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
|
||||
container_name: codetrans-tgi-service
|
||||
vllm-service:
|
||||
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-vllm-service
|
||||
ports:
|
||||
- "8008:80"
|
||||
volumes:
|
||||
@@ -13,28 +13,27 @@ services:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
ENABLE_HPU_GRAPH: true
|
||||
LIMIT_HPU_GRAPH: true
|
||||
USE_FLASH_ATTENTION: true
|
||||
FLASH_ATTENTION_RECOMPUTE: true
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
NUM_CARDS: ${NUM_CARDS}
|
||||
VLLM_TORCH_PROFILER_DIR: "/mnt"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "sleep 500 && exit 0"]
|
||||
interval: 1s
|
||||
timeout: 505s
|
||||
retries: 1
|
||||
test: ["CMD-SHELL", "curl -f http://$host_ip:8008/health || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 10s
|
||||
retries: 100
|
||||
runtime: habana
|
||||
cap_add:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
|
||||
command: --model $LLM_MODEL_ID --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
|
||||
container_name: llm-textgen-gaudi-server
|
||||
container_name: codetrans-xeon-llm-server
|
||||
depends_on:
|
||||
tgi-service:
|
||||
vllm-service:
|
||||
condition: service_healthy
|
||||
ports:
|
||||
- "9000:9000"
|
||||
@@ -43,18 +42,19 @@ services:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
LLM_ENDPOINT: ${LLM_ENDPOINT}
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
restart: unless-stopped
|
||||
codetrans-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/codetrans:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-backend-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
- vllm-service
|
||||
- llm
|
||||
ports:
|
||||
- "7777:7777"
|
||||
- "${BACKEND_SERVICE_PORT:-7777}:7777"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
@@ -69,7 +69,7 @@ services:
|
||||
depends_on:
|
||||
- codetrans-gaudi-backend-server
|
||||
ports:
|
||||
- "5173:5173"
|
||||
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
|
||||
99
CodeTrans/docker_compose/intel/hpu/gaudi/compose_tgi.yaml
Normal file
99
CodeTrans/docker_compose/intel/hpu/gaudi/compose_tgi.yaml
Normal file
@@ -0,0 +1,99 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
services:
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
|
||||
container_name: codetrans-gaudi-tgi-service
|
||||
ports:
|
||||
- "8008:80"
|
||||
volumes:
|
||||
- "${MODEL_CACHE}:/data"
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
ENABLE_HPU_GRAPH: true
|
||||
LIMIT_HPU_GRAPH: true
|
||||
USE_FLASH_ATTENTION: true
|
||||
FLASH_ATTENTION_RECOMPUTE: true
|
||||
runtime: habana
|
||||
cap_add:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-llm-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LLM_ENDPOINT: ${LLM_ENDPOINT}
|
||||
LLM_MODEL_ID: ${LLM_MODEL_ID}
|
||||
LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
restart: unless-stopped
|
||||
codetrans-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/codetrans:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-backend-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
- llm
|
||||
ports:
|
||||
- "${BACKEND_SERVICE_PORT:-7777}:7777"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
ipc: host
|
||||
restart: always
|
||||
codetrans-gaudi-ui-server:
|
||||
image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-ui-server
|
||||
depends_on:
|
||||
- codetrans-gaudi-backend-server
|
||||
ports:
|
||||
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- BASE_URL=${BACKEND_SERVICE_ENDPOINT}
|
||||
ipc: host
|
||||
restart: always
|
||||
codetrans-gaudi-nginx-server:
|
||||
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
|
||||
container_name: codetrans-gaudi-nginx-server
|
||||
depends_on:
|
||||
- codetrans-gaudi-backend-server
|
||||
- codetrans-gaudi-ui-server
|
||||
ports:
|
||||
- "${NGINX_PORT:-80}:80"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
|
||||
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
|
||||
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
|
||||
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
|
||||
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
|
||||
ipc: host
|
||||
restart: always
|
||||
|
||||
networks:
|
||||
default:
|
||||
driver: bridge
|
||||
@@ -8,7 +8,12 @@ popd > /dev/null
|
||||
|
||||
|
||||
export LLM_MODEL_ID="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
|
||||
export LLM_ENDPOINT="http://${host_ip}:8008"
|
||||
export LLM_COMPONENT_NAME="OpeaTextGenService"
|
||||
export NUM_CARDS=1
|
||||
export BLOCK_SIZE=128
|
||||
export MAX_NUM_SEQS=256
|
||||
export MAX_SEQ_LEN_TO_CAPTURE=2048
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
export LLM_SERVICE_HOST_IP=${host_ip}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
|
||||
|
||||
@@ -23,6 +23,18 @@ services:
|
||||
dockerfile: comps/llms/src/text-generation/Dockerfile
|
||||
extends: codetrans
|
||||
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
|
||||
vllm:
|
||||
build:
|
||||
context: vllm
|
||||
dockerfile: Dockerfile.cpu
|
||||
extends: codetrans
|
||||
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
|
||||
vllm-gaudi:
|
||||
build:
|
||||
context: vllm-fork
|
||||
dockerfile: Dockerfile.hpu
|
||||
extends: codetrans
|
||||
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
|
||||
nginx:
|
||||
build:
|
||||
context: GenAIComps
|
||||
|
||||
@@ -30,12 +30,12 @@ function build_docker_images() {
|
||||
|
||||
cd $WORKPATH/docker_image_build
|
||||
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
|
||||
git clone --depth 1 --branch v0.6.4.post2+Gaudi-1.19.0 https://github.com/HabanaAI/vllm-fork.git
|
||||
|
||||
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
|
||||
service_list="codetrans codetrans-ui llm-textgen nginx"
|
||||
service_list="codetrans codetrans-ui llm-textgen vllm-gaudi nginx"
|
||||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
|
||||
|
||||
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
|
||||
docker images && sleep 1s
|
||||
}
|
||||
|
||||
@@ -45,7 +45,12 @@ function start_services() {
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${http_proxy}
|
||||
export LLM_MODEL_ID="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
export TGI_LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_COMPONENT_NAME="OpeaTextGenService"
|
||||
export NUM_CARDS=1
|
||||
export BLOCK_SIZE=128
|
||||
export MAX_NUM_SEQS=256
|
||||
export MAX_SEQ_LEN_TO_CAPTURE=2048
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export MEGA_SERVICE_HOST_IP=${ip_address}
|
||||
export LLM_SERVICE_HOST_IP=${ip_address}
|
||||
@@ -65,13 +70,15 @@ function start_services() {
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 100 ]]; do
|
||||
docker logs codetrans-tgi-service > ${LOG_PATH}/tgi_service_start.log
|
||||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
|
||||
docker logs codetrans-gaudi-vllm-service > ${LOG_PATH}/vllm_service_start.log 2>&1
|
||||
if grep -q complete ${LOG_PATH}/vllm_service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 5s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
sleep 1m
|
||||
}
|
||||
|
||||
function validate_services() {
|
||||
@@ -103,27 +110,19 @@ function validate_services() {
|
||||
}
|
||||
|
||||
function validate_microservices() {
|
||||
# tgi for embedding service
|
||||
validate_services \
|
||||
"${ip_address}:8008/generate" \
|
||||
"generated_text" \
|
||||
"tgi" \
|
||||
"codetrans-tgi-service" \
|
||||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
|
||||
|
||||
# llm microservice
|
||||
validate_services \
|
||||
"${ip_address}:9000/v1/chat/completions" \
|
||||
"data: " \
|
||||
"llm" \
|
||||
"llm-textgen-gaudi-server" \
|
||||
"codetrans-xeon-llm-server" \
|
||||
'{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}'
|
||||
}
|
||||
|
||||
function validate_megaservice() {
|
||||
# Curl the Mega Service
|
||||
validate_services \
|
||||
"${ip_address}:7777/v1/codetrans" \
|
||||
"${ip_address}:${BACKEND_SERVICE_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans" \
|
||||
"codetrans-gaudi-backend-server" \
|
||||
@@ -131,7 +130,7 @@ function validate_megaservice() {
|
||||
|
||||
# test the megeservice via nginx
|
||||
validate_services \
|
||||
"${ip_address}:80/v1/codetrans" \
|
||||
"${ip_address}:${NGINX_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans-nginx" \
|
||||
"codetrans-gaudi-nginx-server" \
|
||||
@@ -170,7 +169,7 @@ function validate_frontend() {
|
||||
|
||||
function stop_docker() {
|
||||
cd $WORKPATH/docker_compose/intel/hpu/gaudi
|
||||
docker compose stop && docker compose rm -f
|
||||
docker compose -f compose.yaml stop && docker compose rm -f
|
||||
}
|
||||
|
||||
function main() {
|
||||
|
||||
@@ -30,12 +30,16 @@ function build_docker_images() {
|
||||
|
||||
cd $WORKPATH/docker_image_build
|
||||
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
|
||||
git clone https://github.com/vllm-project/vllm.git && cd vllm
|
||||
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
|
||||
echo "Check out vLLM tag ${VLLM_VER}"
|
||||
git checkout ${VLLM_VER} &> /dev/null
|
||||
cd ../
|
||||
|
||||
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
|
||||
service_list="codetrans codetrans-ui llm-textgen nginx"
|
||||
service_list="codetrans codetrans-ui llm-textgen vllm nginx"
|
||||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
|
||||
|
||||
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
|
||||
docker images && sleep 1s
|
||||
}
|
||||
|
||||
@@ -44,7 +48,8 @@ function start_services() {
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${http_proxy}
|
||||
export LLM_MODEL_ID="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
export TGI_LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_COMPONENT_NAME="OpeaTextGenService"
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export MEGA_SERVICE_HOST_IP=${ip_address}
|
||||
export LLM_SERVICE_HOST_IP=${ip_address}
|
||||
@@ -60,17 +65,19 @@ function start_services() {
|
||||
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
|
||||
|
||||
# Start Docker Containers
|
||||
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
|
||||
docker compose -f compose.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 100 ]]; do
|
||||
docker logs codetrans-tgi-service > ${LOG_PATH}/tgi_service_start.log
|
||||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
|
||||
docker logs codetrans-xeon-vllm-service > ${LOG_PATH}/vllm_service_start.log 2>&1
|
||||
if grep -q complete ${LOG_PATH}/vllm_service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 5s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
sleep 1m
|
||||
}
|
||||
|
||||
function validate_services() {
|
||||
@@ -102,20 +109,12 @@ function validate_services() {
|
||||
}
|
||||
|
||||
function validate_microservices() {
|
||||
# tgi for embedding service
|
||||
validate_services \
|
||||
"${ip_address}:8008/generate" \
|
||||
"generated_text" \
|
||||
"tgi" \
|
||||
"codetrans-tgi-service" \
|
||||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
|
||||
|
||||
# llm microservice
|
||||
validate_services \
|
||||
"${ip_address}:9000/v1/chat/completions" \
|
||||
"data: " \
|
||||
"llm" \
|
||||
"llm-textgen-server" \
|
||||
"codetrans-xeon-llm-server" \
|
||||
'{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}'
|
||||
|
||||
}
|
||||
@@ -123,7 +122,7 @@ function validate_microservices() {
|
||||
function validate_megaservice() {
|
||||
# Curl the Mega Service
|
||||
validate_services \
|
||||
"${ip_address}:7777/v1/codetrans" \
|
||||
"${ip_address}:${BACKEND_SERVICE_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans" \
|
||||
"codetrans-xeon-backend-server" \
|
||||
@@ -131,7 +130,7 @@ function validate_megaservice() {
|
||||
|
||||
# test the megeservice via nginx
|
||||
validate_services \
|
||||
"${ip_address}:80/v1/codetrans" \
|
||||
"${ip_address}:${NGINX_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans-nginx" \
|
||||
"codetrans-xeon-nginx-server" \
|
||||
@@ -169,7 +168,7 @@ function validate_frontend() {
|
||||
|
||||
function stop_docker() {
|
||||
cd $WORKPATH/docker_compose/intel/cpu/xeon/
|
||||
docker compose stop && docker compose rm -f
|
||||
docker compose -f compose.yaml stop && docker compose rm -f
|
||||
}
|
||||
|
||||
function main() {
|
||||
|
||||
194
CodeTrans/tests/test_compose_tgi_on_gaudi.sh
Normal file
194
CodeTrans/tests/test_compose_tgi_on_gaudi.sh
Normal file
@@ -0,0 +1,194 @@
|
||||
#!/bin/bash
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
set -xe
|
||||
IMAGE_REPO=${IMAGE_REPO:-"opea"}
|
||||
IMAGE_TAG=${IMAGE_TAG:-"latest"}
|
||||
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
|
||||
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
|
||||
export REGISTRY=${IMAGE_REPO}
|
||||
export TAG=${IMAGE_TAG}
|
||||
export MODEL_CACHE=${model_cache:-"./data"}
|
||||
|
||||
WORKPATH=$(dirname "$PWD")
|
||||
LOG_PATH="$WORKPATH/tests"
|
||||
ip_address=$(hostname -I | awk '{print $1}')
|
||||
|
||||
function build_docker_images() {
|
||||
opea_branch=${opea_branch:-"main"}
|
||||
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
|
||||
if [[ "${opea_branch}" != "main" ]]; then
|
||||
cd $WORKPATH
|
||||
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
|
||||
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
|
||||
find . -type f -name "Dockerfile*" | while read -r file; do
|
||||
echo "Processing file: $file"
|
||||
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
|
||||
done
|
||||
fi
|
||||
|
||||
cd $WORKPATH/docker_image_build
|
||||
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
|
||||
|
||||
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
|
||||
service_list="codetrans codetrans-ui llm-textgen nginx"
|
||||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
|
||||
|
||||
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
|
||||
docker images && sleep 1s
|
||||
}
|
||||
|
||||
function start_services() {
|
||||
cd $WORKPATH/docker_compose/intel/hpu/gaudi/
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${http_proxy}
|
||||
export LLM_MODEL_ID="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
export LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_COMPONENT_NAME="OpeaTextGenService"
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export MEGA_SERVICE_HOST_IP=${ip_address}
|
||||
export LLM_SERVICE_HOST_IP=${ip_address}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7777/v1/codetrans"
|
||||
export FRONTEND_SERVICE_IP=${ip_address}
|
||||
export FRONTEND_SERVICE_PORT=5173
|
||||
export BACKEND_SERVICE_NAME=codetrans
|
||||
export BACKEND_SERVICE_IP=${ip_address}
|
||||
export BACKEND_SERVICE_PORT=7777
|
||||
export NGINX_PORT=80
|
||||
export host_ip=${ip_address}
|
||||
|
||||
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
|
||||
|
||||
# Start Docker Containers
|
||||
docker compose -f compose_tgi.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 100 ]]; do
|
||||
docker logs codetrans-gaudi-tgi-service > ${LOG_PATH}/tgi_service_start.log
|
||||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 5s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
sleep 1m
|
||||
}
|
||||
|
||||
function validate_services() {
|
||||
local URL="$1"
|
||||
local EXPECTED_RESULT="$2"
|
||||
local SERVICE_NAME="$3"
|
||||
local DOCKER_NAME="$4"
|
||||
local INPUT_DATA="$5"
|
||||
|
||||
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
|
||||
if [ "$HTTP_STATUS" -eq 200 ]; then
|
||||
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
|
||||
|
||||
local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
|
||||
|
||||
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
|
||||
echo "[ $SERVICE_NAME ] Content is as expected."
|
||||
else
|
||||
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
|
||||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
|
||||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
|
||||
exit 1
|
||||
fi
|
||||
sleep 5s
|
||||
}
|
||||
|
||||
function validate_microservices() {
|
||||
# tgi for embedding service
|
||||
validate_services \
|
||||
"${ip_address}:8008/generate" \
|
||||
"generated_text" \
|
||||
"tgi" \
|
||||
"codetrans-gaudi-tgi-service" \
|
||||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
|
||||
|
||||
# llm microservice
|
||||
validate_services \
|
||||
"${ip_address}:9000/v1/chat/completions" \
|
||||
"data: " \
|
||||
"llm" \
|
||||
"codetrans-gaudi-llm-server" \
|
||||
'{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}'
|
||||
|
||||
}
|
||||
|
||||
function validate_megaservice() {
|
||||
# Curl the Mega Service
|
||||
validate_services \
|
||||
"${ip_address}:${BACKEND_SERVICE_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans" \
|
||||
"codetrans-gaudi-backend-server" \
|
||||
'{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
|
||||
|
||||
# test the megeservice via nginx
|
||||
validate_services \
|
||||
"${ip_address}:${NGINX_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans-nginx" \
|
||||
"codetrans-gaudi-nginx-server" \
|
||||
'{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
|
||||
|
||||
}
|
||||
|
||||
function validate_frontend() {
|
||||
cd $WORKPATH/ui/svelte
|
||||
local conda_env_name="OPEA_e2e"
|
||||
export PATH=${HOME}/miniforge3/bin/:$PATH
|
||||
if conda info --envs | grep -q "$conda_env_name"; then
|
||||
echo "$conda_env_name exist!"
|
||||
else
|
||||
conda create -n ${conda_env_name} python=3.12 -y
|
||||
fi
|
||||
source activate ${conda_env_name}
|
||||
|
||||
sed -i "s/localhost/$ip_address/g" playwright.config.ts
|
||||
|
||||
conda install -c conda-forge nodejs=22.6.0 -y
|
||||
npm install && npm ci && npx playwright install --with-deps
|
||||
node -v && npm -v && pip list
|
||||
|
||||
exit_status=0
|
||||
npx playwright test || exit_status=$?
|
||||
|
||||
if [ $exit_status -ne 0 ]; then
|
||||
echo "[TEST INFO]: ---------frontend test failed---------"
|
||||
exit $exit_status
|
||||
else
|
||||
echo "[TEST INFO]: ---------frontend test passed---------"
|
||||
fi
|
||||
}
|
||||
|
||||
function stop_docker() {
|
||||
cd $WORKPATH/docker_compose/intel/hpu/gaudi/
|
||||
docker compose -f compose_tgi.yaml stop && docker compose rm -f
|
||||
}
|
||||
|
||||
function main() {
|
||||
|
||||
stop_docker
|
||||
|
||||
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
|
||||
start_services
|
||||
|
||||
validate_microservices
|
||||
validate_megaservice
|
||||
validate_frontend
|
||||
|
||||
stop_docker
|
||||
echo y | docker system prune
|
||||
|
||||
}
|
||||
|
||||
main
|
||||
194
CodeTrans/tests/test_compose_tgi_on_xeon.sh
Normal file
194
CodeTrans/tests/test_compose_tgi_on_xeon.sh
Normal file
@@ -0,0 +1,194 @@
|
||||
#!/bin/bash
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
set -xe
|
||||
IMAGE_REPO=${IMAGE_REPO:-"opea"}
|
||||
IMAGE_TAG=${IMAGE_TAG:-"latest"}
|
||||
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
|
||||
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
|
||||
export REGISTRY=${IMAGE_REPO}
|
||||
export TAG=${IMAGE_TAG}
|
||||
export MODEL_CACHE=${model_cache:-"./data"}
|
||||
|
||||
WORKPATH=$(dirname "$PWD")
|
||||
LOG_PATH="$WORKPATH/tests"
|
||||
ip_address=$(hostname -I | awk '{print $1}')
|
||||
|
||||
function build_docker_images() {
|
||||
opea_branch=${opea_branch:-"main"}
|
||||
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
|
||||
if [[ "${opea_branch}" != "main" ]]; then
|
||||
cd $WORKPATH
|
||||
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
|
||||
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
|
||||
find . -type f -name "Dockerfile*" | while read -r file; do
|
||||
echo "Processing file: $file"
|
||||
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
|
||||
done
|
||||
fi
|
||||
|
||||
cd $WORKPATH/docker_image_build
|
||||
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
|
||||
|
||||
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
|
||||
service_list="codetrans codetrans-ui llm-textgen nginx"
|
||||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
|
||||
|
||||
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
|
||||
docker images && sleep 1s
|
||||
}
|
||||
|
||||
function start_services() {
|
||||
cd $WORKPATH/docker_compose/intel/cpu/xeon/
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${http_proxy}
|
||||
export LLM_MODEL_ID="mistralai/Mistral-7B-Instruct-v0.3"
|
||||
export LLM_ENDPOINT="http://${ip_address}:8008"
|
||||
export LLM_COMPONENT_NAME="OpeaTextGenService"
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export MEGA_SERVICE_HOST_IP=${ip_address}
|
||||
export LLM_SERVICE_HOST_IP=${ip_address}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7777/v1/codetrans"
|
||||
export FRONTEND_SERVICE_IP=${ip_address}
|
||||
export FRONTEND_SERVICE_PORT=5173
|
||||
export BACKEND_SERVICE_NAME=codetrans
|
||||
export BACKEND_SERVICE_IP=${ip_address}
|
||||
export BACKEND_SERVICE_PORT=7777
|
||||
export NGINX_PORT=80
|
||||
export host_ip=${ip_address}
|
||||
|
||||
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
|
||||
|
||||
# Start Docker Containers
|
||||
docker compose -f compose_tgi.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 100 ]]; do
|
||||
docker logs codetrans-xeon-tgi-service > ${LOG_PATH}/tgi_service_start.log
|
||||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 5s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
sleep 1m
|
||||
}
|
||||
|
||||
function validate_services() {
|
||||
local URL="$1"
|
||||
local EXPECTED_RESULT="$2"
|
||||
local SERVICE_NAME="$3"
|
||||
local DOCKER_NAME="$4"
|
||||
local INPUT_DATA="$5"
|
||||
|
||||
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
|
||||
if [ "$HTTP_STATUS" -eq 200 ]; then
|
||||
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
|
||||
|
||||
local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
|
||||
|
||||
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
|
||||
echo "[ $SERVICE_NAME ] Content is as expected."
|
||||
else
|
||||
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
|
||||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
|
||||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
|
||||
exit 1
|
||||
fi
|
||||
sleep 5s
|
||||
}
|
||||
|
||||
function validate_microservices() {
|
||||
# tgi for embedding service
|
||||
validate_services \
|
||||
"${ip_address}:8008/generate" \
|
||||
"generated_text" \
|
||||
"tgi" \
|
||||
"codetrans-xeon-tgi-service" \
|
||||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
|
||||
|
||||
# llm microservice
|
||||
validate_services \
|
||||
"${ip_address}:9000/v1/chat/completions" \
|
||||
"data: " \
|
||||
"llm" \
|
||||
"codetrans-xeon-llm-server" \
|
||||
'{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}'
|
||||
|
||||
}
|
||||
|
||||
function validate_megaservice() {
|
||||
# Curl the Mega Service
|
||||
validate_services \
|
||||
"${ip_address}:${BACKEND_SERVICE_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans" \
|
||||
"codetrans-xeon-backend-server" \
|
||||
'{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
|
||||
|
||||
# test the megeservice via nginx
|
||||
validate_services \
|
||||
"${ip_address}:${NGINX_PORT}/v1/codetrans" \
|
||||
"print" \
|
||||
"mega-codetrans-nginx" \
|
||||
"codetrans-xeon-nginx-server" \
|
||||
'{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
|
||||
|
||||
}
|
||||
|
||||
function validate_frontend() {
|
||||
cd $WORKPATH/ui/svelte
|
||||
local conda_env_name="OPEA_e2e"
|
||||
export PATH=${HOME}/miniforge3/bin/:$PATH
|
||||
if conda info --envs | grep -q "$conda_env_name"; then
|
||||
echo "$conda_env_name exist!"
|
||||
else
|
||||
conda create -n ${conda_env_name} python=3.12 -y
|
||||
fi
|
||||
source activate ${conda_env_name}
|
||||
|
||||
sed -i "s/localhost/$ip_address/g" playwright.config.ts
|
||||
|
||||
conda install -c conda-forge nodejs=22.6.0 -y
|
||||
npm install && npm ci && npx playwright install --with-deps
|
||||
node -v && npm -v && pip list
|
||||
|
||||
exit_status=0
|
||||
npx playwright test || exit_status=$?
|
||||
|
||||
if [ $exit_status -ne 0 ]; then
|
||||
echo "[TEST INFO]: ---------frontend test failed---------"
|
||||
exit $exit_status
|
||||
else
|
||||
echo "[TEST INFO]: ---------frontend test passed---------"
|
||||
fi
|
||||
}
|
||||
|
||||
function stop_docker() {
|
||||
cd $WORKPATH/docker_compose/intel/cpu/xeon/
|
||||
docker compose -f compose_tgi.yaml stop && docker compose rm -f
|
||||
}
|
||||
|
||||
function main() {
|
||||
|
||||
stop_docker
|
||||
|
||||
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
|
||||
start_services
|
||||
|
||||
validate_microservices
|
||||
validate_megaservice
|
||||
validate_frontend
|
||||
|
||||
stop_docker
|
||||
echo y | docker system prune
|
||||
|
||||
}
|
||||
|
||||
main
|
||||
Reference in New Issue
Block a user