[ Translation ] Refine documents (#1795)
Signed-off-by: ZePan110 <ze.pan@intel.com>
This commit is contained in:
@@ -1,8 +1,15 @@
|
|||||||
# Translation Application
|
# Translation Application
|
||||||
|
|
||||||
Language Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text.
|
The Translation example demonstrates the implementation of language translation using OPEA component-level microservices.
|
||||||
|
|
||||||
Translation architecture shows below:
|
## Table of contents
|
||||||
|
|
||||||
|
1. [Architecture](#architecture)
|
||||||
|
2. [Deployment Options](#deployment-options)
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
The architecture of the Translation Application is illustrated below:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@@ -60,14 +67,12 @@ flowchart LR
|
|||||||
|
|
||||||
This Translation use case performs Language Translation Inference across multiple platforms. Currently, we provide the example for [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand OPEA ecosystem.
|
This Translation use case performs Language Translation Inference across multiple platforms. Currently, we provide the example for [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand OPEA ecosystem.
|
||||||
|
|
||||||
## Deploy Translation Service
|
## Deployment Options
|
||||||
|
|
||||||
The Translation service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processors.
|
The table below lists the available deployment options and their implementation details for different hardware platforms.
|
||||||
|
|
||||||
### Deploy Translation on Gaudi
|
| Platform | Deployment Method | Link |
|
||||||
|
| ------------ | ----------------- | ----------------------------------------------------------------- |
|
||||||
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instructions on deploying Translation on Gaudi.
|
| Intel Xeon | Docker compose | [Deployment on Xeon](./docker_compose/intel/cpu/xeon/README.md) |
|
||||||
|
| Intel Gaudi2 | Docker compose | [Deployment on Gaudi](./docker_compose/intel/hpu/gaudi/README.md) |
|
||||||
### Deploy Translation on Xeon
|
| AMD ROCm | Docker compose | [Deployment on AMD Rocm](./docker_compose/amd/gpu/rocm/README.md) |
|
||||||
|
|
||||||
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying Translation on Xeon.
|
|
||||||
|
|||||||
@@ -1,116 +1,149 @@
|
|||||||
# Build and deploy Translation Application on AMD GPU (ROCm)
|
# Example Translation Deployment on AMD GPU (ROCm)
|
||||||
|
|
||||||
## Build Docker Images
|
This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD GPU (ROCm). This example includes the following sections:
|
||||||
|
|
||||||
### 1. Build Docker Image
|
- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on AMD GPU (ROCm).
|
||||||
|
- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files.
|
||||||
|
- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes.
|
||||||
|
|
||||||
- #### Create application install directory and go to it:
|
## Translation Quick Start Deployment
|
||||||
|
|
||||||
```bash
|
This section describes how to quickly deploy and test the Translation service manually on AMD GPU (ROCm). The basic steps are:
|
||||||
mkdir ~/translation-install && cd translation-install
|
|
||||||
```
|
|
||||||
|
|
||||||
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
|
1. [Access the Code](#access-the-code)
|
||||||
|
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
|
||||||
|
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
|
||||||
|
4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
|
||||||
|
5. [Check the Deployment Status](#check-the-deployment-status)
|
||||||
|
6. [Test the Pipeline](#test-the-pipeline)
|
||||||
|
7. [Cleanup the Deployment](#cleanup-the-deployment)
|
||||||
|
|
||||||
```bash
|
### Access the Code
|
||||||
git clone https://github.com/opea-project/GenAIExamples.git
|
|
||||||
```
|
|
||||||
|
|
||||||
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
|
Clone the GenAIExample repository and access the Translation AMD GPU (ROCm) Docker Compose files and supporting scripts:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
|
git clone https://github.com/opea-project/GenAIExamples.git
|
||||||
```
|
cd GenAIExamples/Translation/docker_compose/amd/gpu/rocm/
|
||||||
|
```
|
||||||
|
|
||||||
We remind you that when using a specific version of the code, you need to use the README from this version:
|
Checkout a released version, such as v1.2:
|
||||||
|
|
||||||
- #### Go to build directory:
|
```
|
||||||
|
git checkout v1.2
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
### Generate a HuggingFace Access Token
|
||||||
cd ~/translation-install/GenAIExamples/Translation/docker_image_build
|
|
||||||
```
|
|
||||||
|
|
||||||
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
|
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
|
||||||
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
|
|
||||||
|
|
||||||
```bash
|
### Configure the Deployment Environment
|
||||||
echo Y | rm -R GenAIComps
|
|
||||||
```
|
|
||||||
|
|
||||||
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
|
To set up environment variables for deploying Translation service, source the _set_env.sh_ or _set_env_vllm.sh_ script in this directory:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
git clone https://github.com/opea-project/GenAIComps.git
|
//with TGI:
|
||||||
```
|
source ./set_env.sh
|
||||||
|
```
|
||||||
|
|
||||||
If you use a specific tag of the GenAIExamples repository,
|
```
|
||||||
then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value):
|
//with VLLM:
|
||||||
|
source ./set_env_vllm.sh
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
The _set_env.sh_ script will prompt for required and optional environment variables used to configure the Translation service based on TGI. The _set_env_vllm.sh_ script will prompt for required and optional environment variables used to configure the Translation service based on VLLM. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments.
|
||||||
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3
|
|
||||||
```
|
|
||||||
|
|
||||||
We remind you that when using a specific version of the code, you need to use the README from this version.
|
### Deploy the Service Using Docker Compose
|
||||||
|
|
||||||
- #### Setting the list of images for the build (from the build file.yaml)
|
To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
|
||||||
|
|
||||||
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
|
```bash
|
||||||
|
//with TGI:
|
||||||
|
docker compose -f compose.yaml up -d
|
||||||
|
```
|
||||||
|
|
||||||
#### vLLM-based application
|
```bash
|
||||||
|
//with VLLM:
|
||||||
|
docker compose -f compose_vllm.yaml up -d
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the AMD GPU (ROCm)
|
||||||
service_list="vllm-rocm translation translation-ui llm-textgen nginx"
|
|
||||||
```
|
|
||||||
|
|
||||||
#### TGI-based application
|
### Check the Deployment Status
|
||||||
|
|
||||||
```bash
|
After running docker compose, check if all the containers launched via docker compose have started:
|
||||||
service_list="translation translation-ui llm-textgen nginx"
|
|
||||||
```
|
|
||||||
|
|
||||||
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
|
```
|
||||||
|
docker ps -a
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
For the default deployment, the following 5 containers should be running.
|
||||||
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
|
|
||||||
```
|
|
||||||
|
|
||||||
- #### Build Docker Images
|
### Test the Pipeline
|
||||||
|
|
||||||
```bash
|
Once the Translation service are running, test the pipeline using the following command:
|
||||||
docker compose -f build.yaml build ${service_list} --no-cache
|
|
||||||
```
|
|
||||||
|
|
||||||
After the build, we check the list of images with the command:
|
```bash
|
||||||
|
DATA='{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
||||||
|
|
||||||
```bash
|
curl http://${HOST_IP}:${TRANSLATION_LLM_SERVICE_PORT}/v1/translation \
|
||||||
docker image ls
|
-d "$DATA" \
|
||||||
```
|
-H 'Content-Type: application/json'
|
||||||
|
```
|
||||||
|
|
||||||
The list of images should include:
|
Checking the response from the service. The response should be similar to JSON:
|
||||||
|
|
||||||
##### vLLM-based application:
|
```textmate
|
||||||
|
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" I"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
||||||
|
|
||||||
- opea/vllm-rocm:latest
|
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" love"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
||||||
- opea/llm-textgen:latest
|
|
||||||
- opea/nginx:latest
|
|
||||||
- opea/translation:latest
|
|
||||||
- opea/translation-ui:latest
|
|
||||||
|
|
||||||
##### TGI-based application:
|
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" machine"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
||||||
|
|
||||||
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
|
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" translation"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
||||||
- opea/llm-textgen:latest
|
|
||||||
- opea/nginx:latest
|
|
||||||
- opea/translation:latest
|
|
||||||
- opea/translation-ui:latest
|
|
||||||
|
|
||||||
---
|
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"."}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
||||||
|
|
||||||
### Docker Compose Configuration for AMD GPUs
|
data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":"</s>"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":{"completion_tokens":6,"prompt_tokens":3071,"total_tokens":3077,"completion_tokens_details":null,"prompt_tokens_details":null}}
|
||||||
|
|
||||||
|
data: [DONE]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
|
||||||
|
|
||||||
|
### Cleanup the Deployment
|
||||||
|
|
||||||
|
To stop the containers associated with the deployment, execute the following command:
|
||||||
|
|
||||||
|
```
|
||||||
|
//with TGI:
|
||||||
|
docker compose -f compose.yaml down
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
//with VLLM:
|
||||||
|
docker compose -f compose_vllm.yaml up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
All the Translation containers will be stopped and then removed on completion of the "down" command.
|
||||||
|
|
||||||
|
## Translation Docker Compose Files
|
||||||
|
|
||||||
|
The compose.yaml is default compose file using tgi as serving framework
|
||||||
|
|
||||||
|
| Service Name | Image Name |
|
||||||
|
| -------------------------- | -------------------------------------------------------- |
|
||||||
|
| translation-tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.1-rocm |
|
||||||
|
| translation-llm | opea/llm-textgen:latest |
|
||||||
|
| translation-backend-server | opea/translation:latest |
|
||||||
|
| translation-ui-server | opea/translation-ui:latest |
|
||||||
|
| translation-nginx-server | opea/nginx:latest |
|
||||||
|
|
||||||
|
## Translation Service Configuration for AMD GPUs
|
||||||
|
|
||||||
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
|
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
|
||||||
|
|
||||||
- compose_vllm.yaml - for vLLM-based application
|
- compose_vllm.yaml - for vLLM-based service
|
||||||
- compose.yaml - for TGI-based
|
- compose.yaml - for TGI-based
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -142,305 +175,16 @@ security_opt:
|
|||||||
- seccomp:unconfined
|
- seccomp:unconfined
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture.
|
||||||
|
|
||||||
|
| Service Name | Possible Image Names | Optional | Description |
|
||||||
|
| -------------------------- | -------------------------------------------------------- | -------- | --------------------------------------------------------------------------------------------------- |
|
||||||
|
| translation-tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.1-rocm | No | Specific to the TGI deployment, focuses on text generation inference using AMD GPU (ROCm) hardware. |
|
||||||
|
| translation-vllm-service | opea/vllm-rocm:latest | No | Handles large language model (LLM) tasks, utilizing AMD GPU (ROCm) hardware. |
|
||||||
|
| translation-llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks |
|
||||||
|
| translation-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. |
|
||||||
|
| translation-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. |
|
||||||
|
| translation-nginx-server | opea/nginx:latest | No | A cts as a reverse proxy, managing traffic between the UI and backend services. |
|
||||||
|
|
||||||
**How to Identify GPU Device IDs:**
|
**How to Identify GPU Device IDs:**
|
||||||
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
|
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
|
||||||
|
|
||||||
### Set deploy environment variables
|
|
||||||
|
|
||||||
#### Setting variables in the operating system environment:
|
|
||||||
|
|
||||||
##### Set variable HUGGINGFACEHUB_API_TOKEN:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
|
|
||||||
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Set variables value in set_env\*\*\*\*.sh file:
|
|
||||||
|
|
||||||
Go to Docker Compose directory:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm
|
|
||||||
```
|
|
||||||
|
|
||||||
The example uses the Nano text editor. You can use any convenient text editor:
|
|
||||||
|
|
||||||
#### If you use vLLM
|
|
||||||
|
|
||||||
```bash
|
|
||||||
nano set_env_vllm.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
#### If you use TGI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
nano set_env.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
If you are in a proxy environment, also set the proxy-related environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export http_proxy="Your_HTTP_Proxy"
|
|
||||||
export https_proxy="Your_HTTPs_Proxy"
|
|
||||||
```
|
|
||||||
|
|
||||||
Set the values of the variables:
|
|
||||||
|
|
||||||
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
|
|
||||||
|
|
||||||
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
|
|
||||||
|
|
||||||
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
|
|
||||||
|
|
||||||
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
|
|
||||||
|
|
||||||
We set these values in the file set_env\*\*\*\*.sh
|
|
||||||
|
|
||||||
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
|
|
||||||
The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
|
|
||||||
|
|
||||||
#### Set variables with script set_env\*\*\*\*.sh
|
|
||||||
|
|
||||||
#### If you use vLLM
|
|
||||||
|
|
||||||
```bash
|
|
||||||
. set_env_vllm.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
#### If you use TGI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
. set_env.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Start the services:
|
|
||||||
|
|
||||||
#### If you use vLLM
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose -f compose_vllm.yaml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
#### If you use TGI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose -f compose.yaml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
All containers should be running and should not restart:
|
|
||||||
|
|
||||||
##### If you use vLLM:
|
|
||||||
|
|
||||||
- translationn-vllm-service
|
|
||||||
- translation-tgi-service
|
|
||||||
- translation-llm
|
|
||||||
- translation-backend-server
|
|
||||||
- translation-ui-server
|
|
||||||
- translation-nginx-server
|
|
||||||
|
|
||||||
##### If you use TGI:
|
|
||||||
|
|
||||||
- translation-tgi-service
|
|
||||||
- translation-llm
|
|
||||||
- translation-backend-server
|
|
||||||
- translation-ui-server
|
|
||||||
- translation-nginx-server
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Validate the Services
|
|
||||||
|
|
||||||
### 1. Validate the vLLM/TGI Service
|
|
||||||
|
|
||||||
#### If you use vLLM:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATA='{"model": "haoranxu/ALMA-13B", "prompt": "What is Deep Learning?", "max_tokens": 100, "temperature": 0}'
|
|
||||||
|
|
||||||
curl http://${HOST_IP}:${TRANSLATION_VLLM_SERVICE_PORT}/v1/chat/completions \
|
|
||||||
-X POST \
|
|
||||||
-d "$DATA" \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
Checking the response from the service. The response should be similar to JSON:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"id": "cmpl-059dd7fb311a46c2b807e0b3315e730c",
|
|
||||||
"object": "text_completion",
|
|
||||||
"created": 1743063706,
|
|
||||||
"model": "haoranxu/ALMA-13B",
|
|
||||||
"choices": [
|
|
||||||
{
|
|
||||||
"index": 0,
|
|
||||||
"text": " Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning",
|
|
||||||
"logprobs": null,
|
|
||||||
"finish_reason": "length",
|
|
||||||
"stop_reason": null,
|
|
||||||
"prompt_logprobs": null
|
|
||||||
}
|
|
||||||
],
|
|
||||||
|
|
||||||
"usage": {
|
|
||||||
"prompt_tokens": 6,
|
|
||||||
"total_tokens": 106,
|
|
||||||
"completion_tokens": 100,
|
|
||||||
"prompt_tokens_details": null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
If the service response has a meaningful response in the value of the "choices.message.content" key,
|
|
||||||
then we consider the vLLM service to be successfully launched
|
|
||||||
|
|
||||||
#### If you use TGI:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATA='{"inputs":"What is Deep Learning?",'\
|
|
||||||
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
|
|
||||||
|
|
||||||
curl http://${HOST_IP}:${TRANSLATION_TGI_SERVICE_PORT}/generate \
|
|
||||||
-X POST \
|
|
||||||
-d "$DATA" \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
Checking the response from the service. The response should be similar to JSON:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"generated_text": "\n\n What can it Do? What's the Hype? What Should You Do If"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
If the service response has a meaningful response in the value of the "generated_text" key,
|
|
||||||
then we consider the TGI service to be successfully launched
|
|
||||||
|
|
||||||
### 2. Validate the LLM Service
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATA='{"query":"What is Deep Learning?",'\
|
|
||||||
'"max_tokens":32,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\
|
|
||||||
'"repetition_penalty":1.03,"stream":false}'
|
|
||||||
|
|
||||||
curl http://${HOST_IP}:${TRANSLATION_LLM_SERVICE_PORT}/v1/chat/completions \
|
|
||||||
-X POST \
|
|
||||||
-d "$DATA" \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
Checking the response from the service. The response should be similar to JSON:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"id": "",
|
|
||||||
"choices": [
|
|
||||||
{
|
|
||||||
"finish_reason": "length",
|
|
||||||
"index": 0,
|
|
||||||
"logprobs": null,
|
|
||||||
"text": " Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning."
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"created": 1742978568,
|
|
||||||
"model": "haoranxu/ALMA-13B",
|
|
||||||
"object": "text_completion",
|
|
||||||
"system_fingerprint": "2.3.1-sha-a094729-rocm",
|
|
||||||
"usage": {
|
|
||||||
"completion_tokens": 32,
|
|
||||||
"prompt_tokens": 6,
|
|
||||||
"total_tokens": 38,
|
|
||||||
"completion_tokens_details": null,
|
|
||||||
"prompt_tokens_details": null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Validate Nginx Service
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATA='{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
|
|
||||||
curl http://${HOST_IP}:${TRANSLATION_LLM_SERVICE_PORT}/v1/translation \
|
|
||||||
-d "$DATA" \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
Checking the response from the service. The response should be similar to JSON:
|
|
||||||
|
|
||||||
```textmate
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" I"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" love"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" machine"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" translation"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"."}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":"</s>"}],"created":1743062099,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":{"completion_tokens":6,"prompt_tokens":3071,"total_tokens":3077,"completion_tokens_details":null,"prompt_tokens_details":null}}
|
|
||||||
|
|
||||||
data: [DONE]
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Validate MegaService
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DATA='{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
|
|
||||||
curl http://${HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d "$DATA"
|
|
||||||
```
|
|
||||||
|
|
||||||
Checking the response from the service. The response should be similar to JSON:
|
|
||||||
|
|
||||||
```textmate
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" I"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" love"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" machine"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" translation"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"."}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null}
|
|
||||||
|
|
||||||
data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":"</s>"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":{"completion_tokens":6,"prompt_tokens":3071,"total_tokens":3077,"completion_tokens_details":null,"prompt_tokens_details":null}}
|
|
||||||
|
|
||||||
data: [DONE]
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
If the response text is similar to the one above, then we consider the service verification successful.
|
|
||||||
|
|
||||||
### 5. Validate Frontend
|
|
||||||
|
|
||||||
To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${TRANSLATION_FRONTEND_SERVICE_PORT} A page should open when you click through to this address:
|
|
||||||

|
|
||||||
|
|
||||||
If a page of this type has opened, then we believe that the service is running and responding, and we can proceed to functional UI testing.
|
|
||||||
|
|
||||||
Let's enter the task for the service in the "Input" field. For example, "我爱机器翻译" with selected "German" as language source and press Enter. After that, a page with the result of the task should open:
|
|
||||||
|
|
||||||

|
|
||||||
If the result shown on the page is correct, then we consider the verification of the UI service to be successful.
|
|
||||||
|
|
||||||
### 6. Stop application
|
|
||||||
|
|
||||||
#### If you use vLLM
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm
|
|
||||||
docker compose -f compose_vllm.yaml down
|
|
||||||
```
|
|
||||||
|
|
||||||
#### If you use TGI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm
|
|
||||||
docker compose -f compose.yaml down
|
|
||||||
```
|
|
||||||
|
|||||||
@@ -1,169 +1,144 @@
|
|||||||
# Build Mega Service of Translation on Xeon
|
# Example Translation Deployment on Intel® Xeon® Platform
|
||||||
|
|
||||||
This document outlines the deployment process for a Translation application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
|
This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. This example includes the following sections:
|
||||||
|
|
||||||
## 🚀 Apply Xeon Server on AWS
|
- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on Intel® Xeon® platform.
|
||||||
|
- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files.
|
||||||
|
- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes.
|
||||||
|
|
||||||
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
|
## Translation Quick Start Deployment
|
||||||
|
|
||||||
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
|
This section describes how to quickly deploy and test the Translation service manually on Intel® Xeon® platform. The basic steps are:
|
||||||
|
|
||||||
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
|
1. [Access the Code](#access-the-code)
|
||||||
|
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
|
||||||
|
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
|
||||||
|
4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
|
||||||
|
5. [Check the Deployment Status](#check-the-deployment-status)
|
||||||
|
6. [Test the Pipeline](#test-the-pipeline)
|
||||||
|
7. [Cleanup the Deployment](#cleanup-the-deployment)
|
||||||
|
|
||||||
## 🚀 Prepare Docker Images
|
### Access the Code
|
||||||
|
|
||||||
For Docker Images, you have two options to prepare them.
|
Clone the GenAIExample repository and access the Translation Intel® Xeon® platform Docker Compose files and supporting scripts:
|
||||||
|
|
||||||
1. Pull the docker images from docker hub.
|
```
|
||||||
|
git clone https://github.com/opea-project/GenAIExamples.git
|
||||||
- More stable to use.
|
cd GenAIExamples/Translation/docker_compose/intel/cpu/xeon/
|
||||||
- Will be automatically downloaded when using docker compose command.
|
|
||||||
|
|
||||||
2. Build the docker images from source.
|
|
||||||
|
|
||||||
- Contain the latest new features.
|
|
||||||
|
|
||||||
- Need to be manually build.
|
|
||||||
|
|
||||||
If you choose to pull docker images form docker hub, skip this section and go to [Start Microservices](#start-microservices) part directly.
|
|
||||||
|
|
||||||
Follow the instructions below to build the docker images from source.
|
|
||||||
|
|
||||||
### 1. Build LLM Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/opea-project/GenAIComps.git
|
|
||||||
cd GenAIComps
|
|
||||||
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Build MegaService Docker Image
|
Checkout a released version, such as v1.2:
|
||||||
|
|
||||||
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `translation.py` Python script. Build MegaService Docker image via below command:
|
```
|
||||||
|
git checkout v1.2
|
||||||
```bash
|
|
||||||
git clone https://github.com/opea-project/GenAIExamples
|
|
||||||
cd GenAIExamples/Translation/
|
|
||||||
docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Build UI Docker Image
|
### Generate a HuggingFace Access Token
|
||||||
|
|
||||||
Build frontend Docker image via below command:
|
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
|
||||||
|
|
||||||
```bash
|
### Configure the Deployment Environment
|
||||||
cd GenAIExamples/Translation/ui
|
|
||||||
docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
|
To set up environment variables for deploying Translation service, source the set_env.sh script in this directory:
|
||||||
|
|
||||||
|
```
|
||||||
|
cd ../../../
|
||||||
|
source set_env.sh
|
||||||
|
cd intel/cpu/xeon
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Build Nginx Docker Image
|
The set_env.sh script will prompt for required and optional environment variables used to configure the Translation service. If a value is not entered, the script will use a default value for the same. It will also generate a env file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments.
|
||||||
|
|
||||||
```bash
|
### Deploy the Service Using Docker Compose
|
||||||
cd GenAIComps
|
|
||||||
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run the command `docker images`, you will have the following Docker Images:
|
To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
|
||||||
|
|
||||||
1. `opea/llm-textgen:latest`
|
|
||||||
2. `opea/translation:latest`
|
|
||||||
3. `opea/translation-ui:latest`
|
|
||||||
4. `opea/nginx:latest`
|
|
||||||
|
|
||||||
## 🚀 Start Microservices
|
|
||||||
|
|
||||||
### Required Models
|
|
||||||
|
|
||||||
By default, the LLM model is set to a default value as listed below:
|
|
||||||
|
|
||||||
| Service | Model |
|
|
||||||
| ------- | ----------------- |
|
|
||||||
| LLM | haoranxu/ALMA-13B |
|
|
||||||
|
|
||||||
Change the `LLM_MODEL_ID` below for your needs.
|
|
||||||
|
|
||||||
### Setup Environment Variables
|
|
||||||
|
|
||||||
1. Set the required environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Example: host_ip="192.168.1.1"
|
|
||||||
export host_ip="External_Public_IP"
|
|
||||||
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
|
|
||||||
export no_proxy="Your_No_Proxy"
|
|
||||||
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
|
|
||||||
# Example: NGINX_PORT=80
|
|
||||||
export NGINX_PORT=${your_nginx_port}
|
|
||||||
```
|
|
||||||
|
|
||||||
2. If you are in a proxy environment, also set the proxy-related environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export http_proxy="Your_HTTP_Proxy"
|
|
||||||
export https_proxy="Your_HTTPs_Proxy"
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Set up other environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ../../../
|
|
||||||
source set_env.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Start Microservice Docker Containers
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
> Note: The docker images will be automatically downloaded from `docker hub`:
|
The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Xeon® Platform:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
docker pull opea/llm-textgen:latest
|
[+] Running 6/6
|
||||||
docker pull opea/translation:latest
|
✔ Network xeon_default Created 0.1s
|
||||||
docker pull opea/translation-ui:latest
|
✔ Container tgi-service Healthy 328.1s
|
||||||
docker pull opea/nginx:latest
|
✔ Container llm-textgen-server Started 323.5s
|
||||||
|
✔ Container translation-xeon-backend-server Started 323.7s
|
||||||
|
✔ Container translation-xeon-ui-server Started 324.0s
|
||||||
|
✔ Container translation-xeon-nginx-server Started 324.2s
|
||||||
```
|
```
|
||||||
|
|
||||||
### Validate Microservices
|
### Check the Deployment Status
|
||||||
|
|
||||||
1. TGI Service
|
After running docker compose, check if all the containers launched via docker compose have started:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
curl http://${host_ip}:8008/generate \
|
docker ps -a
|
||||||
-X POST \
|
```
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
2. LLM Microservice
|
For the default deployment, the following 5 containers should be running:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
curl http://${host_ip}:9000/v1/chat/completions \
|
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||||
-X POST \
|
89a39f7c917f opea/nginx:latest "/docker-entrypoint.…" 7 minutes ago Up About a minute 0.0.0.0:80->80/tcp, :::80->80/tcp translation-xeon-nginx-server
|
||||||
-d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \
|
68b8b86a737e opea/translation-ui:latest "docker-entrypoint.s…" 7 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp translation-xeon-ui-server
|
||||||
-H 'Content-Type: application/json'
|
8400903275b5 opea/translation:latest "python translation.…" 7 minutes ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp translation-xeon-backend-server
|
||||||
```
|
2da5545cb18c opea/llm-textgen:latest "bash entrypoint.sh" 7 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-textgen-server
|
||||||
|
dee02c1fb538 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp tgi-service
|
||||||
|
```
|
||||||
|
|
||||||
3. MegaService
|
### Test the Pipeline
|
||||||
|
|
||||||
```bash
|
Once the Translation service are running, test the pipeline using the following command:
|
||||||
curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{
|
|
||||||
"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Nginx Service
|
```bash
|
||||||
|
curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{
|
||||||
|
"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
|
||||||
curl http://${host_ip}:${NGINX_PORT}/v1/translation \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.
|
### Cleanup the Deployment
|
||||||
|
|
||||||
## 🚀 Launch the UI
|
To stop the containers associated with the deployment, execute the following command:
|
||||||
|
|
||||||
Open this URL `http://{host_ip}:5173` in your browser to access the frontend.
|
```
|
||||||

|
docker compose -f compose.yaml down
|
||||||

|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
[+] Running 6/6
|
||||||
|
✔ Container translation-xeon-nginx-server Removed 10.4s
|
||||||
|
✔ Container translation-xeon-ui-server Removed 10.3s
|
||||||
|
✔ Container translation-xeon-backend-server Removed 10.3s
|
||||||
|
✔ Container llm-textgen-server Removed 10.3s
|
||||||
|
✔ Container tgi-service Removed 2.8s
|
||||||
|
✔ Network xeon_default Removed 0.4s
|
||||||
|
```
|
||||||
|
|
||||||
|
All the Translation containers will be stopped and then removed on completion of the "down" command.
|
||||||
|
|
||||||
|
## Translation Docker Compose Files
|
||||||
|
|
||||||
|
The compose.yaml is default compose file using tgi as serving framework
|
||||||
|
|
||||||
|
| Service Name | Image Name |
|
||||||
|
| ------------------------------- | ------------------------------------------------------------- |
|
||||||
|
| tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu |
|
||||||
|
| llm | opea/llm-textgen:latest |
|
||||||
|
| translation-xeon-backend-server | opea/translation:latest |
|
||||||
|
| translation-xeon-ui-server | opea/translation-ui:latest |
|
||||||
|
| translation-xeon-nginx-server | opea/nginx:latest |
|
||||||
|
|
||||||
|
## Translation Service Configuration
|
||||||
|
|
||||||
|
The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture.
|
||||||
|
|
||||||
|
| Service Name | Possible Image Names | Optional | Description |
|
||||||
|
| ------------------------------- | ------------------------------------------------------------- | -------- | ----------------------------------------------------------------------------------------------- |
|
||||||
|
| tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu | No | Specific to the TGI deployment, focuses on text generation inference using Xeon hardware. |
|
||||||
|
| llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks |
|
||||||
|
| translation-xeon-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. |
|
||||||
|
| translation-xeon-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. |
|
||||||
|
| translation-xeon-nginx-server | opea/nginx:latest | No | Acts as a reverse proxy, managing traffic between the UI and backend services. |
|
||||||
|
|||||||
@@ -1,161 +1,143 @@
|
|||||||
# Build MegaService of Translation on Gaudi
|
# Example Translation Deployment on Intel® Gaudi® Platform
|
||||||
|
|
||||||
This document outlines the deployment process for a Translation application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
|
This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. This example includes the following sections:
|
||||||
|
|
||||||
## 🚀 Prepare Docker Images
|
- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on Intel® Gaudi® platform.
|
||||||
|
- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files.
|
||||||
|
- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes.
|
||||||
|
|
||||||
For Docker Images, you have two options to prepare them.
|
## Translation Quick Start Deployment
|
||||||
|
|
||||||
1. Pull the docker images from docker hub.
|
This section describes how to quickly deploy and test the Translation service manually on Intel® Gaudi® platform. The basic steps are:
|
||||||
|
|
||||||
- More stable to use.
|
1. [Access the Code](#access-the-code)
|
||||||
- Will be automatically downloaded when using docker compose command.
|
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
|
||||||
|
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
|
||||||
|
4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
|
||||||
|
5. [Check the Deployment Status](#check-the-deployment-status)
|
||||||
|
6. [Test the Pipeline](#test-the-pipeline)
|
||||||
|
7. [Cleanup the Deployment](#cleanup-the-deployment)
|
||||||
|
|
||||||
2. Build the docker images from source.
|
### Access the Code
|
||||||
|
|
||||||
- Contain the latest new features.
|
Clone the GenAIExample repository and access the Translation Intel® Gaudi® platform Docker Compose files and supporting scripts:
|
||||||
|
|
||||||
- Need to be manually build.
|
```
|
||||||
|
git clone https://github.com/opea-project/GenAIExamples.git
|
||||||
If you choose to pull docker images form docker hub, skip to [Start Microservices](#start-microservices) part directly.
|
cd GenAIExamples/Translation/docker_compose/intel/hpu/gaudi/
|
||||||
|
|
||||||
Follow the instructions below to build the docker images from source.
|
|
||||||
|
|
||||||
### 1. Build LLM Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/opea-project/GenAIComps.git
|
|
||||||
cd GenAIComps
|
|
||||||
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Build MegaService Docker Image
|
Checkout a released version, such as v1.2:
|
||||||
|
|
||||||
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `translation.py` Python script. Build the MegaService Docker image using the command below:
|
```
|
||||||
|
git checkout v1.2
|
||||||
```bash
|
|
||||||
git clone https://github.com/opea-project/GenAIExamples
|
|
||||||
cd GenAIExamples/Translation
|
|
||||||
docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Build UI Docker Image
|
### Generate a HuggingFace Access Token
|
||||||
|
|
||||||
Construct the frontend Docker image using the command below:
|
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
|
||||||
|
|
||||||
```bash
|
### Configure the Deployment Environment
|
||||||
cd GenAIExamples/Translation/ui/
|
|
||||||
docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
|
To set up environment variables for deploying Translation service, source the _set_env.sh_ script in this directory:
|
||||||
|
|
||||||
|
```
|
||||||
|
cd ../../../
|
||||||
|
source set_env.sh
|
||||||
|
cd intel/hpu/gaudi/
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Build Nginx Docker Image
|
The set_env.sh script will prompt for required and optional environment variables used to configure the Translation service. If a value is not entered, the script will use a default value for the same. It will also generate a env file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments.
|
||||||
|
|
||||||
```bash
|
### Deploy the Service Using Docker Compose
|
||||||
cd GenAIComps
|
|
||||||
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run the command `docker images`, you will have the following four Docker Images:
|
To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
|
||||||
|
|
||||||
1. `opea/llm-textgen:latest`
|
|
||||||
2. `opea/translation:latest`
|
|
||||||
3. `opea/translation-ui:latest`
|
|
||||||
4. `opea/nginx:latest`
|
|
||||||
|
|
||||||
## 🚀 Start Microservices
|
|
||||||
|
|
||||||
### Required Models
|
|
||||||
|
|
||||||
By default, the LLM model is set to a default value as listed below:
|
|
||||||
|
|
||||||
| Service | Model |
|
|
||||||
| ------- | ----------------- |
|
|
||||||
| LLM | haoranxu/ALMA-13B |
|
|
||||||
|
|
||||||
Change the `LLM_MODEL_ID` below for your needs.
|
|
||||||
|
|
||||||
### Setup Environment Variables
|
|
||||||
|
|
||||||
1. Set the required environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Example: host_ip="192.168.1.1"
|
|
||||||
export host_ip="External_Public_IP"
|
|
||||||
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
|
|
||||||
export no_proxy="Your_No_Proxy"
|
|
||||||
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
|
|
||||||
# Example: NGINX_PORT=80
|
|
||||||
export NGINX_PORT=${your_nginx_port}
|
|
||||||
```
|
|
||||||
|
|
||||||
2. If you are in a proxy environment, also set the proxy-related environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export http_proxy="Your_HTTP_Proxy"
|
|
||||||
export https_proxy="Your_HTTPs_Proxy"
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Set up other environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ../../../
|
|
||||||
source set_env.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Start Microservice Docker Containers
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
> Note: The docker images will be automatically downloaded from `docker hub`:
|
The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Gaudi® Platform:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
docker pull opea/llm-textgen:latest
|
[+] Running 5/5
|
||||||
docker pull opea/translation:latest
|
✔ Container tgi-gaudi-server Healthy 222.4s
|
||||||
docker pull opea/translation-ui:latest
|
✔ Container llm-textgen-gaudi-server Started 221.7s
|
||||||
docker pull opea/nginx:latest
|
✔ Container translation-gaudi-backend-server Started 222.0s
|
||||||
|
✔ Container translation-gaudi-ui-server Started 222.2s
|
||||||
|
✔ Container translation-gaudi-nginx-server Started 222.6s
|
||||||
```
|
```
|
||||||
|
|
||||||
### Validate Microservices
|
### Check the Deployment Status
|
||||||
|
|
||||||
1. TGI Service
|
After running docker compose, check if all the containers launched via docker compose have started:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
curl http://${host_ip}:8008/generate \
|
docker ps -a
|
||||||
-X POST \
|
```
|
||||||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
|
|
||||||
-H 'Content-Type: application/json'
|
|
||||||
```
|
|
||||||
|
|
||||||
2. LLM Microservice
|
For the default deployment, the following 5 containers should be running:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
curl http://${host_ip}:9000/v1/chat/completions \
|
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||||
-X POST \
|
097f577b3a53 opea/nginx:latest "/docker-entrypoint.…" 5 minutes ago Up About a minute 0.0.0.0:80->80/tcp, :::80->80/tcp translation-gaudi-nginx-server
|
||||||
-d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \
|
0578b7034af3 opea/translation-ui:latest "docker-entrypoint.s…" 5 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp translation-gaudi-ui-server
|
||||||
-H 'Content-Type: application/json'
|
bc23dd5b9cb0 opea/translation:latest "python translation.…" 5 minutes ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp translation-gaudi-backend-server
|
||||||
```
|
2cf6fabaa7c7 opea/llm-textgen:latest "bash entrypoint.sh" 5 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-textgen-gaudi-server
|
||||||
|
f4764d0c1817 ghcr.io/huggingface/tgi-gaudi:2.3.1 "/tgi-entrypoint.sh …" 5 minutes ago Up 5 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp tgi-gaudi-server
|
||||||
|
```
|
||||||
|
|
||||||
3. MegaService
|
### Test the Pipeline
|
||||||
|
|
||||||
```bash
|
Once the Translation service are running, test the pipeline using the following command:
|
||||||
curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{
|
|
||||||
"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Nginx Service
|
```bash
|
||||||
|
curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{
|
||||||
|
"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
|
||||||
curl http://${host_ip}:${NGINX_PORT}/v1/translation \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.
|
### Cleanup the Deployment
|
||||||
|
|
||||||
## 🚀 Launch the UI
|
To stop the containers associated with the deployment, execute the following command:
|
||||||
|
|
||||||
Open this URL `http://{host_ip}:5173` in your browser to access the frontend.
|
```
|
||||||

|
docker compose -f compose.yaml down
|
||||||

|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
[+] Running 6/6
|
||||||
|
✔ Container translation-gaudi-nginx-server Removed 10.5s
|
||||||
|
✔ Container translation-gaudi-ui-server Removed 10.3s
|
||||||
|
✔ Container translation-gaudi-backend-server Removed 10.4s
|
||||||
|
✔ Container llm-textgen-gaudi-server Removed 10.4s
|
||||||
|
✔ Container tgi-gaudi-server Removed 12.0s
|
||||||
|
✔ Network gaudi_default Removed 0.4s
|
||||||
|
```
|
||||||
|
|
||||||
|
All the Translation containers will be stopped and then removed on completion of the "down" command.
|
||||||
|
|
||||||
|
## Translation Docker Compose Files
|
||||||
|
|
||||||
|
The compose.yaml is default compose file using tgi as serving framework
|
||||||
|
|
||||||
|
| Service Name | Image Name |
|
||||||
|
| -------------------------------- | ----------------------------------- |
|
||||||
|
| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.3.1 |
|
||||||
|
| llm | opea/llm-textgen:latest |
|
||||||
|
| translation-gaudi-backend-server | opea/translation:latest |
|
||||||
|
| translation-gaudi-ui-server | opea/translation-ui:latest |
|
||||||
|
| translation-gaudi-nginx-server | opea/nginx:latest |
|
||||||
|
|
||||||
|
## Translation Service Configuration
|
||||||
|
|
||||||
|
The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture.
|
||||||
|
|
||||||
|
| Service Name | Possible Image Names | Optional | Description |
|
||||||
|
| -------------------------------- | ----------------------------------- | -------- | ----------------------------------------------------------------------------------------------- |
|
||||||
|
| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.3.1 | No | Specific to the TGI deployment, focuses on text generation inference using Gaudi hardware. |
|
||||||
|
| llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks |
|
||||||
|
| translation-gaudi-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. |
|
||||||
|
| translation-gaudi-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. |
|
||||||
|
| translation-gaudi-nginx-server | opea/nginx:latest | No | Acts as a reverse proxy, managing traffic between the UI and backend services. |
|
||||||
|
|||||||
Reference in New Issue
Block a user