Signed-off-by: Mustafa <mustafa.cetin@intel.com> Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> Co-authored-by: Harsha Ramayanam <harsha.ramayanam@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: XinyaoWa <xinyao.wang@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>
320 lines
11 KiB
Markdown
320 lines
11 KiB
Markdown
# Build MegaService of Document Summarization on Gaudi
|
||
|
||
This document outlines the deployment process for a Document Summarization application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to Docker Hub soon, which will simplify the deployment process for this service.
|
||
|
||
## 🚀 Build Docker Images
|
||
|
||
### 1. Build MicroService Docker Image
|
||
|
||
First of all, you need to build Docker Images locally and install the python package of it.
|
||
|
||
```bash
|
||
git clone https://github.com/opea-project/GenAIComps.git
|
||
cd GenAIComps
|
||
```
|
||
|
||
#### Audio to text Service
|
||
|
||
The Audio to text Service is another service for converting audio to text. Follow these steps to build and run the service:
|
||
|
||
```bash
|
||
docker build -t opea/dataprep-audio2text:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/audio2text/Dockerfile .
|
||
```
|
||
|
||
#### Video to Audio Service
|
||
|
||
The Video to Audio Service extracts audio from video files. Follow these steps to build and run the service:
|
||
|
||
```bash
|
||
docker build -t opea/dataprep-video2audio:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/video2audio/Dockerfile .
|
||
```
|
||
|
||
#### Multimedia to Text Service
|
||
|
||
The Multimedia to Text Service transforms multimedia data to text data. Follow these steps to build and run the service:
|
||
|
||
```bash
|
||
docker build -t opea/dataprep-multimedia2text:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/Dockerfile .
|
||
```
|
||
|
||
### 2. Build MegaService Docker Image
|
||
|
||
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `docsum.py` Python script. Build the MegaService Docker image via below command:
|
||
|
||
```bash
|
||
git clone https://github.com/opea-project/GenAIExamples
|
||
cd GenAIExamples/DocSum/
|
||
docker build -t opea/docsum:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
|
||
```
|
||
|
||
### 3. Build UI Docker Image
|
||
|
||
Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.
|
||
|
||
#### Svelte UI
|
||
|
||
Build the frontend Docker image via below command:
|
||
|
||
```bash
|
||
cd GenAIExamples/DocSum/ui
|
||
docker build -t opea/docsum-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
|
||
```
|
||
|
||
#### Gradio UI
|
||
|
||
Build the Gradio UI frontend Docker image using the following command:
|
||
|
||
```bash
|
||
cd GenAIExamples/DocSum/ui
|
||
docker build -t opea/docsum-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
|
||
```
|
||
|
||
#### React UI
|
||
|
||
Build the frontend Docker image via below command:
|
||
|
||
```bash
|
||
cd GenAIExamples/DocSum/ui
|
||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
|
||
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT -f ./docker/Dockerfile.react .
|
||
|
||
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
|
||
```
|
||
|
||
## 🚀 Start Microservices and MegaService
|
||
|
||
### Required Models
|
||
|
||
Default model is "Intel/neural-chat-7b-v3-3". Change "LLM_MODEL_ID" environment variable in commands below if you want to use another model.
|
||
|
||
```bash
|
||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||
```
|
||
|
||
When using gated models, you also need to provide [HuggingFace token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable.
|
||
|
||
### Setup Environment Variable
|
||
|
||
To set up environment variables for deploying Document Summarization services, follow these steps:
|
||
|
||
1. Set the required environment variables:
|
||
|
||
```bash
|
||
# Example: host_ip="192.168.1.1"
|
||
export host_ip="External_Public_IP"
|
||
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
|
||
export no_proxy="Your_No_Proxy"
|
||
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
|
||
```
|
||
|
||
2. If you are in a proxy environment, also set the proxy-related environment variables:
|
||
|
||
```bash
|
||
export http_proxy="Your_HTTP_Proxy"
|
||
export https_proxy="Your_HTTPs_Proxy"
|
||
```
|
||
|
||
3. Set up other environment variables:
|
||
|
||
```bash
|
||
source GenAIExamples/DocSum/docker_compose/set_env.sh
|
||
```
|
||
|
||
### Start Microservice Docker Containers
|
||
|
||
```bash
|
||
cd GenAIExamples/DocSum/docker_compose/intel/hpu/gaudi
|
||
docker compose -f compose.yaml up -d
|
||
```
|
||
|
||
You will have the following Docker Images:
|
||
|
||
1. `opea/docsum-ui:latest`
|
||
2. `opea/docsum:latest`
|
||
3. `opea/llm-docsum-tgi:latest`
|
||
4. `opea/whisper:latest`
|
||
5. `opea/dataprep-audio2text:latest`
|
||
6. `opea/dataprep-multimedia2text:latest`
|
||
7. `opea/dataprep-video2audio:latest`
|
||
|
||
### Validate Microservices
|
||
|
||
1. TGI Service
|
||
|
||
```bash
|
||
curl http://${host_ip}:8008/generate \
|
||
-X POST \
|
||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
|
||
-H 'Content-Type: application/json'
|
||
```
|
||
|
||
2. LLM Microservice
|
||
|
||
```bash
|
||
curl http://${host_ip}:9000/v1/chat/docsum \
|
||
-X POST \
|
||
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
|
||
-H 'Content-Type: application/json'
|
||
```
|
||
|
||
3. Whisper Microservice
|
||
|
||
```bash
|
||
curl http://${host_ip}:7066/v1/asr \
|
||
-X POST \
|
||
-d '{"audio":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
|
||
-H 'Content-Type: application/json'
|
||
```
|
||
|
||
Expected output:
|
||
|
||
```bash
|
||
{"asr_result":"you"}
|
||
```
|
||
|
||
4. Audio2Text Microservice
|
||
|
||
```bash
|
||
curl http://${host_ip}:9199/v1/audio/transcriptions \
|
||
-X POST \
|
||
-d '{"byte_str":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
|
||
-H 'Content-Type: application/json'
|
||
```
|
||
|
||
Expected output:
|
||
|
||
```bash
|
||
{"downstream_black_list":[],"id":"--> this will be different id number for each run <--","query":"you"}
|
||
```
|
||
|
||
5. Multimedia to text Microservice
|
||
|
||
```bash
|
||
curl http://${host_ip}:7079/v1/multimedia2text \
|
||
-X POST \
|
||
-d '{"audio":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
|
||
-H 'Content-Type: application/json'
|
||
```
|
||
|
||
Expected output:
|
||
|
||
```bash
|
||
{"downstream_black_list":[],"id":"--> this will be different id number for each run <--","query":"you"}
|
||
```
|
||
|
||
6. MegaService
|
||
|
||
Text:
|
||
|
||
```bash
|
||
curl -X POST http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
|
||
|
||
# Use English mode (default).
|
||
curl http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: multipart/form-data" \
|
||
-F "type=text" \
|
||
-F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
|
||
-F "max_tokens=32" \
|
||
-F "language=en" \
|
||
-F "stream=true"
|
||
|
||
# Use Chinese mode.
|
||
curl http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: multipart/form-data" \
|
||
-F "type=text" \
|
||
-F "messages=2024年9月26日,北京——今日,英特尔正式发布英特尔® 至强® 6性能核处理器(代号Granite Rapids),为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
|
||
-F "max_tokens=32" \
|
||
-F "language=zh" \
|
||
-F "stream=true"
|
||
|
||
# Upload file
|
||
curl http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: multipart/form-data" \
|
||
-F "type=text" \
|
||
-F "messages=" \
|
||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||
-F "max_tokens=32" \
|
||
-F "language=en" \
|
||
-F "stream=true"
|
||
```
|
||
|
||
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
|
||
|
||
Audio:
|
||
|
||
```bash
|
||
curl -X POST http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
|
||
|
||
curl http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: multipart/form-data" \
|
||
-F "type=audio" \
|
||
-F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
|
||
-F "max_tokens=32" \
|
||
-F "language=en" \
|
||
-F "stream=true"
|
||
```
|
||
|
||
Video:
|
||
|
||
```bash
|
||
curl -X POST http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"type": "video", "messages": "convert your video to base64 data type"}'
|
||
|
||
curl http://${host_ip}:8888/v1/docsum \
|
||
-H "Content-Type: multipart/form-data" \
|
||
-F "type=video" \
|
||
-F "messages=convert your video to base64 data type" \
|
||
-F "max_tokens=32" \
|
||
-F "language=en" \
|
||
-F "stream=true"
|
||
```
|
||
|
||
> More detailed tests can be found here `cd GenAIExamples/DocSum/test`
|
||
|
||
## 🚀 Launch the UI
|
||
|
||
Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.
|
||
|
||
### Gradio UI
|
||
|
||
Open this URL `http://{host_ip}:5173` in your browser to access the Gradio based frontend.
|
||

|
||
|
||
## 🚀 Launch the Svelte UI
|
||
|
||
Open this URL `http://{host_ip}:5173` in your browser to access the Svelte based frontend.
|
||
|
||

|
||
|
||
Here is an example for summarizing a article.
|
||
|
||

|
||
|
||
## 🚀 Launch the React UI (Optional)
|
||
|
||
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `docsum-xeon-ui-server` service with the `docsum-xeon-react-ui-server` service as per the config below:
|
||
|
||
```yaml
|
||
docsum-gaudi-react-ui-server:
|
||
image: ${REGISTRY:-opea}/docsum-react-ui:${TAG:-latest}
|
||
container_name: docsum-gaudi-react-ui-server
|
||
depends_on:
|
||
- docsum-gaudi-backend-server
|
||
ports:
|
||
- "5174:80"
|
||
environment:
|
||
- no_proxy=${no_proxy}
|
||
- https_proxy=${https_proxy}
|
||
- http_proxy=${http_proxy}
|
||
- DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
|
||
```
|
||
|
||
Open this URL `http://{host_ip}:5175` in your browser to access the frontend.
|
||
|
||

|