Docsum (#1095)

Signed-off-by: Mustafa <mustafa.cetin@intel.com> Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com> Co-authored-by: Harsha Ramayanam <harsha.ramayanam@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: XinyaoWa <xinyao.wang@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 01:15:42 -08:00
parent 2587179224
commit eb91d1f054
22 changed files with 1392 additions and 275 deletions
--- a/DocSum/docker_compose/intel/cpu/xeon/README.md
+++ b/DocSum/docker_compose/intel/cpu/xeon/README.md
@@ -12,17 +12,46 @@ After launching your instance, you can connect to it using SSH (for Linux instan

 ## 🚀 Build Docker Images

-First of all, you need to build Docker Images locally and install the python package of it.
+### 1. Build MicroService Docker Image

-### 1. Build LLM Image
+First of all, you need to build Docker Images locally and install the python package of it.

 ```bash
 git clone https://github.com/opea-project/GenAIComps.git
 cd GenAIComps
-docker build -t opea/llm-docsum-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/summarization/tgi/langchain/Dockerfile .
 ```

-Then run the command `docker images`, you will have the following four Docker Images:
+#### Whisper Service
+
+The Whisper Service converts audio files to text. Follow these steps to build and run the service:
+
+```bash
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
+```
+
+#### Audio to text Service
+
+The Audio to text Service is another service for converting audio to text. Follow these steps to build and run the service:
+
+```bash
+docker build -t opea/dataprep-audio2text:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/audio2text/Dockerfile .
+```
+
+#### Video to Audio Service
+
+The Video to Audio Service extracts audio from video files. Follow these steps to build and run the service:
+
+```bash
+docker build -t opea/dataprep-video2audio:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/video2audio/Dockerfile .
+```
+
+#### Multimedia to Text Service
+
+The Multimedia to Text Service transforms multimedia data to text data. Follow these steps to build and run the service:
+
+```bash
+docker build -t opea/dataprep-multimedia2text:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimedia2text/Dockerfile .
+```

 ### 2. Build MegaService Docker Image

@@ -36,6 +65,10 @@ docker build -t opea/docsum:latest --build-arg https_proxy=$https_proxy --build-

 ### 3. Build UI Docker Image

+Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.
+
+#### Svelte UI
+
 Build the frontend Docker image via below command:

 ```bash
@@ -43,13 +76,16 @@ cd GenAIExamples/DocSum/ui
 docker build -t opea/docsum-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
 ```

-Then run the command `docker images`, you will have the following Docker Images:
+#### Gradio UI

-1. `opea/llm-docsum-tgi:latest`
-2. `opea/docsum:latest`
-3. `opea/docsum-ui:latest`
+Build the Gradio UI frontend Docker image using the following command:

-### 4. Build React UI Docker Image
+```bash
+cd GenAIExamples/DocSum/ui
+docker build -t opea/docsum-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
+```
+
+#### React UI

 Build the frontend Docker image via below command:

@@ -61,45 +97,62 @@ docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT
 docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy  -f ./docker/Dockerfile.react .
 ```

-Then run the command `docker images`, you will have the following Docker Images:
-
-1. `opea/llm-docsum-tgi:latest`
-2. `opea/docsum:latest`
-3. `opea/docsum-ui:latest`
-4. `opea/docsum-react-ui:latest`
-
 ## 🚀 Start Microservices and MegaService

 ### Required Models

-We set default model as "Intel/neural-chat-7b-v3-3", change "LLM_MODEL_ID" in following Environment Variables setting if you want to use other models.
-If use gated models, you also need to provide [huggingface token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable.
-
-### Setup Environment Variables
-
-Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
+Default model is "Intel/neural-chat-7b-v3-3". Change "LLM_MODEL_ID" environment variable in commands below if you want to use another model.

 ```bash
-export no_proxy=${your_no_proxy}
-export http_proxy=${your_http_proxy}
-export https_proxy=${your_http_proxy}
 export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
-export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
-export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
-export MEGA_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
-export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
 ```

-Note: Please replace with `host_ip` with your external IP address, do not use localhost.
+When using gated models, you also need to provide [HuggingFace token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable.
+
+### Setup Environment Variable
+
+To set up environment variables for deploying Document Summarization services, follow these steps:
+
+1. Set the required environment variables:
+
+   ```bash
+   # Example: host_ip="192.168.1.1"
+   export host_ip="External_Public_IP"
+   # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
+   export no_proxy="Your_No_Proxy"
+   export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
+   ```
+
+2. If you are in a proxy environment, also set the proxy-related environment variables:
+
+   ```bash
+   export http_proxy="Your_HTTP_Proxy"
+   export https_proxy="Your_HTTPs_Proxy"
+   ```
+
+3. Set up other environment variables:
+
+   ```bash
+   source GenAIExamples/DocSum/docker_compose/set_env.sh
+   ```

 ### Start Microservice Docker Containers

 ```bash
 cd GenAIExamples/DocSum/docker_compose/intel/cpu/xeon
-docker compose up -d
+docker compose -f compose.yaml up -d
 ```

+You will have the following Docker Images:
+
+1. `opea/docsum-ui:latest`
+2. `opea/docsum:latest`
+3. `opea/llm-docsum-tgi:latest`
+4. `opea/whisper:latest`
+5. `opea/dataprep-audio2text:latest`
+6. `opea/dataprep-multimedia2text:latest`
+7. `opea/dataprep-video2audio:latest`
+
 ### Validate Microservices

 1. TGI Service
@@ -120,31 +173,143 @@ docker compose up -d
     -H 'Content-Type: application/json'
   ```

-3. MegaService
+3. Whisper Microservice

   ```bash
+    curl http://${host_ip}:7066/v1/asr \
+        -X POST \
+        -d '{"audio":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+        -H 'Content-Type: application/json'
+   ```
+
+   Expected output:
+
+   ```bash
+     {"asr_result":"you"}
+   ```
+
+4. Audio2Text Microservice
+
+   ```bash
+    curl http://${host_ip}:9099/v1/audio/transcriptions \
+        -X POST \
+        -d '{"byte_str":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+        -H 'Content-Type: application/json'
+   ```
+
+   Expected output:
+
+   ```bash
+     {"downstream_black_list":[],"id":"--> this will be different id number for each run <--","query":"you"}
+   ```
+
+5. Multimedia to text Microservice
+
+   ```bash
+    curl http://${host_ip}:7079/v1/multimedia2text \
+        -X POST \
+        -d '{"audio":"UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+        -H 'Content-Type: application/json'
+   ```
+
+   Expected output:
+
+   ```bash
+     {"downstream_black_list":[],"id":"--> this will be different id number for each run <--","query":"you"}
+   ```
+
+6. MegaService
+
+   Text:
+
+   ```bash
+   curl -X POST http://${host_ip}:8888/v1/docsum \
+        -H "Content-Type: application/json" \
+        -d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
+
+   # Use English mode (default).
   curl http://${host_ip}:8888/v1/docsum \
       -H "Content-Type: multipart/form-data" \
+       -F "type=text" \
       -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
       -F "max_tokens=32" \
       -F "language=en" \
-       -F "stream=false"
+       -F "stream=true"
+
+   # Use Chinese mode.
+   curl http://${host_ip}:8888/v1/docsum \
+       -H "Content-Type: multipart/form-data" \
+       -F "type=text" \
+       -F "messages=2024年9月26日，北京——今日，英特尔正式发布英特尔® 至强® 6性能核处理器（代号Granite Rapids），为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
+       -F "max_tokens=32" \
+       -F "language=zh" \
+       -F "stream=true"
+
+   # Upload file
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "max_tokens=32" \
+      -F "language=en" \
+      -F "stream=true"
   ```

-Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.
+   > Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
+
+   Audio:
+
+   ```bash
+   curl -X POST http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: application/json" \
+      -d '{"type": "audio", "messages": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
+
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=audio" \
+      -F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
+      -F "max_tokens=32" \
+      -F "language=en" \
+      -F "stream=true"
+   ```
+
+   Video:
+
+   ```bash
+   curl -X POST http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: application/json" \
+      -d '{"type": "video", "messages": "convert your video to base64 data type"}'
+
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=video" \
+      -F "messages=convert your video to base64 data type" \
+      -F "max_tokens=32" \
+      -F "language=en" \
+      -F "stream=true"
+   ```

 ## 🚀 Launch the UI

-Open this URL `http://{host_ip}:5173` in your browser to access the svelte based frontend.
+Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.

-Open this URL `http://{host_ip}:5174` in your browser to access the React based frontend.
+### Gradio UI
+
+Open this URL `http://{host_ip}:5173` in your browser to access the Gradio based frontend.
+
+![project-screenshot](../../../../assets/img/docSum_ui_gradio_text.png)

 ### Svelte UI

+Open this URL `http://{host_ip}:5173` in your browser to access the Svelte based frontend.
+
 ![project-screenshot](../../../../assets/img/docSum_ui_text.png)

 ### React UI (Optional)

+Open this URL `http://{host_ip}:5174` in your browser to access the React based frontend.
+
 To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `docsum-xeon-ui-server` service with the `docsum-xeon-react-ui-server` service as per the config below:

 ```yaml
--- a/DocSum/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/DocSum/docker_compose/intel/cpu/xeon/compose.yaml
@@ -17,7 +17,8 @@ services:
      - "./data:/data"
    shm_size: 1g
    command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
-  llm:
+
+  llm-docsum-tgi:
    image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
    container_name: llm-docsum-server
    depends_on:
@@ -32,12 +33,56 @@ services:
      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
    restart: unless-stopped
+
+  whisper:
+    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
+    container_name: whisper-service
+    ports:
+      - "7066:7066"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+
+  dataprep-audio2text:
+    image: ${REGISTRY:-opea}/dataprep-audio2text:${TAG:-latest}
+    container_name: dataprep-audio2text-service
+    ports:
+      - "9099:9099"
+    ipc: host
+    environment:
+      A2T_ENDPOINT: ${A2T_ENDPOINT}
+
+  dataprep-video2audio:
+    image: ${REGISTRY:-opea}/dataprep-video2audio:${TAG:-latest}
+    container_name: dataprep-video2audio-service
+    ports:
+      - "7078:7078"
+    ipc: host
+    environment:
+      V2A_ENDPOINT: ${V2A_ENDPOINT}
+
+  dataprep-multimedia2text:
+    image: ${REGISTRY:-opea}/dataprep-multimedia2text:${TAG:-latest}
+    container_name: dataprep-multimedia2text
+    ports:
+      - "7079:7079"
+    ipc: host
+    environment:
+      V2A_ENDPOINT: ${V2A_ENDPOINT}
+      A2T_ENDPOINT: ${A2T_ENDPOINT}
+
  docsum-xeon-backend-server:
    image: ${REGISTRY:-opea}/docsum:${TAG:-latest}
    container_name: docsum-xeon-backend-server
    depends_on:
      - tgi-service
-      - llm
+      - llm-docsum-tgi
+      - dataprep-multimedia2text
+      - dataprep-video2audio
+      - dataprep-audio2text
    ports:
      - "8888:8888"
    environment:
@@ -45,10 +90,12 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
+      - DATA_SERVICE_HOST_IP=${DATA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
    ipc: host
    restart: always
-  docsum-xeon-ui-server:
+
+  docsum-ui:
    image: ${REGISTRY:-opea}/docsum-ui:${TAG:-latest}
    container_name: docsum-xeon-ui-server
    depends_on:
@@ -59,6 +106,7 @@ services:
      - no_proxy=${no_proxy}
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
+      - BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
      - DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
    ipc: host
    restart: always