Add instructions of modifying reranking docker image for NVGPU (#1133)

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 15:37:32 +08:00
parent 7e62175c2e
commit 2587179224
1 changed files with 51 additions and 4 deletions
--- a/ChatQnA/docker_compose/nvidia/gpu/README.md
+++ b/ChatQnA/docker_compose/nvidia/gpu/README.md
@@ -5,8 +5,9 @@ This document outlines the deployment process for a ChatQnA application utilizin
 Quick Start Deployment Steps:

 1. Set up the environment variables.
-2. Run Docker Compose.
-3. Consume the ChatQnA Service.
+2. Modify the TEI Docker Image for Reranking
+3. Run Docker Compose.
+4. Consume the ChatQnA Service.

 ## Quick Start: 1.Setup Environment Variable

@@ -35,7 +36,30 @@ To set up environment variables for deploying ChatQnA services, follow these ste
   source ./set_env.sh
   ```

-## Quick Start: 2.Run Docker Compose
+## Quick Start: 2.Modify the TEI Docker Image for Reranking
+
+> **Note:**
+> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
+
+| GPU Arch     | GPU                                        | Compute Capacity | Image                                                    |
+| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
+| Volta        | V100                                       | 7.0              | NOT SUPPORTED                                            |
+| Turing       | T4, GeForce RTX 2000 Series                | 7.5              | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
+| Ampere 80    | A100, A30                                  | 8.0              | ghcr.io/huggingface/text-embeddings-inference:1.5        |
+| Ampere 86    | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6              | ghcr.io/huggingface/text-embeddings-inference:86-1.5     |
+| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series     | 8.9              | ghcr.io/huggingface/text-embeddings-inference:89-1.5     |
+| Hopper       | H100                                       | 9.0              | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
+
+For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
+
+```
+# vim compose.yaml
+tei-reranking-service:
+  #image: ghcr.io/huggingface/text-embeddings-inference:1.5
+  image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
+```
+
+## Quick Start: 3.Run Docker Compose

 ```bash
 docker compose up -d
@@ -56,7 +80,7 @@ In following cases, you could build docker image from source by yourself.

 Please refer to 'Build Docker Images' in below.

-## QuickStart: 3.Consume the ChatQnA Service
+## QuickStart: 4.Consume the ChatQnA Service

 ```bash
 curl http://${host_ip}:8888/v1/chatqna \
@@ -176,6 +200,29 @@ Change the `xxx_MODEL_ID` below for your needs.
   source ./set_env.sh
   ```

+### Modify the TEI Docker Image for Reranking
+
+> **Note:**
+> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
+
+| GPU Arch     | GPU                                        | Compute Capacity | Image                                                    |
+| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
+| Volta        | V100                                       | 7.0              | NOT SUPPORTED                                            |
+| Turing       | T4, GeForce RTX 2000 Series                | 7.5              | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
+| Ampere 80    | A100, A30                                  | 8.0              | ghcr.io/huggingface/text-embeddings-inference:1.5        |
+| Ampere 86    | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6              | ghcr.io/huggingface/text-embeddings-inference:86-1.5     |
+| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series     | 8.9              | ghcr.io/huggingface/text-embeddings-inference:89-1.5     |
+| Hopper       | H100                                       | 9.0              | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
+
+For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
+
+```
+# vim compose.yaml
+tei-reranking-service:
+  #image: ghcr.io/huggingface/text-embeddings-inference:1.5
+  image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
+```
+
 ### Start all the services Docker Containers

 ```bash