Add instructions of modifying reranking docker image for NVGPU (#1133)

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Wang, Kai Lawrence
2024-11-18 15:37:32 +08:00
committed by GitHub
parent 7e62175c2e
commit 2587179224

View File

@@ -5,8 +5,9 @@ This document outlines the deployment process for a ChatQnA application utilizin
Quick Start Deployment Steps:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
2. Modify the TEI Docker Image for Reranking
3. Run Docker Compose.
4. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
@@ -35,7 +36,30 @@ To set up environment variables for deploying ChatQnA services, follow these ste
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
## Quick Start: 2.Modify the TEI Docker Image for Reranking
> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```
## Quick Start: 3.Run Docker Compose
```bash
docker compose up -d
@@ -56,7 +80,7 @@ In following cases, you could build docker image from source by yourself.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
## QuickStart: 4.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
@@ -176,6 +200,29 @@ Change the `xxx_MODEL_ID` below for your needs.
source ./set_env.sh
```
### Modify the TEI Docker Image for Reranking
> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```
### Start all the services Docker Containers
```bash