Add instructions of modifying reranking docker image for NVGPU (#1133)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
7e62175c2e
commit
2587179224
@@ -5,8 +5,9 @@ This document outlines the deployment process for a ChatQnA application utilizin
|
||||
Quick Start Deployment Steps:
|
||||
|
||||
1. Set up the environment variables.
|
||||
2. Run Docker Compose.
|
||||
3. Consume the ChatQnA Service.
|
||||
2. Modify the TEI Docker Image for Reranking
|
||||
3. Run Docker Compose.
|
||||
4. Consume the ChatQnA Service.
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
@@ -35,7 +36,30 @@ To set up environment variables for deploying ChatQnA services, follow these ste
|
||||
source ./set_env.sh
|
||||
```
|
||||
|
||||
## Quick Start: 2.Run Docker Compose
|
||||
## Quick Start: 2.Modify the TEI Docker Image for Reranking
|
||||
|
||||
> **Note:**
|
||||
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
|
||||
|
||||
| GPU Arch | GPU | Compute Capacity | Image |
|
||||
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
|
||||
| Volta | V100 | 7.0 | NOT SUPPORTED |
|
||||
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
|
||||
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
|
||||
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
|
||||
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
|
||||
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
|
||||
|
||||
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
|
||||
|
||||
```
|
||||
# vim compose.yaml
|
||||
tei-reranking-service:
|
||||
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
|
||||
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
|
||||
```
|
||||
|
||||
## Quick Start: 3.Run Docker Compose
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
@@ -56,7 +80,7 @@ In following cases, you could build docker image from source by yourself.
|
||||
|
||||
Please refer to 'Build Docker Images' in below.
|
||||
|
||||
## QuickStart: 3.Consume the ChatQnA Service
|
||||
## QuickStart: 4.Consume the ChatQnA Service
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/chatqna \
|
||||
@@ -176,6 +200,29 @@ Change the `xxx_MODEL_ID` below for your needs.
|
||||
source ./set_env.sh
|
||||
```
|
||||
|
||||
### Modify the TEI Docker Image for Reranking
|
||||
|
||||
> **Note:**
|
||||
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
|
||||
|
||||
| GPU Arch | GPU | Compute Capacity | Image |
|
||||
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
|
||||
| Volta | V100 | 7.0 | NOT SUPPORTED |
|
||||
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
|
||||
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
|
||||
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
|
||||
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
|
||||
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
|
||||
|
||||
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
|
||||
|
||||
```
|
||||
# vim compose.yaml
|
||||
tei-reranking-service:
|
||||
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
|
||||
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
|
||||
```
|
||||
|
||||
### Start all the services Docker Containers
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user