Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4d5972112c | ||
|
|
dab0177432 | ||
|
|
e7b000eca5 | ||
|
|
723fddec79 | ||
|
|
f629702004 |
@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/AudioQnA/docker/gaudi/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
## 🚀 Test MicroServices
|
||||
|
||||
@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/AudioQnA/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
## 🚀 Test MicroServices
|
||||
|
||||
@@ -15,19 +15,19 @@ The AudioQnA application is defined as a Custom Resource (CR) file that the abov
|
||||
The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
|
||||
|
||||
- tgi-service: ghcr.io/huggingface/text-generation-inference:1.4
|
||||
- llm: opea/llm-tgi:latest
|
||||
- asr: opea/asr:latest
|
||||
- whisper: opea/whisper:latest
|
||||
- tts: opea/tts:latest
|
||||
- speecht5: opea/speecht5:latest
|
||||
- llm: opea/llm-tgi:v0.9
|
||||
- asr: opea/asr:v0.9
|
||||
- whisper: opea/whisper:v0.9
|
||||
- tts: opea/tts:v0.9
|
||||
- speecht5: opea/speecht5:v0.9
|
||||
|
||||
|
||||
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
|
||||
For Gaudi:
|
||||
|
||||
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
|
||||
- whisper-gaudi: opea/whisper-gaudi:latest
|
||||
- speecht5-gaudi: opea/speecht5-gaudi:latest
|
||||
- whisper-gaudi: opea/whisper-gaudi:v0.9
|
||||
- speecht5-gaudi: opea/speecht5-gaudi:v0.9
|
||||
|
||||
> [NOTE]
|
||||
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.
|
||||
|
||||
@@ -50,7 +50,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/asr:latest
|
||||
image: opea/asr:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: asr-deploy
|
||||
args: null
|
||||
@@ -101,7 +101,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/whisper-gaudi:latest
|
||||
image: opea/whisper-gaudi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: whisper-deploy
|
||||
args: null
|
||||
@@ -164,7 +164,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/tts:latest
|
||||
image: opea/tts:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: tts-deploy
|
||||
args: null
|
||||
@@ -215,7 +215,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/speecht5-gaudi:latest
|
||||
image: opea/speecht5-gaudi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: speecht5-deploy
|
||||
args: null
|
||||
@@ -365,7 +365,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/llm-tgi:latest
|
||||
image: opea/llm-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: llm-deploy
|
||||
args: null
|
||||
@@ -416,7 +416,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/audioqna:latest
|
||||
image: opea/audioqna:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: audioqna-backend-server-deploy
|
||||
args: null
|
||||
|
||||
@@ -50,7 +50,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/asr:latest
|
||||
image: opea/asr:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: asr-deploy
|
||||
args: null
|
||||
@@ -101,7 +101,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/whisper:latest
|
||||
image: opea/whisper:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: whisper-deploy
|
||||
args: null
|
||||
@@ -152,7 +152,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/tts:latest
|
||||
image: opea/tts:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: tts-deploy
|
||||
args: null
|
||||
@@ -203,7 +203,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/speecht5:latest
|
||||
image: opea/speecht5:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: speecht5-deploy
|
||||
args: null
|
||||
@@ -321,7 +321,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/llm-tgi:latest
|
||||
image: opea/llm-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: llm-deploy
|
||||
args: null
|
||||
@@ -372,7 +372,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: audio-qna-config
|
||||
image: opea/audioqna:latest
|
||||
image: opea/audioqna:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: audioqna-backend-server-deploy
|
||||
args: null
|
||||
|
||||
@@ -10,7 +10,90 @@ ChatQnA architecture shows below:
|
||||
|
||||
ChatQnA is implemented on top of [GenAIComps](https://github.com/opea-project/GenAIComps), the ChatQnA Flow Chart shows below:
|
||||
|
||||

|
||||
```mermaid
|
||||
---
|
||||
config:
|
||||
flowchart:
|
||||
nodeSpacing: 100
|
||||
rankSpacing: 100
|
||||
curve: linear
|
||||
theme: base
|
||||
themeVariables:
|
||||
fontSize: 42px
|
||||
---
|
||||
flowchart LR
|
||||
%% Colors %%
|
||||
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
|
||||
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
|
||||
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
|
||||
classDef invisible fill:transparent,stroke:transparent;
|
||||
style ChatQnA-MegaService stroke:#000000
|
||||
%% Subgraphs %%
|
||||
subgraph ChatQnA-MegaService["ChatQnA-MegaService"]
|
||||
direction LR
|
||||
EM([Embedding <br>]):::blue
|
||||
RET([Retrieval <br>]):::blue
|
||||
RER([Rerank <br>]):::blue
|
||||
LLM([LLM <br>]):::blue
|
||||
end
|
||||
subgraph User Interface
|
||||
direction TB
|
||||
a([User Input Query]):::orchid
|
||||
Ingest([Ingest data]):::orchid
|
||||
UI([UI server<br>]):::orchid
|
||||
end
|
||||
subgraph ChatQnA GateWay
|
||||
direction LR
|
||||
invisible1[ ]:::invisible
|
||||
GW([ChatQnA GateWay<br>]):::orange
|
||||
end
|
||||
subgraph .
|
||||
X([OPEA Micsrservice]):::blue
|
||||
Y{{Open Source Service}}
|
||||
Z([OPEA Gateway]):::orange
|
||||
Z1([UI]):::orchid
|
||||
end
|
||||
|
||||
TEI_RER{{Reranking service<br>'TEI'<br>}}
|
||||
TEI_EM{{Embedding service <br>'TEI LangChain'<br>}}
|
||||
VDB{{Vector DB<br>'Redis'<br>}}
|
||||
R_RET{{Retriever service <br>'LangChain Redis'<br>}}
|
||||
DP([Data Preparation<br>'LangChain Redis'<br>]):::blue
|
||||
LLM_gen{{LLM Service <br>'TGI'<br>}}
|
||||
|
||||
%% Data Preparation flow
|
||||
%% Ingest data flow
|
||||
direction LR
|
||||
Ingest[Ingest data] -->|a| UI
|
||||
UI -->|b| DP
|
||||
DP <-.->|c| TEI_EM
|
||||
|
||||
%% Questions interaction
|
||||
direction LR
|
||||
a[User Input Query] -->|1| UI
|
||||
UI -->|2| GW
|
||||
GW <==>|3| ChatQnA-MegaService
|
||||
EM ==>|4| RET
|
||||
RET ==>|5| RER
|
||||
RER ==>|6| LLM
|
||||
|
||||
|
||||
%% Embedding service flow
|
||||
direction TB
|
||||
EM <-.->|3'| TEI_EM
|
||||
RET <-.->|4'| R_RET
|
||||
RER <-.->|5'| TEI_RER
|
||||
LLM <-.->|6'| LLM_gen
|
||||
|
||||
direction TB
|
||||
%% Vector DB interaction
|
||||
R_RET <-.->|d|VDB
|
||||
DP <-.->|d|VDB
|
||||
|
||||
|
||||
|
||||
|
||||
```
|
||||
|
||||
This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.
|
||||
|
||||
@@ -78,7 +161,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/gaudi/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -91,7 +174,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
|
||||
@@ -100,7 +183,7 @@ Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on buil
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/gpu/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [NVIDIA GPU Guide](./docker/gpu/README.md) for more instructions on building docker images from source.
|
||||
|
||||
546
ChatQnA/benchmark/README.md
Normal file
546
ChatQnA/benchmark/README.md
Normal file
@@ -0,0 +1,546 @@
|
||||
# ChatQnA Benchmarking
|
||||
|
||||
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
|
||||
|
||||
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
|
||||
|
||||
# Purpose
|
||||
|
||||
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
|
||||
|
||||
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
|
||||
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
|
||||
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
|
||||
|
||||
# Metrics
|
||||
|
||||
The benchmark will report the below metrics, including:
|
||||
|
||||
- Number of Concurrent Requests
|
||||
- End-to-End Latency: P50, P90, P99 (in milliseconds)
|
||||
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
|
||||
- Average Next Token Latency (in milliseconds)
|
||||
- Average Token Latency (in milliseconds)
|
||||
- Requests Per Second (RPS)
|
||||
- Output Tokens Per Second
|
||||
- Input Tokens Per Second
|
||||
|
||||
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
|
||||
|
||||
# Getting Started
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
|
||||
|
||||
- Every node has direct internet access
|
||||
- Set up kubectl on the master node with access to the Kubernetes cluster.
|
||||
- Install Python 3.8+ on the master node for running the stress tool.
|
||||
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
|
||||
|
||||
## Kubernetes Cluster Example
|
||||
|
||||
```bash
|
||||
$ kubectl get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
k8s-master Ready control-plane 35d v1.29.6
|
||||
k8s-work1 Ready <none> 35d v1.29.5
|
||||
k8s-work2 Ready <none> 35d v1.29.6
|
||||
k8s-work3 Ready <none> 35d v1.29.6
|
||||
```
|
||||
|
||||
## Manifest preparation
|
||||
|
||||
We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
git clone https://github.com/opea-project/GenAIExamples.git
|
||||
cd GenAIExamples/ChatQnA/benchmark
|
||||
|
||||
# replace the image tag from latest to v0.9 since we want to test with v0.9 release
|
||||
IMAGE_TAG=v0.9
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
|
||||
|
||||
# set the huggingface token
|
||||
HUGGINGFACE_TOKEN=<your token>
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
|
||||
|
||||
# set models
|
||||
LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
|
||||
EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
|
||||
RERANK_MODEL_ID=BAAI/bge-reranker-base
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
|
||||
```
|
||||
|
||||
## Benchmark tool preparation
|
||||
|
||||
The test uses the [benchmark tool](https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark) to do performance test. We need to set up benchmark tool at the master node of Kubernetes which is k8s-master.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
git clone https://github.com/opea-project/GenAIEval.git
|
||||
cd GenAIEval
|
||||
python3 -m venv stress_venv
|
||||
source stress_venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Test Configurations
|
||||
|
||||
Workload configuration:
|
||||
|
||||
| Key | Value |
|
||||
| -------- | ------- |
|
||||
| Workload | ChatQnA |
|
||||
| Tag | V0.9 |
|
||||
|
||||
Models configuration
|
||||
| Key | Value |
|
||||
| ---------- | ------------------ |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| Inference | Intel/neural-chat-7b-v3-3 |
|
||||
|
||||
Benchmark parameters
|
||||
| Key | Value |
|
||||
| ---------- | ------------------ |
|
||||
| LLM input tokens | 1024 |
|
||||
| LLM output tokens | 128 |
|
||||
|
||||
Number of test requests for different scheduled node number:
|
||||
| Node count | Concurrency | Query number |
|
||||
| ----- | -------- | -------- |
|
||||
| 1 | 128 | 640 |
|
||||
| 2 | 256 | 1280 |
|
||||
| 4 | 512 | 2560 |
|
||||
|
||||
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
|
||||
|
||||
## Test Steps
|
||||
|
||||
### Single node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-worker1 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 640]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-worker1 node-type-
|
||||
```
|
||||
|
||||
### Two node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/two_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 1280]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-worker1 k8s-worker2 node-type-
|
||||
```
|
||||
|
||||
### Four node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/four_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 2560]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
|
||||
```
|
||||
|
||||
### Example Result
|
||||
|
||||
The following is a summary of the test result, with files saved at `TEST_OUTPUT_DIR`.
|
||||
|
||||
```statistics
|
||||
Concurrency : 512
|
||||
Max request count : 2560
|
||||
Http timeout : 60000
|
||||
|
||||
Benchmark target : chatqnafixed
|
||||
|
||||
=================Total statistics=====================
|
||||
Succeed Response: 2560 (Total 2560, 100.0% Success), Duration: 26.44s, Input Tokens: 61440, Output Tokens: 255985, RPS: 96.82, Input Tokens per Second: 2323.71, Output Tokens per Second: 9681.57
|
||||
End to End latency(ms), P50: 3576.34, P90: 4242.19, P99: 5252.23, Avg: 3581.55
|
||||
First token latency(ms), P50: 726.64, P90: 1128.27, P99: 1796.09, Avg: 769.58
|
||||
Average Next token latency(ms): 28.41
|
||||
Average token latency(ms) : 35.85
|
||||
======================================================
|
||||
```
|
||||
|
||||
```test spec
|
||||
benchmarkresult:
|
||||
Average_Next_token_latency: '28.41'
|
||||
Average_token_latency: '35.85'
|
||||
Duration: '26.44'
|
||||
End_to_End_latency_Avg: '3581.55'
|
||||
End_to_End_latency_P50: '3576.34'
|
||||
End_to_End_latency_P90: '4242.19'
|
||||
End_to_End_latency_P99: '5252.23'
|
||||
First_token_latency_Avg: '769.58'
|
||||
First_token_latency_P50: '726.64'
|
||||
First_token_latency_P90: '1128.27'
|
||||
First_token_latency_P99: '1796.09'
|
||||
Input_Tokens: '61440'
|
||||
Input_Tokens_per_Second: '2323.71'
|
||||
Onput_Tokens: '255985'
|
||||
Output_Tokens_per_Second: '9681.57'
|
||||
RPS: '96.82'
|
||||
Succeed_Response: '2560'
|
||||
locust_P50: '160'
|
||||
locust_P99: '810'
|
||||
locust_num_failures: '0'
|
||||
locust_num_requests: '2560'
|
||||
benchmarkspec:
|
||||
bench-target: chatqnafixed
|
||||
endtest_time: '2024-08-25T14:19:25.955973'
|
||||
host: http://10.110.105.197:8888
|
||||
llm-model: Intel/neural-chat-7b-v3-3
|
||||
locustfile: /home/sdp/lvl/GenAIEval/evals/benchmark/stresscli/locust/aistress.py
|
||||
max_requests: 2560
|
||||
namespace: default
|
||||
processes: 2
|
||||
run_name: benchmark
|
||||
runtime: 60m
|
||||
starttest_time: '2024-08-25T14:18:50.366514'
|
||||
stop_timeout: 120
|
||||
tool: locust
|
||||
users: 512
|
||||
hardwarespec:
|
||||
aise-gaudi-00:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375272Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-01:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375256Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-02:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375260Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-03:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.6.8
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-112-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056374404Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.4 LTS
|
||||
workloadspec:
|
||||
aise-gaudi-00:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-01:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
prometheus-operator:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 200Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-02:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-03:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
dataprep-deploy:
|
||||
replica: 1
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
vector-db:
|
||||
replica: 1
|
||||
```
|
||||
55
ChatQnA/benchmark/benchmark.yaml
Normal file
55
ChatQnA/benchmark/benchmark.yaml
Normal file
@@ -0,0 +1,55 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
test_suite_config: # Overall configuration settings for the test suite
|
||||
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
|
||||
concurrent_level: 5 # The concurrency level, adjustable based on requirements
|
||||
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
|
||||
random_prompt: false # Use random prompts if true, fixed prompts if false
|
||||
run_time: 60m # The max total run time for the test suite
|
||||
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
|
||||
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
|
||||
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
|
||||
test_output_dir: "${TEST_OUTPUT_DIR}" # The directory to store the test output
|
||||
|
||||
test_cases:
|
||||
chatqna:
|
||||
embedding:
|
||||
run_test: false
|
||||
service_name: "embedding-svc" # Replace with your service name
|
||||
embedserve:
|
||||
run_test: false
|
||||
service_name: "embedding-dependency-svc" # Replace with your service name
|
||||
retriever:
|
||||
run_test: false
|
||||
service_name: "retriever-svc" # Replace with your service name
|
||||
parameters:
|
||||
search_type: "similarity"
|
||||
k: 4
|
||||
fetch_k: 20
|
||||
lambda_mult: 0.5
|
||||
score_threshold: 0.2
|
||||
reranking:
|
||||
run_test: false
|
||||
service_name: "reranking-svc" # Replace with your service name
|
||||
parameters:
|
||||
top_n: 1
|
||||
rerankserve:
|
||||
run_test: false
|
||||
service_name: "reranking-dependency-svc" # Replace with your service name
|
||||
llm:
|
||||
run_test: false
|
||||
service_name: "llm-svc" # Replace with your service name
|
||||
parameters:
|
||||
max_new_tokens: 128
|
||||
temperature: 0.01
|
||||
top_k: 10
|
||||
top_p: 0.95
|
||||
repetition_penalty: 1.03
|
||||
streaming: true
|
||||
llmserve:
|
||||
run_test: false
|
||||
service_name: "llm-dependency-svc" # Replace with your service name
|
||||
e2e:
|
||||
run_test: true
|
||||
service_name: "chatqna-backend-server-svc" # Replace with your service name
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/chatqna:latest
|
||||
image: opea/chatqna:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: chatqna-backend-server-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/dataprep-redis:latest
|
||||
image: opea/dataprep-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: dataprep-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/embedding-tei:latest
|
||||
image: opea/embedding-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: embedding-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/llm-tgi:latest
|
||||
image: opea/llm-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: llm-deploy
|
||||
args: null
|
||||
|
||||
@@ -31,7 +31,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/tei-gaudi:latest
|
||||
image: opea/tei-gaudi:v0.9
|
||||
name: reranking-dependency-deploy
|
||||
args:
|
||||
- --model-id
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/reranking-tei:latest
|
||||
image: opea/reranking-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: reranking-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/retriever-redis:latest
|
||||
image: opea/retriever-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: retriever-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/chatqna:latest
|
||||
image: opea/chatqna:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: chatqna-backend-server-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/dataprep-redis:latest
|
||||
image: opea/dataprep-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: dataprep-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/embedding-tei:latest
|
||||
image: opea/embedding-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: embedding-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/llm-tgi:latest
|
||||
image: opea/llm-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: llm-deploy
|
||||
args: null
|
||||
|
||||
@@ -31,7 +31,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/tei-gaudi:latest
|
||||
image: opea/tei-gaudi:v0.9
|
||||
name: reranking-dependency-deploy
|
||||
args:
|
||||
- --model-id
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/reranking-tei:latest
|
||||
image: opea/reranking-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: reranking-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/retriever-redis:latest
|
||||
image: opea/retriever-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: retriever-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/chatqna:latest
|
||||
image: opea/chatqna:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: chatqna-backend-server-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/dataprep-redis:latest
|
||||
image: opea/dataprep-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: dataprep-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/embedding-tei:latest
|
||||
image: opea/embedding-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: embedding-deploy
|
||||
args: null
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/llm-tgi:latest
|
||||
image: opea/llm-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: llm-deploy
|
||||
args: null
|
||||
|
||||
@@ -31,7 +31,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/tei-gaudi:latest
|
||||
image: opea/tei-gaudi:v0.9
|
||||
name: reranking-dependency-deploy
|
||||
args:
|
||||
- --model-id
|
||||
|
||||
@@ -32,7 +32,7 @@ spec:
|
||||
- envFrom:
|
||||
- configMapRef:
|
||||
name: qna-config
|
||||
image: opea/reranking-tei:latest
|
||||
image: opea/reranking-tei:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: reranking-deploy
|
||||
args: null
|
||||
|
||||
@@ -40,7 +40,7 @@ spec:
|
||||
configMapKeyRef:
|
||||
name: qna-config
|
||||
key: INDEX_NAME
|
||||
image: opea/retriever-redis:latest
|
||||
image: opea/retriever-redis:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: retriever-deploy
|
||||
args: null
|
||||
|
||||
@@ -160,7 +160,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/aipc/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
|
||||
# let ollama service runs
|
||||
# e.g. ollama run llama3
|
||||
|
||||
@@ -211,26 +211,26 @@ cd GenAIExamples/ChatQnA/docker/gaudi/
|
||||
If use tgi for llm backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
If use vllm for llm backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose_vllm.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose_vllm.yaml up -d
|
||||
```
|
||||
|
||||
If use vllm-on-ray for llm backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose_vllm_ray.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose_vllm_ray.yaml up -d
|
||||
```
|
||||
|
||||
If you want to enable guardrails microservice in the pipeline, please follow the below command instead:
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/gaudi/
|
||||
docker compose -f compose_guardrails.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose_guardrails.yaml up -d
|
||||
```
|
||||
|
||||
> **_NOTE:_** Users need at least two Gaudi cards to run the ChatQnA successfully.
|
||||
|
||||
@@ -17,7 +17,7 @@ start the docker containers
|
||||
|
||||
```
|
||||
cd ./GenAIExamples/ChatQnA/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Check the start up log by `docker compose -f ./docker/gaudi/compose.yaml logs`.
|
||||
@@ -149,7 +149,7 @@ Set the LLM_MODEL_ID then restart the containers.
|
||||
Also you can check overall logs with the following command, where the compose.yaml is the mega service docker-compose configuration file.
|
||||
|
||||
```
|
||||
docker compose -f ./docker-composer/gaudi/compose.yaml logs
|
||||
TAG=v0.9 docker compose -f ./docker-composer/gaudi/compose.yaml logs
|
||||
```
|
||||
|
||||
## 4. Check each micro service used by the Mega Service
|
||||
|
||||
@@ -121,7 +121,7 @@ Note: Please replace with `host_ip` with you external IP address, do **NOT** use
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/gpu/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate MicroServices and MegaService
|
||||
|
||||
@@ -226,13 +226,13 @@ cd GenAIExamples/ChatQnA/docker/xeon/
|
||||
If use TGI backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
If use vLLM backend.
|
||||
|
||||
```bash
|
||||
docker compose -f compose_vllm.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose_vllm.yaml up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -205,7 +205,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/ChatQnA/docker/xeon/
|
||||
docker compose -f compose_qdrant.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose_qdrant.yaml up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -72,6 +72,7 @@ services:
|
||||
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
|
||||
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
|
||||
LANGCHAIN_PROJECT: "opea-retriever-service"
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
restart: unless-stopped
|
||||
tei-reranking-service:
|
||||
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
|
||||
|
||||
@@ -16,18 +16,18 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
|
||||
|
||||
- redis-vector-db: redis/redis-stack:7.2.0-v9
|
||||
- tei_embedding_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
- embedding: opea/embedding-tei:latest
|
||||
- retriever: opea/retriever-redis:latest
|
||||
- embedding: opea/embedding-tei:v0.9
|
||||
- retriever: opea/retriever-redis:v0.9
|
||||
- tei_xeon_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
- reranking: opea/reranking-tei:latest
|
||||
- reranking: opea/reranking-tei:v0.9
|
||||
- tgi-service: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
|
||||
- llm: opea/llm-tgi:latest
|
||||
- chaqna-xeon-backend-server: opea/chatqna:latest
|
||||
- llm: opea/llm-tgi:v0.9
|
||||
- chaqna-xeon-backend-server: opea/chatqna:v0.9
|
||||
|
||||
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
|
||||
For Gaudi:
|
||||
|
||||
- tei-embedding-service: opea/tei-gaudi:latest
|
||||
- tei-embedding-service: opea/tei-gaudi:v0.9
|
||||
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
|
||||
|
||||
> [NOTE]
|
||||
|
||||
@@ -501,7 +501,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/dataprep-redis:latest"
|
||||
image: "opea/dataprep-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: data-prep
|
||||
@@ -579,7 +579,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/embedding-tei:latest"
|
||||
image: "opea/embedding-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: embedding-usvc
|
||||
@@ -657,7 +657,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -807,7 +807,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/reranking-tei:latest"
|
||||
image: "opea/reranking-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: reranking-usvc
|
||||
@@ -885,7 +885,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/retriever-redis:latest"
|
||||
image: "opea/retriever-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: retriever-usvc
|
||||
@@ -1212,7 +1212,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/chatqna:latest"
|
||||
image: "opea/chatqna:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -500,7 +500,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/dataprep-redis:latest"
|
||||
image: "opea/dataprep-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: data-prep
|
||||
@@ -578,7 +578,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/embedding-tei:latest"
|
||||
image: "opea/embedding-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: embedding-usvc
|
||||
@@ -656,7 +656,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -806,7 +806,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/reranking-tei:latest"
|
||||
image: "opea/reranking-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: reranking-usvc
|
||||
@@ -884,7 +884,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/retriever-redis:latest"
|
||||
image: "opea/retriever-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: retriever-usvc
|
||||
@@ -1209,7 +1209,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/chatqna:latest"
|
||||
image: "opea/chatqna:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeGen/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -84,7 +84,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeGen/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
|
||||
|
||||
@@ -103,7 +103,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeGen/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate the MicroServices and MegaService
|
||||
|
||||
@@ -106,7 +106,7 @@ Note: Please replace the `host_ip` with you external IP address, do not use `loc
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeGen/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate the MicroServices and MegaService
|
||||
|
||||
@@ -6,7 +6,8 @@
|
||||
|
||||
> You can also customize the "MODEL_ID" if needed.
|
||||
|
||||
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGEn workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
|
||||
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGen workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
|
||||
> Alternatively, you can change the `codegen.yaml` to use a different type of volume, such as a persistent volume claim.
|
||||
|
||||
## Deploy On Xeon
|
||||
|
||||
@@ -30,10 +31,13 @@ kubectl apply -f codegen.yaml
|
||||
|
||||
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
|
||||
|
||||
Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGEn service for access.
|
||||
Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGen service for access.
|
||||
|
||||
Open another terminal and run the following command to verify the service if working:
|
||||
|
||||
> Note that it may take a couple of minutes for the service to be ready. If the `curl` command below fails, you
|
||||
> can check the logs of the codegen-tgi pod to see its status or check for errors.
|
||||
|
||||
```
|
||||
kubectl get pods
|
||||
curl http://localhost:7778/v1/codegen -H "Content-Type: application/json" -d '{
|
||||
|
||||
@@ -170,7 +170,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -271,6 +271,8 @@ spec:
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: 1
|
||||
memory: 64Gi
|
||||
hugepages-2Mi: 500Mi
|
||||
volumes:
|
||||
- name: model-volume
|
||||
hostPath:
|
||||
@@ -324,7 +326,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/codegen:latest"
|
||||
image: "opea/codegen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -169,7 +169,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -322,7 +322,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/codegen:latest"
|
||||
image: "opea/codegen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -179,7 +179,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: {}
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -230,7 +230,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/codegen:latest"
|
||||
image: "opea/codegen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: codegen
|
||||
@@ -273,7 +273,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/codegen-react-ui:latest"
|
||||
image: "opea/codegen-react-ui:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: react-ui
|
||||
|
||||
@@ -57,7 +57,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -70,7 +70,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
|
||||
|
||||
@@ -62,7 +62,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -70,7 +70,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/CodeTrans/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -170,7 +170,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -324,7 +324,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/codetrans:latest"
|
||||
image: "opea/codetrans:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -169,7 +169,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -322,7 +322,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/codetrans:latest"
|
||||
image: "opea/codetrans:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -59,7 +59,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8000/v1/retrievaltool"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
|
||||
export llm_hardware='xeon' #xeon, xpu, gaudi
|
||||
cd GenAIExamples/DocIndexRetriever/docker/${llm_hardware}/
|
||||
docker compose -f docker-compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f docker-compose.yaml up -d
|
||||
```
|
||||
|
||||
### 3. Validation
|
||||
|
||||
@@ -58,7 +58,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/DocSum/docker/gaudi/
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/DocSum/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
|
||||
|
||||
@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/DocSum/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -60,6 +60,8 @@ Build the frontend Docker image via below command:
|
||||
cd GenAIExamples/DocSum/docker/ui/
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
|
||||
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT -f ./docker/Dockerfile.react .
|
||||
|
||||
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
|
||||
```
|
||||
|
||||
Then run the command `docker images`, you will have the following Docker Images:
|
||||
@@ -93,7 +95,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/DocSum/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -170,7 +170,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-docsum-tgi:latest"
|
||||
image: "opea/llm-docsum-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -324,7 +324,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/docsum:latest"
|
||||
image: "opea/docsum:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -169,7 +169,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-docsum-tgi:latest"
|
||||
image: "opea/llm-docsum-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -322,7 +322,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/docsum:latest"
|
||||
image: "opea/docsum:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -180,7 +180,7 @@ spec:
|
||||
value:
|
||||
|
||||
securityContext: {}
|
||||
image: "opea/llm-docsum-tgi:latest"
|
||||
image: "opea/llm-docsum-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -231,7 +231,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/docsum:latest"
|
||||
image: "opea/docsum:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: docsum
|
||||
@@ -274,7 +274,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/docsum-react-ui:latest"
|
||||
image: "opea/docsum-react-ui:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: react-ui
|
||||
|
||||
@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/FaqGen/docker/gaudi
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -85,7 +85,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/FaqGen/docker/xeon
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -117,7 +117,7 @@ spec:
|
||||
value: "http://faq-tgi-svc.default.svc.cluster.local:8010"
|
||||
- name: HUGGINGFACEHUB_API_TOKEN
|
||||
value: "insert-your-huggingface-token-here"
|
||||
image: opea/llm-faqgen-tgi:latest
|
||||
image: opea/llm-faqgen-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
args: null
|
||||
ports:
|
||||
@@ -166,7 +166,7 @@ spec:
|
||||
value: faq-mega-server-svc
|
||||
- name: MEGA_SERVICE_PORT
|
||||
value: "7777"
|
||||
image: opea/faqgen:latest
|
||||
image: opea/faqgen:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
args: null
|
||||
ports:
|
||||
|
||||
@@ -24,7 +24,7 @@ spec:
|
||||
env:
|
||||
- name: DOC_BASE_URL
|
||||
value: http://{insert_your_ip_here}:7779/v1/faqgen
|
||||
image: opea/faqgen-ui:latest
|
||||
image: opea/faqgen-ui:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
args: null
|
||||
ports:
|
||||
|
||||
@@ -96,7 +96,7 @@ spec:
|
||||
value: "http://faq-tgi-cpu-svc.default.svc.cluster.local:8011"
|
||||
- name: HUGGINGFACEHUB_API_TOKEN
|
||||
value: "insert-your-huggingface-token-here"
|
||||
image: opea/llm-faqgen-tgi:latest
|
||||
image: opea/llm-faqgen-tgi:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
args: null
|
||||
ports:
|
||||
@@ -145,7 +145,7 @@ spec:
|
||||
value: faq-mega-server-cpu-svc
|
||||
- name: MEGA_SERVICE_PORT
|
||||
value: "7777"
|
||||
image: opea/faqgen:latest
|
||||
image: opea/faqgen:v0.9
|
||||
imagePullPolicy: IfNotPresent
|
||||
args: null
|
||||
ports:
|
||||
|
||||
@@ -179,7 +179,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: {}
|
||||
image: "opea/llm-faqgen-tgi:latest"
|
||||
image: "opea/llm-faqgen-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -230,7 +230,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/faqgen:latest"
|
||||
image: "opea/faqgen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: faqgen
|
||||
@@ -273,7 +273,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value:
|
||||
securityContext: null
|
||||
image: "opea/faqgen-react-ui:latest"
|
||||
image: "opea/faqgen-react-ui:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: react-ui
|
||||
|
||||
@@ -195,7 +195,7 @@ cd GenAIExamples/ProductivitySuite/docker/xeon/
|
||||
```
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
### Setup Keycloak
|
||||
|
||||
@@ -65,7 +65,7 @@ spec:
|
||||
- configMapRef:
|
||||
name: chat-history-config
|
||||
securityContext: null
|
||||
image: "opea/chathistory-mongo-server:latest"
|
||||
image: "opea/chathistory-mongo-server:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: chat-history
|
||||
|
||||
@@ -499,7 +499,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/dataprep-redis:latest"
|
||||
image: "opea/dataprep-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: data-prep
|
||||
@@ -557,7 +557,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/embedding-tei:latest"
|
||||
image: "opea/embedding-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: embedding-usvc
|
||||
@@ -615,7 +615,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -753,7 +753,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/reranking-tei:latest"
|
||||
image: "opea/reranking-tei:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: reranking-usvc
|
||||
@@ -811,7 +811,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/retriever-redis:latest"
|
||||
image: "opea/retriever-redis:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: retriever-usvc
|
||||
@@ -1069,7 +1069,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/chatqna:latest"
|
||||
image: "opea/chatqna:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -171,7 +171,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-tgi:latest"
|
||||
image: "opea/llm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -301,7 +301,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/codegen:latest"
|
||||
image: "opea/codegen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -171,7 +171,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/llm-docsum-tgi:latest"
|
||||
image: "opea/llm-docsum-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -301,7 +301,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/docsum:latest"
|
||||
image: "opea/docsum:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -183,7 +183,7 @@ spec:
|
||||
- configMapRef:
|
||||
name: faqgen-llm-uservice-config
|
||||
securityContext: {}
|
||||
image: "opea/llm-faqgen-tgi:latest"
|
||||
image: "opea/llm-faqgen-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: llm-uservice
|
||||
@@ -234,7 +234,7 @@ spec:
|
||||
- name: no_proxy
|
||||
value: ""
|
||||
securityContext: null
|
||||
image: "opea/faqgen:latest"
|
||||
image: "opea/faqgen:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: faqgen
|
||||
|
||||
@@ -82,7 +82,7 @@ spec:
|
||||
- name: APP_KEYCLOAK_SERVICE_ENDPOINT
|
||||
value: ""
|
||||
securityContext: null
|
||||
image: "opea/productivity-suite-react-ui-server:latest"
|
||||
image: "opea/productivity-suite-react-ui-server:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: react-ui
|
||||
|
||||
@@ -65,7 +65,7 @@ spec:
|
||||
- configMapRef:
|
||||
name: prompt-registry-config
|
||||
securityContext: null
|
||||
image: "opea/promptregistry-mongo-server:latest"
|
||||
image: "opea/promptregistry-mongo-server:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: prompt-registry
|
||||
|
||||
@@ -69,7 +69,7 @@ If your version of `Habana Driver` < 1.16.0 (check with `hl-smi`), run the follo
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/SearchQnA/docker/gaudi/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -82,7 +82,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/SearchQnA/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
|
||||
|
||||
@@ -109,7 +109,7 @@ export LLM_SERVICE_PORT=3007
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/SearchQnA/docker/gaudi/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
## 🚀 Test MicroServices
|
||||
|
||||
@@ -88,7 +88,7 @@ export LLM_SERVICE_PORT=3007
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/SearchQnA/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
## 🚀 Test MicroServices
|
||||
|
||||
@@ -64,7 +64,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
|
||||
### Start Microservice Docker Containers
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -72,7 +72,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
|
||||
### Start Microservice Docker Containers
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -63,7 +63,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/VisualQnA/docker/gaudi/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
|
||||
@@ -76,5 +76,5 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
|
||||
|
||||
```bash
|
||||
cd GenAIExamples/VisualQnA/docker/xeon/
|
||||
docker compose up -d
|
||||
TAG=v0.9 docker compose up -d
|
||||
```
|
||||
|
||||
@@ -85,7 +85,7 @@ cd GenAIExamples/VisualQnA/docker/gaudi/
|
||||
```
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
> **_NOTE:_** Users need at least one Gaudi cards to run the VisualQnA successfully.
|
||||
|
||||
@@ -124,7 +124,7 @@ cd GenAIExamples/VisualQnA/docker/xeon/
|
||||
```
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yaml up -d
|
||||
TAG=v0.9 docker compose -f compose.yaml up -d
|
||||
```
|
||||
|
||||
### Validate Microservices
|
||||
|
||||
@@ -165,7 +165,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/lvm-tgi:latest"
|
||||
image: "opea/lvm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: lvm-uservice
|
||||
@@ -215,7 +215,7 @@ spec:
|
||||
name: visualqna-tgi-config
|
||||
securityContext:
|
||||
{}
|
||||
image: "opea/llava-tgi:latest"
|
||||
image: "opea/llava-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
@@ -282,7 +282,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/visualqna:latest"
|
||||
image: "opea/visualqna:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
@@ -166,7 +166,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/lvm-tgi:latest"
|
||||
image: "opea/lvm-tgi:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: lvm-uservice
|
||||
@@ -282,7 +282,7 @@ spec:
|
||||
runAsUser: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
image: "opea/visualqna:latest"
|
||||
image: "opea/visualqna:v0.9"
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /tmp
|
||||
|
||||
Reference in New Issue
Block a user