Compare commits

...

5 Commits
v1.2rc ... v0.9

Author SHA1 Message Date
xiguiw
4d5972112c [Doc] Update ChatQnA flow chart (#542)
* Update flow chart

Signed-off-by: Wang, Xigui <xigui.wang@intel.com>

* Updated Flowchart

Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com>

---------

Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
(cherry picked from commit dad8eb4b82)
2024-08-27 11:07:03 +08:00
lvliang-intel
dab0177432 Add benchmark README for ChatQnA (#662)
* Add benchmark README for ChatQnA

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add benchmark.yaml

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml path

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* fix preci issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update title

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

---------

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: Yingchun Guo <yingchun.guo@intel.com>
2024-08-27 11:06:36 +08:00
NeuralChatBot
e7b000eca5 Freeze OPEA images tag
Signed-off-by: NeuralChatBot <grp_neural_chat_bot@intel.com>
2024-08-25 16:28:59 +00:00
chen, suyue
723fddec79 add env for chatqna vllm (#655)
Signed-off-by: chensuyue <suyue.chen@intel.com>
(cherry picked from commit f78aa9ee2f)
2024-08-23 22:11:32 +08:00
Dina Suehiro Jones
f629702004 Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml and doc updates (#613)
* Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml and doc updates

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
(cherry picked from commit c25063f4bb)
2024-08-23 22:11:31 +08:00
82 changed files with 842 additions and 149 deletions

View File

@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007
```bash
cd GenAIExamples/AudioQnA/docker/gaudi/
docker compose up -d
TAG=v0.9 docker compose up -d
```
## 🚀 Test MicroServices

View File

@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007
```bash
cd GenAIExamples/AudioQnA/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```
## 🚀 Test MicroServices

View File

@@ -15,19 +15,19 @@ The AudioQnA application is defined as a Custom Resource (CR) file that the abov
The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
- tgi-service: ghcr.io/huggingface/text-generation-inference:1.4
- llm: opea/llm-tgi:latest
- asr: opea/asr:latest
- whisper: opea/whisper:latest
- tts: opea/tts:latest
- speecht5: opea/speecht5:latest
- llm: opea/llm-tgi:v0.9
- asr: opea/asr:v0.9
- whisper: opea/whisper:v0.9
- tts: opea/tts:v0.9
- speecht5: opea/speecht5:v0.9
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest
- whisper-gaudi: opea/whisper-gaudi:v0.9
- speecht5-gaudi: opea/speecht5-gaudi:v0.9
> [NOTE]
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.

View File

@@ -50,7 +50,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
image: opea/asr:v0.9
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
@@ -101,7 +101,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/whisper-gaudi:latest
image: opea/whisper-gaudi:v0.9
imagePullPolicy: IfNotPresent
name: whisper-deploy
args: null
@@ -164,7 +164,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
image: opea/tts:v0.9
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
@@ -215,7 +215,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/speecht5-gaudi:latest
image: opea/speecht5-gaudi:v0.9
imagePullPolicy: IfNotPresent
name: speecht5-deploy
args: null
@@ -365,7 +365,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
image: opea/llm-tgi:v0.9
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
@@ -416,7 +416,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/audioqna:latest
image: opea/audioqna:v0.9
imagePullPolicy: IfNotPresent
name: audioqna-backend-server-deploy
args: null

View File

@@ -50,7 +50,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
image: opea/asr:v0.9
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
@@ -101,7 +101,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/whisper:latest
image: opea/whisper:v0.9
imagePullPolicy: IfNotPresent
name: whisper-deploy
args: null
@@ -152,7 +152,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
image: opea/tts:v0.9
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
@@ -203,7 +203,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/speecht5:latest
image: opea/speecht5:v0.9
imagePullPolicy: IfNotPresent
name: speecht5-deploy
args: null
@@ -321,7 +321,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
image: opea/llm-tgi:v0.9
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
@@ -372,7 +372,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/audioqna:latest
image: opea/audioqna:v0.9
imagePullPolicy: IfNotPresent
name: audioqna-backend-server-deploy
args: null

View File

@@ -10,7 +10,90 @@ ChatQnA architecture shows below:
ChatQnA is implemented on top of [GenAIComps](https://github.com/opea-project/GenAIComps), the ChatQnA Flow Chart shows below:
![Flow Chart](./assets/img/chatqna_flow_chart.png)
```mermaid
---
config:
flowchart:
nodeSpacing: 100
rankSpacing: 100
curve: linear
theme: base
themeVariables:
fontSize: 42px
---
flowchart LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style ChatQnA-MegaService stroke:#000000
%% Subgraphs %%
subgraph ChatQnA-MegaService["ChatQnA-MegaService"]
direction LR
EM([Embedding <br>]):::blue
RET([Retrieval <br>]):::blue
RER([Rerank <br>]):::blue
LLM([LLM <br>]):::blue
end
subgraph User Interface
direction TB
a([User Input Query]):::orchid
Ingest([Ingest data]):::orchid
UI([UI server<br>]):::orchid
end
subgraph ChatQnA GateWay
direction LR
invisible1[ ]:::invisible
GW([ChatQnA GateWay<br>]):::orange
end
subgraph .
X([OPEA Micsrservice]):::blue
Y{{Open Source Service}}
Z([OPEA Gateway]):::orange
Z1([UI]):::orchid
end
TEI_RER{{Reranking service<br>'TEI'<br>}}
TEI_EM{{Embedding service <br>'TEI LangChain'<br>}}
VDB{{Vector DB<br>'Redis'<br>}}
R_RET{{Retriever service <br>'LangChain Redis'<br>}}
DP([Data Preparation<br>'LangChain Redis'<br>]):::blue
LLM_gen{{LLM Service <br>'TGI'<br>}}
%% Data Preparation flow
%% Ingest data flow
direction LR
Ingest[Ingest data] -->|a| UI
UI -->|b| DP
DP <-.->|c| TEI_EM
%% Questions interaction
direction LR
a[User Input Query] -->|1| UI
UI -->|2| GW
GW <==>|3| ChatQnA-MegaService
EM ==>|4| RET
RET ==>|5| RER
RER ==>|6| LLM
%% Embedding service flow
direction TB
EM <-.->|3'| TEI_EM
RET <-.->|4'| R_RET
RER <-.->|5'| TEI_RER
LLM <-.->|6'| LLM_gen
direction TB
%% Vector DB interaction
R_RET <-.->|d|VDB
DP <-.->|d|VDB
```
This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.
@@ -78,7 +161,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
```bash
cd GenAIExamples/ChatQnA/docker/gaudi/
docker compose up -d
TAG=v0.9 docker compose up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -91,7 +174,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/ChatQnA/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
@@ -100,7 +183,7 @@ Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on buil
```bash
cd GenAIExamples/ChatQnA/docker/gpu/
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [NVIDIA GPU Guide](./docker/gpu/README.md) for more instructions on building docker images from source.

546
ChatQnA/benchmark/README.md Normal file
View File

@@ -0,0 +1,546 @@
# ChatQnA Benchmarking
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
# Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
# Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
# Getting Started
## Prerequisites
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running the stress tool.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
## Kubernetes Cluster Example
```bash
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 35d v1.29.6
k8s-work1 Ready <none> 35d v1.29.5
k8s-work2 Ready <none> 35d v1.29.6
k8s-work3 Ready <none> 35d v1.29.6
```
## Manifest preparation
We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
```bash
# on k8s-master node
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/benchmark
# replace the image tag from latest to v0.9 since we want to test with v0.9 release
IMAGE_TAG=v0.9
find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
# set the huggingface token
HUGGINGFACE_TOKEN=<your token>
find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
# set models
LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
RERANK_MODEL_ID=BAAI/bge-reranker-base
find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
```
## Benchmark tool preparation
The test uses the [benchmark tool](https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark) to do performance test. We need to set up benchmark tool at the master node of Kubernetes which is k8s-master.
```bash
# on k8s-master node
git clone https://github.com/opea-project/GenAIEval.git
cd GenAIEval
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
```
## Test Configurations
Workload configuration:
| Key | Value |
| -------- | ------- |
| Workload | ChatQnA |
| Tag | V0.9 |
Models configuration
| Key | Value |
| ---------- | ------------------ |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| Inference | Intel/neural-chat-7b-v3-3 |
Benchmark parameters
| Key | Value |
| ---------- | ------------------ |
| LLM input tokens | 1024 |
| LLM output tokens | 128 |
Number of test requests for different scheduled node number:
| Node count | Concurrency | Query number |
| ----- | -------- | -------- |
| 1 | 128 | 640 |
| 2 | 256 | 1280 |
| 4 | 512 | 2560 |
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
## Test Steps
### Single node test
#### 1. Preparation
We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-worker1 node-type=chatqna-opea
```
#### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
kubectl apply -f .
```
#### 3. Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 640]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
And then run the benchmark tool by:
```bash
cd GenAIEval/evals/benchmark
python benchmark.py
```
#### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
#### 5. Clean up
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-worker1 node-type-
```
### Two node test
#### 1. Preparation
We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
```
#### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/two_gaudi
kubectl apply -f .
```
#### 3. Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 1280]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
And then run the benchmark tool by:
```bash
cd GenAIEval/evals/benchmark
python benchmark.py
```
#### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
#### 5. Clean up
```bash
# on k8s-master node
kubectl delete -f .
kubectl label nodes k8s-worker1 k8s-worker2 node-type-
```
### Four node test
#### 1. Preparation
We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
```
#### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/four_gaudi
kubectl apply -f .
```
#### 3. Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 2560]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
And then run the benchmark tool by:
```bash
cd GenAIEval/evals/benchmark
python benchmark.py
```
#### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
#### 5. Clean up
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
```
### Example Result
The following is a summary of the test result, with files saved at `TEST_OUTPUT_DIR`.
```statistics
Concurrency : 512
Max request count : 2560
Http timeout : 60000
Benchmark target : chatqnafixed
=================Total statistics=====================
Succeed Response: 2560 (Total 2560, 100.0% Success), Duration: 26.44s, Input Tokens: 61440, Output Tokens: 255985, RPS: 96.82, Input Tokens per Second: 2323.71, Output Tokens per Second: 9681.57
End to End latency(ms), P50: 3576.34, P90: 4242.19, P99: 5252.23, Avg: 3581.55
First token latency(ms), P50: 726.64, P90: 1128.27, P99: 1796.09, Avg: 769.58
Average Next token latency(ms): 28.41
Average token latency(ms) : 35.85
======================================================
```
```test spec
benchmarkresult:
Average_Next_token_latency: '28.41'
Average_token_latency: '35.85'
Duration: '26.44'
End_to_End_latency_Avg: '3581.55'
End_to_End_latency_P50: '3576.34'
End_to_End_latency_P90: '4242.19'
End_to_End_latency_P99: '5252.23'
First_token_latency_Avg: '769.58'
First_token_latency_P50: '726.64'
First_token_latency_P90: '1128.27'
First_token_latency_P99: '1796.09'
Input_Tokens: '61440'
Input_Tokens_per_Second: '2323.71'
Onput_Tokens: '255985'
Output_Tokens_per_Second: '9681.57'
RPS: '96.82'
Succeed_Response: '2560'
locust_P50: '160'
locust_P99: '810'
locust_num_failures: '0'
locust_num_requests: '2560'
benchmarkspec:
bench-target: chatqnafixed
endtest_time: '2024-08-25T14:19:25.955973'
host: http://10.110.105.197:8888
llm-model: Intel/neural-chat-7b-v3-3
locustfile: /home/sdp/lvl/GenAIEval/evals/benchmark/stresscli/locust/aistress.py
max_requests: 2560
namespace: default
processes: 2
run_name: benchmark
runtime: 60m
starttest_time: '2024-08-25T14:18:50.366514'
stop_timeout: 120
tool: locust
users: 512
hardwarespec:
aise-gaudi-00:
architecture: amd64
containerRuntimeVersion: containerd://1.7.18
cpu: '160'
habana.ai/gaudi: '8'
kernelVersion: 5.15.0-92-generic
kubeProxyVersion: v1.29.7
kubeletVersion: v1.29.7
memory: 1056375272Ki
operatingSystem: linux
osImage: Ubuntu 22.04.3 LTS
aise-gaudi-01:
architecture: amd64
containerRuntimeVersion: containerd://1.7.18
cpu: '160'
habana.ai/gaudi: '8'
kernelVersion: 5.15.0-92-generic
kubeProxyVersion: v1.29.7
kubeletVersion: v1.29.7
memory: 1056375256Ki
operatingSystem: linux
osImage: Ubuntu 22.04.3 LTS
aise-gaudi-02:
architecture: amd64
containerRuntimeVersion: containerd://1.7.18
cpu: '160'
habana.ai/gaudi: '8'
kernelVersion: 5.15.0-92-generic
kubeProxyVersion: v1.29.7
kubeletVersion: v1.29.7
memory: 1056375260Ki
operatingSystem: linux
osImage: Ubuntu 22.04.3 LTS
aise-gaudi-03:
architecture: amd64
containerRuntimeVersion: containerd://1.6.8
cpu: '160'
habana.ai/gaudi: '8'
kernelVersion: 5.15.0-112-generic
kubeProxyVersion: v1.29.7
kubeletVersion: v1.29.7
memory: 1056374404Ki
operatingSystem: linux
osImage: Ubuntu 22.04.4 LTS
workloadspec:
aise-gaudi-00:
chatqna-backend-server-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 4000Mi
requests:
cpu: '8'
memory: 4000Mi
embedding-dependency-deploy:
replica: 1
resources:
limits:
cpu: '80'
memory: 20000Mi
requests:
cpu: '80'
memory: 20000Mi
embedding-deploy:
replica: 1
llm-dependency-deploy:
replica: 8
resources:
limits:
habana.ai/gaudi: '1'
requests:
habana.ai/gaudi: '1'
llm-deploy:
replica: 1
retriever-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 2500Mi
requests:
cpu: '8'
memory: 2500Mi
aise-gaudi-01:
chatqna-backend-server-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 4000Mi
requests:
cpu: '8'
memory: 4000Mi
embedding-dependency-deploy:
replica: 1
resources:
limits:
cpu: '80'
memory: 20000Mi
requests:
cpu: '80'
memory: 20000Mi
embedding-deploy:
replica: 1
llm-dependency-deploy:
replica: 8
resources:
limits:
habana.ai/gaudi: '1'
requests:
habana.ai/gaudi: '1'
llm-deploy:
replica: 1
prometheus-operator:
replica: 1
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
retriever-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 2500Mi
requests:
cpu: '8'
memory: 2500Mi
aise-gaudi-02:
chatqna-backend-server-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 4000Mi
requests:
cpu: '8'
memory: 4000Mi
embedding-dependency-deploy:
replica: 1
resources:
limits:
cpu: '80'
memory: 20000Mi
requests:
cpu: '80'
memory: 20000Mi
embedding-deploy:
replica: 1
llm-dependency-deploy:
replica: 8
resources:
limits:
habana.ai/gaudi: '1'
requests:
habana.ai/gaudi: '1'
llm-deploy:
replica: 1
retriever-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 2500Mi
requests:
cpu: '8'
memory: 2500Mi
aise-gaudi-03:
chatqna-backend-server-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 4000Mi
requests:
cpu: '8'
memory: 4000Mi
dataprep-deploy:
replica: 1
embedding-dependency-deploy:
replica: 1
resources:
limits:
cpu: '80'
memory: 20000Mi
requests:
cpu: '80'
memory: 20000Mi
embedding-deploy:
replica: 1
llm-dependency-deploy:
replica: 8
resources:
limits:
habana.ai/gaudi: '1'
requests:
habana.ai/gaudi: '1'
llm-deploy:
replica: 1
retriever-deploy:
replica: 1
resources:
limits:
cpu: '8'
memory: 2500Mi
requests:
cpu: '8'
memory: 2500Mi
vector-db:
replica: 1
```

View File

@@ -0,0 +1,55 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
test_suite_config: # Overall configuration settings for the test suite
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
concurrent_level: 5 # The concurrency level, adjustable based on requirements
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
random_prompt: false # Use random prompts if true, fixed prompts if false
run_time: 60m # The max total run time for the test suite
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "${TEST_OUTPUT_DIR}" # The directory to store the test output
test_cases:
chatqna:
embedding:
run_test: false
service_name: "embedding-svc" # Replace with your service name
embedserve:
run_test: false
service_name: "embedding-dependency-svc" # Replace with your service name
retriever:
run_test: false
service_name: "retriever-svc" # Replace with your service name
parameters:
search_type: "similarity"
k: 4
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
reranking:
run_test: false
service_name: "reranking-svc" # Replace with your service name
parameters:
top_n: 1
rerankserve:
run_test: false
service_name: "reranking-dependency-svc" # Replace with your service name
llm:
run_test: false
service_name: "llm-svc" # Replace with your service name
parameters:
max_new_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
streaming: true
llmserve:
run_test: false
service_name: "llm-dependency-svc" # Replace with your service name
e2e:
run_test: true
service_name: "chatqna-backend-server-svc" # Replace with your service name

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/chatqna:latest
image: opea/chatqna:v0.9
imagePullPolicy: IfNotPresent
name: chatqna-backend-server-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/dataprep-redis:latest
image: opea/dataprep-redis:v0.9
imagePullPolicy: IfNotPresent
name: dataprep-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/embedding-tei:latest
image: opea/embedding-tei:v0.9
imagePullPolicy: IfNotPresent
name: embedding-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/llm-tgi:latest
image: opea/llm-tgi:v0.9
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null

View File

@@ -31,7 +31,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/tei-gaudi:latest
image: opea/tei-gaudi:v0.9
name: reranking-dependency-deploy
args:
- --model-id

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/reranking-tei:latest
image: opea/reranking-tei:v0.9
imagePullPolicy: IfNotPresent
name: reranking-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/retriever-redis:latest
image: opea/retriever-redis:v0.9
imagePullPolicy: IfNotPresent
name: retriever-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/chatqna:latest
image: opea/chatqna:v0.9
imagePullPolicy: IfNotPresent
name: chatqna-backend-server-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/dataprep-redis:latest
image: opea/dataprep-redis:v0.9
imagePullPolicy: IfNotPresent
name: dataprep-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/embedding-tei:latest
image: opea/embedding-tei:v0.9
imagePullPolicy: IfNotPresent
name: embedding-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/llm-tgi:latest
image: opea/llm-tgi:v0.9
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null

View File

@@ -31,7 +31,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/tei-gaudi:latest
image: opea/tei-gaudi:v0.9
name: reranking-dependency-deploy
args:
- --model-id

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/reranking-tei:latest
image: opea/reranking-tei:v0.9
imagePullPolicy: IfNotPresent
name: reranking-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/retriever-redis:latest
image: opea/retriever-redis:v0.9
imagePullPolicy: IfNotPresent
name: retriever-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/chatqna:latest
image: opea/chatqna:v0.9
imagePullPolicy: IfNotPresent
name: chatqna-backend-server-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/dataprep-redis:latest
image: opea/dataprep-redis:v0.9
imagePullPolicy: IfNotPresent
name: dataprep-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/embedding-tei:latest
image: opea/embedding-tei:v0.9
imagePullPolicy: IfNotPresent
name: embedding-deploy
args: null

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/llm-tgi:latest
image: opea/llm-tgi:v0.9
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null

View File

@@ -31,7 +31,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/tei-gaudi:latest
image: opea/tei-gaudi:v0.9
name: reranking-dependency-deploy
args:
- --model-id

View File

@@ -32,7 +32,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/reranking-tei:latest
image: opea/reranking-tei:v0.9
imagePullPolicy: IfNotPresent
name: reranking-deploy
args: null

View File

@@ -40,7 +40,7 @@ spec:
configMapKeyRef:
name: qna-config
key: INDEX_NAME
image: opea/retriever-redis:latest
image: opea/retriever-redis:v0.9
imagePullPolicy: IfNotPresent
name: retriever-deploy
args: null

View File

@@ -160,7 +160,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
```bash
cd GenAIExamples/ChatQnA/docker/aipc/
docker compose up -d
TAG=v0.9 docker compose up -d
# let ollama service runs
# e.g. ollama run llama3

View File

@@ -211,26 +211,26 @@ cd GenAIExamples/ChatQnA/docker/gaudi/
If use tgi for llm backend.
```bash
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
If use vllm for llm backend.
```bash
docker compose -f compose_vllm.yaml up -d
TAG=v0.9 docker compose -f compose_vllm.yaml up -d
```
If use vllm-on-ray for llm backend.
```bash
docker compose -f compose_vllm_ray.yaml up -d
TAG=v0.9 docker compose -f compose_vllm_ray.yaml up -d
```
If you want to enable guardrails microservice in the pipeline, please follow the below command instead:
```bash
cd GenAIExamples/ChatQnA/docker/gaudi/
docker compose -f compose_guardrails.yaml up -d
TAG=v0.9 docker compose -f compose_guardrails.yaml up -d
```
> **_NOTE:_** Users need at least two Gaudi cards to run the ChatQnA successfully.

View File

@@ -17,7 +17,7 @@ start the docker containers
```
cd ./GenAIExamples/ChatQnA/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
Check the start up log by `docker compose -f ./docker/gaudi/compose.yaml logs`.
@@ -149,7 +149,7 @@ Set the LLM_MODEL_ID then restart the containers.
Also you can check overall logs with the following command, where the compose.yaml is the mega service docker-compose configuration file.
```
docker compose -f ./docker-composer/gaudi/compose.yaml logs
TAG=v0.9 docker compose -f ./docker-composer/gaudi/compose.yaml logs
```
## 4. Check each micro service used by the Mega Service

View File

@@ -121,7 +121,7 @@ Note: Please replace with `host_ip` with you external IP address, do **NOT** use
```bash
cd GenAIExamples/ChatQnA/docker/gpu/
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate MicroServices and MegaService

View File

@@ -226,13 +226,13 @@ cd GenAIExamples/ChatQnA/docker/xeon/
If use TGI backend.
```bash
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
If use vLLM backend.
```bash
docker compose -f compose_vllm.yaml up -d
TAG=v0.9 docker compose -f compose_vllm.yaml up -d
```
### Validate Microservices

View File

@@ -205,7 +205,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
```bash
cd GenAIExamples/ChatQnA/docker/xeon/
docker compose -f compose_qdrant.yaml up -d
TAG=v0.9 docker compose -f compose_qdrant.yaml up -d
```
### Validate Microservices

View File

@@ -72,6 +72,7 @@ services:
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-retriever-service"
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2

View File

@@ -16,18 +16,18 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
- redis-vector-db: redis/redis-stack:7.2.0-v9
- tei_embedding_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- embedding: opea/embedding-tei:latest
- retriever: opea/retriever-redis:latest
- embedding: opea/embedding-tei:v0.9
- retriever: opea/retriever-redis:v0.9
- tei_xeon_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- reranking: opea/reranking-tei:latest
- reranking: opea/reranking-tei:v0.9
- tgi-service: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
- llm: opea/llm-tgi:latest
- chaqna-xeon-backend-server: opea/chatqna:latest
- llm: opea/llm-tgi:v0.9
- chaqna-xeon-backend-server: opea/chatqna:v0.9
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tei-embedding-service: opea/tei-gaudi:latest
- tei-embedding-service: opea/tei-gaudi:v0.9
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
> [NOTE]

View File

@@ -501,7 +501,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
image: "opea/dataprep-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: data-prep
@@ -579,7 +579,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/embedding-tei:latest"
image: "opea/embedding-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: embedding-usvc
@@ -657,7 +657,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -807,7 +807,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/reranking-tei:latest"
image: "opea/reranking-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: reranking-usvc
@@ -885,7 +885,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
image: "opea/retriever-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: retriever-usvc
@@ -1212,7 +1212,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
image: "opea/chatqna:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -500,7 +500,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
image: "opea/dataprep-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: data-prep
@@ -578,7 +578,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/embedding-tei:latest"
image: "opea/embedding-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: embedding-usvc
@@ -656,7 +656,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -806,7 +806,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/reranking-tei:latest"
image: "opea/reranking-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: reranking-usvc
@@ -884,7 +884,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
image: "opea/retriever-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: retriever-usvc
@@ -1209,7 +1209,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
image: "opea/chatqna:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
```bash
cd GenAIExamples/CodeGen/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -84,7 +84,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/CodeGen/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.

View File

@@ -103,7 +103,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"
```bash
cd GenAIExamples/CodeGen/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate the MicroServices and MegaService

View File

@@ -106,7 +106,7 @@ Note: Please replace the `host_ip` with you external IP address, do not use `loc
```bash
cd GenAIExamples/CodeGen/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate the MicroServices and MegaService

View File

@@ -6,7 +6,8 @@
> You can also customize the "MODEL_ID" if needed.
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGEn workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGen workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
> Alternatively, you can change the `codegen.yaml` to use a different type of volume, such as a persistent volume claim.
## Deploy On Xeon
@@ -30,10 +31,13 @@ kubectl apply -f codegen.yaml
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGEn service for access.
Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGen service for access.
Open another terminal and run the following command to verify the service if working:
> Note that it may take a couple of minutes for the service to be ready. If the `curl` command below fails, you
> can check the logs of the codegen-tgi pod to see its status or check for errors.
```
kubectl get pods
curl http://localhost:7778/v1/codegen -H "Content-Type: application/json" -d '{

View File

@@ -170,7 +170,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -271,6 +271,8 @@ spec:
resources:
limits:
habana.ai/gaudi: 1
memory: 64Gi
hugepages-2Mi: 500Mi
volumes:
- name: model-volume
hostPath:
@@ -324,7 +326,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/codegen:latest"
image: "opea/codegen:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -169,7 +169,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -322,7 +322,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/codegen:latest"
image: "opea/codegen:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -179,7 +179,7 @@ spec:
- name: no_proxy
value:
securityContext: {}
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -230,7 +230,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/codegen:latest"
image: "opea/codegen:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: codegen
@@ -273,7 +273,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/codegen-react-ui:latest"
image: "opea/codegen-react-ui:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: react-ui

View File

@@ -57,7 +57,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
```bash
cd GenAIExamples/CodeTrans/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -70,7 +70,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/CodeTrans/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.

View File

@@ -62,7 +62,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
```bash
cd GenAIExamples/CodeTrans/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -70,7 +70,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
```bash
cd GenAIExamples/CodeTrans/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -170,7 +170,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -324,7 +324,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/codetrans:latest"
image: "opea/codetrans:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -169,7 +169,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -322,7 +322,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/codetrans:latest"
image: "opea/codetrans:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -59,7 +59,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8000/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export llm_hardware='xeon' #xeon, xpu, gaudi
cd GenAIExamples/DocIndexRetriever/docker/${llm_hardware}/
docker compose -f docker-compose.yaml up -d
TAG=v0.9 docker compose -f docker-compose.yaml up -d
```
### 3. Validation

View File

@@ -58,7 +58,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
```bash
cd GenAIExamples/DocSum/docker/gaudi/
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/DocSum/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.

View File

@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
```bash
cd GenAIExamples/DocSum/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -60,6 +60,8 @@ Build the frontend Docker image via below command:
cd GenAIExamples/DocSum/docker/ui/
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT -f ./docker/Dockerfile.react .
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
Then run the command `docker images`, you will have the following Docker Images:
@@ -93,7 +95,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
```bash
cd GenAIExamples/DocSum/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -170,7 +170,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-docsum-tgi:latest"
image: "opea/llm-docsum-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -324,7 +324,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/docsum:latest"
image: "opea/docsum:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -169,7 +169,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-docsum-tgi:latest"
image: "opea/llm-docsum-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -322,7 +322,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/docsum:latest"
image: "opea/docsum:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -180,7 +180,7 @@ spec:
value:
securityContext: {}
image: "opea/llm-docsum-tgi:latest"
image: "opea/llm-docsum-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -231,7 +231,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/docsum:latest"
image: "opea/docsum:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: docsum
@@ -274,7 +274,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/docsum-react-ui:latest"
image: "opea/docsum-react-ui:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: react-ui

View File

@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
```bash
cd GenAIExamples/FaqGen/docker/gaudi
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -85,7 +85,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo
```bash
cd GenAIExamples/FaqGen/docker/xeon
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -117,7 +117,7 @@ spec:
value: "http://faq-tgi-svc.default.svc.cluster.local:8010"
- name: HUGGINGFACEHUB_API_TOKEN
value: "insert-your-huggingface-token-here"
image: opea/llm-faqgen-tgi:latest
image: opea/llm-faqgen-tgi:v0.9
imagePullPolicy: IfNotPresent
args: null
ports:
@@ -166,7 +166,7 @@ spec:
value: faq-mega-server-svc
- name: MEGA_SERVICE_PORT
value: "7777"
image: opea/faqgen:latest
image: opea/faqgen:v0.9
imagePullPolicy: IfNotPresent
args: null
ports:

View File

@@ -24,7 +24,7 @@ spec:
env:
- name: DOC_BASE_URL
value: http://{insert_your_ip_here}:7779/v1/faqgen
image: opea/faqgen-ui:latest
image: opea/faqgen-ui:v0.9
imagePullPolicy: IfNotPresent
args: null
ports:

View File

@@ -96,7 +96,7 @@ spec:
value: "http://faq-tgi-cpu-svc.default.svc.cluster.local:8011"
- name: HUGGINGFACEHUB_API_TOKEN
value: "insert-your-huggingface-token-here"
image: opea/llm-faqgen-tgi:latest
image: opea/llm-faqgen-tgi:v0.9
imagePullPolicy: IfNotPresent
args: null
ports:
@@ -145,7 +145,7 @@ spec:
value: faq-mega-server-cpu-svc
- name: MEGA_SERVICE_PORT
value: "7777"
image: opea/faqgen:latest
image: opea/faqgen:v0.9
imagePullPolicy: IfNotPresent
args: null
ports:

View File

@@ -179,7 +179,7 @@ spec:
- name: no_proxy
value:
securityContext: {}
image: "opea/llm-faqgen-tgi:latest"
image: "opea/llm-faqgen-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -230,7 +230,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/faqgen:latest"
image: "opea/faqgen:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: faqgen
@@ -273,7 +273,7 @@ spec:
- name: no_proxy
value:
securityContext: null
image: "opea/faqgen-react-ui:latest"
image: "opea/faqgen-react-ui:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: react-ui

View File

@@ -195,7 +195,7 @@ cd GenAIExamples/ProductivitySuite/docker/xeon/
```
```bash
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
### Setup Keycloak

View File

@@ -65,7 +65,7 @@ spec:
- configMapRef:
name: chat-history-config
securityContext: null
image: "opea/chathistory-mongo-server:latest"
image: "opea/chathistory-mongo-server:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: chat-history

View File

@@ -499,7 +499,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
image: "opea/dataprep-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: data-prep
@@ -557,7 +557,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/embedding-tei:latest"
image: "opea/embedding-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: embedding-usvc
@@ -615,7 +615,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -753,7 +753,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/reranking-tei:latest"
image: "opea/reranking-tei:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: reranking-usvc
@@ -811,7 +811,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
image: "opea/retriever-redis:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: retriever-usvc
@@ -1069,7 +1069,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
image: "opea/chatqna:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -171,7 +171,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-tgi:latest"
image: "opea/llm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -301,7 +301,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/codegen:latest"
image: "opea/codegen:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -171,7 +171,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/llm-docsum-tgi:latest"
image: "opea/llm-docsum-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -301,7 +301,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/docsum:latest"
image: "opea/docsum:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -183,7 +183,7 @@ spec:
- configMapRef:
name: faqgen-llm-uservice-config
securityContext: {}
image: "opea/llm-faqgen-tgi:latest"
image: "opea/llm-faqgen-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: llm-uservice
@@ -234,7 +234,7 @@ spec:
- name: no_proxy
value: ""
securityContext: null
image: "opea/faqgen:latest"
image: "opea/faqgen:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: faqgen

View File

@@ -82,7 +82,7 @@ spec:
- name: APP_KEYCLOAK_SERVICE_ENDPOINT
value: ""
securityContext: null
image: "opea/productivity-suite-react-ui-server:latest"
image: "opea/productivity-suite-react-ui-server:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: react-ui

View File

@@ -65,7 +65,7 @@ spec:
- configMapRef:
name: prompt-registry-config
securityContext: null
image: "opea/promptregistry-mongo-server:latest"
image: "opea/promptregistry-mongo-server:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: prompt-registry

View File

@@ -69,7 +69,7 @@ If your version of `Habana Driver` < 1.16.0 (check with `hl-smi`), run the follo
```bash
cd GenAIExamples/SearchQnA/docker/gaudi/
docker compose up -d
TAG=v0.9 docker compose up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -82,7 +82,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/SearchQnA/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```
Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.

View File

@@ -109,7 +109,7 @@ export LLM_SERVICE_PORT=3007
```bash
cd GenAIExamples/SearchQnA/docker/gaudi/
docker compose up -d
TAG=v0.9 docker compose up -d
```
## 🚀 Test MicroServices

View File

@@ -88,7 +88,7 @@ export LLM_SERVICE_PORT=3007
```bash
cd GenAIExamples/SearchQnA/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```
## 🚀 Test MicroServices

View File

@@ -64,7 +64,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
### Start Microservice Docker Containers
```bash
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -72,7 +72,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
### Start Microservice Docker Containers
```bash
docker compose up -d
TAG=v0.9 docker compose up -d
```
### Validate Microservices

View File

@@ -63,7 +63,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).
```bash
cd GenAIExamples/VisualQnA/docker/gaudi/
docker compose up -d
TAG=v0.9 docker compose up -d
```
> Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -76,5 +76,5 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).
```bash
cd GenAIExamples/VisualQnA/docker/xeon/
docker compose up -d
TAG=v0.9 docker compose up -d
```

View File

@@ -85,7 +85,7 @@ cd GenAIExamples/VisualQnA/docker/gaudi/
```
```bash
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
> **_NOTE:_** Users need at least one Gaudi cards to run the VisualQnA successfully.

View File

@@ -124,7 +124,7 @@ cd GenAIExamples/VisualQnA/docker/xeon/
```
```bash
docker compose -f compose.yaml up -d
TAG=v0.9 docker compose -f compose.yaml up -d
```
### Validate Microservices

View File

@@ -165,7 +165,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/lvm-tgi:latest"
image: "opea/lvm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: lvm-uservice
@@ -215,7 +215,7 @@ spec:
name: visualqna-tgi-config
securityContext:
{}
image: "opea/llava-tgi:latest"
image: "opea/llava-tgi:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
@@ -282,7 +282,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/visualqna:latest"
image: "opea/visualqna:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp

View File

@@ -166,7 +166,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/lvm-tgi:latest"
image: "opea/lvm-tgi:v0.9"
imagePullPolicy: IfNotPresent
ports:
- name: lvm-uservice
@@ -282,7 +282,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/visualqna:latest"
image: "opea/visualqna:v0.9"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp