[Doc] Update ChatQnA flow chart (#542 )

* Update flow chart Signed-off-by: Wang, Xigui <xigui.wang@intel.com> * Updated Flowchart Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> --------- Signed-off-by: Wang, Xigui <xigui.wang@intel.com> Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> (cherry picked from commit dad8eb4b82)
Add benchmark README for ChatQnA (#662 )
2024-08-27 11:07:03 +08:00 · 2024-08-27 11:06:36 +08:00 · 2024-08-25 16:28:59 +00:00 · 2024-08-23 22:11:32 +08:00 · 2024-08-23 22:11:31 +08:00
82 changed files with 842 additions and 149 deletions
--- a/AudioQnA/docker/gaudi/README.md
+++ b/AudioQnA/docker/gaudi/README.md
@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007

 ```bash
 cd GenAIExamples/AudioQnA/docker/gaudi/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ## 🚀 Test MicroServices
--- a/AudioQnA/docker/xeon/README.md
+++ b/AudioQnA/docker/xeon/README.md
@@ -81,7 +81,7 @@ export LLM_SERVICE_PORT=3007

 ```bash
 cd GenAIExamples/AudioQnA/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ## 🚀 Test MicroServices
--- a/AudioQnA/kubernetes/README.md
+++ b/AudioQnA/kubernetes/README.md
@@ -15,19 +15,19 @@ The AudioQnA application is defined as a Custom Resource (CR) file that the abov
 The AudioQnA uses the below prebuilt images if you choose a Xeon deployment

 - tgi-service: ghcr.io/huggingface/text-generation-inference:1.4
- llm: opea/llm-tgi:latest
- asr: opea/asr:latest
- whisper: opea/whisper:latest
- tts: opea/tts:latest
- speecht5: opea/speecht5:latest
+- llm: opea/llm-tgi:v0.9
+- asr: opea/asr:v0.9
+- whisper: opea/whisper:v0.9
+- tts: opea/tts:v0.9
+- speecht5: opea/speecht5:v0.9


 Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
 For Gaudi:

 - tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest
+- whisper-gaudi: opea/whisper-gaudi:v0.9
+- speecht5-gaudi: opea/speecht5-gaudi:v0.9

 > [NOTE]  
 > Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.
--- a/AudioQnA/kubernetes/manifests/gaudi/audioqna.yaml
+++ b/AudioQnA/kubernetes/manifests/gaudi/audioqna.yaml
@@ -50,7 +50,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/asr:latest
+        image: opea/asr:v0.9
        imagePullPolicy: IfNotPresent
        name: asr-deploy
        args: null
@@ -101,7 +101,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/whisper-gaudi:latest
+        image: opea/whisper-gaudi:v0.9
        imagePullPolicy: IfNotPresent
        name: whisper-deploy
        args: null
@@ -164,7 +164,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/tts:latest
+        image: opea/tts:v0.9
        imagePullPolicy: IfNotPresent
        name: tts-deploy
        args: null
@@ -215,7 +215,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/speecht5-gaudi:latest
+        image: opea/speecht5-gaudi:v0.9
        imagePullPolicy: IfNotPresent
        name: speecht5-deploy
        args: null
@@ -365,7 +365,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/llm-tgi:latest
+        image: opea/llm-tgi:v0.9
        imagePullPolicy: IfNotPresent
        name: llm-deploy
        args: null
@@ -416,7 +416,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/audioqna:latest
+        image: opea/audioqna:v0.9
        imagePullPolicy: IfNotPresent
        name: audioqna-backend-server-deploy
        args: null
--- a/AudioQnA/kubernetes/manifests/xeon/audioqna.yaml
+++ b/AudioQnA/kubernetes/manifests/xeon/audioqna.yaml
@@ -50,7 +50,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/asr:latest
+        image: opea/asr:v0.9
        imagePullPolicy: IfNotPresent
        name: asr-deploy
        args: null
@@ -101,7 +101,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/whisper:latest
+        image: opea/whisper:v0.9
        imagePullPolicy: IfNotPresent
        name: whisper-deploy
        args: null
@@ -152,7 +152,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/tts:latest
+        image: opea/tts:v0.9
        imagePullPolicy: IfNotPresent
        name: tts-deploy
        args: null
@@ -203,7 +203,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/speecht5:latest
+        image: opea/speecht5:v0.9
        imagePullPolicy: IfNotPresent
        name: speecht5-deploy
        args: null
@@ -321,7 +321,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/llm-tgi:latest
+        image: opea/llm-tgi:v0.9
        imagePullPolicy: IfNotPresent
        name: llm-deploy
        args: null
@@ -372,7 +372,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: opea/audioqna:latest
+        image: opea/audioqna:v0.9
        imagePullPolicy: IfNotPresent
        name: audioqna-backend-server-deploy
        args: null
--- a/ChatQnA/README.md
+++ b/ChatQnA/README.md
@@ -10,7 +10,90 @@ ChatQnA architecture shows below:

 ChatQnA is implemented on top of [GenAIComps](https://github.com/opea-project/GenAIComps), the ChatQnA Flow Chart shows below:

-![Flow Chart](./assets/img/chatqna_flow_chart.png)
+```mermaid
+---
+config:
+  flowchart:
+    nodeSpacing: 100
+    rankSpacing: 100
+    curve: linear
+  theme: base
+  themeVariables:
+    fontSize: 42px
+---
+flowchart LR
+    %% Colors %%
+    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef invisible fill:transparent,stroke:transparent;
+    style ChatQnA-MegaService stroke:#000000
+    %% Subgraphs %%
+    subgraph ChatQnA-MegaService["ChatQnA-MegaService"]
+        direction LR
+        EM([Embedding <br>]):::blue
+        RET([Retrieval <br>]):::blue
+        RER([Rerank <br>]):::blue
+        LLM([LLM <br>]):::blue
+    end
+    subgraph User Interface
+        direction TB
+        a([User Input Query]):::orchid
+        Ingest([Ingest data]):::orchid
+        UI([UI server<br>]):::orchid
+    end
+    subgraph ChatQnA GateWay
+        direction LR
+        invisible1[ ]:::invisible
+        GW([ChatQnA GateWay<br>]):::orange
+    end
+    subgraph .
+        X([OPEA Micsrservice]):::blue
+        Y{{Open Source Service}}
+        Z([OPEA Gateway]):::orange
+        Z1([UI]):::orchid
+    end
+
+    TEI_RER{{Reranking service<br>'TEI'<br>}}
+    TEI_EM{{Embedding service <br>'TEI LangChain'<br>}}
+    VDB{{Vector DB<br>'Redis'<br>}}
+    R_RET{{Retriever service <br>'LangChain Redis'<br>}}
+    DP([Data Preparation<br>'LangChain Redis'<br>]):::blue
+    LLM_gen{{LLM Service <br>'TGI'<br>}}
+
+    %% Data Preparation flow
+    %% Ingest data flow
+    direction LR
+    Ingest[Ingest data] -->|a| UI
+    UI -->|b| DP
+    DP <-.->|c| TEI_EM
+
+    %% Questions interaction
+    direction LR
+    a[User Input Query] -->|1| UI
+    UI -->|2| GW
+    GW <==>|3| ChatQnA-MegaService
+    EM ==>|4| RET
+    RET ==>|5| RER
+    RER ==>|6| LLM
+
+
+    %% Embedding service flow
+    direction TB
+    EM <-.->|3'| TEI_EM
+    RET <-.->|4'| R_RET
+    RER <-.->|5'| TEI_RER
+    LLM <-.->|6'| LLM_gen
+
+    direction TB
+    %% Vector DB interaction
+    R_RET <-.->|d|VDB
+    DP <-.->|d|VDB
+
+
+
+
+```

 This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.

@@ -78,7 +161,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).

 ```bash
 cd GenAIExamples/ChatQnA/docker/gaudi/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -91,7 +174,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/ChatQnA/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
@@ -100,7 +183,7 @@ Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on buil

 ```bash
 cd GenAIExamples/ChatQnA/docker/gpu/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [NVIDIA GPU Guide](./docker/gpu/README.md) for more instructions on building docker images from source.
--- a/ChatQnA/benchmark/README.md
+++ b/ChatQnA/benchmark/README.md
@@ -0,0 +1,546 @@
+# ChatQnA Benchmarking
+
+This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
+
+By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
+
+# Purpose
+
+We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
+
+- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
+- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
+- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
+
+# Metrics
+
+The benchmark will report the below metrics, including:
+
+- Number of Concurrent Requests
+- End-to-End Latency: P50, P90, P99 (in milliseconds)
+- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
+- Average Next Token Latency (in milliseconds)
+- Average Token Latency (in milliseconds)
+- Requests Per Second (RPS)
+- Output Tokens Per Second
+- Input Tokens Per Second
+
+Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
+
+# Getting Started
+
+## Prerequisites
+
+- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
+
+- Every node has direct internet access
+- Set up kubectl on the master node with access to the Kubernetes cluster.
+- Install Python 3.8+ on the master node for running the stress tool.
+- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
+
+## Kubernetes Cluster Example
+
+```bash
+$ kubectl get nodes
+NAME                STATUS   ROLES           AGE   VERSION
+k8s-master          Ready    control-plane   35d   v1.29.6
+k8s-work1           Ready    <none>          35d   v1.29.5
+k8s-work2           Ready    <none>          35d   v1.29.6
+k8s-work3           Ready    <none>          35d   v1.29.6
+```
+
+## Manifest preparation
+
+We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
+
+```bash
+# on k8s-master node
+git clone https://github.com/opea-project/GenAIExamples.git
+cd GenAIExamples/ChatQnA/benchmark
+
+# replace the image tag from latest to v0.9 since we want to test with v0.9 release
+IMAGE_TAG=v0.9
+find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
+
+# set the huggingface token
+HUGGINGFACE_TOKEN=<your token>
+find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
+
+# set models
+LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
+RERANK_MODEL_ID=BAAI/bge-reranker-base
+find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
+find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
+find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
+```
+
+## Benchmark tool preparation
+
+The test uses the [benchmark tool](https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark) to do performance test. We need to set up benchmark tool at the master node of Kubernetes which is k8s-master.
+
+```bash
+# on k8s-master node
+git clone https://github.com/opea-project/GenAIEval.git
+cd GenAIEval
+python3 -m venv stress_venv
+source stress_venv/bin/activate
+pip install -r requirements.txt
+```
+
+## Test Configurations
+
+Workload configuration:
+
+| Key      | Value   |
+| -------- | ------- |
+| Workload | ChatQnA |
+| Tag      | V0.9    |
+
+Models configuration
+| Key | Value |
+| ---------- | ------------------ |
+| Embedding | BAAI/bge-base-en-v1.5 |
+| Reranking | BAAI/bge-reranker-base |
+| Inference | Intel/neural-chat-7b-v3-3 |
+
+Benchmark parameters
+| Key | Value |
+| ---------- | ------------------ |
+| LLM input tokens | 1024 |
+| LLM output tokens | 128 |
+
+Number of test requests for different scheduled node number:
+| Node count | Concurrency | Query number |
+| ----- | -------- | -------- |
+| 1 | 128 | 640 |
+| 2 | 256 | 1280 |
+| 4 | 512 | 2560 |
+
+More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
+
+## Test Steps
+
+### Single node test
+
+#### 1. Preparation
+
+We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
+
+```bash
+kubectl label nodes k8s-worker1 node-type=chatqna-opea
+```
+
+#### 2. Install ChatQnA
+
+Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s.
+
+```bash
+# on k8s-master node
+cd GenAIExamples/ChatQnA/benchmark/single_gaudi
+kubectl apply -f .
+```
+
+#### 3. Run tests
+
+We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
+
+```bash
+export USER_QUERIES="[4, 8, 16, 640]"
+export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
+envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
+```
+
+And then run the benchmark tool by:
+
+```bash
+cd GenAIEval/evals/benchmark
+python benchmark.py
+```
+
+#### 4. Data collection
+
+All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
+
+#### 5. Clean up
+
+```bash
+# on k8s-master node
+cd GenAIExamples/ChatQnA/benchmark/single_gaudi
+kubectl delete -f .
+kubectl label nodes k8s-worker1 node-type-
+```
+
+### Two node test
+
+#### 1. Preparation
+
+We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
+
+```bash
+kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
+```
+
+#### 2. Install ChatQnA
+
+Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s.
+
+```bash
+# on k8s-master node
+cd GenAIExamples/ChatQnA/benchmark/two_gaudi
+kubectl apply -f .
+```
+
+#### 3. Run tests
+
+We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
+
+```bash
+export USER_QUERIES="[4, 8, 16, 1280]"
+export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
+envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
+```
+
+And then run the benchmark tool by:
+
+```bash
+cd GenAIEval/evals/benchmark
+python benchmark.py
+```
+
+#### 4. Data collection
+
+All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
+
+#### 5. Clean up
+
+```bash
+# on k8s-master node
+kubectl delete -f .
+kubectl label nodes k8s-worker1 k8s-worker2 node-type-
+```
+
+### Four node test
+
+#### 1. Preparation
+
+We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
+
+```bash
+kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
+```
+
+#### 2. Install ChatQnA
+
+Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s.
+
+```bash
+# on k8s-master node
+cd GenAIExamples/ChatQnA/benchmark/four_gaudi
+kubectl apply -f .
+```
+
+#### 3. Run tests
+
+We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
+
+```bash
+export USER_QUERIES="[4, 8, 16, 2560]"
+export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
+envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
+```
+
+And then run the benchmark tool by:
+
+```bash
+cd GenAIEval/evals/benchmark
+python benchmark.py
+```
+
+#### 4. Data collection
+
+All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
+
+#### 5. Clean up
+
+```bash
+# on k8s-master node
+cd GenAIExamples/ChatQnA/benchmark/single_gaudi
+kubectl delete -f .
+kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
+```
+
+### Example Result
+
+The following is a summary of the test result, with files saved at `TEST_OUTPUT_DIR`.
+
+```statistics
+Concurrency       : 512
+Max request count : 2560
+Http timeout      : 60000
+
+Benchmark target  : chatqnafixed
+
+=================Total statistics=====================
+Succeed Response:  2560 (Total 2560, 100.0% Success), Duration: 26.44s, Input Tokens: 61440, Output Tokens: 255985, RPS: 96.82, Input Tokens per Second: 2323.71, Output Tokens per Second: 9681.57
+End to End latency(ms),    P50: 3576.34,   P90: 4242.19,   P99: 5252.23,   Avg: 3581.55
+First token latency(ms),   P50: 726.64,   P90: 1128.27,   P99: 1796.09,   Avg: 769.58
+Average Next token latency(ms): 28.41
+Average token latency(ms)     : 35.85
+======================================================
+```
+
+```test spec
+benchmarkresult:
+  Average_Next_token_latency: '28.41'
+  Average_token_latency: '35.85'
+  Duration: '26.44'
+  End_to_End_latency_Avg: '3581.55'
+  End_to_End_latency_P50: '3576.34'
+  End_to_End_latency_P90: '4242.19'
+  End_to_End_latency_P99: '5252.23'
+  First_token_latency_Avg: '769.58'
+  First_token_latency_P50: '726.64'
+  First_token_latency_P90: '1128.27'
+  First_token_latency_P99: '1796.09'
+  Input_Tokens: '61440'
+  Input_Tokens_per_Second: '2323.71'
+  Onput_Tokens: '255985'
+  Output_Tokens_per_Second: '9681.57'
+  RPS: '96.82'
+  Succeed_Response: '2560'
+  locust_P50: '160'
+  locust_P99: '810'
+  locust_num_failures: '0'
+  locust_num_requests: '2560'
+benchmarkspec:
+  bench-target: chatqnafixed
+  endtest_time: '2024-08-25T14:19:25.955973'
+  host: http://10.110.105.197:8888
+  llm-model: Intel/neural-chat-7b-v3-3
+  locustfile: /home/sdp/lvl/GenAIEval/evals/benchmark/stresscli/locust/aistress.py
+  max_requests: 2560
+  namespace: default
+  processes: 2
+  run_name: benchmark
+  runtime: 60m
+  starttest_time: '2024-08-25T14:18:50.366514'
+  stop_timeout: 120
+  tool: locust
+  users: 512
+hardwarespec:
+  aise-gaudi-00:
+    architecture: amd64
+    containerRuntimeVersion: containerd://1.7.18
+    cpu: '160'
+    habana.ai/gaudi: '8'
+    kernelVersion: 5.15.0-92-generic
+    kubeProxyVersion: v1.29.7
+    kubeletVersion: v1.29.7
+    memory: 1056375272Ki
+    operatingSystem: linux
+    osImage: Ubuntu 22.04.3 LTS
+  aise-gaudi-01:
+    architecture: amd64
+    containerRuntimeVersion: containerd://1.7.18
+    cpu: '160'
+    habana.ai/gaudi: '8'
+    kernelVersion: 5.15.0-92-generic
+    kubeProxyVersion: v1.29.7
+    kubeletVersion: v1.29.7
+    memory: 1056375256Ki
+    operatingSystem: linux
+    osImage: Ubuntu 22.04.3 LTS
+  aise-gaudi-02:
+    architecture: amd64
+    containerRuntimeVersion: containerd://1.7.18
+    cpu: '160'
+    habana.ai/gaudi: '8'
+    kernelVersion: 5.15.0-92-generic
+    kubeProxyVersion: v1.29.7
+    kubeletVersion: v1.29.7
+    memory: 1056375260Ki
+    operatingSystem: linux
+    osImage: Ubuntu 22.04.3 LTS
+  aise-gaudi-03:
+    architecture: amd64
+    containerRuntimeVersion: containerd://1.6.8
+    cpu: '160'
+    habana.ai/gaudi: '8'
+    kernelVersion: 5.15.0-112-generic
+    kubeProxyVersion: v1.29.7
+    kubeletVersion: v1.29.7
+    memory: 1056374404Ki
+    operatingSystem: linux
+    osImage: Ubuntu 22.04.4 LTS
+workloadspec:
+  aise-gaudi-00:
+    chatqna-backend-server-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 4000Mi
+        requests:
+          cpu: '8'
+          memory: 4000Mi
+    embedding-dependency-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '80'
+          memory: 20000Mi
+        requests:
+          cpu: '80'
+          memory: 20000Mi
+    embedding-deploy:
+      replica: 1
+    llm-dependency-deploy:
+      replica: 8
+      resources:
+        limits:
+          habana.ai/gaudi: '1'
+        requests:
+          habana.ai/gaudi: '1'
+    llm-deploy:
+      replica: 1
+    retriever-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 2500Mi
+        requests:
+          cpu: '8'
+          memory: 2500Mi
+  aise-gaudi-01:
+    chatqna-backend-server-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 4000Mi
+        requests:
+          cpu: '8'
+          memory: 4000Mi
+    embedding-dependency-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '80'
+          memory: 20000Mi
+        requests:
+          cpu: '80'
+          memory: 20000Mi
+    embedding-deploy:
+      replica: 1
+    llm-dependency-deploy:
+      replica: 8
+      resources:
+        limits:
+          habana.ai/gaudi: '1'
+        requests:
+          habana.ai/gaudi: '1'
+    llm-deploy:
+      replica: 1
+    prometheus-operator:
+      replica: 1
+      resources:
+        limits:
+          cpu: 200m
+          memory: 200Mi
+        requests:
+          cpu: 100m
+          memory: 100Mi
+    retriever-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 2500Mi
+        requests:
+          cpu: '8'
+          memory: 2500Mi
+  aise-gaudi-02:
+    chatqna-backend-server-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 4000Mi
+        requests:
+          cpu: '8'
+          memory: 4000Mi
+    embedding-dependency-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '80'
+          memory: 20000Mi
+        requests:
+          cpu: '80'
+          memory: 20000Mi
+    embedding-deploy:
+      replica: 1
+    llm-dependency-deploy:
+      replica: 8
+      resources:
+        limits:
+          habana.ai/gaudi: '1'
+        requests:
+          habana.ai/gaudi: '1'
+    llm-deploy:
+      replica: 1
+    retriever-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 2500Mi
+        requests:
+          cpu: '8'
+          memory: 2500Mi
+  aise-gaudi-03:
+    chatqna-backend-server-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 4000Mi
+        requests:
+          cpu: '8'
+          memory: 4000Mi
+    dataprep-deploy:
+      replica: 1
+    embedding-dependency-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '80'
+          memory: 20000Mi
+        requests:
+          cpu: '80'
+          memory: 20000Mi
+    embedding-deploy:
+      replica: 1
+    llm-dependency-deploy:
+      replica: 8
+      resources:
+        limits:
+          habana.ai/gaudi: '1'
+        requests:
+          habana.ai/gaudi: '1'
+    llm-deploy:
+      replica: 1
+    retriever-deploy:
+      replica: 1
+      resources:
+        limits:
+          cpu: '8'
+          memory: 2500Mi
+        requests:
+          cpu: '8'
+          memory: 2500Mi
+    vector-db:
+      replica: 1
+```
--- a/ChatQnA/benchmark/benchmark.yaml
+++ b/ChatQnA/benchmark/benchmark.yaml
@@ -0,0 +1,55 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+test_suite_config: # Overall configuration settings for the test suite
+  examples: ["chatqna"]  # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
+  concurrent_level: 5  # The concurrency level, adjustable based on requirements
+  user_queries: ${USER_QUERIES}  # Number of test requests at each concurrency level
+  random_prompt: false  # Use random prompts if true, fixed prompts if false
+  run_time: 60m  # The max total run time for the test suite
+  collect_service_metric: false  # Collect service metrics if true, do not collect service metrics if false
+  data_visualization: false # Generate data visualization if true, do not generate data visualization if false
+  llm_model: "Intel/neural-chat-7b-v3-3"  # The LLM model used for the test
+  test_output_dir: "${TEST_OUTPUT_DIR}"  # The directory to store the test output
+
+test_cases:
+  chatqna:
+    embedding:
+      run_test: false
+      service_name: "embedding-svc"  # Replace with your service name
+    embedserve:
+      run_test: false
+      service_name: "embedding-dependency-svc"  # Replace with your service name
+    retriever:
+      run_test: false
+      service_name: "retriever-svc"  # Replace with your service name
+      parameters:
+        search_type: "similarity"
+        k: 4
+        fetch_k: 20
+        lambda_mult: 0.5
+        score_threshold: 0.2
+    reranking:
+      run_test: false
+      service_name: "reranking-svc"  # Replace with your service name
+      parameters:
+        top_n: 1
+    rerankserve:
+      run_test: false
+      service_name: "reranking-dependency-svc"  # Replace with your service name
+    llm:
+      run_test: false
+      service_name: "llm-svc"  # Replace with your service name
+      parameters:
+        max_new_tokens: 128
+        temperature: 0.01
+        top_k: 10
+        top_p: 0.95
+        repetition_penalty: 1.03
+        streaming: true
+    llmserve:
+      run_test: false
+      service_name: "llm-dependency-svc"  # Replace with your service name
+    e2e:
+      run_test: true
+      service_name: "chatqna-backend-server-svc"  # Replace with your service name
--- a/ChatQnA/benchmark/four_gaudi/chatqna_mega_service_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/chatqna_mega_service_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/chatqna:latest
+        image: opea/chatqna:v0.9
        imagePullPolicy: IfNotPresent
        name: chatqna-backend-server-deploy
        args: null
--- a/ChatQnA/benchmark/four_gaudi/dataprep-microservice_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/dataprep-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/dataprep-redis:latest
+        image: opea/dataprep-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: dataprep-deploy
        args: null
--- a/ChatQnA/benchmark/four_gaudi/embedding-microservice_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/embedding-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/embedding-tei:latest
+        image: opea/embedding-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: embedding-deploy
        args: null
--- a/ChatQnA/benchmark/four_gaudi/llm-microservice_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/llm-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/llm-tgi:latest
+        image: opea/llm-tgi:v0.9
        imagePullPolicy: IfNotPresent
        name: llm-deploy
        args: null
--- a/ChatQnA/benchmark/four_gaudi/reranking-dependency_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/reranking-dependency_run.yaml
@@ -31,7 +31,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/tei-gaudi:latest
+        image: opea/tei-gaudi:v0.9
        name: reranking-dependency-deploy
        args:
        - --model-id
--- a/ChatQnA/benchmark/four_gaudi/reranking-microservice_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/reranking-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/reranking-tei:latest
+        image: opea/reranking-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: reranking-deploy
        args: null
--- a/ChatQnA/benchmark/four_gaudi/retrieval-microservice_run.yaml
+++ b/ChatQnA/benchmark/four_gaudi/retrieval-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/retriever-redis:latest
+        image: opea/retriever-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: retriever-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/chatqna_mega_service_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/chatqna_mega_service_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/chatqna:latest
+        image: opea/chatqna:v0.9
        imagePullPolicy: IfNotPresent
        name: chatqna-backend-server-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/dataprep-microservice_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/dataprep-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/dataprep-redis:latest
+        image: opea/dataprep-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: dataprep-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/embedding-microservice_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/embedding-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/embedding-tei:latest
+        image: opea/embedding-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: embedding-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/llm-microservice_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/llm-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/llm-tgi:latest
+        image: opea/llm-tgi:v0.9
        imagePullPolicy: IfNotPresent
        name: llm-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/reranking-dependency_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/reranking-dependency_run.yaml
@@ -31,7 +31,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/tei-gaudi:latest
+        image: opea/tei-gaudi:v0.9
        name: reranking-dependency-deploy
        args:
        - --model-id
--- a/ChatQnA/benchmark/single_gaudi/reranking-microservice_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/reranking-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/reranking-tei:latest
+        image: opea/reranking-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: reranking-deploy
        args: null
--- a/ChatQnA/benchmark/single_gaudi/retrieval-microservice_run.yaml
+++ b/ChatQnA/benchmark/single_gaudi/retrieval-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/retriever-redis:latest
+        image: opea/retriever-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: retriever-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/chatqna_mega_service_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/chatqna_mega_service_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/chatqna:latest
+        image: opea/chatqna:v0.9
        imagePullPolicy: IfNotPresent
        name: chatqna-backend-server-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/dataprep-microservice_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/dataprep-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/dataprep-redis:latest
+        image: opea/dataprep-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: dataprep-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/embedding-microservice_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/embedding-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/embedding-tei:latest
+        image: opea/embedding-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: embedding-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/llm-microservice_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/llm-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/llm-tgi:latest
+        image: opea/llm-tgi:v0.9
        imagePullPolicy: IfNotPresent
        name: llm-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/reranking-dependency_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/reranking-dependency_run.yaml
@@ -31,7 +31,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/tei-gaudi:latest
+        image: opea/tei-gaudi:v0.9
        name: reranking-dependency-deploy
        args:
        - --model-id
--- a/ChatQnA/benchmark/two_gaudi/reranking-microservice_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/reranking-microservice_run.yaml
@@ -32,7 +32,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: qna-config
-        image: opea/reranking-tei:latest
+        image: opea/reranking-tei:v0.9
        imagePullPolicy: IfNotPresent
        name: reranking-deploy
        args: null
--- a/ChatQnA/benchmark/two_gaudi/retrieval-microservice_run.yaml
+++ b/ChatQnA/benchmark/two_gaudi/retrieval-microservice_run.yaml
@@ -40,7 +40,7 @@ spec:
            configMapKeyRef:
              name: qna-config
              key: INDEX_NAME
-        image: opea/retriever-redis:latest
+        image: opea/retriever-redis:v0.9
        imagePullPolicy: IfNotPresent
        name: retriever-deploy
        args: null
--- a/ChatQnA/docker/aipc/README.md
+++ b/ChatQnA/docker/aipc/README.md
@@ -160,7 +160,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc

 ```bash
 cd GenAIExamples/ChatQnA/docker/aipc/
-docker compose up -d
+TAG=v0.9 docker compose up -d

 # let ollama service runs
 # e.g. ollama run llama3
--- a/ChatQnA/docker/gaudi/README.md
+++ b/ChatQnA/docker/gaudi/README.md
@@ -211,26 +211,26 @@ cd GenAIExamples/ChatQnA/docker/gaudi/
 If use tgi for llm backend.

 ```bash
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 If use vllm for llm backend.

 ```bash
-docker compose -f compose_vllm.yaml up -d
+TAG=v0.9 docker compose -f compose_vllm.yaml up -d
 ```

 If use vllm-on-ray for llm backend.

 ```bash
-docker compose -f compose_vllm_ray.yaml up -d
+TAG=v0.9 docker compose -f compose_vllm_ray.yaml up -d
 ```

 If you want to enable guardrails microservice in the pipeline, please follow the below command instead:

 ```bash
 cd GenAIExamples/ChatQnA/docker/gaudi/
-docker compose -f compose_guardrails.yaml up -d
+TAG=v0.9 docker compose -f compose_guardrails.yaml up -d
 ```

 > **_NOTE:_** Users need at least two Gaudi cards to run the ChatQnA successfully.
--- a/ChatQnA/docker/gaudi/how_to_validate_service.md
+++ b/ChatQnA/docker/gaudi/how_to_validate_service.md
@@ -17,7 +17,7 @@ start the docker containers

 ```
 cd ./GenAIExamples/ChatQnA/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Check the start up log by `docker compose -f ./docker/gaudi/compose.yaml logs`.
@@ -149,7 +149,7 @@ Set the LLM_MODEL_ID then restart the containers.
 Also you can check overall logs with the following command, where the compose.yaml is the mega service docker-compose configuration file.

 ```
-docker compose -f ./docker-composer/gaudi/compose.yaml logs
+TAG=v0.9 docker compose -f ./docker-composer/gaudi/compose.yaml logs
 ```

 ## 4. Check each micro service used by the Mega Service
--- a/ChatQnA/docker/gpu/README.md
+++ b/ChatQnA/docker/gpu/README.md
@@ -121,7 +121,7 @@ Note: Please replace with `host_ip` with you external IP address, do **NOT** use

 ```bash
 cd GenAIExamples/ChatQnA/docker/gpu/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate MicroServices and MegaService
--- a/ChatQnA/docker/xeon/README.md
+++ b/ChatQnA/docker/xeon/README.md
@@ -226,13 +226,13 @@ cd GenAIExamples/ChatQnA/docker/xeon/
 If use TGI backend.

 ```bash
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 If use vLLM backend.

 ```bash
-docker compose -f compose_vllm.yaml up -d
+TAG=v0.9 docker compose -f compose_vllm.yaml up -d
 ```

 ### Validate Microservices
--- a/ChatQnA/docker/xeon/README_qdrant.md
+++ b/ChatQnA/docker/xeon/README_qdrant.md
@@ -205,7 +205,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc

 ```bash
 cd GenAIExamples/ChatQnA/docker/xeon/
-docker compose -f compose_qdrant.yaml up -d
+TAG=v0.9 docker compose -f compose_qdrant.yaml up -d
 ```

 ### Validate Microservices
--- a/ChatQnA/docker/xeon/compose_vllm.yaml
+++ b/ChatQnA/docker/xeon/compose_vllm.yaml
@@ -72,6 +72,7 @@ services:
      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
      LANGCHAIN_PROJECT: "opea-retriever-service"
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
    restart: unless-stopped
  tei-reranking-service:
    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
--- a/ChatQnA/kubernetes/README.md
+++ b/ChatQnA/kubernetes/README.md
@@ -16,18 +16,18 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment

 - redis-vector-db: redis/redis-stack:7.2.0-v9
 - tei_embedding_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- embedding: opea/embedding-tei:latest
- retriever: opea/retriever-redis:latest
+- embedding: opea/embedding-tei:v0.9
+- retriever: opea/retriever-redis:v0.9
 - tei_xeon_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- reranking: opea/reranking-tei:latest
+- reranking: opea/reranking-tei:v0.9
 - tgi-service: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
- llm: opea/llm-tgi:latest
- chaqna-xeon-backend-server: opea/chatqna:latest
+- llm: opea/llm-tgi:v0.9
+- chaqna-xeon-backend-server: opea/chatqna:v0.9

 Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
 For Gaudi:

- tei-embedding-service: opea/tei-gaudi:latest
+- tei-embedding-service: opea/tei-gaudi:v0.9
 - tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1

 > [NOTE]  
--- a/ChatQnA/kubernetes/manifests/gaudi/chatqna.yaml
+++ b/ChatQnA/kubernetes/manifests/gaudi/chatqna.yaml
@@ -501,7 +501,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/dataprep-redis:latest"
+          image: "opea/dataprep-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: data-prep
@@ -579,7 +579,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/embedding-tei:latest"
+          image: "opea/embedding-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: embedding-usvc
@@ -657,7 +657,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -807,7 +807,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/reranking-tei:latest"
+          image: "opea/reranking-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: reranking-usvc
@@ -885,7 +885,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/retriever-redis:latest"
+          image: "opea/retriever-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: retriever-usvc
@@ -1212,7 +1212,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/chatqna:latest"
+          image: "opea/chatqna:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/ChatQnA/kubernetes/manifests/xeon/chatqna.yaml
+++ b/ChatQnA/kubernetes/manifests/xeon/chatqna.yaml
@@ -500,7 +500,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/dataprep-redis:latest"
+          image: "opea/dataprep-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: data-prep
@@ -578,7 +578,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/embedding-tei:latest"
+          image: "opea/embedding-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: embedding-usvc
@@ -656,7 +656,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -806,7 +806,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/reranking-tei:latest"
+          image: "opea/reranking-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: reranking-usvc
@@ -884,7 +884,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/retriever-redis:latest"
+          image: "opea/retriever-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: retriever-usvc
@@ -1209,7 +1209,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/chatqna:latest"
+          image: "opea/chatqna:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/CodeGen/README.md
+++ b/CodeGen/README.md
@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).

 ```bash
 cd GenAIExamples/CodeGen/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -84,7 +84,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/CodeGen/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
--- a/CodeGen/docker/gaudi/README.md
+++ b/CodeGen/docker/gaudi/README.md
@@ -103,7 +103,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"

 ```bash
 cd GenAIExamples/CodeGen/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate the MicroServices and MegaService
--- a/CodeGen/docker/xeon/README.md
+++ b/CodeGen/docker/xeon/README.md
@@ -106,7 +106,7 @@ Note: Please replace the `host_ip` with you external IP address, do not use `loc

 ```bash
 cd GenAIExamples/CodeGen/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate the MicroServices and MegaService
--- a/CodeGen/kubernetes/manifests/README.md
+++ b/CodeGen/kubernetes/manifests/README.md
@@ -6,7 +6,8 @@

 > You can also customize the "MODEL_ID" if needed.

-> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGEn workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
+> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the CodeGen workload is running. Otherwise, you need to modify the `codegen.yaml` file to change the `model-volume` to a directory that exists on the node.
+> Alternatively, you can change the `codegen.yaml` to use a different type of volume, such as a persistent volume claim.

 ## Deploy On Xeon

@@ -30,10 +31,13 @@ kubectl apply -f codegen.yaml

 To verify the installation, run the command `kubectl get pod` to make sure all pods are running.

-Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGEn service for access.
+Then run the command `kubectl port-forward svc/codegen 7778:7778` to expose the CodeGen service for access.

 Open another terminal and run the following command to verify the service if working:

+> Note that it may take a couple of minutes for the service to be ready. If the `curl` command below fails, you
+> can check the logs of the codegen-tgi pod to see its status or check for errors.
+
 ```
 kubectl get pods
 curl http://localhost:7778/v1/codegen -H "Content-Type: application/json" -d '{
--- a/CodeGen/kubernetes/manifests/gaudi/codegen.yaml
+++ b/CodeGen/kubernetes/manifests/gaudi/codegen.yaml
@@ -170,7 +170,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -271,6 +271,8 @@ spec:
          resources:
            limits:
              habana.ai/gaudi: 1
+              memory: 64Gi
+              hugepages-2Mi: 500Mi
      volumes:
        - name: model-volume
          hostPath:
@@ -324,7 +326,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/codegen:latest"
+          image: "opea/codegen:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/CodeGen/kubernetes/manifests/xeon/codegen.yaml
+++ b/CodeGen/kubernetes/manifests/xeon/codegen.yaml
@@ -169,7 +169,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -322,7 +322,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/codegen:latest"
+          image: "opea/codegen:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/CodeGen/kubernetes/manifests/xeon/ui/react-codegen.yaml
+++ b/CodeGen/kubernetes/manifests/xeon/ui/react-codegen.yaml
@@ -179,7 +179,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: {}
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -230,7 +230,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/codegen:latest"
+          image: "opea/codegen:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: codegen
@@ -273,7 +273,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/codegen-react-ui:latest"
+          image: "opea/codegen-react-ui:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: react-ui
--- a/CodeTrans/README.md
+++ b/CodeTrans/README.md
@@ -57,7 +57,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).

 ```bash
 cd GenAIExamples/CodeTrans/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -70,7 +70,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/CodeTrans/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
--- a/CodeTrans/docker/gaudi/README.md
+++ b/CodeTrans/docker/gaudi/README.md
@@ -62,7 +62,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"

 ```bash
 cd GenAIExamples/CodeTrans/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/CodeTrans/docker/xeon/README.md
+++ b/CodeTrans/docker/xeon/README.md
@@ -70,7 +70,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"

 ```bash
 cd GenAIExamples/CodeTrans/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/CodeTrans/kubernetes/manifests/gaudi/codetrans.yaml
+++ b/CodeTrans/kubernetes/manifests/gaudi/codetrans.yaml
@@ -170,7 +170,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -324,7 +324,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/codetrans:latest"
+          image: "opea/codetrans:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/CodeTrans/kubernetes/manifests/xeon/codetrans.yaml
+++ b/CodeTrans/kubernetes/manifests/xeon/codetrans.yaml
@@ -169,7 +169,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -322,7 +322,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/codetrans:latest"
+          image: "opea/codetrans:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/DocIndexRetriever/docker/README.md
+++ b/DocIndexRetriever/docker/README.md
@@ -59,7 +59,7 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8000/v1/retrievaltool"
 export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
 export llm_hardware='xeon' #xeon, xpu, gaudi
 cd GenAIExamples/DocIndexRetriever/docker/${llm_hardware}/
-docker compose -f docker-compose.yaml up -d
+TAG=v0.9 docker compose -f docker-compose.yaml up -d
 ```

 ### 3. Validation
--- a/DocSum/README.md
+++ b/DocSum/README.md
@@ -58,7 +58,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).

 ```bash
 cd GenAIExamples/DocSum/docker/gaudi/
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -71,7 +71,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/DocSum/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
--- a/DocSum/docker/gaudi/README.md
+++ b/DocSum/docker/gaudi/README.md
@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo

 ```bash
 cd GenAIExamples/DocSum/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/DocSum/docker/xeon/README.md
+++ b/DocSum/docker/xeon/README.md
@@ -60,6 +60,8 @@ Build the frontend Docker image via below command:
 cd GenAIExamples/DocSum/docker/ui/
 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
 docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT -f ./docker/Dockerfile.react .
+
+docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy  -f ./docker/Dockerfile.react .
 ```

 Then run the command `docker images`, you will have the following Docker Images:
@@ -93,7 +95,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo

 ```bash
 cd GenAIExamples/DocSum/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/DocSum/kubernetes/manifests/gaudi/docsum.yaml
+++ b/DocSum/kubernetes/manifests/gaudi/docsum.yaml
@@ -170,7 +170,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-docsum-tgi:latest"
+          image: "opea/llm-docsum-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -324,7 +324,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/docsum:latest"
+          image: "opea/docsum:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/DocSum/kubernetes/manifests/xeon/docsum.yaml
+++ b/DocSum/kubernetes/manifests/xeon/docsum.yaml
@@ -169,7 +169,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-docsum-tgi:latest"
+          image: "opea/llm-docsum-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -322,7 +322,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/docsum:latest"
+          image: "opea/docsum:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/DocSum/kubernetes/manifests/xeon/ui/react-docsum.yaml
+++ b/DocSum/kubernetes/manifests/xeon/ui/react-docsum.yaml
@@ -180,7 +180,7 @@ spec:
              value:

          securityContext: {}
-          image: "opea/llm-docsum-tgi:latest"
+          image: "opea/llm-docsum-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -231,7 +231,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/docsum:latest"
+          image: "opea/docsum:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: docsum
@@ -274,7 +274,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/docsum-react-ui:latest"
+          image: "opea/docsum-react-ui:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: react-ui
--- a/FaqGen/docker/gaudi/README.md
+++ b/FaqGen/docker/gaudi/README.md
@@ -86,7 +86,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo

 ```bash
 cd GenAIExamples/FaqGen/docker/gaudi
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/FaqGen/docker/xeon/README.md
+++ b/FaqGen/docker/xeon/README.md
@@ -85,7 +85,7 @@ Note: Please replace with `host_ip` with your external IP address, do not use lo

 ```bash
 cd GenAIExamples/FaqGen/docker/xeon
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/FaqGen/kubernetes/manifests/gaudi/faqgen.yaml
+++ b/FaqGen/kubernetes/manifests/gaudi/faqgen.yaml
@@ -117,7 +117,7 @@ spec:
              value: "http://faq-tgi-svc.default.svc.cluster.local:8010"
            - name: HUGGINGFACEHUB_API_TOKEN
              value: "insert-your-huggingface-token-here"
-          image: opea/llm-faqgen-tgi:latest
+          image: opea/llm-faqgen-tgi:v0.9
          imagePullPolicy: IfNotPresent
          args: null
          ports:
@@ -166,7 +166,7 @@ spec:
              value: faq-mega-server-svc
            - name: MEGA_SERVICE_PORT
              value: "7777"
-          image: opea/faqgen:latest
+          image: opea/faqgen:v0.9
          imagePullPolicy: IfNotPresent
          args: null
          ports:
--- a/FaqGen/kubernetes/manifests/ui.yaml
+++ b/FaqGen/kubernetes/manifests/ui.yaml
@@ -24,7 +24,7 @@ spec:
          env:
            - name: DOC_BASE_URL
              value: http://{insert_your_ip_here}:7779/v1/faqgen
-          image: opea/faqgen-ui:latest
+          image: opea/faqgen-ui:v0.9
          imagePullPolicy: IfNotPresent
          args: null
          ports:
--- a/FaqGen/kubernetes/manifests/xeon/faqgen.yaml
+++ b/FaqGen/kubernetes/manifests/xeon/faqgen.yaml
@@ -96,7 +96,7 @@ spec:
              value: "http://faq-tgi-cpu-svc.default.svc.cluster.local:8011"
            - name: HUGGINGFACEHUB_API_TOKEN
              value: "insert-your-huggingface-token-here"
-          image: opea/llm-faqgen-tgi:latest
+          image: opea/llm-faqgen-tgi:v0.9
          imagePullPolicy: IfNotPresent
          args: null
          ports:
@@ -145,7 +145,7 @@ spec:
              value: faq-mega-server-cpu-svc
            - name: MEGA_SERVICE_PORT
              value: "7777"
-          image: opea/faqgen:latest
+          image: opea/faqgen:v0.9
          imagePullPolicy: IfNotPresent
          args: null
          ports:
--- a/FaqGen/kubernetes/manifests/xeon/ui/react-faqgen.yaml
+++ b/FaqGen/kubernetes/manifests/xeon/ui/react-faqgen.yaml
@@ -179,7 +179,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: {}
-          image: "opea/llm-faqgen-tgi:latest"
+          image: "opea/llm-faqgen-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -230,7 +230,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/faqgen:latest"
+          image: "opea/faqgen:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: faqgen
@@ -273,7 +273,7 @@ spec:
            - name: no_proxy
              value:
          securityContext: null
-          image: "opea/faqgen-react-ui:latest"
+          image: "opea/faqgen-react-ui:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: react-ui
--- a/ProductivitySuite/docker/xeon/README.md
+++ b/ProductivitySuite/docker/xeon/README.md
@@ -195,7 +195,7 @@ cd GenAIExamples/ProductivitySuite/docker/xeon/
 ```

 ```bash
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 ### Setup Keycloak
--- a/ProductivitySuite/kubernetes/manifests/xeon/chat_history.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/chat_history.yaml
@@ -65,7 +65,7 @@ spec:
          - configMapRef:
              name: chat-history-config
          securityContext: null
-          image: "opea/chathistory-mongo-server:latest"
+          image: "opea/chathistory-mongo-server:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: chat-history
--- a/ProductivitySuite/kubernetes/manifests/xeon/chatqna.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/chatqna.yaml
@@ -499,7 +499,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/dataprep-redis:latest"
+          image: "opea/dataprep-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: data-prep
@@ -557,7 +557,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/embedding-tei:latest"
+          image: "opea/embedding-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: embedding-usvc
@@ -615,7 +615,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -753,7 +753,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/reranking-tei:latest"
+          image: "opea/reranking-tei:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: reranking-usvc
@@ -811,7 +811,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/retriever-redis:latest"
+          image: "opea/retriever-redis:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: retriever-usvc
@@ -1069,7 +1069,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/chatqna:latest"
+          image: "opea/chatqna:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/ProductivitySuite/kubernetes/manifests/xeon/codegen.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/codegen.yaml
@@ -171,7 +171,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-tgi:latest"
+          image: "opea/llm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -301,7 +301,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/codegen:latest"
+          image: "opea/codegen:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/ProductivitySuite/kubernetes/manifests/xeon/docsum.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/docsum.yaml
@@ -171,7 +171,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/llm-docsum-tgi:latest"
+          image: "opea/llm-docsum-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -301,7 +301,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/docsum:latest"
+          image: "opea/docsum:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/ProductivitySuite/kubernetes/manifests/xeon/faqgen.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/faqgen.yaml
@@ -183,7 +183,7 @@ spec:
            - configMapRef:
                name: faqgen-llm-uservice-config
          securityContext: {}
-          image: "opea/llm-faqgen-tgi:latest"
+          image: "opea/llm-faqgen-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: llm-uservice
@@ -234,7 +234,7 @@ spec:
            - name: no_proxy
              value: ""
          securityContext: null
-          image: "opea/faqgen:latest"
+          image: "opea/faqgen:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: faqgen
--- a/ProductivitySuite/kubernetes/manifests/xeon/productivity_suite_reactui.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/productivity_suite_reactui.yaml
@@ -82,7 +82,7 @@ spec:
            - name: APP_KEYCLOAK_SERVICE_ENDPOINT
              value: ""
          securityContext: null
-          image: "opea/productivity-suite-react-ui-server:latest"
+          image: "opea/productivity-suite-react-ui-server:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: react-ui
--- a/ProductivitySuite/kubernetes/manifests/xeon/prompt_registry.yaml
+++ b/ProductivitySuite/kubernetes/manifests/xeon/prompt_registry.yaml
@@ -65,7 +65,7 @@ spec:
          - configMapRef:
              name: prompt-registry-config
          securityContext: null
-          image: "opea/promptregistry-mongo-server:latest"
+          image: "opea/promptregistry-mongo-server:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: prompt-registry
--- a/SearchQnA/README.md
+++ b/SearchQnA/README.md
@@ -69,7 +69,7 @@ If your version of `Habana Driver` < 1.16.0 (check with `hl-smi`), run the follo

 ```bash
 cd GenAIExamples/SearchQnA/docker/gaudi/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -82,7 +82,7 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/SearchQnA/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 Refer to the [Xeon Guide](./docker/xeon/README.md) for more instructions on building docker images from source.
--- a/SearchQnA/docker/gaudi/README.md
+++ b/SearchQnA/docker/gaudi/README.md
@@ -109,7 +109,7 @@ export LLM_SERVICE_PORT=3007

 ```bash
 cd GenAIExamples/SearchQnA/docker/gaudi/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ## 🚀 Test MicroServices
--- a/SearchQnA/docker/xeon/README.md
+++ b/SearchQnA/docker/xeon/README.md
@@ -88,7 +88,7 @@ export LLM_SERVICE_PORT=3007

 ```bash
 cd GenAIExamples/SearchQnA/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ## 🚀 Test MicroServices
--- a/Translation/docker/gaudi/README.md
+++ b/Translation/docker/gaudi/README.md
@@ -64,7 +64,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
 ### Start Microservice Docker Containers

 ```bash
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/Translation/docker/xeon/README.md
+++ b/Translation/docker/xeon/README.md
@@ -72,7 +72,7 @@ Note: Please replace with `host_ip` with you external IP address, do not use loc
 ### Start Microservice Docker Containers

 ```bash
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 ### Validate Microservices
--- a/VisualQnA/README.md
+++ b/VisualQnA/README.md
@@ -63,7 +63,7 @@ Find the corresponding [compose.yaml](./docker/gaudi/compose.yaml).

 ```bash
 cd GenAIExamples/VisualQnA/docker/gaudi/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```

 > Notice: Currently only the **Habana Driver 1.16.x** is supported for Gaudi.
@@ -76,5 +76,5 @@ Find the corresponding [compose.yaml](./docker/xeon/compose.yaml).

 ```bash
 cd GenAIExamples/VisualQnA/docker/xeon/
-docker compose up -d
+TAG=v0.9 docker compose up -d
 ```
--- a/VisualQnA/docker/gaudi/README.md
+++ b/VisualQnA/docker/gaudi/README.md
@@ -85,7 +85,7 @@ cd GenAIExamples/VisualQnA/docker/gaudi/
 ```

 ```bash
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 > **_NOTE:_** Users need at least one Gaudi cards to run the VisualQnA successfully.
--- a/VisualQnA/docker/xeon/README.md
+++ b/VisualQnA/docker/xeon/README.md
@@ -124,7 +124,7 @@ cd GenAIExamples/VisualQnA/docker/xeon/
 ```

 ```bash
-docker compose -f compose.yaml up -d
+TAG=v0.9 docker compose -f compose.yaml up -d
 ```

 ### Validate Microservices
--- a/VisualQnA/kubernetes/manifests/gaudi/visualqna.yaml
+++ b/VisualQnA/kubernetes/manifests/gaudi/visualqna.yaml
@@ -165,7 +165,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/lvm-tgi:latest"
+          image: "opea/lvm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: lvm-uservice
@@ -215,7 +215,7 @@ spec:
                name: visualqna-tgi-config
          securityContext:
            {}
-          image: "opea/llava-tgi:latest"
+          image: "opea/llava-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /data
@@ -282,7 +282,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/visualqna:latest"
+          image: "opea/visualqna:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
--- a/VisualQnA/kubernetes/manifests/xeon/visualqna.yaml
+++ b/VisualQnA/kubernetes/manifests/xeon/visualqna.yaml
@@ -166,7 +166,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/lvm-tgi:latest"
+          image: "opea/lvm-tgi:v0.9"
          imagePullPolicy: IfNotPresent
          ports:
            - name: lvm-uservice
@@ -282,7 +282,7 @@ spec:
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
-          image: "opea/visualqna:latest"
+          image: "opea/visualqna:v0.9"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /tmp
Author	SHA1	Message	Date
xiguiw	4d5972112c	[Doc] Update ChatQnA flow chart (#542 ) * Update flow chart Signed-off-by: Wang, Xigui <xigui.wang@intel.com> * Updated Flowchart Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> --------- Signed-off-by: Wang, Xigui <xigui.wang@intel.com> Signed-off-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> (cherry picked from commit `dad8eb4b82`)	2024-08-27 11:07:03 +08:00
lvliang-intel	dab0177432	Add benchmark README for ChatQnA (#662 ) * Add benchmark README for ChatQnA Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmark.yaml Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml path Signed-off-by: lvliang-intel <liang1.lv@intel.com> * fix preci issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update title Signed-off-by: lvliang-intel <liang1.lv@intel.com> --------- Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: Yingchun Guo <yingchun.guo@intel.com>	2024-08-27 11:06:36 +08:00
NeuralChatBot	e7b000eca5	Freeze OPEA images tag Signed-off-by: NeuralChatBot <grp_neural_chat_bot@intel.com>	2024-08-25 16:28:59 +00:00
chen, suyue	723fddec79	add env for chatqna vllm (#655 ) Signed-off-by: chensuyue <suyue.chen@intel.com> (cherry picked from commit `f78aa9ee2f`)	2024-08-23 22:11:32 +08:00
Dina Suehiro Jones	f629702004	Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml and doc updates (#613 ) * Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml and doc updates Signed-off-by: dmsuehir <dina.s.jones@intel.com> (cherry picked from commit `c25063f4bb`)	2024-08-23 22:11:31 +08:00