audioqna trigger ut

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Update CodeTrans for GenAIComps Refactor (#1309 )
2025-01-02 08:53:10 +08:00 · 2025-01-02 09:33:26 +08:00 · 2025-01-02 09:32:12 +08:00 · 2025-01-01 19:05:52 +08:00 · 2025-01-01 03:17:33 +00:00 · 2025-01-01 11:17:05 +08:00
488 changed files with 17160 additions and 11401 deletions
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -1,17 +1,23 @@
-/AgentQnA/ kaokao.lv@intel.com
-/AudioQnA/ sihan.chen@intel.com
-/ChatQnA/ liang1.lv@intel.com
-/CodeGen/ liang1.lv@intel.com
-/CodeTrans/ sihan.chen@intel.com
-/DocSum/ letong.han@intel.com
+/.github/ suyue.chen@intel.com ze.pan@intel.com
+/AgentQnA/ kaokao.lv@intel.com minmin.hou@intel.com
+/AudioQnA/ sihan.chen@intel.com wenjiao.yue@intel.com
+/AvatarChatbot/ chun.tao@intel.com kaokao.lv@intel.com
+/ChatQnA/ liang1.lv@intel.com letong.han@intel.com
+/CodeGen/ liang1.lv@intel.com xinyao.wang@intel.com
+/CodeTrans/ sihan.chen@intel.com xinyao.wang@intel.com
+/DBQnA/ supriya.krishnamurthi@intel.com liang1.lv@intel.com
 /DocIndexRetriever/ kaokao.lv@intel.com chendi.xue@intel.com
-/InstructionTuning xinyu.ye@intel.com
-/RerankFinetuning xinyu.ye@intel.com
-/MultimodalQnA tiep.le@intel.com
-/FaqGen/ xinyao.wang@intel.com
-/SearchQnA/ sihan.chen@intel.com
-/Translation/ liang1.lv@intel.com
-/VisualQnA/ liang1.lv@intel.com
-/ProductivitySuite/ hoong.tee.yeoh@intel.com
-/VideoQnA huiling.bao@intel.com
-/*/ liang1.lv@intel.com
+/DocSum/ letong.han@intel.com xinyao.wang@intel.com
+/EdgeCraftRAG/ yongbo.zhu@intel.com mingyuan.qi@intel.com
+/FaqGen/ yogesh.pandey@intel.com xinyao.wang@intel.com
+/GraphRAG/ rita.brugarolas.brufau@intel.com abolfazl.shahbazi@intel.com
+/InstructionTuning/ xinyu.ye@intel.com kaokao.lv@intel.com
+/MultimodalQnA/ melanie.h.buehler@intel.com tiep.le@intel.com
+/ProductivitySuite/ jaswanth.karani@intel.com hoong.tee.yeoh@intel.com
+/RerankFinetuning/ xinyu.ye@intel.com kaokao.lv@intel.com
+/SearchQnA/ sihan.chen@intel.com letong.han@intel.com
+/Text2Image/ wenjiao.yue@intel.com xinyu.ye@intel.com
+/Translation/ liang1.lv@intel.com sihan.chen@intel.com
+/VideoQnA/ huiling.bao@intel.com xinyao.wang@intel.com
+/VisualQnA/ liang1.lv@intel.com sihan.chen@intel.com
+/*/ liang1.lv@intel.com feng.tian@intel.com suyue.chen@intel.com
--- a/.github/ISSUE_TEMPLATE/1_bug_template.yml
+++ b/.github/ISSUE_TEMPLATE/1_bug_template.yml
@@ -4,6 +4,7 @@
 name: Report Bug
 description: Used to report bug
 title: "[Bug]"
+labels: ["bug"]
 body:
  - type: dropdown
    id: priority
--- a/.github/ISSUE_TEMPLATE/2_feature_template.yml
+++ b/.github/ISSUE_TEMPLATE/2_feature_template.yml
@@ -4,6 +4,7 @@
 name: Report Feature
 description: Used to report feature
 title: "[Feature]"
+labels: ["feature"]
 body:
  - type: dropdown
    id: priority
--- a/.github/code_spell_ignore.txt
+++ b/.github/code_spell_ignore.txt
@@ -1,2 +1,2 @@
 ModelIn
-modelin
+modelin
--- a/.github/license_template.txt
+++ b/.github/license_template.txt
@@ -1,2 +1,2 @@
 Copyright (C) 2024 Intel Corporation
-SPDX-License-Identifier: Apache-2.0
+SPDX-License-Identifier: Apache-2.0
--- a/.github/workflows/_example-workflow.yml
+++ b/.github/workflows/_example-workflow.yml
@@ -77,9 +77,9 @@ jobs:
              git clone https://github.com/vllm-project/vllm.git
              cd vllm && git rev-parse HEAD && cd ../
          fi
-          if [[ $(grep -c "vllm-hpu:" ${docker_compose_path}) != 0 ]]; then
+          if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
               git clone https://github.com/HabanaAI/vllm-fork.git
-               cd vllm-fork && git rev-parse HEAD && cd ../
+               cd vllm-fork && git checkout 3c39626 && cd ../
          fi
          git clone https://github.com/opea-project/GenAIComps.git
          cd GenAIComps && git checkout ${{ inputs.opea_branch }} && git rev-parse HEAD && cd ../
--- a/.github/workflows/_get-test-matrix.yml
+++ b/.github/workflows/_get-test-matrix.yml
@@ -14,7 +14,7 @@ on:
      test_mode:
        required: false
        type: string
-        default: 'docker_compose'
+        default: 'compose'
    outputs:
      run_matrix:
        description: "The matrix string"
@@ -42,6 +42,12 @@ jobs:
          ref: ${{ env.CHECKOUT_REF }}
          fetch-depth: 0

+      - name: Check Dangerous Command Injection
+        if: github.event_name == 'pull_request' || github.event_name == 'pull_request_target'
+        uses: opea-project/validation/actions/check-cmd@main
+        with:
+          work_dir: ${{ github.workspace }}
+
      - name: Get test matrix
        id: get-test-matrix
        run: |
--- a/.github/workflows/_gmc-workflow.yml
+++ b/.github/workflows/_gmc-workflow.yml
@@ -67,36 +67,6 @@ jobs:
          make docker.build
          make docker.push

-      - name: Scan gmcmanager
-        if: ${{ inputs.node == 'gaudi' }}
-        uses: opea-project/validation/actions/trivy-scan@main
-        with:
-          image-ref: ${{ env.DOCKER_REGISTRY }}/gmcmanager:${{ env.VERSION }}
-          output: gmcmanager-scan.txt
-
-      - name: Upload gmcmanager scan result
-        if: ${{ inputs.node == 'gaudi' }}
-        uses: actions/upload-artifact@v4.3.4
-        with:
-          name: gmcmanager-scan
-          path: gmcmanager-scan.txt
-          overwrite: true
-
-      - name: Scan gmcrouter
-        if: ${{ inputs.node == 'gaudi' }}
-        uses: opea-project/validation/actions/trivy-scan@main
-        with:
-          image-ref: ${{ env.DOCKER_REGISTRY }}/gmcrouter:${{ env.VERSION }}
-          output: gmcrouter-scan.txt
-
-      - name: Upload gmcrouter scan result
-        if: ${{ inputs.node == 'gaudi' }}
-        uses: actions/upload-artifact@v4.3.4
-        with:
-          name: gmcrouter-scan
-          path: gmcrouter-scan.txt
-          overwrite: true
-
      - name: Clean up images
        if: always()
        run: |
--- a/.github/workflows/_manifest-e2e.yml
+++ b/.github/workflows/_manifest-e2e.yml
@@ -22,7 +22,72 @@ on:
        type: string

 jobs:
+  get-test-case:
+    runs-on: ubuntu-latest
+    outputs:
+      test_cases: ${{ steps.test-case-matrix.outputs.test_cases }}
+      CHECKOUT_REF: ${{ steps.get-checkout-ref.outputs.CHECKOUT_REF }}
+    steps:
+      - name: Get checkout ref
+        id: get-checkout-ref
+        run: |
+          if [ "${{ github.event_name }}" == "pull_request" ] || [ "${{ github.event_name }}" == "pull_request_target" ]; then
+            CHECKOUT_REF=refs/pull/${{ github.event.number }}/merge
+          else
+            CHECKOUT_REF=${{ github.ref }}
+          fi
+          echo "CHECKOUT_REF=${CHECKOUT_REF}" >> $GITHUB_OUTPUT
+          echo "checkout ref ${CHECKOUT_REF}"
+
+      - name: Checkout out Repo
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ steps.get-checkout-ref.outputs.CHECKOUT_REF }}
+          fetch-depth: 0
+
+      - name: Get test matrix
+        shell: bash
+        id: test-case-matrix
+        run: |
+          example_l=$(echo ${{ inputs.example }} | tr '[:upper:]' '[:lower:]')
+          cd ${{ github.workspace }}/${{ inputs.example }}/tests
+          run_test_cases=""
+
+          default_test_case=$(find . -type f -name "test_manifest_on_${{ inputs.hardware }}.sh" | cut -d/ -f2)
+          if [ "$default_test_case" ]; then run_test_cases="$default_test_case"; fi
+          other_test_cases=$(find . -type f -name "test_manifest_*_on_${{ inputs.hardware }}.sh" | cut -d/ -f2)
+          echo "default_test_case=$default_test_case"
+          echo "other_test_cases=$other_test_cases"
+
+          if [ "${{ inputs.tag }}" == "ci" ]; then
+            base_commit=$(curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
+            "https://api.github.com/repos/opea-project/GenAIExamples/commits?sha=${{ github.event.pull_request.base.ref }}" | jq -r '.[0].sha')
+            merged_commit=$(git log -1 --format='%H')
+            changed_files="$(git diff --name-only ${base_commit} ${merged_commit} | grep -vE '${{ inputs.diff_excluded_files }}')" || true
+          fi
+
+          for test_case in $other_test_cases; do
+            if [ "${{ inputs.tag }}" == "ci" ]; then
+              flag=${test_case%_on_*}
+              flag=${flag#test_compose_}
+              if [[ $(printf '%s\n' "${changed_files[@]}" | grep ${{ inputs.example }} | grep ${flag}) ]]; then
+                run_test_cases="$run_test_cases $test_case"
+              fi
+            else
+              run_test_cases="$run_test_cases $test_case"
+            fi
+          done
+
+          test_cases=$(echo $run_test_cases | tr ' ' '\n' | sort -u | jq -R '.' | jq -sc '.')
+          echo "test_cases=$test_cases"
+          echo "test_cases=$test_cases" >> $GITHUB_OUTPUT
+
  manifest-test:
+    needs: [get-test-case]
+    strategy:
+      matrix:
+        test_case: ${{ fromJSON(needs.get-test-case.outputs.test_cases) }}
+      fail-fast: false
    runs-on: "k8s-${{ inputs.hardware }}"
    continue-on-error: true
    steps:
@@ -45,11 +110,14 @@ jobs:
          fetch-depth: 0

      - name: Set variables
+        env:
+          test_case: ${{ matrix.test_case }}
        run: |
          echo "IMAGE_REPO=${OPEA_IMAGE_REPO}opea" >> $GITHUB_ENV
          echo "IMAGE_TAG=${{ inputs.tag }}" >> $GITHUB_ENV
          lower_example=$(echo "${{ inputs.example }}" | tr '[:upper:]' '[:lower:]')
-          echo "NAMESPACE=$lower_example-$(tr -dc a-z0-9 </dev/urandom | head -c 16)" >> $GITHUB_ENV
+          name=$(echo "$test_case" | cut -d/ -f2 | cut -d'_' -f3- |cut -d'_' -f1 | grep -v 'on' | sed 's/^/-/')
+          echo "NAMESPACE=$lower_example$name-$(tr -dc a-z0-9 </dev/urandom | head -c 16)" >> $GITHUB_ENV
          echo "ROLLOUT_TIMEOUT_SECONDS=1800s" >> $GITHUB_ENV
          echo "KUBECTL_TIMEOUT_SECONDS=60s" >> $GITHUB_ENV
          echo "continue_test=true" >> $GITHUB_ENV
@@ -59,15 +127,19 @@ jobs:

      - name: Kubectl install
        id: install
+        env:
+          test_case: ${{ matrix.test_case }}
        run: |
-          if [[ ! -f ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh ]]; then
+          set -x
+          echo "test_case=$test_case"
+          if [[ ! -f ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} ]]; then
            echo "No test script found, exist test!"
            exit 0
          else
-            ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh init_${{ inputs.example }}
+            ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} init_${{ inputs.example }}
            echo "should_cleanup=true" >> $GITHUB_ENV
            kubectl create ns $NAMESPACE
-            ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh install_${{ inputs.example }} $NAMESPACE
+            ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} install_${{ inputs.example }} $NAMESPACE
            echo "Testing ${{ inputs.example }}, waiting for pod ready..."
            if kubectl rollout status deployment --namespace "$NAMESPACE" --timeout "$ROLLOUT_TIMEOUT_SECONDS"; then
              echo "Testing manifests ${{ inputs.example }}, waiting for pod ready done!"
@@ -82,14 +154,16 @@ jobs:

      - name: Validate e2e test
        if: always()
+        env:
+          test_case: ${{ matrix.test_case }}
        run: |
          if $skip_validate; then
            echo "Skip validate"
          else
-            if ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh validate_${{ inputs.example }} $NAMESPACE ; then
-              echo "Validate ${{ inputs.example }} successful!"
+            if ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} validate_${{ inputs.example }} $NAMESPACE ; then
+              echo "Validate ${test_case} successful!"
            else
-              echo "Validate ${{ inputs.example }} failure!!!"
+              echo "Validate ${test_case} failure!!!"
              echo "Check the logs in 'Dump logs when e2e test failed' step!!!"
              exit 1
            fi
--- a/.github/workflows/_run-docker-compose.yml
+++ b/.github/workflows/_run-docker-compose.yml
@@ -111,6 +111,17 @@ jobs:
          ref: ${{ needs.get-test-case.outputs.CHECKOUT_REF }}
          fetch-depth: 0

+      - name: Clean up container before test
+        shell: bash
+        run: |
+          docker ps
+          cd ${{ github.workspace }}/${{ inputs.example }}
+          export test_case=${{ matrix.test_case }}
+          export hardware=${{ inputs.hardware }}
+          bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "containers"
+          bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "ports"
+          docker ps
+
      - name: Run test
        shell: bash
        env:
@@ -123,6 +134,7 @@ jobs:
          SERVING_TOKEN: ${{ secrets.SERVING_TOKEN }}
          IMAGE_REPO: ${{ inputs.registry }}
          IMAGE_TAG: ${{ inputs.tag }}
+          opea_branch: "refactor_comps"
          example: ${{ inputs.example }}
          hardware: ${{ inputs.hardware }}
          test_case: ${{ matrix.test_case }}
@@ -131,21 +143,14 @@ jobs:
          if [[ "$IMAGE_REPO" == "" ]]; then export IMAGE_REPO="${OPEA_IMAGE_REPO}opea"; fi
          if [ -f ${test_case} ]; then timeout 30m bash ${test_case}; else echo "Test script {${test_case}} not found, skip test!"; fi

-      - name: Clean up container
+      - name: Clean up container after test
        shell: bash
        if: cancelled() || failure()
        run: |
-          cd ${{ github.workspace }}/${{ inputs.example }}/docker_compose
-          test_case=${{ matrix.test_case }}
-          flag=${test_case%_on_*}
-          flag=${flag#test_}
-          yaml_file=$(find . -type f -wholename "*${{ inputs.hardware }}/${flag}.yaml")
-          echo $yaml_file
-          container_list=$(cat $yaml_file | grep container_name | cut -d':' -f2)
-          for container_name in $container_list; do
-              cid=$(docker ps -aq --filter "name=$container_name")
-              if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
-          done
+          cd ${{ github.workspace }}/${{ inputs.example }}
+          export test_case=${{ matrix.test_case }}
+          export hardware=${{ inputs.hardware }}
+          bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "containers"
          docker system prune -f
          docker rmi $(docker images --filter reference="*:5000/*/*" -q) || true

--- a/.github/workflows/manual-docker-clean.yaml
+++ b/.github/workflows/manual-docker-clean.yaml
@@ -0,0 +1,31 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+name: Clean up container on manual event
+on:
+  workflow_dispatch:
+    inputs:
+      node:
+        default: "rocm"
+        description: "Hardware to clean"
+        required: true
+        type: string
+      clean_list:
+        default: ""
+        description: "docker command to clean"
+        required: false
+        type: string
+
+jobs:
+  clean:
+    runs-on: "${{ inputs.node }}"
+    steps:
+      - name: Clean up container
+        run: |
+          docker ps
+          if [ "${{ inputs.clean_list }}" ]; then
+            echo "----------stop and remove containers----------"
+            docker stop ${{ inputs.clean_list }} && docker rm ${{ inputs.clean_list }}
+            echo "----------container removed----------"
+            docker ps
+          fi
--- a/.github/workflows/manual-example-workflow.yml
+++ b/.github/workflows/manual-example-workflow.yml
@@ -12,7 +12,7 @@ on:
        type: string
      examples:
        default: "ChatQnA"
-        description: 'List of examples to test [AudioQnA,ChatQnA,CodeGen,CodeTrans,DocSum,FaqGen,SearchQnA,Translation]'
+        description: 'List of examples to test [AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA,AvatarChatbot,Text2Image,WorkflowExecAgent,DBQnA,EdgeCraftRAG,GraphRAG]'
        required: true
        type: string
      tag:
@@ -51,7 +51,7 @@ on:
        required: false
        type: string
      inject_commit:
-        default: true
+        default: false
        description: "inject commit to docker images true or false"
        required: false
        type: string
--- a/.github/workflows/manual-freeze-tag.yml
+++ b/.github/workflows/manual-freeze-tag.yml
@@ -1,13 +1,13 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-name: Freeze OPEA images release tag in readme on manual event
+name: Freeze OPEA images release tag

 on:
  workflow_dispatch:
    inputs:
      tag:
-        default: "latest"
+        default: "1.1.0"
        description: "Tag to apply to images"
        required: true
        type: string
@@ -23,10 +23,6 @@ jobs:
          fetch-depth: 0
          ref: ${{ github.ref }}

-      - uses: actions/setup-python@v5
-        with:
-          python-version: "3.10"
-
      - name: Set up Git
        run: |
          git config --global user.name "NeuralChatBot"
@@ -35,9 +31,10 @@ jobs:

      - name: Run script
        run: |
-          find . -name "*.md" | xargs sed -i "s|^docker\ compose|TAG=${{ github.event.inputs.tag }}\ docker\ compose|g"
-          find . -type f -name "*.yaml" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
-          find . -type f -name "*.md" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
+          IFS='.' read -r major minor patch <<< "${{ github.event.inputs.tag }}"
+          echo "VERSION_MAJOR ${major}"  > version.txt
+          echo "VERSION_MINOR ${minor}" >> version.txt
+          echo "VERSION_PATCH ${patch}" >> version.txt

      - name: Commit changes
        run: |
--- a/.github/workflows/manual-image-build.yml
+++ b/.github/workflows/manual-image-build.yml
@@ -12,7 +12,7 @@ on:
        type: string
      example:
        default: "ChatQnA"
-        description: 'Build images belong to which example?'
+        description: 'Build images belong to which example? [AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA,AvatarChatbot,Text2Image,WorkflowExecAgent,DBQnA,EdgeCraftRAG,GraphRAG]'
        required: true
        type: string
      services:
@@ -31,7 +31,7 @@ on:
        required: false
        type: string
      inject_commit:
-        default: true
+        default: false
        description: "inject commit to docker images true or false"
        required: false
        type: string
--- a/.github/workflows/manual-reset-local-registry.yml
+++ b/.github/workflows/manual-reset-local-registry.yml
@@ -0,0 +1,59 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+name: Clean up Local Registry on manual event
+on:
+  workflow_dispatch:
+    inputs:
+      nodes:
+        default: "gaudi,xeon"
+        description: "Hardware to clean up"
+        required: true
+        type: string
+
+env:
+  EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
+
+jobs:
+  get-build-matrix:
+    runs-on: ubuntu-latest
+    outputs:
+      examples: ${{ steps.get-matrix.outputs.examples }}
+      nodes: ${{ steps.get-matrix.outputs.nodes }}
+    steps:
+    - name: Create Matrix
+      id: get-matrix
+      run: |
+        examples=($(echo ${EXAMPLES} | tr ',' ' '))
+        examples_json=$(printf '%s\n' "${examples[@]}" | sort -u | jq -R '.' | jq -sc '.')
+        echo "examples=$examples_json" >> $GITHUB_OUTPUT
+        nodes=($(echo ${{ inputs.nodes }} | tr ',' ' '))
+        nodes_json=$(printf '%s\n' "${nodes[@]}" | sort -u | jq -R '.' | jq -sc '.')
+        echo "nodes=$nodes_json" >> $GITHUB_OUTPUT
+
+  clean-up:
+    needs: get-build-matrix
+    strategy:
+      matrix:
+        node: ${{ fromJson(needs.get-build-matrix.outputs.nodes) }}
+      fail-fast: false
+    runs-on: "docker-build-${{ matrix.node }}"
+    steps:
+      - name: Clean Up Local Registry
+        run: |
+          echo "Cleaning up local registry on ${{ matrix.node }}"
+          bash /home/sdp/workspace/fully_registry_cleanup.sh
+          docker ps | grep registry
+
+  build:
+    needs: [get-build-matrix, clean-up]
+    strategy:
+      matrix:
+        example: ${{ fromJson(needs.get-build-matrix.outputs.examples) }}
+        node: ${{ fromJson(needs.get-build-matrix.outputs.nodes) }}
+      fail-fast: false
+    uses: ./.github/workflows/_example-workflow.yml
+    with:
+      node: ${{ matrix.node }}
+      example: ${{ matrix.example }}
+    secrets: inherit
--- a/.github/workflows/nightly-docker-build-publish.yml
+++ b/.github/workflows/nightly-docker-build-publish.yml
@@ -5,11 +5,11 @@ name: Nightly build/publish latest docker images

 on:
  schedule:
-    - cron: "30 13 * * *" # UTC time
+    - cron: "30 14 * * *" # UTC time
  workflow_dispatch:

 env:
-  EXAMPLES: "AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA"
+  EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
  TAG: "latest"
  PUBLISH_TAGS: "latest"

@@ -32,7 +32,7 @@ jobs:
          echo "TAG=$TAG" >> $GITHUB_OUTPUT
          echo "PUBLISH_TAGS=$PUBLISH_TAGS" >> $GITHUB_OUTPUT

-  build:
+  build-and-test:
    needs: get-build-matrix
    strategy:
      matrix:
@@ -42,6 +42,7 @@ jobs:
    with:
      node: gaudi
      example: ${{ matrix.example }}
+      test_compose: true
    secrets: inherit

  get-image-list:
@@ -51,7 +52,7 @@ jobs:
      examples: ${{ needs.get-build-matrix.outputs.EXAMPLES }}

  publish:
-    needs: [get-build-matrix, get-image-list, build]
+    needs: [get-build-matrix, get-image-list, build-and-test]
    strategy:
      matrix:
        image: ${{ fromJSON(needs.get-image-list.outputs.matrix) }}
--- a/.github/workflows/pr-check-duplicated-image.yml
+++ b/.github/workflows/pr-check-duplicated-image.yml
@@ -0,0 +1,40 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+name: Check Duplicated Images
+
+on:
+  pull_request:
+    branches: [main, genaicomps_refactor]
+    types: [opened, reopened, ready_for_review, synchronize]
+    paths:
+      - "**/docker_image_build/*.yaml"
+      - ".github/workflows/pr-check-duplicated-image.yml"
+      - ".github/workflows/scripts/check_duplicated_image.py"
+  workflow_dispatch:
+
+# If there is a new commit, the previous jobs will be canceled
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  check-duplicated-image:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean Up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo
+        uses: actions/checkout@v4
+
+      - name: Check all the docker image build files
+        run: |
+          pip install PyYAML
+          cd ${{github.workspace}}
+          build_files=""
+          for f in `find . -path "*/docker_image_build/build.yaml"`; do
+              build_files="$build_files $f"
+          done
+          python3 .github/workflows/scripts/check_duplicated_image.py $build_files
+        shell: bash
--- a/.github/workflows/pr-code-scan.yml
+++ b/.github/workflows/pr-code-scan.yml
@@ -34,6 +34,11 @@ jobs:
      - name: Checkout out Repo
        uses: actions/checkout@v4

+      - name: Check Dangerous Command Injection
+        uses: opea-project/validation/actions/check-cmd@main
+        with:
+          work_dir: ${{ github.workspace }}
+
      - name: Docker Build
        run: |
          docker build -f ${{ github.workspace }}/.github/workflows/docker/${{ env.DOCKER_FILE_NAME }}.dockerfile -t ${{ env.REPO_NAME }}:${{ env.REPO_TAG }} .
--- a/.github/workflows/pr-dependency-review.yml
+++ b/.github/workflows/pr-dependency-review.yml
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0

 name: "Dependency Review"
-on: [pull_request]
+on: [pull_request_target]

 permissions:
  contents: read
--- a/.github/workflows/pr-docker-compose-e2e.yml
+++ b/.github/workflows/pr-docker-compose-e2e.yml
@@ -4,8 +4,8 @@
 name: E2E test with docker compose

 on:
-  pull_request_target:
-    branches: ["main", "*rc"]
+  pull_request:
+    branches: ["main", "*rc", "genaicomps_refactor"]
    types: [opened, reopened, ready_for_review, synchronize] # added `ready_for_review` since draft is skipped
    paths:
      - "**/Dockerfile**"
--- a/.github/workflows/pr-dockerfile-path-and-build-yaml-scan.yml
+++ b/.github/workflows/pr-dockerfile-path-and-build-yaml-scan.yml
@@ -0,0 +1,110 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+name: Compose file and dockerfile path checking
+
+on:
+  pull_request:
+    branches: [main, genaicomps_refactor]
+    types: [opened, reopened, ready_for_review, synchronize]
+
+jobs:
+  check-dockerfile-paths-in-README:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean Up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo GenAIExamples
+        uses: actions/checkout@v4
+
+      - name: Clone Repo GenAIComps
+        run: |
+          cd ..
+          git clone https://github.com/opea-project/GenAIComps.git
+          cd GenAIComps && git checkout refactor_comps
+
+      - name: Check for Missing Dockerfile Paths in GenAIComps
+        run: |
+          cd ${{github.workspace}}
+          miss="FALSE"
+          while IFS=: read -r file line content; do
+              dockerfile_path=$(echo "$content" | awk -F '-f ' '{print $2}' | awk '{print $1}')
+              if [[ ! -f "../GenAIComps/${dockerfile_path}" ]]; then
+                  miss="TRUE"
+                  echo "Missing Dockerfile: GenAIComps/${dockerfile_path} (Referenced in GenAIExamples/${file}:${line})"
+              fi
+          done < <(grep -Ern 'docker build .* -f comps/.+/Dockerfile' --include='*.md' .)
+
+
+          if [[ "$miss" == "TRUE" ]]; then
+            exit 1
+          fi
+
+        shell: bash
+
+  check-Dockerfile-in-build-yamls:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean Up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo GenAIExamples
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Check Dockerfile path included in image build yaml
+        if: always()
+        run: |
+          set -e
+          shopt -s globstar
+          no_add="FALSE"
+          cd ${{github.workspace}}
+          Dockerfiles=$(realpath $(find ./ -name '*Dockerfile*'))
+          if [ -n "$Dockerfiles" ]; then
+            for dockerfile in $Dockerfiles; do
+              service=$(echo "$dockerfile" | awk -F '/GenAIExamples/' '{print $2}' | awk -F '/' '{print $2}')
+              cd ${{github.workspace}}/$service/docker_image_build
+              all_paths=$(realpath $(awk '  /context:/ { context = $2 }  /dockerfile:/ { dockerfile = $2; combined = context "/" dockerfile; gsub(/\/+/, "/", combined); if  (index(context, ".") > 0) {print combined}}' build.yaml) 2> /dev/null || true  )
+              if ! echo "$all_paths" | grep -q "$dockerfile"; then
+                echo "AR: Update $dockerfile to GenAIExamples/$service/docker_image_build/build.yaml. The yaml is used for release images build."
+                no_add="TRUE"
+              fi
+            done
+          fi
+
+          if [[ "$no_add" == "TRUE" ]]; then
+            exit 1
+          fi
+
+  check-image-and-service-names-in-build-yaml:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean Up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo GenAIExamples
+        uses: actions/checkout@v4
+
+      - name: Check name agreement in build.yaml
+        run: |
+          pip install ruamel.yaml
+          cd ${{github.workspace}}
+          consistency="TRUE"
+          build_yamls=$(find . -name 'build.yaml')
+          for build_yaml in $build_yamls; do
+            message=$(python3 .github/workflows/scripts/check-name-agreement.py "$build_yaml")
+            if [[ "$message" != *"consistent"* ]]; then
+              consistency="FALSE"
+              echo "Inconsistent service name and image name found in file $build_yaml."
+              echo "$message"
+            fi
+          done
+
+          if [[ "$consistency" == "FALSE" ]]; then
+            echo "Please ensure that the service and image names are consistent in build.yaml, otherwise we cannot guarantee that your image will be published correctly."
+            exit 1
+          fi
+
+        shell: bash
--- a/.github/workflows/pr-link-path-scan.yml
+++ b/.github/workflows/pr-link-path-scan.yml
@@ -1,47 +1,14 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-name: Check Paths and Hyperlinks
+name: Check hyperlinks and relative path validity

 on:
  pull_request:
-    branches: [main]
+    branches: [main, genaicomps_refactor]
    types: [opened, reopened, ready_for_review, synchronize]

 jobs:
-  check-dockerfile-paths:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clean Up Working Directory
-        run: sudo rm -rf ${{github.workspace}}/*
-
-      - name: Checkout Repo GenAIExamples
-        uses: actions/checkout@v4
-
-      - name: Clone Repo GenAIComps
-        run: |
-          cd ..
-          git clone https://github.com/opea-project/GenAIComps.git
-
-      - name: Check for Missing Dockerfile Paths in GenAIComps
-        run: |
-          cd ${{github.workspace}}
-          miss="FALSE"
-          while IFS=: read -r file line content; do
-              dockerfile_path=$(echo "$content" | awk -F '-f ' '{print $2}' | awk '{print $1}')
-              if [[ ! -f "../GenAIComps/${dockerfile_path}" ]]; then
-                  miss="TRUE"
-                  echo "Missing Dockerfile: GenAIComps/${dockerfile_path} (Referenced in GenAIExamples/${file}:${line})"
-              fi
-          done < <(grep -Ern 'docker build .* -f comps/.+/Dockerfile' --include='*.md' .)
-
-
-          if [[ "$miss" == "TRUE" ]]; then
-            exit 1
-          fi
-
-        shell: bash
-
  check-the-validity-of-hyperlinks-in-README:
    runs-on: ubuntu-latest
    steps:
--- a/.github/workflows/pr-manifest-e2e.yml.disabled
+++ b/.github/workflows/pr-manifest-e2e.yml.disabled
--- a/.github/workflows/push-image-build.yml
+++ b/.github/workflows/push-image-build.yml
@@ -8,7 +8,9 @@ on:
    branches: [ 'main' ]
    paths:
      - "**.py"
-      - "**Dockerfile"
+      - "**Dockerfile*"
+      - "**docker_image_build/build.yaml"
+      - "**/ui/**"

 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}-on-push
@@ -18,7 +20,7 @@ jobs:
  job1:
    uses: ./.github/workflows/_get-test-matrix.yml
    with:
-      test_mode: "docker_image_build/build.yaml"
+      test_mode: "docker_image_build"

  image-build:
    needs: job1
--- a/.github/workflows/scripts/check-name-agreement.py
+++ b/.github/workflows/scripts/check-name-agreement.py
@@ -0,0 +1,46 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import argparse
+
+from ruamel.yaml import YAML
+
+
+def parse_yaml_file(file_path):
+    yaml = YAML()
+    with open(file_path, "r") as file:
+        data = yaml.load(file)
+    return data
+
+
+def check_service_image_consistency(data):
+    inconsistencies = []
+    for service_name, service_details in data.get("services", {}).items():
+        image_name = service_details.get("image", "")
+        # Extract the image name part after the last '/'
+        image_name_part = image_name.split("/")[-1].split(":")[0]
+        # Check if the service name is a substring of the image name part
+        if service_name not in image_name_part:
+            # Get the line number of the service name
+            line_number = service_details.lc.line + 1
+            inconsistencies.append((service_name, image_name, line_number))
+    return inconsistencies
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Check service name and image name consistency in a YAML file.")
+    parser.add_argument("file_path", type=str, help="The path to the YAML file.")
+    args = parser.parse_args()
+
+    data = parse_yaml_file(args.file_path)
+
+    inconsistencies = check_service_image_consistency(data)
+    if inconsistencies:
+        for service_name, image_name, line_number in inconsistencies:
+            print(f"Service name: {service_name}, Image name: {image_name}, Line number: {line_number}")
+    else:
+        print("All consistent")
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/workflows/scripts/check_duplicated_image.py
+++ b/.github/workflows/scripts/check_duplicated_image.py
@@ -0,0 +1,63 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import argparse
+import os.path
+import subprocess
+import sys
+
+import yaml
+
+images = {}
+
+
+def check_docker_compose_build_definition(file_path):
+    with open(file_path, "r") as f:
+        data = yaml.load(f, Loader=yaml.FullLoader)
+        for service in data["services"]:
+            if "build" in data["services"][service] and "image" in data["services"][service]:
+                bash_command = "echo " + data["services"][service]["image"]
+                image = (
+                    subprocess.run(["bash", "-c", bash_command], check=True, capture_output=True)
+                    .stdout.decode("utf-8")
+                    .strip()
+                )
+                build = data["services"][service]["build"]
+                context = build.get("context", "")
+                dockerfile = os.path.normpath(
+                    os.path.join(os.path.dirname(file_path), context, build.get("dockerfile", ""))
+                )
+                if not os.path.isfile(dockerfile):
+                    # dockerfile not exists in the current repo context, assume it's in 3rd party context
+                    dockerfile = os.path.normpath(os.path.join(context, build.get("dockerfile", "")))
+                item = {"file_path": file_path, "service": service, "dockerfile": dockerfile}
+                if image in images and dockerfile != images[image]["dockerfile"]:
+                    print("ERROR: !!! Found Conflicts !!!")
+                    print(f"Image: {image}, Dockerfile: {dockerfile}, defined in Service: {service}, File: {file_path}")
+                    print(
+                        f"Image: {image}, Dockerfile: {images[image]['dockerfile']}, defined in Service: {images[image]['service']}, File: {images[image]['file_path']}"
+                    )
+                    sys.exit(1)
+                else:
+                    # print(f"Add Image: {image} Dockerfile: {dockerfile}")
+                    images[image] = item
+
+
+def parse_arg():
+    parser = argparse.ArgumentParser(
+        description="Check for conflicts in image build definition in docker-compose.yml files"
+    )
+    parser.add_argument("files", nargs="+", help="list of files to be checked")
+    return parser.parse_args()
+
+
+def main():
+    args = parse_arg()
+    for file_path in args.files:
+        check_docker_compose_build_definition(file_path)
+    print("SUCCESS: No Conlicts Found.")
+    return 0
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/workflows/scripts/docker_compose_clean_up.sh
+++ b/.github/workflows/scripts/docker_compose_clean_up.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# The test machine used by several opea projects, so the test scripts can't use `docker compose down` to clean up
+# the all the containers, ports and networks directly.
+# So we need to use the following script to minimize the impact of the clean up.
+
+test_case=${test_case:-"test_compose_on_gaudi.sh"}
+hardware=${hardware:-"gaudi"}
+flag=${test_case%_on_*}
+flag=${flag#test_}
+yaml_file=$(find . -type f -wholename "*${hardware}/${flag}.yaml")
+echo $yaml_file
+
+case "$1" in
+    containers)
+        echo "Stop and remove all containers used by the services in $yaml_file ..."
+        containers=$(cat $yaml_file | grep container_name | cut -d':' -f2)
+        for container_name in $containers; do
+            cid=$(docker ps -aq --filter "name=$container_name")
+            if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
+        done
+        ;;
+    ports)
+        echo "Release all ports used by the services in $yaml_file ..."
+        pip install jq yq
+        ports=$(yq '.services[].ports[] | split(":")[0]' $yaml_file | grep -o '[0-9a-zA-Z_-]\+')
+        echo "$ports"
+        for port in $ports; do
+          if [[ $port =~ [a-zA-Z_-] ]]; then
+            port=$(grep -E "export $port=" tests/$test_case | cut -d'=' -f2)
+          fi
+          echo $port
+          cid=$(docker ps --filter "publish=${port}" --format "{{.ID}}")
+          if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
+        done
+        ;;
+    *)
+        echo "Unknown function: $1"
+        ;;
+esac
--- a/.github/workflows/scripts/get_test_matrix.sh
+++ b/.github/workflows/scripts/get_test_matrix.sh
@@ -16,8 +16,13 @@ for example in ${examples}; do
    if [[ ! $(find . -type f | grep ${test_mode}) ]]; then continue; fi
    cd tests
    ls -l
-    hardware_list=$(find . -type f -name "test_compose*_on_*.sh" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
-    echo "Test supported hardware list = ${hardware_list}"
+    if [[ "$test_mode" == "docker_image_build" ]]; then
+        find_name="test_manifest_on_*.sh"
+    else
+        find_name="test_${test_mode}*_on_*.sh"
+    fi
+    hardware_list=$(find . -type f -name "${find_name}" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
+    echo -e "Test supported hardware list: \n${hardware_list}"

    run_hardware=""
    if [[ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | cut -d'/' -f2 | grep -E '*.py|Dockerfile*|ui|docker_image_build' ) ]]; then
--- a/.gitignore
+++ b/.gitignore
@@ -5,4 +5,4 @@
 **/playwright/.cache/
 **/test-results/

-__pycache__/
+__pycache__/
--- a/.prettierignore
+++ b/.prettierignore
@@ -1 +1 @@
-**/kubernetes/
+**/kubernetes/
--- a/.set_env.sh
+++ b/.set_env.sh
@@ -0,0 +1,16 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+#To anounce the version of the codes, please create a version.txt and have following format.
+#VERSION_MAJOR 1
+#VERSION_MINOR 0
+#VERSION_PATCH 0
+
+VERSION_FILE="version.txt"
+if [ -f $VERSION_FILE ]; then
+    VER_OPEA_MAJOR=$(grep "VERSION_MAJOR" $VERSION_FILE | cut -d " " -f 2)
+    VER_OPEA_MINOR=$(grep "VERSION_MINOR" $VERSION_FILE | cut -d " " -f 2)
+    VER_OPEA_PATCH=$(grep "VERSION_PATCH" $VERSION_FILE | cut -d " " -f 2)
+    export TAG=$VER_OPEA_MAJOR.$VER_OPEA_MINOR
+    echo OPEA Version:$TAG
+fi
--- a/AgentQnA/README.md
+++ b/AgentQnA/README.md
@@ -83,29 +83,32 @@ flowchart LR

 ## Deployment with docker

-1. Build agent docker image
+1. Build agent docker image [Optional]

-   Note: this is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
+> [!NOTE]
+> the step is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.

-   First, clone the opea GenAIComps repo.
+First, clone the opea GenAIComps repo.

-   ```
-   export WORKDIR=<your-work-directory>
-   cd $WORKDIR
-   git clone https://github.com/opea-project/GenAIComps.git
-   ```
+```
+export WORKDIR=<your-work-directory>
+cd $WORKDIR
+git clone https://github.com/opea-project/GenAIComps.git
+```

-   Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.
+Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.

-   ```
-   cd GenAIComps
-   docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
-   ```
+```
+cd GenAIComps
+docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
+```

 2. Set up environment for this example </br>
+
   First, clone this repo.

   ```
+   export WORKDIR=<your-work-directory>
   cd $WORKDIR
   git clone https://github.com/opea-project/GenAIExamples.git
   ```
@@ -113,6 +116,14 @@ flowchart LR
   Second, set up env vars.

   ```
+   # Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
+   export host_ip=$(hostname -I | awk '{print $1}')
+   # if you are in a proxy environment, also set the proxy-related environment variables
+   export http_proxy="Your_HTTP_Proxy"
+   export https_proxy="Your_HTTPs_Proxy"
+   # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
+   export no_proxy="Your_No_Proxy"
+
   export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
   # for using open-source llms
   export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
@@ -147,6 +158,12 @@ flowchart LR
 5. Launch agent services</br>
   We provide two options for `llm_engine` of the agents: 1. open-source LLMs, 2. OpenAI models via API calls.

+   Deploy it on Gaudi or Xeon respectively
+
+   ::::{tab-set}
+   :::{tab-item} Gaudi
+   :sync: Gaudi
+
   To use open-source LLMs on Gaudi2, run commands below.

   ```
@@ -155,6 +172,10 @@ flowchart LR
   bash launch_agent_service_tgi_gaudi.sh
   ```

+   :::
+   :::{tab-item} Xeon
+   :sync: Xeon
+
   To use OpenAI models, run commands below.

   ```
@@ -162,6 +183,9 @@ flowchart LR
   bash launch_agent_service_openai.sh
   ```

+   :::
+   ::::
+
 ## Validate services

 First look at logs of the agent docker containers:
@@ -181,7 +205,7 @@ You should see something like "HTTP server setup successful" if the docker conta
 Second, validate worker agent:

 ```
-curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "Most recent album by Taylor Swift"
    }'
 ```
@@ -189,7 +213,7 @@ curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: app
 Third, validate supervisor agent:

 ```
-curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "Most recent album by Taylor Swift"
    }'
 ```
--- a/AgentQnA/docker_compose/amd/gpu/rocm/compose.yaml
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/compose.yaml
@@ -0,0 +1,97 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+services:
+  agent-tgi-server:
+    image: ${AGENTQNA_TGI_IMAGE}
+    container_name: agent-tgi-server
+    ports:
+      - "${AGENTQNA_TGI_SERVICE_PORT-8085}:80"
+    volumes:
+      - /var/opea/agent-service/:/data
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TGI_LLM_ENDPOINT: "http://${HOST_IP}:${AGENTQNA_TGI_SERVICE_PORT}"
+      HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    shm_size: 1g
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri/${AGENTQNA_CARD_ID}:/dev/dri/${AGENTQNA_CARD_ID}
+      - /dev/dri/${AGENTQNA_RENDER_ID}:/dev/dri/${AGENTQNA_RENDER_ID}
+    cap_add:
+      - SYS_PTRACE
+    group_add:
+      - video
+    security_opt:
+      - seccomp:unconfined
+    ipc: host
+    command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192
+
+  worker-rag-agent:
+    image: opea/agent-langchain:latest
+    container_name: rag-agent-endpoint
+    volumes:
+      # - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
+      - ${TOOLSET_PATH}:/home/user/tools/
+    ports:
+      - "9095:9095"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: rag_agent_llama
+      recursion_limit: ${recursion_limit_worker}
+      llm_engine: tgi
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      streaming: false
+      tools: /home/user/tools/worker_agent_tools.yaml
+      require_human_feedback: false
+      RETRIEVAL_TOOL_URL: ${RETRIEVAL_TOOL_URL}
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
+      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
+      LANGCHAIN_PROJECT: "opea-worker-agent-service"
+      port: 9095
+
+  supervisor-react-agent:
+    image: opea/agent-langchain:latest
+    container_name: react-agent-endpoint
+    depends_on:
+      - agent-tgi-server
+      - worker-rag-agent
+    volumes:
+      # - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
+      - ${TOOLSET_PATH}:/home/user/tools/
+    ports:
+      - "${AGENTQNA_FRONTEND_PORT}:9090"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: react_langgraph
+      recursion_limit: ${recursion_limit_supervisor}
+      llm_engine: tgi
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      streaming: false
+      tools: /home/user/tools/supervisor_agent_tools.yaml
+      require_human_feedback: false
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
+      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
+      LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
+      CRAG_SERVER: $CRAG_SERVER
+      WORKER_AGENT_URL: $WORKER_AGENT_URL
+      port: 9090
--- a/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_tgi_rocm.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_tgi_rocm.sh
@@ -0,0 +1,47 @@
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+WORKPATH=$(dirname "$PWD")/..
+export ip_address=${host_ip}
+export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
+export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+export AGENTQNA_TGI_SERVICE_PORT="8085"
+
+# LLM related environment variables
+export AGENTQNA_CARD_ID="card1"
+export AGENTQNA_RENDER_ID="renderD136"
+export HF_CACHE_DIR=${HF_CACHE_DIR}
+ls $HF_CACHE_DIR
+export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
+#export NUM_SHARDS=4
+export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
+export temperature=0.01
+export max_new_tokens=512
+
+# agent related environment variables
+export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
+export TOOLSET_PATH=/home/huggingface/datamonsters/amd-opea/GenAIExamples/AgentQnA/tools/
+echo "TOOLSET_PATH=${TOOLSET_PATH}"
+export recursion_limit_worker=12
+export recursion_limit_supervisor=10
+export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
+export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
+export CRAG_SERVER=http://${ip_address}:18881
+
+export AGENTQNA_FRONTEND_PORT="9090"
+
+#retrieval_tool
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
+export REDIS_URL="redis://${host_ip}:26379"
+export INDEX_NAME="rag-redis"
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export EMBEDDING_SERVICE_HOST_IP=${host_ip}
+export RETRIEVER_SERVICE_HOST_IP=${host_ip}
+export RERANK_SERVICE_HOST_IP=${host_ip}
+export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
+export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
+export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
+export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
+
+docker compose -f compose.yaml up -d
--- a/AgentQnA/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/set_env.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+WORKPATH=$(dirname "$PWD")/..
+export ip_address=${host_ip}
+export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
+export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+export AGENTQNA_TGI_SERVICE_PORT="19001"
+
+# LLM related environment variables
+export AGENTQNA_CARD_ID="card1"
+export AGENTQNA_RENDER_ID="renderD136"
+export HF_CACHE_DIR=${HF_CACHE_DIR}
+ls $HF_CACHE_DIR
+export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
+export NUM_SHARDS=4
+export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
+export temperature=0.01
+export max_new_tokens=512
+
+# agent related environment variables
+export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
+export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
+export recursion_limit_worker=12
+export recursion_limit_supervisor=10
+export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
+export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
+export CRAG_SERVER=http://${ip_address}:18881
+
+export AGENTQNA_FRONTEND_PORT="15557"
+
+#retrieval_tool
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
+export REDIS_URL="redis://${host_ip}:26379"
+export INDEX_NAME="rag-redis"
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export EMBEDDING_SERVICE_HOST_IP=${host_ip}
+export RETRIEVER_SERVICE_HOST_IP=${host_ip}
+export RERANK_SERVICE_HOST_IP=${host_ip}
+export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
+export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
+export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
+export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
--- a/AgentQnA/docker_compose/intel/cpu/xeon/README.md
+++ b/AgentQnA/docker_compose/intel/cpu/xeon/README.md
@@ -1,3 +1,100 @@
-# Deployment on Xeon
+# Single node on-prem deployment with Docker Compose on Xeon Scalable processors

-We deploy the retrieval tool on Xeon. For LLMs, we support OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
+This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
+
+## Deployment with docker
+
+1. First, clone this repo.
+   ```
+   export WORKDIR=<your-work-directory>
+   cd $WORKDIR
+   git clone https://github.com/opea-project/GenAIExamples.git
+   ```
+2. Set up environment for this example </br>
+
+   ```
+   # Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
+   export host_ip=$(hostname -I | awk '{print $1}')
+   # if you are in a proxy environment, also set the proxy-related environment variables
+   export http_proxy="Your_HTTP_Proxy"
+   export https_proxy="Your_HTTPs_Proxy"
+   # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
+   export no_proxy="Your_No_Proxy"
+
+   export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
+   #OPANAI_API_KEY if you want to use OpenAI models
+   export OPENAI_API_KEY=<your-openai-key>
+   ```
+
+3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
+
+   First, launch the mega-service.
+
+   ```
+   cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
+   bash launch_retrieval_tool.sh
+   ```
+
+   Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
+
+   ```
+   bash run_ingest_data.sh
+   ```
+
+4. Launch Tool service
+   In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
+   ```
+   docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
+   ```
+5. Launch `Agent` service
+
+   The configurations of the supervisor agent and the worker agent are defined in the docker-compose yaml file. We currently use openAI GPT-4o-mini as LLM, and llama3.1-70B-instruct (served by TGI-Gaudi) in Gaudi example. To use openai llm, run command below.
+
+   ```
+   cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
+   bash launch_agent_service_openai.sh
+   ```
+
+6. [Optional] Build `Agent` docker image if pulling images failed.
+
+   ```
+   git clone https://github.com/opea-project/GenAIComps.git
+   cd GenAIComps
+   docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
+   ```
+
+## Validate services
+
+First look at logs of the agent docker containers:
+
+```
+# worker agent
+docker logs rag-agent-endpoint
+```
+
+```
+# supervisor agent
+docker logs react-agent-endpoint
+```
+
+You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
+
+Second, validate worker agent:
+
+```
+curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Most recent album by Taylor Swift"
+    }'
+```
+
+Third, validate supervisor agent:
+
+```
+curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Most recent album by Taylor Swift"
+    }'
+```
+
+## How to register your own tools with agent
+
+You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).
--- a/AgentQnA/docker_compose/intel/cpu/xeon/launch_agent_service_openai.sh
+++ b/AgentQnA/docker_compose/intel/cpu/xeon/launch_agent_service_openai.sh
@@ -1,6 +1,9 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

+pushd "../../../../../" > /dev/null
+source .set_env.sh
+popd > /dev/null
 export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
 export ip_address=$(hostname -I | awk '{print $1}')
 export recursion_limit_worker=12
--- a/AgentQnA/docker_compose/intel/hpu/gaudi/README.md
+++ b/AgentQnA/docker_compose/intel/hpu/gaudi/README.md
@@ -0,0 +1,105 @@
+# Single node on-prem deployment AgentQnA on Gaudi
+
+This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Gaudi using open-source LLMs,
+For more details, please refer to the deployment guide [here](../../../../README.md).
+
+## Deployment with docker
+
+1. First, clone this repo.
+   ```
+   export WORKDIR=<your-work-directory>
+   cd $WORKDIR
+   git clone https://github.com/opea-project/GenAIExamples.git
+   ```
+2. Set up environment for this example </br>
+
+   ```
+   # Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
+   export host_ip=$(hostname -I | awk '{print $1}')
+   # if you are in a proxy environment, also set the proxy-related environment variables
+   export http_proxy="Your_HTTP_Proxy"
+   export https_proxy="Your_HTTPs_Proxy"
+   # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
+   export no_proxy="Your_No_Proxy"
+
+   export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
+   # for using open-source llms
+   export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
+   # Example export HF_CACHE_DIR=$WORKDIR so that no need to redownload every time
+   export HF_CACHE_DIR=<directory-where-llms-are-downloaded>
+
+   ```
+
+3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
+
+   First, launch the mega-service.
+
+   ```
+   cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
+   bash launch_retrieval_tool.sh
+   ```
+
+   Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
+
+   ```
+   bash run_ingest_data.sh
+   ```
+
+4. Launch Tool service
+   In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
+   ```
+   docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
+   ```
+5. Launch `Agent` service
+
+   To use open-source LLMs on Gaudi2, run commands below.
+
+   ```
+   cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
+   bash launch_tgi_gaudi.sh
+   bash launch_agent_service_tgi_gaudi.sh
+   ```
+
+6. [Optional] Build `Agent` docker image if pulling images failed.
+
+   ```
+   git clone https://github.com/opea-project/GenAIComps.git
+   cd GenAIComps
+   docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
+   ```
+
+## Validate services
+
+First look at logs of the agent docker containers:
+
+```
+# worker agent
+docker logs rag-agent-endpoint
+```
+
+```
+# supervisor agent
+docker logs react-agent-endpoint
+```
+
+You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
+
+Second, validate worker agent:
+
+```
+curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Most recent album by Taylor Swift"
+    }'
+```
+
+Third, validate supervisor agent:
+
+```
+curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Most recent album by Taylor Swift"
+    }'
+```
+
+## How to register your own tools with agent
+
+You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).
--- a/AgentQnA/docker_compose/intel/hpu/gaudi/launch_agent_service_tgi_gaudi.sh
+++ b/AgentQnA/docker_compose/intel/hpu/gaudi/launch_agent_service_tgi_gaudi.sh
@@ -1,6 +1,9 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

+pushd "../../../../../" > /dev/null
+source .set_env.sh
+popd > /dev/null
 WORKPATH=$(dirname "$PWD")/..
 # export WORKDIR=$WORKPATH/../../
 echo "WORKDIR=${WORKDIR}"
--- a/AgentQnA/docker_compose/intel/hpu/gaudi/tgi_gaudi.yaml
+++ b/AgentQnA/docker_compose/intel/hpu/gaudi/tgi_gaudi.yaml
@@ -3,7 +3,7 @@

 services:
  tgi-server:
-    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
+    image: ghcr.io/huggingface/tgi-gaudi:2.0.6
    container_name: tgi-server
    ports:
      - "8085:80"
--- a/AgentQnA/tests/step4a_launch_and_validate_agent_tgi_on_rocm.sh
+++ b/AgentQnA/tests/step4a_launch_and_validate_agent_tgi_on_rocm.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+set -ex
+
+WORKPATH=$(dirname "$PWD")
+export WORKDIR=$WORKPATH/../../
+echo "WORKDIR=${WORKDIR}"
+export ip_address=$(hostname -I | awk '{print $1}')
+export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+
+export HF_CACHE_DIR=$WORKDIR/hf_cache
+if [ ! -d "$HF_CACHE_DIR" ]; then
+    mkdir -p "$HF_CACHE_DIR"
+fi
+ls $HF_CACHE_DIR
+
+
+function start_agent_and_api_server() {
+    echo "Starting CRAG server"
+    docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
+
+    echo "Starting Agent services"
+    cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
+    bash launch_agent_service_tgi_rocm.sh
+}
+
+function validate() {
+    local CONTENT="$1"
+    local EXPECTED_RESULT="$2"
+    local SERVICE_NAME="$3"
+
+    if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
+        echo "[ $SERVICE_NAME ] Content is as expected: $CONTENT"
+        echo 0
+    else
+        echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
+        echo 1
+    fi
+}
+
+function validate_agent_service() {
+    echo "----------------Test agent ----------------"
+    local CONTENT=$(http_proxy="" curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Tell me about Michael Jackson song thriller"
+    }')
+    local EXIT_CODE=$(validate "$CONTENT" "Thriller" "react-agent-endpoint")
+    docker logs rag-agent-endpoint
+    if [ "$EXIT_CODE" == "1" ]; then
+        exit 1
+    fi
+
+    local CONTENT=$(http_proxy="" curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
+     "query": "Tell me about Michael Jackson song thriller"
+    }')
+    local EXIT_CODE=$(validate "$CONTENT" "Thriller" "react-agent-endpoint")
+    docker logs react-agent-endpoint
+    if [ "$EXIT_CODE" == "1" ]; then
+        exit 1
+    fi
+
+}
+
+function main() {
+    echo "==================== Start agent ===================="
+    start_agent_and_api_server
+    echo "==================== Agent started ===================="
+
+    echo "==================== Validate agent service ===================="
+    validate_agent_service
+    echo "==================== Agent service validated ===================="
+}
+
+main
--- a/AgentQnA/tests/test_compose_on_rocm.sh
+++ b/AgentQnA/tests/test_compose_on_rocm.sh
@@ -0,0 +1,75 @@
+#!/bin/bash
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+set -e
+
+WORKPATH=$(dirname "$PWD")
+export WORKDIR=$WORKPATH/../../
+echo "WORKDIR=${WORKDIR}"
+export ip_address=$(hostname -I | awk '{print $1}')
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
+
+function stop_crag() {
+    cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
+    echo "Stopping container kdd-cup-24-crag-service with cid $cid"
+    if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
+}
+
+function stop_agent_docker() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm
+    # docker compose -f compose.yaml down
+    container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
+    for container_name in $container_list; do
+        cid=$(docker ps -aq --filter "name=$container_name")
+        echo "Stopping container $container_name"
+        if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
+    done
+}
+
+function stop_retrieval_tool() {
+    echo "Stopping Retrieval tool"
+    local RETRIEVAL_TOOL_PATH=$WORKPATH/../DocIndexRetriever
+    cd $RETRIEVAL_TOOL_PATH/docker_compose/intel/cpu/xeon/
+    # docker compose -f compose.yaml down
+    container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
+    for container_name in $container_list; do
+        cid=$(docker ps -aq --filter "name=$container_name")
+        echo "Stopping container $container_name"
+        if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
+    done
+}
+echo "workpath: $WORKPATH"
+echo "=================== Stop containers ===================="
+stop_crag
+stop_agent_docker
+stop_retrieval_tool
+
+cd $WORKPATH/tests
+
+echo "=================== #1 Building docker images===================="
+bash step1_build_images.sh
+echo "=================== #1 Building docker images completed===================="
+
+echo "=================== #2 Start retrieval tool===================="
+bash step2_start_retrieval_tool.sh
+echo "=================== #2 Retrieval tool started===================="
+
+echo "=================== #3 Ingest data and validate retrieval===================="
+bash step3_ingest_data_and_validate_retrieval.sh
+echo "=================== #3 Data ingestion and validation completed===================="
+
+echo "=================== #4 Start agent and API server===================="
+bash step4a_launch_and_validate_agent_tgi_on_rocm.sh
+echo "=================== #4 Agent test passed ===================="
+
+echo "=================== #5 Stop agent and API server===================="
+stop_crag
+stop_agent_docker
+stop_retrieval_tool
+echo "=================== #5 Agent and API server stopped===================="
+
+echo y | docker system prune
+
+echo "ALL DONE!"
--- a/AudioQnA/Dockerfile
+++ b/AudioQnA/Dockerfile
@@ -16,9 +16,8 @@ RUN useradd -m -s /bin/bash user && \

 WORKDIR /home/user/
 RUN git clone https://github.com/opea-project/GenAIComps.git
-
 WORKDIR /home/user/GenAIComps
-RUN pip install --no-cache-dir --upgrade pip && \
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt

 COPY ./audioqna.py /home/user/audioqna.py
--- a/AudioQnA/Dockerfile.multilang
+++ b/AudioQnA/Dockerfile.multilang
@@ -18,7 +18,7 @@ WORKDIR /home/user/
 RUN git clone https://github.com/opea-project/GenAIComps.git

 WORKDIR /home/user/GenAIComps
-RUN pip install --no-cache-dir --upgrade pip && \
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt

 COPY ./audioqna_multilang.py /home/user/audioqna_multilang.py
--- a/AudioQnA/audioqna.py
+++ b/AudioQnA/audioqna.py
@@ -1,58 +1,133 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-import asyncio
 import os

-from comps import AudioQnAGateway, MicroService, ServiceOrchestrator, ServiceType
+from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
+from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
+from comps.cores.proto.docarray import LLMParams
+from fastapi import Request

-MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
 MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
-ASR_SERVICE_HOST_IP = os.getenv("ASR_SERVICE_HOST_IP", "0.0.0.0")
-ASR_SERVICE_PORT = int(os.getenv("ASR_SERVICE_PORT", 9099))
-LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
-LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
-TTS_SERVICE_HOST_IP = os.getenv("TTS_SERVICE_HOST_IP", "0.0.0.0")
-TTS_SERVICE_PORT = int(os.getenv("TTS_SERVICE_PORT", 9088))
+
+WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
+WHISPER_SERVER_PORT = int(os.getenv("WHISPER_SERVER_PORT", 7066))
+SPEECHT5_SERVER_HOST_IP = os.getenv("SPEECHT5_SERVER_HOST_IP", "0.0.0.0")
+SPEECHT5_SERVER_PORT = int(os.getenv("SPEECHT5_SERVER_PORT", 7055))
+LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
+LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 3006))
+
+
+def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
+    if self.services[cur_node].service_type == ServiceType.LLM:
+        # convert TGI/vLLM to unified OpenAI /v1/chat/completions format
+        next_inputs = {}
+        next_inputs["model"] = "tgi"  # specifically clarify the fake model to make the format unified
+        next_inputs["messages"] = [{"role": "user", "content": inputs["asr_result"]}]
+        next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
+        next_inputs["top_p"] = llm_parameters_dict["top_p"]
+        next_inputs["stream"] = inputs["streaming"]  # False as default
+        next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
+        # next_inputs["presence_penalty"] = inputs["presence_penalty"]
+        # next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
+        next_inputs["temperature"] = inputs["temperature"]
+        inputs = next_inputs
+    elif self.services[cur_node].service_type == ServiceType.TTS:
+        next_inputs = {}
+        next_inputs["text"] = inputs["choices"][0]["message"]["content"]
+        next_inputs["voice"] = kwargs["voice"]
+        inputs = next_inputs
+    return inputs
+
+
+def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
+    if self.services[cur_node].service_type == ServiceType.TTS:
+        new_inputs = {}
+        new_inputs["text"] = inputs["choices"][0]["text"]
+        return new_inputs
+    else:
+        return inputs


 class AudioQnAService:
    def __init__(self, host="0.0.0.0", port=8000):
        self.host = host
        self.port = port
+        ServiceOrchestrator.align_inputs = align_inputs
        self.megaservice = ServiceOrchestrator()

+        self.endpoint = str(MegaServiceEndpoint.AUDIO_QNA)
+
    def add_remote_service(self):
        asr = MicroService(
            name="asr",
-            host=ASR_SERVICE_HOST_IP,
-            port=ASR_SERVICE_PORT,
-            endpoint="/v1/audio/transcriptions",
+            host=WHISPER_SERVER_HOST_IP,
+            port=WHISPER_SERVER_PORT,
+            endpoint="/v1/asr",
            use_remote_service=True,
            service_type=ServiceType.ASR,
        )
        llm = MicroService(
            name="llm",
-            host=LLM_SERVICE_HOST_IP,
-            port=LLM_SERVICE_PORT,
+            host=LLM_SERVER_HOST_IP,
+            port=LLM_SERVER_PORT,
            endpoint="/v1/chat/completions",
            use_remote_service=True,
            service_type=ServiceType.LLM,
        )
        tts = MicroService(
            name="tts",
-            host=TTS_SERVICE_HOST_IP,
-            port=TTS_SERVICE_PORT,
-            endpoint="/v1/audio/speech",
+            host=SPEECHT5_SERVER_HOST_IP,
+            port=SPEECHT5_SERVER_PORT,
+            endpoint="/v1/tts",
            use_remote_service=True,
            service_type=ServiceType.TTS,
        )
        self.megaservice.add(asr).add(llm).add(tts)
        self.megaservice.flow_to(asr, llm)
        self.megaservice.flow_to(llm, tts)
-        self.gateway = AudioQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
+
+    async def handle_request(self, request: Request):
+        data = await request.json()
+
+        chat_request = AudioChatCompletionRequest.parse_obj(data)
+        parameters = LLMParams(
+            # relatively lower max_tokens for audio conversation
+            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
+            top_k=chat_request.top_k if chat_request.top_k else 10,
+            top_p=chat_request.top_p if chat_request.top_p else 0.95,
+            temperature=chat_request.temperature if chat_request.temperature else 0.01,
+            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
+            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
+            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
+            streaming=False,  # TODO add streaming LLM output as input to TTS
+        )
+        result_dict, runtime_graph = await self.megaservice.schedule(
+            initial_inputs={"audio": chat_request.audio},
+            llm_parameters=parameters,
+            voice=chat_request.voice if hasattr(chat_request, "voice") else "default",
+        )
+
+        last_node = runtime_graph.all_leaves()[-1]
+        response = result_dict[last_node]["tts_result"]
+
+        return response
+
+    def start(self):
+        self.service = MicroService(
+            self.__class__.__name__,
+            service_role=ServiceRoleType.MEGASERVICE,
+            host=self.host,
+            port=self.port,
+            endpoint=self.endpoint,
+            input_datatype=AudioChatCompletionRequest,
+            output_datatype=ChatCompletionResponse,
+        )
+        self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
+        self.service.start()


 if __name__ == "__main__":
-    audioqna = AudioQnAService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
+    audioqna = AudioQnAService(port=MEGA_SERVICE_PORT)
    audioqna.add_remote_service()
+    audioqna.start()
--- a/AudioQnA/audioqna_multilang.py
+++ b/AudioQnA/audioqna_multilang.py
@@ -1,13 +1,14 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-import asyncio
 import base64
 import os

-from comps import AudioQnAGateway, MicroService, ServiceOrchestrator, ServiceType
+from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
+from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
+from comps.cores.proto.docarray import LLMParams
+from fastapi import Request

-MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
 MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))

 WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
@@ -19,12 +20,8 @@ LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 8888))


 def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
-    print(inputs)
-    if self.services[cur_node].service_type == ServiceType.ASR:
-        # {'byte_str': 'UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA'}
-        inputs["audio"] = inputs["byte_str"]
-        del inputs["byte_str"]
-    elif self.services[cur_node].service_type == ServiceType.LLM:
+
+    if self.services[cur_node].service_type == ServiceType.LLM:
        # convert TGI/vLLM to unified OpenAI /v1/chat/completions format
        next_inputs = {}
        next_inputs["model"] = "tgi"  # specifically clarify the fake model to make the format unified
@@ -60,6 +57,8 @@ class AudioQnAService:
        ServiceOrchestrator.align_outputs = align_outputs
        self.megaservice = ServiceOrchestrator()

+        self.endpoint = str(MegaServiceEndpoint.AUDIO_QNA)
+
    def add_remote_service(self):
        asr = MicroService(
            name="asr",
@@ -90,9 +89,46 @@ class AudioQnAService:
        self.megaservice.add(asr).add(llm).add(tts)
        self.megaservice.flow_to(asr, llm)
        self.megaservice.flow_to(llm, tts)
-        self.gateway = AudioQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
+
+    async def handle_request(self, request: Request):
+        data = await request.json()
+
+        chat_request = AudioChatCompletionRequest.parse_obj(data)
+        parameters = LLMParams(
+            # relatively lower max_tokens for audio conversation
+            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
+            top_k=chat_request.top_k if chat_request.top_k else 10,
+            top_p=chat_request.top_p if chat_request.top_p else 0.95,
+            temperature=chat_request.temperature if chat_request.temperature else 0.01,
+            frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
+            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
+            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
+            streaming=False,  # TODO add streaming LLM output as input to TTS
+        )
+        result_dict, runtime_graph = await self.megaservice.schedule(
+            initial_inputs={"audio": chat_request.audio}, llm_parameters=parameters
+        )
+
+        last_node = runtime_graph.all_leaves()[-1]
+        response = result_dict[last_node]["byte_str"]
+
+        return response
+
+    def start(self):
+        self.service = MicroService(
+            self.__class__.__name__,
+            service_role=ServiceRoleType.MEGASERVICE,
+            host=self.host,
+            port=self.port,
+            endpoint=self.endpoint,
+            input_datatype=AudioChatCompletionRequest,
+            output_datatype=ChatCompletionResponse,
+        )
+        self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
+        self.service.start()


 if __name__ == "__main__":
-    audioqna = AudioQnAService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
+    audioqna = AudioQnAService(port=MEGA_SERVICE_PORT)
    audioqna.add_remote_service()
+    audioqna.start()
--- a/AudioQnA/benchmark/accuracy/README.md
+++ b/AudioQnA/benchmark/accuracy/README.md
@@ -14,12 +14,12 @@ We evaluate the WER (Word Error Rate) metric of the ASR microservice.

 ### Launch ASR microservice

-Launch the ASR microserice with the following commands. For more details please refer to [doc](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/whisper/README.md).
+Launch the ASR microserice with the following commands. For more details please refer to [doc](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src/README.md).

 ```bash
 git clone https://github.com/opea-project/GenAIComps
 cd GenAIComps
-docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
 # change the name of model by editing model_name_or_path you want to evaluate
 docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest --model_name_or_path "openai/whisper-tiny"
 ```
--- a/AudioQnA/benchmark/performance/README.md
+++ b/AudioQnA/benchmark/performance/README.md
@@ -0,0 +1,77 @@
+# AudioQnA Benchmarking
+
+This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance.
+
+By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
+
+## Purpose
+
+We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
+
+- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
+- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
+- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
+
+## Metrics
+
+The benchmark will report the below metrics, including:
+
+- Number of Concurrent Requests
+- End-to-End Latency: P50, P90, P99 (in milliseconds)
+- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
+- Average Next Token Latency (in milliseconds)
+- Average Token Latency (in milliseconds)
+- Requests Per Second (RPS)
+- Output Tokens Per Second
+- Input Tokens Per Second
+
+Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
+
+## Getting Started
+
+We recommend using Kubernetes to deploy the AudioQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs.
+
+### Prerequisites
+
+- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
+
+- Every node has direct internet access
+- Set up kubectl on the master node with access to the Kubernetes cluster.
+- Install Python 3.8+ on the master node for running GenAIEval.
+- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
+- Ensure that the container's ulimit can meet the the number of requests.
+
+```bash
+# The way to modify the containered ulimit:
+sudo systemctl edit containerd
+# Add two lines:
+[Service]
+LimitNOFILE=65536:1048576
+
+sudo systemctl daemon-reload; sudo systemctl restart containerd
+```
+
+## Test Steps
+
+Please deploy AudioQnA service before benchmarking.
+
+### Run Benchmark Test
+
+Before the benchmark, we can configure the number of test queries and test output directory by:
+
+```bash
+export USER_QUERIES="[128, 128, 128, 128]"
+export TEST_OUTPUT_DIR="/tmp/benchmark_output"
+```
+
+And then run the benchmark by:
+
+```bash
+bash benchmark.sh -n <node_count>
+```
+
+The argument `-n` refers to the number of test nodes.
+
+### Data collection
+
+All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
--- a/AudioQnA/benchmark/performance/benchmark.sh
+++ b/AudioQnA/benchmark/performance/benchmark.sh
@@ -0,0 +1,99 @@
+#!/bin/bash
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+deployment_type="k8s"
+node_number=1
+service_port=8888
+query_per_node=128
+
+benchmark_tool_path="$(pwd)/GenAIEval"
+
+usage() {
+    echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
+    echo "  -d deployment_type    AudioQnA deployment type, select between k8s and docker (default: k8s)"
+    echo "  -n node_number        Test node number, required only for k8s deployment_type, (default: 1)"
+    echo "  -i service_ip         AudioQnA service ip, required only for docker deployment_type"
+    echo "  -p service_port       AudioQnA service port, required only for docker deployment_type, (default: 8888)"
+    exit 1
+}
+
+while getopts ":d:n:i:p:" opt; do
+    case ${opt} in
+        d )
+            deployment_type=$OPTARG
+            ;;
+        n )
+            node_number=$OPTARG
+            ;;
+        i )
+            service_ip=$OPTARG
+            ;;
+        p )
+            service_port=$OPTARG
+            ;;
+        \? )
+            echo "Invalid option: -$OPTARG" 1>&2
+            usage
+            ;;
+        : )
+            echo "Invalid option: -$OPTARG requires an argument" 1>&2
+            usage
+            ;;
+    esac
+done
+
+if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
+    echo "Error: service_ip is required for docker deployment_type" 1>&2
+    usage
+fi
+
+if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
+    echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
+fi
+
+function main() {
+    if [[ ! -d ${benchmark_tool_path} ]]; then
+        echo "Benchmark tool not found, setting up..."
+        setup_env
+    fi
+    run_benchmark
+}
+
+function setup_env() {
+    git clone https://github.com/opea-project/GenAIEval.git
+    pushd ${benchmark_tool_path}
+    python3 -m venv stress_venv
+    source stress_venv/bin/activate
+    pip install -r requirements.txt
+    popd
+}
+
+function run_benchmark() {
+    source ${benchmark_tool_path}/stress_venv/bin/activate
+    export DEPLOYMENT_TYPE=${deployment_type}
+    export SERVICE_IP=${service_ip:-"None"}
+    export SERVICE_PORT=${service_port:-"None"}
+    if [[ -z $USER_QUERIES ]]; then
+        user_query=$((query_per_node*node_number))
+        export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
+        echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
+    fi
+    export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
+    if [[ -z $WARMUP ]]; then export WARMUP=0; fi
+    if [[ -z $TEST_OUTPUT_DIR ]]; then
+        if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
+            export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
+        else
+            export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
+        fi
+        echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
+    fi
+
+    envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
+    cd ${benchmark_tool_path}/evals/benchmark
+    python benchmark.py
+}
+
+main
--- a/AudioQnA/benchmark/performance/benchmark.yaml
+++ b/AudioQnA/benchmark/performance/benchmark.yaml
@@ -0,0 +1,52 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+test_suite_config: # Overall configuration settings for the test suite
+  examples: ["audioqna"]  # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
+  deployment_type: "k8s"  # Default is "k8s", can also be "docker"
+  service_ip: None  # Leave as None for k8s, specify for Docker
+  service_port: None  # Leave as None for k8s, specify for Docker
+  warm_ups: 0  # Number of test requests for warm-up
+  run_time: 60m  # The max total run time for the test suite
+  seed:  # The seed for all RNGs
+  user_queries: [1, 2, 4, 8, 16, 32, 64, 128]  # Number of test requests at each concurrency level
+  query_timeout: 120  # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
+  random_prompt: false  # Use random prompts if true, fixed prompts if false
+  collect_service_metric: false  # Collect service metrics if true, do not collect service metrics if false
+  data_visualization: false # Generate data visualization if true, do not generate data visualization if false
+  llm_model: "Intel/neural-chat-7b-v3-3"  # The LLM model used for the test
+  test_output_dir: "/tmp/benchmark_output"  # The directory to store the test output
+  load_shape:              # Tenant concurrency pattern
+    name: constant           # poisson or constant(locust default load shape)
+    params:                  # Loadshape-specific parameters
+      constant:                # Poisson load shape specific parameters, activate only if load_shape is poisson
+        concurrent_level: 4      # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
+      poisson:                 # Poisson load shape specific parameters, activate only if load_shape is poisson
+        arrival-rate: 1.0        # Request arrival rate
+  namespace: "" # Fill the user-defined namespace. Otherwise, it will be default.
+
+test_cases:
+  audioqna:
+    asr:
+      run_test: true
+      service_name: "asr-svc"  # Replace with your service name
+    llm:
+      run_test: true
+      service_name: "llm-svc"  # Replace with your service name
+      parameters:
+        model_name: "Intel/neural-chat-7b-v3-3"
+        max_new_tokens: 128
+        temperature: 0.01
+        top_k: 10
+        top_p: 0.95
+        repetition_penalty: 1.03
+        streaming: true
+    llmserve:
+      run_test: true
+      service_name: "llm-svc"  # Replace with your service name
+    tts:
+      run_test: true
+      service_name: "tts-svc"  # Replace with your service name
+    e2e:
+      run_test: true
+      service_name: "audioqna-backend-server-svc"  # Replace with your service name
--- a/AudioQnA/docker_compose/amd/gpu/rocm/README.md
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/README.md
@@ -0,0 +1,137 @@
+# Build Mega Service of AudioQnA on AMD ROCm GPU
+
+This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice
+pipeline on server on AMD ROCm GPU platform.
+
+## 🚀 Build Docker images
+
+### 1. Source Code install GenAIComps
+
+```bash
+git clone https://github.com/opea-project/GenAIComps.git
+cd GenAIComps
+```
+
+### 2. Build ASR Image
+
+```bash
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
+```
+
+### 3. Build LLM Image
+
+For compose for ROCm example AMD optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (https://github.com/huggingface/text-generation-inference)
+
+### 4. Build TTS Image
+
+```bash
+docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
+```
+
+### 5. Build MegaService Docker Image
+
+To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
+
+```bash
+git clone https://github.com/opea-project/GenAIExamples.git
+cd GenAIExamples/AudioQnA/
+docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
+```
+
+Then run the command `docker images`, you will have following images ready:
+
+1. `opea/whisper:latest`
+2. `opea/speecht5:latest`
+3. `opea/audioqna:latest`
+
+## 🚀 Set the environment variables
+
+Before starting the services with `docker compose`, you have to recheck the following environment variables.
+
+```bash
+export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
+export HUGGINGFACEHUB_API_TOKEN=<your HF token>
+
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}
+
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
+```
+
+or use set_env.sh file to setup environment variables.
+
+Note: Please replace with host_ip with your external IP address, do not use localhost.
+
+Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
+
+Example for set isolation for 1 GPU
+
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD128:/dev/dri/renderD128
+
+Example for set isolation for 2 GPUs
+
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD128:/dev/dri/renderD128
+      - /dev/dri/card0:/dev/dri/card0
+      - /dev/dri/renderD129:/dev/dri/renderD129
+
+Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
+
+## 🚀 Start the MegaService
+
+```bash
+cd GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm/
+docker compose up -d
+```
+
+In following cases, you could build docker image from source by yourself.
+
+- Failed to download the docker image.
+- If you want to use a specific version of Docker image.
+
+Please refer to 'Build Docker Images' in below.
+
+## 🚀 Consume the AudioQnA Service
+
+Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
+base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
+to the response, decode the base64 string and save it as a .wav file.
+
+```bash
+# voice can be "default" or "male"
+curl http://${host_ip}:3008/v1/audioqna \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
+  -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
+```
+
+## 🚀 Test MicroServices
+
+```bash
+# whisper service
+curl http://${host_ip}:7066/v1/asr \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+  -H 'Content-Type: application/json'
+
+# tgi service
+curl http://${host_ip}:3006/generate \
+  -X POST \
+  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
+  -H 'Content-Type: application/json'
+
+# speecht5 service
+curl http://${host_ip}:7055/v1/tts \
+  -X POST \
+  -d '{"text": "Who are you?"}' \
+  -H 'Content-Type: application/json'
+```
--- a/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/compose.yaml
@@ -0,0 +1,85 @@
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+services:
+  whisper-service:
+    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
+    container_name: whisper-service
+    ports:
+      - "7066:7066"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+  speecht5-service:
+    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
+    container_name: speecht5-service
+    ports:
+      - "7055:7055"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+    restart: unless-stopped
+  tgi-service:
+    image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    container_name: tgi-service
+    ports:
+      - "3006:80"
+    volumes:
+     - "./data:/data"
+    shm_size: 1g
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri/card1:/dev/dri/card1
+      - /dev/dri/renderD136:/dev/dri/renderD136
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      HF_HUB_DISABLE_PROGRESS_BARS: 1
+      HF_HUB_ENABLE_HF_TRANSFER: 0
+      host_ip: ${host_ip}
+    healthcheck:
+      test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
+      interval: 10s
+      timeout: 10s
+      retries: 100
+    command: --model-id ${LLM_MODEL_ID}
+    cap_add:
+      - SYS_PTRACE
+    group_add:
+      - video
+    security_opt:
+      - seccomp:unconfined
+    ipc: host
+  audioqna-backend-server:
+    image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
+    container_name: audioqna-xeon-backend-server
+    depends_on:
+      - whisper-service
+      - tgi-service
+      - speecht5-service
+    ports:
+      - "3008:8888"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
+      - WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
+      - WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT}
+      - SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
+      - SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
+    ipc: host
+    restart: always
+
+networks:
+  default:
+    driver: bridge
--- a/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/AudioQnA/docker_compose/amd/gpu/rocm/set_env.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash                                                                                                           set_env.sh
+
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+
+# export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
+
+export host_ip="192.165.1.21"
+export HUGGINGFACEHUB_API_TOKEN=${YOUR_HUGGINGFACEHUB_API_TOKEN}
+# <token>
+
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}
+
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
--- a/AudioQnA/docker_compose/intel/cpu/xeon/README.md
+++ b/AudioQnA/docker_compose/intel/cpu/xeon/README.md
@@ -14,27 +14,20 @@ cd GenAIComps
 ### 2. Build ASR Image

 ```bash
-docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
-
-
-docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
 ```

 ### 3. Build LLM Image

-```bash
-docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
-```
+Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu (https://github.com/huggingface/text-generation-inference)

 ### 4. Build TTS Image

 ```bash
-docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile .
-
-docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
+docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
 ```

-### 6. Build MegaService Docker Image
+### 5. Build MegaService Docker Image

 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:

@@ -47,11 +40,8 @@ docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_p
 Then run the command `docker images`, you will have following images ready:

 1. `opea/whisper:latest`
-2. `opea/asr:latest`
-3. `opea/llm-tgi:latest`
-4. `opea/speecht5:latest`
-5. `opea/tts:latest`
-6. `opea/audioqna:latest`
+2. `opea/speecht5:latest`
+3. `opea/audioqna:latest`

 ## 🚀 Set the environment variables

@@ -61,22 +51,24 @@ Before starting the services with `docker compose`, you have to recheck the foll
 export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
 export HUGGINGFACEHUB_API_TOKEN=<your HF token>

-export TGI_LLM_ENDPOINT=http://$host_ip:3006
 export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-export ASR_ENDPOINT=http://$host_ip:7066
-export TTS_ENDPOINT=http://$host_ip:7055
-
 export MEGA_SERVICE_HOST_IP=${host_ip}
-export ASR_SERVICE_HOST_IP=${host_ip}
-export TTS_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}

-export ASR_SERVICE_PORT=3001
-export TTS_SERVICE_PORT=3002
-export LLM_SERVICE_PORT=3007
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
 ```

+or use set_env.sh file to setup environment variables.
+
+Note: Please replace with host_ip with your external IP address, do not use localhost.
+
 ## 🚀 Start the MegaService

 ```bash
@@ -93,36 +85,18 @@ curl http://${host_ip}:7066/v1/asr \
  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
  -H 'Content-Type: application/json'

-# asr microservice
-curl http://${host_ip}:3001/v1/audio/transcriptions \
-  -X POST \
-  -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-  -H 'Content-Type: application/json'
-
 # tgi service
 curl http://${host_ip}:3006/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

-# llm microservice
-curl http://${host_ip}:3007/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-  -H 'Content-Type: application/json'
-
 # speecht5 service
 curl http://${host_ip}:7055/v1/tts \
  -X POST \
  -d '{"text": "Who are you?"}' \
  -H 'Content-Type: application/json'

-# tts microservice
-curl http://${host_ip}:3002/v1/audio/speech \
-  -X POST \
-  -d '{"text": "Who are you?"}' \
-  -H 'Content-Type: application/json'
-
 ```

 ## 🚀 Test MegaService
@@ -132,8 +106,9 @@ base64 string to the megaservice endpoint. The megaservice will return a spoken
 to the response, decode the base64 string and save it as a .wav file.

 ```bash
+# voice can be "default" or "male"
 curl http://${host_ip}:3008/v1/audioqna \
  -X POST \
-  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
  -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
 ```
--- a/AudioQnA/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/AudioQnA/docker_compose/intel/cpu/xeon/compose.yaml
@@ -13,14 +13,6 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
    restart: unless-stopped
-  asr:
-    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
-    container_name: asr-service
-    ports:
-      - "3001:9099"
-    ipc: host
-    environment:
-      ASR_ENDPOINT: ${ASR_ENDPOINT}
  speecht5-service:
    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
    container_name: speecht5-service
@@ -32,14 +24,6 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
    restart: unless-stopped
-  tts:
-    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
-    container_name: tts-service
-    ports:
-      - "3002:9088"
-    ipc: host
-    environment:
-      TTS_ENDPOINT: ${TTS_ENDPOINT}
  tgi-service:
    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
    container_name: tgi-service
@@ -53,29 +37,20 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      host_ip: ${host_ip}
+    healthcheck:
+      test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
+      interval: 10s
+      timeout: 10s
+      retries: 100
    command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "3007:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    restart: unless-stopped
  audioqna-xeon-backend-server:
    image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
    container_name: audioqna-xeon-backend-server
    depends_on:
-      - asr
-      - llm
-      - tts
+      - whisper-service
+      - tgi-service
+      - speecht5-service
    ports:
      - "3008:8888"
    environment:
@@ -83,12 +58,26 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
-      - ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
-      - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
-      - TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
+      - WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
+      - WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT}
+      - SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
+      - SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
+    ipc: host
+    restart: always
+  audioqna-xeon-ui-server:
+    image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
+    container_name: audioqna-xeon-ui-server
+    depends_on:
+      - audioqna-xeon-backend-server
+    ports:
+      - "5173:5173"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - CHAT_URL=${BACKEND_SERVICE_ENDPOINT}
    ipc: host
    restart: always

--- a/AudioQnA/docker_compose/intel/cpu/xeon/set_env.sh
+++ b/AudioQnA/docker_compose/intel/cpu/xeon/set_env.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# export host_ip=<your External Public IP>
+export host_ip=$(hostname -I | awk '{print $1}')
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+# <token>
+
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}
+
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
--- a/AudioQnA/docker_compose/intel/hpu/gaudi/README.md
+++ b/AudioQnA/docker_compose/intel/hpu/gaudi/README.md
@@ -14,27 +14,20 @@ cd GenAIComps
 ### 2. Build ASR Image

 ```bash
-docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile.intel_hpu .
-
-
-docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
 ```

 ### 3. Build LLM Image

-```bash
-docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
-```
+Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/tgi-gaudi:2.0.6 (https://github.com/huggingface/tgi-gaudi)

 ### 4. Build TTS Image

 ```bash
-docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile.intel_hpu .
-
-docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
+docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
 ```

-### 6. Build MegaService Docker Image
+### 5. Build MegaService Docker Image

 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:

@@ -47,11 +40,8 @@ docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_p
 Then run the command `docker images`, you will have following images ready:

 1. `opea/whisper-gaudi:latest`
-2. `opea/asr:latest`
-3. `opea/llm-tgi:latest`
-4. `opea/speecht5-gaudi:latest`
-5. `opea/tts:latest`
-6. `opea/audioqna:latest`
+2. `opea/speecht5-gaudi:latest`
+3. `opea/audioqna:latest`

 ## 🚀 Set the environment variables

@@ -61,20 +51,18 @@ Before starting the services with `docker compose`, you have to recheck the foll
 export host_ip=<your External Public IP>    # export host_ip=$(hostname -I | awk '{print $1}')
 export HUGGINGFACEHUB_API_TOKEN=<your HF token>

-export TGI_LLM_ENDPOINT=http://$host_ip:3006
 export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-export ASR_ENDPOINT=http://$host_ip:7066
-export TTS_ENDPOINT=http://$host_ip:7055
-
 export MEGA_SERVICE_HOST_IP=${host_ip}
-export ASR_SERVICE_HOST_IP=${host_ip}
-export TTS_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}

-export ASR_SERVICE_PORT=3001
-export TTS_SERVICE_PORT=3002
-export LLM_SERVICE_PORT=3007
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
 ```

 ## 🚀 Start the MegaService
@@ -95,36 +83,18 @@ curl http://${host_ip}:7066/v1/asr \
  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
  -H 'Content-Type: application/json'

-# asr microservice
-curl http://${host_ip}:3001/v1/audio/transcriptions \
-  -X POST \
-  -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-  -H 'Content-Type: application/json'
-
 # tgi service
 curl http://${host_ip}:3006/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

-# llm microservice
-curl http://${host_ip}:3007/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-  -H 'Content-Type: application/json'
-
 # speecht5 service
 curl http://${host_ip}:7055/v1/tts \
  -X POST \
  -d '{"text": "Who are you?"}' \
  -H 'Content-Type: application/json'

-# tts microservice
-curl http://${host_ip}:3002/v1/audio/speech \
-  -X POST \
-  -d '{"text": "Who are you?"}' \
-  -H 'Content-Type: application/json'
-
 ```

 ## 🚀 Test MegaService
@@ -134,8 +104,9 @@ base64 string to the megaservice endpoint. The megaservice will return a spoken
 to the response, decode the base64 string and save it as a .wav file.

 ```bash
+# voice can be "default" or "male"
 curl http://${host_ip}:3008/v1/audioqna \
  -X POST \
-  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
  -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
 ```
--- a/AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -18,14 +18,6 @@ services:
    cap_add:
      - SYS_NICE
    restart: unless-stopped
-  asr:
-    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
-    container_name: asr-service
-    ports:
-      - "3001:9099"
-    ipc: host
-    environment:
-      ASR_ENDPOINT: ${ASR_ENDPOINT}
  speecht5-service:
    image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
    container_name: speecht5-service
@@ -42,16 +34,8 @@ services:
    cap_add:
      - SYS_NICE
    restart: unless-stopped
-  tts:
-    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
-    container_name: tts-service
-    ports:
-      - "3002:9088"
-    ipc: host
-    environment:
-      TTS_ENDPOINT: ${TTS_ENDPOINT}
  tgi-service:
-    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
+    image: ghcr.io/huggingface/tgi-gaudi:2.0.6
    container_name: tgi-gaudi-server
    ports:
      - "3006:80"
@@ -74,29 +58,19 @@ services:
    cap_add:
      - SYS_NICE
    ipc: host
+    healthcheck:
+      test: ["CMD-SHELL", "sleep 500 && exit 0"]
+      interval: 1s
+      timeout: 505s
+      retries: 1
    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-gaudi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "3007:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    restart: unless-stopped
  audioqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
    container_name: audioqna-gaudi-backend-server
    depends_on:
-      - asr
-      - llm
-      - tts
+      - whisper-service
+      - tgi-service
+      - speecht5-service
    ports:
      - "3008:8888"
    environment:
@@ -104,12 +78,26 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
-      - ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
-      - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
-      - TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
+      - WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
+      - WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT}
+      - SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
+      - SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
+    ipc: host
+    restart: always
+  audioqna-gaudi-ui-server:
+    image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
+    container_name: audioqna-gaudi-ui-server
+    depends_on:
+      - audioqna-gaudi-backend-server
+    ports:
+      - "5173:5173"
+    environment:
+      - no_proxy=${no_proxy}
+      - https_proxy=${https_proxy}
+      - http_proxy=${http_proxy}
+      - CHAT_URL=${BACKEND_SERVICE_ENDPOINT}
    ipc: host
    restart: always

--- a/AudioQnA/docker_compose/intel/hpu/gaudi/set_env.sh
+++ b/AudioQnA/docker_compose/intel/hpu/gaudi/set_env.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# export host_ip=<your External Public IP>
+export host_ip=$(hostname -I | awk '{print $1}')
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+# <token>
+
+export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}
+
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_PORT=3006
+
+export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
--- a/AudioQnA/docker_image_build/build.yaml
+++ b/AudioQnA/docker_image_build/build.yaml
@@ -11,51 +11,63 @@ services:
      context: ../
      dockerfile: ./Dockerfile
    image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
+  audioqna-ui:
+    build:
+      context: ../ui
+      dockerfile: ./docker/Dockerfile
+    extends: audioqna
+    image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
+  audioqna-multilang:
+    build:
+      context: ../
+      dockerfile: ./Dockerfile.multilang
+    extends: audioqna
+    image: ${REGISTRY:-opea}/audioqna-multilang:${TAG:-latest}
  whisper-gaudi:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/dependency/Dockerfile.intel_hpu
+      dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
    extends: audioqna
    image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
  whisper:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/dependency/Dockerfile
+      dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
  asr:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/Dockerfile
+      dockerfile: comps/asr/src/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
  llm-tgi:
    build:
      context: GenAIComps
-      dockerfile: comps/llms/text-generation/tgi/Dockerfile
+      dockerfile: comps/llms/src/text-generation/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
  speecht5-gaudi:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/dependency/Dockerfile.intel_hpu
+      dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
    extends: audioqna
    image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
  speecht5:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/dependency/Dockerfile
+      dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
  tts:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/Dockerfile
+      dockerfile: comps/tts/src/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
  gpt-sovits:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/gpt-sovits/Dockerfile
+      dockerfile: comps/tts/src/integrations/dependency/gpt-sovits/Dockerfile
    extends: audioqna
    image: ${REGISTRY:-opea}/gpt-sovits:${TAG:-latest}
--- a/AudioQnA/kubernetes/intel/README_gmc.md
+++ b/AudioQnA/kubernetes/intel/README_gmc.md
@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
 Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
 For Gaudi:

- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
+- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.6
 - whisper-gaudi: opea/whisper-gaudi:latest
 - speecht5-gaudi: opea/speecht5-gaudi:latest

--- a/AudioQnA/kubernetes/intel/cpu/xeon/manifest/audioqna.yaml
+++ b/AudioQnA/kubernetes/intel/cpu/xeon/manifest/audioqna.yaml
@@ -7,69 +7,17 @@ metadata:
  name: audio-qna-config
  namespace: default
 data:
-  ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
-  TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
  LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
  HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
-  TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
  MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
-  ASR_SERVICE_HOST_IP: asr-svc
-  ASR_SERVICE_PORT: "3001"
-  LLM_SERVICE_HOST_IP: llm-svc
-  LLM_SERVICE_PORT: "3007"
-  TTS_SERVICE_HOST_IP: tts-svc
-  TTS_SERVICE_PORT: "3002"

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: asr-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: asr-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: asr-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: asr-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/asr:latest
-        imagePullPolicy: IfNotPresent
-        name: asr-deploy
-        args: null
-        ports:
-        - containerPort: 9099
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: asr-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: asr-deploy
-  ports:
-  - name: service
-    port: 3001
-    targetPort: 9099
+  WHISPER_SERVER_HOST_IP: whisper-svc
+  WHISPER_SERVER_PORT: 7066
+  SPEECHT5_SERVER_HOST_IP: speecht5-svc
+  SPEECHT5_SERVER_PORT: 7055
+  LLM_SERVER_HOST_IP: llm-svc
+  LLM_SERVER_PORT: 3006
+
 ---

 apiVersion: apps/v1
@@ -122,57 +70,6 @@ spec:
    port: 7066
    targetPort: 7066

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: tts-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: tts-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: tts-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: tts-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/tts:latest
-        imagePullPolicy: IfNotPresent
-        name: tts-deploy
-        args: null
-        ports:
-        - containerPort: 9088
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: tts-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: tts-deploy
-  ports:
-  - name: service
-    port: 3002
-    targetPort: 9088
-
 ---
 apiVersion: apps/v1
 kind: Deployment
@@ -291,57 +188,6 @@ spec:
    port: 3006
    targetPort: 80

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: llm-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: llm-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: llm-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: llm-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/llm-tgi:latest
-        imagePullPolicy: IfNotPresent
-        name: llm-deploy
-        args: null
-        ports:
-        - containerPort: 9000
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: llm-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: llm-deploy
-  ports:
-  - name: service
-    port: 3007
-    targetPort: 9000
-
 ---
 apiVersion: apps/v1
 kind: Deployment
--- a/AudioQnA/kubernetes/intel/hpu/gaudi/manifest/audioqna.yaml
+++ b/AudioQnA/kubernetes/intel/hpu/gaudi/manifest/audioqna.yaml
@@ -7,69 +7,17 @@ metadata:
  name: audio-qna-config
  namespace: default
 data:
-  ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
-  TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
  LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
  HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
-  TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
  MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
-  ASR_SERVICE_HOST_IP: asr-svc
-  ASR_SERVICE_PORT: "3001"
-  LLM_SERVICE_HOST_IP: llm-svc
-  LLM_SERVICE_PORT: "3007"
-  TTS_SERVICE_HOST_IP: tts-svc
-  TTS_SERVICE_PORT: "3002"

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: asr-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: asr-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: asr-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: asr-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/asr:latest
-        imagePullPolicy: IfNotPresent
-        name: asr-deploy
-        args: null
-        ports:
-        - containerPort: 9099
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: asr-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: asr-deploy
-  ports:
-  - name: service
-    port: 3001
-    targetPort: 9099
+  WHISPER_SERVER_HOST_IP: whisper-svc
+  WHISPER_SERVER_PORT: 7066
+  SPEECHT5_SERVER_HOST_IP: speecht5-svc
+  SPEECHT5_SERVER_PORT: 7055
+  LLM_SERVER_HOST_IP: llm-svc
+  LLM_SERVER_PORT: 3006
+
 ---

 apiVersion: apps/v1
@@ -134,57 +82,6 @@ spec:
    port: 7066
    targetPort: 7066

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: tts-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: tts-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: tts-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: tts-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/tts:latest
-        imagePullPolicy: IfNotPresent
-        name: tts-deploy
-        args: null
-        ports:
-        - containerPort: 9088
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: tts-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: tts-deploy
-  ports:
-  - name: service
-    port: 3002
-    targetPort: 9088
-
 ---
 apiVersion: apps/v1
 kind: Deployment
@@ -271,7 +168,7 @@ spec:
      - envFrom:
        - configMapRef:
            name: audio-qna-config
-        image: ghcr.io/huggingface/tgi-gaudi:2.0.5
+        image: ghcr.io/huggingface/tgi-gaudi:2.0.6
        name: llm-dependency-deploy-demo
        securityContext:
          capabilities:
@@ -343,57 +240,6 @@ spec:
    port: 3006
    targetPort: 80

---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: llm-deploy
-  namespace: default
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: llm-deploy
-  template:
-    metadata:
-      annotations:
-        sidecar.istio.io/rewriteAppHTTPProbers: 'true'
-      labels:
-        app: llm-deploy
-    spec:
-      topologySpreadConstraints:
-      - maxSkew: 1
-        topologyKey: kubernetes.io/hostname
-        whenUnsatisfiable: ScheduleAnyway
-        labelSelector:
-          matchLabels:
-            app: llm-deploy
-      hostIPC: true
-      containers:
-      - envFrom:
-        - configMapRef:
-            name: audio-qna-config
-        image: opea/llm-tgi:latest
-        imagePullPolicy: IfNotPresent
-        name: llm-deploy
-        args: null
-        ports:
-        - containerPort: 9000
-      serviceAccountName: default
---
-kind: Service
-apiVersion: v1
-metadata:
-  name: llm-svc
-spec:
-  type: ClusterIP
-  selector:
-    app: llm-deploy
-  ports:
-  - name: service
-    port: 3007
-    targetPort: 9000
-
 ---
 apiVersion: apps/v1
 kind: Deployment
--- a/AudioQnA/tests/test_compose_on_gaudi.sh
+++ b/AudioQnA/tests/test_compose_on_gaudi.sh
@@ -2,7 +2,7 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-set -e
+set -xe
 IMAGE_REPO=${IMAGE_REPO:-"opea"}
 IMAGE_TAG=${IMAGE_TAG:-"latest"}
 echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
@@ -19,71 +19,48 @@ function build_docker_images() {
    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
+    service_list="audioqna audioqna-ui whisper-gaudi speecht5-gaudi"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

-    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
+    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
    docker images && sleep 1s
 }

 function start_services() {
    cd $WORKPATH/docker_compose/intel/hpu/gaudi
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
-
-    export TGI_LLM_ENDPOINT=http://$ip_address:3006
    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-    export ASR_ENDPOINT=http://$ip_address:7066
-    export TTS_ENDPOINT=http://$ip_address:7055
-
    export MEGA_SERVICE_HOST_IP=${ip_address}
-    export ASR_SERVICE_HOST_IP=${ip_address}
-    export TTS_SERVICE_HOST_IP=${ip_address}
-    export LLM_SERVICE_HOST_IP=${ip_address}
+    export WHISPER_SERVER_HOST_IP=${ip_address}
+    export SPEECHT5_SERVER_HOST_IP=${ip_address}
+    export LLM_SERVER_HOST_IP=${ip_address}

-    export ASR_SERVICE_PORT=3001
-    export TTS_SERVICE_PORT=3002
-    export LLM_SERVICE_PORT=3007
+    export WHISPER_SERVER_PORT=7066
+    export SPEECHT5_SERVER_PORT=7055
+    export LLM_SERVER_PORT=3006

+    export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
    # sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env

    # Start Docker Containers
    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
-    n=0
-    until [[ "$n" -ge 100 ]]; do
-       docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log
-       if grep -q Connected $LOG_PATH/tgi_service_start.log; then
-           break
-       fi
-       sleep 5s
-       n=$((n+1))
-    done
-
-    n=0
-    until [[ "$n" -ge 100 ]]; do
-       docker logs whisper-service > $LOG_PATH/whisper_service_start.log
-       if grep -q "Uvicorn server setup on port" $LOG_PATH/whisper_service_start.log; then
-           break
-       fi
-       sleep 5s
-       n=$((n+1))
-    done
+    sleep 20s
 }


 function validate_megaservice() {
-    result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
-    echo "result is === $result"
-    if [[ $result == *"AAA"* ]]; then
+    response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
+    # always print the log
+    docker logs whisper-service > $LOG_PATH/whisper-service.log
+    docker logs speecht5-service > $LOG_PATH/tts-service.log
+    docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
+    docker logs audioqna-gaudi-backend-server > $LOG_PATH/audioqna-gaudi-backend-server.log
+    echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
+
+    if [[ $(file speech.mp3) == *"RIFF"* ]]; then
        echo "Result correct."
    else
-        docker logs whisper-service > $LOG_PATH/whisper-service.log
-        docker logs asr-service > $LOG_PATH/asr-service.log
-        docker logs speecht5-service > $LOG_PATH/tts-service.log
-        docker logs tts-service > $LOG_PATH/tts-service.log
-        docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
-        docker logs llm-tgi-gaudi-server > $LOG_PATH/llm-tgi-gaudi-server.log
-
        echo "Result wrong."
        exit 1
    fi
@@ -100,7 +77,7 @@ function validate_megaservice() {
 #
 #    sed -i "s/localhost/$ip_address/g" playwright.config.ts
 #
-##    conda install -c conda-forge nodejs -y
+##    conda install -c conda-forge nodejs=22.6.0 -y
 #    npm install && npm ci && npx playwright install --with-deps
 #    node -v && npm -v && pip list
 #
@@ -126,7 +103,6 @@ function main() {
    if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
    start_services

-    # validate_microservices
    validate_megaservice
    # validate_frontend

--- a/AudioQnA/tests/test_compose_on_rocm.sh
+++ b/AudioQnA/tests/test_compose_on_rocm.sh
@@ -0,0 +1,116 @@
+#!/bin/bash
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+set -xe
+IMAGE_REPO=${IMAGE_REPO:-"opea"}
+IMAGE_TAG=${IMAGE_TAG:-"latest"}
+echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
+echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
+export REGISTRY=${IMAGE_REPO}
+export TAG=${IMAGE_TAG}
+
+WORKPATH=$(dirname "$PWD")
+LOG_PATH="$WORKPATH/tests"
+ip_address=$(hostname -I | awk '{print $1}')
+export PATH="~/miniconda3/bin:$PATH"
+
+function build_docker_images() {
+    cd $WORKPATH/docker_image_build
+    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
+
+    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
+    service_list="audioqna audioqna-ui whisper speecht5"
+    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
+    echo "docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
+    docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+    docker images && sleep 1s
+}
+
+function start_services() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
+
+    export MEGA_SERVICE_HOST_IP=${ip_address}
+    export WHISPER_SERVER_HOST_IP=${ip_address}
+    export SPEECHT5_SERVER_HOST_IP=${ip_address}
+    export LLM_SERVER_HOST_IP=${ip_address}
+
+    export WHISPER_SERVER_PORT=7066
+    export SPEECHT5_SERVER_PORT=7055
+    export LLM_SERVER_PORT=3006
+
+    export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
+
+    # sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+
+    # Start Docker Containers
+    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
+    sleep 24s
+}
+function validate_megaservice() {
+    response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
+    # always print the log
+    docker logs whisper-service > $LOG_PATH/whisper-service.log
+    docker logs speecht5-service > $LOG_PATH/tts-service.log
+    docker logs tgi-service > $LOG_PATH/tgi-service.log
+    docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
+    echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
+
+    if [[ $(file speech.mp3) == *"RIFF"* ]]; then
+        echo "Result correct."
+    else
+        echo "Result wrong."
+        exit 1
+    fi
+
+}
+
+#function validate_frontend() {
+# Frontend tests are currently disabled
+#    cd $WORKPATH/ui/svelte
+#    local conda_env_name="OPEA_e2e"
+#    export PATH=${HOME}/miniforge3/bin/:$PATH
+##    conda remove -n ${conda_env_name} --all -y
+##    conda create -n ${conda_env_name} python=3.12 -y
+#    source activate ${conda_env_name}
+#
+#    sed -i "s/localhost/$ip_address/g" playwright.config.ts
+#
+##    conda install -c conda-forge nodejs -y
+#    npm install && npm ci && npx playwright install --with-deps
+#    node -v && npm -v && pip list
+#
+#    exit_status=0
+#    npx playwright test || exit_status=$?
+#
+#    if [ $exit_status -ne 0 ]; then
+#        echo "[TEST INFO]: ---------frontend test failed---------"
+#        exit $exit_status
+#    else
+#        echo "[TEST INFO]: ---------frontend test passed---------"
+#    fi
+#}
+
+function stop_docker() {
+    cd $WORKPATH/docker_compose/amd/gpu/rocm/
+    docker compose stop && docker compose rm -f
+}
+
+function main() {
+
+    stop_docker
+    if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
+    start_services
+
+    validate_megaservice
+    # Frontend tests are currently disabled
+    # validate_frontend
+
+    stop_docker
+    echo y | docker system prune
+
+}
+
+main
--- a/AudioQnA/tests/test_compose_on_xeon.sh
+++ b/AudioQnA/tests/test_compose_on_xeon.sh
@@ -2,7 +2,7 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-set -e
+set -xe
 IMAGE_REPO=${IMAGE_REPO:-"opea"}
 IMAGE_TAG=${IMAGE_TAG:-"latest"}
 echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
@@ -19,61 +19,49 @@ function build_docker_images() {
    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="audioqna whisper asr llm-tgi speecht5 tts"
+    service_list="audioqna audioqna-ui whisper speecht5"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

-    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
+    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
    docker images && sleep 1s
 }

 function start_services() {
    cd $WORKPATH/docker_compose/intel/cpu/xeon/
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
-    export TGI_LLM_ENDPOINT=http://$ip_address:3006
    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-    export ASR_ENDPOINT=http://$ip_address:7066
-    export TTS_ENDPOINT=http://$ip_address:7055
-
    export MEGA_SERVICE_HOST_IP=${ip_address}
-    export ASR_SERVICE_HOST_IP=${ip_address}
-    export TTS_SERVICE_HOST_IP=${ip_address}
-    export LLM_SERVICE_HOST_IP=${ip_address}
+    export WHISPER_SERVER_HOST_IP=${ip_address}
+    export SPEECHT5_SERVER_HOST_IP=${ip_address}
+    export LLM_SERVER_HOST_IP=${ip_address}

-    export ASR_SERVICE_PORT=3001
-    export TTS_SERVICE_PORT=3002
-    export LLM_SERVICE_PORT=3007
+    export WHISPER_SERVER_PORT=7066
+    export SPEECHT5_SERVER_PORT=7055
+    export LLM_SERVER_PORT=3006
+
+    export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna

    # sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env

    # Start Docker Containers
    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
-    n=0
-    until [[ "$n" -ge 100 ]]; do
-       docker logs tgi-service > $LOG_PATH/tgi_service_start.log
-       if grep -q Connected $LOG_PATH/tgi_service_start.log; then
-           break
-       fi
-       sleep 5s
-       n=$((n+1))
-    done
+    sleep 20s
 }


 function validate_megaservice() {
-    result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
-    echo $result
-    if [[ $result == *"AAA"* ]]; then
+    response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
+    # always print the log
+    docker logs whisper-service > $LOG_PATH/whisper-service.log
+    docker logs speecht5-service > $LOG_PATH/tts-service.log
+    docker logs tgi-service > $LOG_PATH/tgi-service.log
+    docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
+    echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
+
+    if [[ $(file speech.mp3) == *"RIFF"* ]]; then
        echo "Result correct."
    else
-        docker logs whisper-service > $LOG_PATH/whisper-service.log
-        docker logs asr-service > $LOG_PATH/asr-service.log
-        docker logs speecht5-service > $LOG_PATH/tts-service.log
-        docker logs tts-service > $LOG_PATH/tts-service.log
-        docker logs tgi-service > $LOG_PATH/tgi-service.log
-        docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log
-        docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
-
        echo "Result wrong."
        exit 1
    fi
@@ -90,7 +78,7 @@ function validate_megaservice() {
 #
 #    sed -i "s/localhost/$ip_address/g" playwright.config.ts
 #
-##    conda install -c conda-forge nodejs -y
+##    conda install -c conda-forge nodejs=22.6.0 -y
 #    npm install && npm ci && npx playwright install --with-deps
 #    node -v && npm -v && pip list
 #
--- a/AudioQnA/ui/docker/Dockerfile
+++ b/AudioQnA/ui/docker/Dockerfile
@@ -23,4 +23,4 @@ RUN npm run build
 EXPOSE 5173

 # Run the front-end application in preview mode
-CMD ["npm", "run", "preview", "--", "--port", "5173", "--host", "0.0.0.0"]
+CMD ["npm", "run", "preview", "--", "--port", "5173", "--host", "0.0.0.0"]
--- a/AudioQnA/ui/svelte/package.json
+++ b/AudioQnA/ui/svelte/package.json
@@ -1,5 +1,5 @@
 {
-  "name": "sveltekit-auth-example",
+  "name": "audio-qna",
  "version": "0.0.1",
  "private": true,
  "scripts": {
@@ -11,38 +11,38 @@
    "lint": "prettier --check . && eslint .",
    "format": "prettier --write ."
  },
+  "peerDependencies": {
+    "svelte": "^4.0.0"
+  },
  "devDependencies": {
    "@fortawesome/free-solid-svg-icons": "6.2.0",
-    "@sveltejs/adapter-auto": "1.0.0-next.75",
-    "@sveltejs/kit": "^1.30.4",
+    "@playwright/test": "^1.45.2",
+    "@sveltejs/adapter-auto": "^3.0.0",
+    "@sveltejs/kit": "^2.0.0",
+    "@sveltejs/vite-plugin-svelte": "^3.0.0",
    "@tailwindcss/typography": "0.5.7",
    "@types/debug": "4.1.7",
    "@typescript-eslint/eslint-plugin": "^5.27.0",
    "@typescript-eslint/parser": "^5.27.0",
-    "autoprefixer": "^10.4.7",
+    "autoprefixer": "^10.4.16",
    "daisyui": "^3.5.0",
    "debug": "4.3.4",
-    "eslint": "^8.16.0",
-    "eslint-config-prettier": "^8.3.0",
-    "eslint-plugin-neverthrow": "1.1.4",
-    "eslint-plugin-svelte3": "^4.0.0",
    "neverthrow": "5.0.0",
    "pocketbase": "0.7.0",
-    "postcss": "^8.4.23",
+    "postcss": "^8.4.31",
    "postcss-load-config": "^4.0.1",
    "postcss-preset-env": "^8.3.2",
    "prettier": "^2.8.8",
    "prettier-plugin-svelte": "^2.7.0",
    "prettier-plugin-tailwindcss": "^0.3.0",
-    "svelte": "^3.59.1",
-    "svelte-check": "^2.7.1",
+    "svelte": "^4.2.7",
+    "svelte-check": "^3.6.0",
    "svelte-fa": "3.0.3",
-    "svelte-preprocess": "^4.10.7",
-    "tailwindcss": "^3.1.5",
+    "tailwindcss": "^3.3.6",
    "ts-pattern": "4.0.5",
-    "tslib": "^2.3.1",
-    "typescript": "^4.7.4",
-    "vite": "^4.3.9"
+    "tslib": "^2.4.1",
+    "typescript": "^5.0.0",
+    "vite": "^5.0.11"
  },
  "type": "module",
  "dependencies": {
--- a/AudioQnA/ui/svelte/src/app.postcss
+++ b/AudioQnA/ui/svelte/src/app.postcss
@@ -79,4 +79,4 @@ a.btn {

 .w-12\/12 {
 	width: 100%
-}
+}
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/1.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/1.svg
@@ -89,4 +89,4 @@
            <stop offset="1" stop-color="#3300FF" stop-opacity="0.2" />
        </linearGradient>
    </defs>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/2.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/2.svg
@@ -89,4 +89,4 @@
            <stop offset="1" stop-color="#f3f4f6" stop-opacity="0" />
        </linearGradient>
    </defs>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/3.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/3.svg
@@ -76,4 +76,4 @@
            <stop offset="1" stop-color="#9CFFED" stop-opacity="0" />
        </linearGradient>
    </defs>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/4.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/4.svg
@@ -76,4 +76,4 @@
            <stop offset="1" stop-color="#6141E1" stop-opacity="0" />
        </linearGradient>
    </defs>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/5.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/5.svg
@@ -89,4 +89,4 @@
            <stop offset="1" stop-color="#3300FF" stop-opacity="0" />
        </linearGradient>
    </defs>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/stop-recording.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/stop-recording.svg
@@ -3,4 +3,4 @@
    <path
        d="M512 1024a512 512 0 1 1 512-512 512 512 0 0 1-512 512z m0-896a384 384 0 1 0 384 384A384 384 0 0 0 512 128z m128 576h-256a64 64 0 0 1-64-64v-256a64 64 0 0 1 64-64h256a64 64 0 0 1 64 64v256a64 64 0 0 1-64 64z"
        fill="#d81e06" p-id="3104"></path>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/upload.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/upload.svg
@@ -1 +1 @@
-<svg t="1713431562066" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="6399" width="32" height="32"><path d="M592 768h-160c-26.6 0-48-21.4-48-48V384h-175.4c-35.6 0-53.4-43-28.2-68.2L484.6 11.4c15-15 39.6-15 54.6 0l304.4 304.4c25.2 25.2 7.4 68.2-28.2 68.2H640v336c0 26.6-21.4 48-48 48z m432-16v224c0 26.6-21.4 48-48 48H48c-26.6 0-48-21.4-48-48V752c0-26.6 21.4-48 48-48h272v16c0 61.8 50.2 112 112 112h160c61.8 0 112-50.2 112-112v-16h272c26.6 0 48 21.4 48 48z m-248 176c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z m128 0c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z" p-id="6400" fill="#ffffff"></path></svg>
+<svg t="1713431562066" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="6399" width="32" height="32"><path d="M592 768h-160c-26.6 0-48-21.4-48-48V384h-175.4c-35.6 0-53.4-43-28.2-68.2L484.6 11.4c15-15 39.6-15 54.6 0l304.4 304.4c25.2 25.2 7.4 68.2-28.2 68.2H640v336c0 26.6-21.4 48-48 48z m432-16v224c0 26.6-21.4 48-48 48H48c-26.6 0-48-21.4-48-48V752c0-26.6 21.4-48 48-48h272v16c0 61.8 50.2 112 112 112h160c61.8 0 112-50.2 112-112v-16h272c26.6 0 48 21.4 48 48z m-248 176c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z m128 0c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z" p-id="6400" fill="#ffffff"></path></svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voice.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voice.svg
@@ -6,4 +6,4 @@
    <path
        d="M864 479.776 864 352c0-17.664-14.304-32-32-32s-32 14.336-32 32l0 127.776c0 160.16-129.184 290.464-288 290.464-158.784 0-288-130.304-288-290.464L224 352c0-17.664-14.336-32-32-32s-32 14.336-32 32l0 127.776c0 184.608 140.864 336.48 320 352.832L480 896 288 896c-17.664 0-32 14.304-32 32s14.336 32 32 32l448 0c17.696 0 32-14.304 32-32s-14.304-32-32-32l-192 0 0-63.36C723.136 816.256 864 664.384 864 479.776z"
        fill="#707070" p-id="2962"></path>
-</svg>
+</svg>
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voiceOff.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voiceOff.svg
--- a/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voiceOn.svg
+++ b/AudioQnA/ui/svelte/src/lib/assets/icons/svg/voiceOn.svg
--- a/AvatarChatbot/.gitignore
+++ b/AvatarChatbot/.gitignore
@@ -5,4 +5,4 @@
 docker_compose/intel/cpu/xeon/data
 docker_compose/intel/hpu/gaudi/data
 inputs/
-outputs/
+outputs/
--- a/AvatarChatbot/avatarchatbot.py
+++ b/AvatarChatbot/avatarchatbot.py
@@ -5,20 +5,48 @@ import asyncio
 import os
 import sys

-from comps import AvatarChatbotGateway, MicroService, ServiceOrchestrator, ServiceType
+from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
+from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
+from comps.cores.proto.docarray import LLMParams
+from fastapi import Request

-MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
 MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
-ASR_SERVICE_HOST_IP = os.getenv("ASR_SERVICE_HOST_IP", "0.0.0.0")
-ASR_SERVICE_PORT = int(os.getenv("ASR_SERVICE_PORT", 9099))
-LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
-LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
-TTS_SERVICE_HOST_IP = os.getenv("TTS_SERVICE_HOST_IP", "0.0.0.0")
-TTS_SERVICE_PORT = int(os.getenv("TTS_SERVICE_PORT", 9088))
+WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
+WHISPER_SERVER_PORT = int(os.getenv("WHISPER_SERVER_PORT", 7066))
+LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
+LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 3006))
+SPEECHT5_SERVER_HOST_IP = os.getenv("SPEECHT5_SERVER_HOST_IP", "0.0.0.0")
+SPEECHT5_SERVER_PORT = int(os.getenv("SPEECHT5_SERVER_PORT", 7055))
 ANIMATION_SERVICE_HOST_IP = os.getenv("ANIMATION_SERVICE_HOST_IP", "0.0.0.0")
 ANIMATION_SERVICE_PORT = int(os.getenv("ANIMATION_SERVICE_PORT", 9066))


+def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
+    if self.services[cur_node].service_type == ServiceType.LLM:
+        # convert TGI/vLLM to unified OpenAI /v1/chat/completions format
+        next_inputs = {}
+        next_inputs["model"] = "tgi"  # specifically clarify the fake model to make the format unified
+        next_inputs["messages"] = [{"role": "user", "content": inputs["asr_result"]}]
+        next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
+        next_inputs["top_p"] = llm_parameters_dict["top_p"]
+        next_inputs["stream"] = inputs["streaming"]  # False as default
+        next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
+        # next_inputs["presence_penalty"] = inputs["presence_penalty"]
+        # next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
+        next_inputs["temperature"] = inputs["temperature"]
+        inputs = next_inputs
+    elif self.services[cur_node].service_type == ServiceType.TTS:
+        next_inputs = {}
+        next_inputs["text"] = inputs["choices"][0]["message"]["content"]
+        next_inputs["voice"] = kwargs["voice"]
+        inputs = next_inputs
+    elif self.services[cur_node].service_type == ServiceType.ANIMATION:
+        next_inputs = {}
+        next_inputs["byte_str"] = inputs["tts_result"]
+        inputs = next_inputs
+    return inputs
+
+
 def check_env_vars(env_var_list):
    for var in env_var_list:
        if os.getenv(var) is None:
@@ -31,30 +59,32 @@ class AvatarChatbotService:
    def __init__(self, host="0.0.0.0", port=8000):
        self.host = host
        self.port = port
+        ServiceOrchestrator.align_inputs = align_inputs
        self.megaservice = ServiceOrchestrator()
+        self.endpoint = str(MegaServiceEndpoint.AVATAR_CHATBOT)

    def add_remote_service(self):
        asr = MicroService(
            name="asr",
-            host=ASR_SERVICE_HOST_IP,
-            port=ASR_SERVICE_PORT,
-            endpoint="/v1/audio/transcriptions",
+            host=WHISPER_SERVER_HOST_IP,
+            port=WHISPER_SERVER_PORT,
+            endpoint="/v1/asr",
            use_remote_service=True,
            service_type=ServiceType.ASR,
        )
        llm = MicroService(
            name="llm",
-            host=LLM_SERVICE_HOST_IP,
-            port=LLM_SERVICE_PORT,
+            host=LLM_SERVER_HOST_IP,
+            port=LLM_SERVER_PORT,
            endpoint="/v1/chat/completions",
            use_remote_service=True,
            service_type=ServiceType.LLM,
        )
        tts = MicroService(
            name="tts",
-            host=TTS_SERVICE_HOST_IP,
-            port=TTS_SERVICE_PORT,
-            endpoint="/v1/audio/speech",
+            host=SPEECHT5_SERVER_HOST_IP,
+            port=SPEECHT5_SERVER_PORT,
+            endpoint="/v1/tts",
            use_remote_service=True,
            service_type=ServiceType.TTS,
        )
@@ -70,7 +100,44 @@ class AvatarChatbotService:
        self.megaservice.flow_to(asr, llm)
        self.megaservice.flow_to(llm, tts)
        self.megaservice.flow_to(tts, animation)
-        self.gateway = AvatarChatbotGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
+
+    async def handle_request(self, request: Request):
+        data = await request.json()
+
+        chat_request = AudioChatCompletionRequest.model_validate(data)
+        parameters = LLMParams(
+            # relatively lower max_tokens for audio conversation
+            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
+            top_k=chat_request.top_k if chat_request.top_k else 10,
+            top_p=chat_request.top_p if chat_request.top_p else 0.95,
+            temperature=chat_request.temperature if chat_request.temperature else 0.01,
+            repetition_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 1.03,
+            streaming=False,  # TODO add streaming LLM output as input to TTS
+        )
+        # print(parameters)
+
+        result_dict, runtime_graph = await self.megaservice.schedule(
+            initial_inputs={"audio": chat_request.audio},
+            llm_parameters=parameters,
+            voice=chat_request.voice if hasattr(chat_request, "voice") else "default",
+        )
+
+        last_node = runtime_graph.all_leaves()[-1]
+        response = result_dict[last_node]["video_path"]
+        return response
+
+    def start(self):
+        self.service = MicroService(
+            self.__class__.__name__,
+            service_role=ServiceRoleType.MEGASERVICE,
+            host=self.host,
+            port=self.port,
+            endpoint=self.endpoint,
+            input_datatype=AudioChatCompletionRequest,
+            output_datatype=ChatCompletionResponse,
+        )
+        self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
+        self.service.start()


 if __name__ == "__main__":
@@ -78,16 +145,17 @@ if __name__ == "__main__":
        [
            "MEGA_SERVICE_HOST_IP",
            "MEGA_SERVICE_PORT",
-            "ASR_SERVICE_HOST_IP",
-            "ASR_SERVICE_PORT",
-            "LLM_SERVICE_HOST_IP",
-            "LLM_SERVICE_PORT",
-            "TTS_SERVICE_HOST_IP",
-            "TTS_SERVICE_PORT",
+            "WHISPER_SERVER_HOST_IP",
+            "WHISPER_SERVER_PORT",
+            "LLM_SERVER_HOST_IP",
+            "LLM_SERVER_PORT",
+            "SPEECHT5_SERVER_HOST_IP",
+            "SPEECHT5_SERVER_PORT",
            "ANIMATION_SERVICE_HOST_IP",
            "ANIMATION_SERVICE_PORT",
        ]
    )

-    avatarchatbot = AvatarChatbotService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
+    avatarchatbot = AvatarChatbotService(port=MEGA_SERVICE_PORT)
    avatarchatbot.add_remote_service()
+    avatarchatbot.start()
--- a/AvatarChatbot/docker_compose/intel/cpu/xeon/README.md
+++ b/AvatarChatbot/docker_compose/intel/cpu/xeon/README.md
@@ -14,32 +14,25 @@ cd GenAIComps
 ### 2. Build ASR Image

 ```bash
-docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
-
-
-docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
 ```

 ### 3. Build LLM Image

-```bash
-docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
-```
+Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu (https://github.com/huggingface/text-generation-inference)

 ### 4. Build TTS Image

 ```bash
-docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile .
-
-docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
+docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
 ```

 ### 5. Build Animation Image

 ```bash
-docker build -t opea/wav2lip:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/dependency/Dockerfile .
+docker build -t opea/wav2lip:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/integration/dependency/Dockerfile .

-docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/Dockerfile .
+docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/Dockerfile .
 ```

 ### 6. Build MegaService Docker Image
@@ -55,13 +48,10 @@ docker build --no-cache -t opea/avatarchatbot:latest --build-arg https_proxy=$ht
 Then run the command `docker images`, you will have following images ready:

 1. `opea/whisper:latest`
-2. `opea/asr:latest`
-3. `opea/llm-tgi:latest`
-4. `opea/speecht5:latest`
-5. `opea/tts:latest`
-6. `opea/wav2lip:latest`
-7. `opea/animation:latest`
-8. `opea/avatarchatbot:latest`
+2. `opea/speecht5:latest`
+3. `opea/wav2lip:latest`
+4. `opea/animation:latest`
+5. `opea/avatarchatbot:latest`

 ## 🚀 Set the environment variables

@@ -71,24 +61,21 @@ Before starting the services with `docker compose`, you have to recheck the foll
 export HUGGINGFACEHUB_API_TOKEN=<your_hf_token>
 export host_ip=$(hostname -I | awk '{print $1}')

-export TGI_LLM_ENDPOINT=http://$host_ip:3006
 export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-export ASR_ENDPOINT=http://$host_ip:7066
-export TTS_ENDPOINT=http://$host_ip:7055
 export WAV2LIP_ENDPOINT=http://$host_ip:7860

 export MEGA_SERVICE_HOST_IP=${host_ip}
-export ASR_SERVICE_HOST_IP=${host_ip}
-export TTS_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_PORT=3006
 export ANIMATION_SERVICE_HOST_IP=${host_ip}
+export ANIMATION_SERVICE_PORT=3008

 export MEGA_SERVICE_PORT=8888
-export ASR_SERVICE_PORT=3001
-export TTS_SERVICE_PORT=3002
-export LLM_SERVICE_PORT=3007
-export ANIMATION_SERVICE_PORT=3008
 ```

 - Xeon CPU
@@ -124,36 +111,18 @@ curl http://${host_ip}:7066/v1/asr \
  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
  -H 'Content-Type: application/json'

-# asr microservice
-curl http://${host_ip}:3001/v1/audio/transcriptions \
-  -X POST \
-  -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-  -H 'Content-Type: application/json'
-
 # tgi service
 curl http://${host_ip}:3006/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

-# llm microservice
-curl http://${host_ip}:3007/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-  -H 'Content-Type: application/json'
-
 # speecht5 service
 curl http://${host_ip}:7055/v1/tts \
  -X POST \
  -d '{"text": "Who are you?"}' \
  -H 'Content-Type: application/json'

-# tts microservice
-curl http://${host_ip}:3002/v1/audio/speech \
-  -X POST \
-  -d '{"text": "Who are you?"}' \
-  -H 'Content-Type: application/json'
-
 # wav2lip service
 cd ../../../..
 curl http://${host_ip}:7860/v1/wav2lip \
--- a/AvatarChatbot/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/AvatarChatbot/docker_compose/intel/cpu/xeon/compose.yaml
@@ -14,14 +14,6 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
    restart: unless-stopped
-  asr:
-    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
-    container_name: asr-service
-    ports:
-      - "3001:9099"
-    ipc: host
-    environment:
-      ASR_ENDPOINT: ${ASR_ENDPOINT}
  speecht5-service:
    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
    container_name: speecht5-service
@@ -33,14 +25,6 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
    restart: unless-stopped
-  tts:
-    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
-    container_name: tts-service
-    ports:
-      - "3002:9088"
-    ipc: host
-    environment:
-      TTS_ENDPOINT: ${TTS_ENDPOINT}
  tgi-service:
    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
    container_name: tgi-service
@@ -54,22 +38,13 @@ services:
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      host_ip: ${host_ip}
+    healthcheck:
+      test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
+      interval: 10s
+      timeout: 10s
+      retries: 100
    command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "3007:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    restart: unless-stopped
  wav2lip-service:
    image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
    container_name: wav2lip-service
@@ -110,9 +85,6 @@ services:
    image: ${REGISTRY:-opea}/avatarchatbot:${TAG:-latest}
    container_name: avatarchatbot-xeon-backend-server
    depends_on:
-      - asr
-      - llm
-      - tts
      - animation
    ports:
      - "3009:8888"
@@ -122,12 +94,12 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
-      - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
-      - ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
-      - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
-      - TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
+      - WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
+      - WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT}
+      - SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
+      - SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
      - ANIMATION_SERVICE_HOST_IP=${ANIMATION_SERVICE_HOST_IP}
      - ANIMATION_SERVICE_PORT=${ANIMATION_SERVICE_PORT}
    ipc: host
--- a/AvatarChatbot/docker_compose/intel/cpu/xeon/set_env.sh
+++ b/AvatarChatbot/docker_compose/intel/cpu/xeon/set_env.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+pushd "../../../../../" > /dev/null
+source .set_env.sh
+popd > /dev/null
--- a/AvatarChatbot/docker_compose/intel/hpu/gaudi/README.md
+++ b/AvatarChatbot/docker_compose/intel/hpu/gaudi/README.md
@@ -14,32 +14,29 @@ cd GenAIComps
 ### 2. Build ASR Image

 ```bash
-docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile.intel_hpu .
-
-
-docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
+docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
 ```

 ### 3. Build LLM Image

 ```bash
-docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
+docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
 ```

+Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/tgi-gaudi:2.0.6 (https://github.com/huggingface/tgi-gaudi)
+
 ### 4. Build TTS Image

 ```bash
-docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile.intel_hpu .
-
-docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
+docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
 ```

 ### 5. Build Animation Image

 ```bash
-docker build -t opea/wav2lip-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/dependency/Dockerfile.intel_hpu .
+docker build -t opea/wav2lip-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/integration/dependency/Dockerfile.intel_hpu .

-docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/Dockerfile .
+docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/Dockerfile .
 ```

 ### 6. Build MegaService Docker Image
@@ -55,13 +52,10 @@ docker build --no-cache -t opea/avatarchatbot:latest --build-arg https_proxy=$ht
 Then run the command `docker images`, you will have following images ready:

 1. `opea/whisper-gaudi:latest`
-2. `opea/asr:latest`
-3. `opea/llm-tgi:latest`
-4. `opea/speecht5-gaudi:latest`
-5. `opea/tts:latest`
-6. `opea/wav2lip-gaudi:latest`
-7. `opea/animation:latest`
-8. `opea/avatarchatbot:latest`
+2. `opea/speecht5-gaudi:latest`
+3. `opea/wav2lip-gaudi:latest`
+4. `opea/animation:latest`
+5. `opea/avatarchatbot:latest`

 ## 🚀 Set the environment variables

@@ -71,24 +65,21 @@ Before starting the services with `docker compose`, you have to recheck the foll
 export HUGGINGFACEHUB_API_TOKEN=<your_hf_token>
 export host_ip=$(hostname -I | awk '{print $1}')

-export TGI_LLM_ENDPOINT=http://$host_ip:3006
 export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-export ASR_ENDPOINT=http://$host_ip:7066
-export TTS_ENDPOINT=http://$host_ip:7055
 export WAV2LIP_ENDPOINT=http://$host_ip:7860

 export MEGA_SERVICE_HOST_IP=${host_ip}
-export ASR_SERVICE_HOST_IP=${host_ip}
-export TTS_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
+export WHISPER_SERVER_HOST_IP=${host_ip}
+export WHISPER_SERVER_PORT=7066
+export SPEECHT5_SERVER_HOST_IP=${host_ip}
+export SPEECHT5_SERVER_PORT=7055
+export LLM_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_PORT=3006
 export ANIMATION_SERVICE_HOST_IP=${host_ip}
+export ANIMATION_SERVICE_PORT=3008

 export MEGA_SERVICE_PORT=8888
-export ASR_SERVICE_PORT=3001
-export TTS_SERVICE_PORT=3002
-export LLM_SERVICE_PORT=3007
-export ANIMATION_SERVICE_PORT=3008
 ```

 - Gaudi2 HPU
@@ -124,36 +115,18 @@ curl http://${host_ip}:7066/v1/asr \
  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
  -H 'Content-Type: application/json'

-# asr microservice
-curl http://${host_ip}:3001/v1/audio/transcriptions \
-  -X POST \
-  -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-  -H 'Content-Type: application/json'
-
 # tgi service
 curl http://${host_ip}:3006/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

-# llm microservice
-curl http://${host_ip}:3007/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-  -H 'Content-Type: application/json'
-
 # speecht5 service
 curl http://${host_ip}:7055/v1/tts \
  -X POST \
  -d '{"text": "Who are you?"}' \
  -H 'Content-Type: application/json'

-# tts microservice
-curl http://${host_ip}:3002/v1/audio/speech \
-  -X POST \
-  -d '{"text": "Who are you?"}' \
-  -H 'Content-Type: application/json'
-
 # wav2lip service
 cd ../../../..
 curl http://${host_ip}:7860/v1/wav2lip \
--- a/AvatarChatbot/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/AvatarChatbot/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -15,20 +15,12 @@ services:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_MODULES: all
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
    runtime: habana
    cap_add:
      - SYS_NICE
    restart: unless-stopped
-  asr:
-    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
-    container_name: asr-service
-    ports:
-      - "3001:9099"
-    ipc: host
-    environment:
-      ASR_ENDPOINT: ${ASR_ENDPOINT}
  speecht5-service:
    image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
    container_name: speecht5-service
@@ -39,22 +31,14 @@ services:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_MODULES: all
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
    runtime: habana
    cap_add:
      - SYS_NICE
    restart: unless-stopped
-  tts:
-    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
-    container_name: tts-service
-    ports:
-      - "3002:9088"
-    ipc: host
-    environment:
-      TTS_ENDPOINT: ${TTS_ENDPOINT}
  tgi-service:
-    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
+    image: ghcr.io/huggingface/tgi-gaudi:2.0.6
    container_name: tgi-gaudi-server
    ports:
      - "3006:80"
@@ -67,7 +51,7 @@ services:
      HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
-      HABANA_VISIBLE_MODULES: all
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      ENABLE_HPU_GRAPH: true
      LIMIT_HPU_GRAPH: true
@@ -77,22 +61,12 @@ services:
    cap_add:
      - SYS_NICE
    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 128 --max-total-tokens 256
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-gaudi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "3007:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    restart: unless-stopped
+    healthcheck:
+      test: ["CMD-SHELL", "sleep 500 && exit 0"]
+      interval: 1s
+      timeout: 505s
+      retries: 1
+    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
  wav2lip-service:
    image: ${REGISTRY:-opea}/wav2lip-gaudi:${TAG:-latest}
    container_name: wav2lip-service
@@ -105,7 +79,7 @@ services:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_MODULES: all
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      DEVICE: ${DEVICE}
      INFERENCE_MODE: ${INFERENCE_MODE}
@@ -132,7 +106,7 @@ services:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_MODULES: all
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      WAV2LIP_ENDPOINT: ${WAV2LIP_ENDPOINT}
    runtime: habana
@@ -143,9 +117,6 @@ services:
    image: ${REGISTRY:-opea}/avatarchatbot:${TAG:-latest}
    container_name: avatarchatbot-gaudi-backend-server
    depends_on:
-      - asr
-      - llm
-      - tts
      - animation
    ports:
      - "3009:8888"
@@ -155,12 +126,12 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
-      - ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
-      - ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
-      - TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
-      - TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
+      - WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
+      - WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT}
+      - SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
+      - SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
      - ANIMATION_SERVICE_HOST_IP=${ANIMATION_SERVICE_HOST_IP}
      - ANIMATION_SERVICE_PORT=${ANIMATION_SERVICE_PORT}
    ipc: host
--- a/AvatarChatbot/docker_compose/intel/hpu/gaudi/set_env.sh
+++ b/AvatarChatbot/docker_compose/intel/hpu/gaudi/set_env.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+pushd "../../../../../" > /dev/null
+source .set_env.sh
+popd > /dev/null
--- a/AvatarChatbot/docker_image_build/build.yaml
+++ b/AvatarChatbot/docker_image_build/build.yaml
@@ -14,60 +14,60 @@ services:
  whisper-gaudi:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/dependency/Dockerfile.intel_hpu
+      dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
  whisper:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/dependency/Dockerfile
+      dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
  asr:
    build:
      context: GenAIComps
-      dockerfile: comps/asr/whisper/Dockerfile
+      dockerfile: comps/asr/src/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/asr:${TAG:-latest}
  llm-tgi:
    build:
      context: GenAIComps
-      dockerfile: comps/llms/text-generation/tgi/Dockerfile
+      dockerfile: comps/llms/src/text-generation/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
  speecht5-gaudi:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/dependency/Dockerfile.intel_hpu
+      dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
  speecht5:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/dependency/Dockerfile
+      dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
  tts:
    build:
      context: GenAIComps
-      dockerfile: comps/tts/speecht5/Dockerfile
+      dockerfile: comps/tts/src/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/tts:${TAG:-latest}
  wav2lip-gaudi:
    build:
      context: GenAIComps
-      dockerfile: comps/animation/wav2lip/dependency/Dockerfile.intel_hpu
+      dockerfile: comps/animation/src/integration/dependency/Dockerfile.intel_hpu
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/wav2lip-gaudi:${TAG:-latest}
  wav2lip:
    build:
      context: GenAIComps
-      dockerfile: comps/animation/wav2lip/dependency/Dockerfile
+      dockerfile: comps/animation/src/integration/dependency/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
  animation:
    build:
      context: GenAIComps
-      dockerfile: comps/animation/wav2lip/Dockerfile
+      dockerfile: comps/animation/src/Dockerfile
    extends: avatarchatbot
    image: ${REGISTRY:-opea}/animation:${TAG:-latest}
--- a/AvatarChatbot/tests/test_compose_on_gaudi.sh
+++ b/AvatarChatbot/tests/test_compose_on_gaudi.sh
@@ -26,10 +26,10 @@ function build_docker_images() {
    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="avatarchatbot whisper-gaudi asr llm-tgi speecht5-gaudi tts wav2lip-gaudi animation"
+    service_list="avatarchatbot whisper-gaudi speecht5-gaudi wav2lip-gaudi animation"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

-    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
+    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6

    docker images && sleep 1s
 }
@@ -41,30 +41,27 @@ function start_services() {
    export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN
    export host_ip=$(hostname -I | awk '{print $1}')

-    export TGI_LLM_ENDPOINT=http://$host_ip:3006
    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-    export ASR_ENDPOINT=http://$host_ip:7066
-    export TTS_ENDPOINT=http://$host_ip:7055
    export WAV2LIP_ENDPOINT=http://$host_ip:7860

    export MEGA_SERVICE_HOST_IP=${host_ip}
-    export ASR_SERVICE_HOST_IP=${host_ip}
-    export TTS_SERVICE_HOST_IP=${host_ip}
-    export LLM_SERVICE_HOST_IP=${host_ip}
+    export WHISPER_SERVER_HOST_IP=${host_ip}
+    export WHISPER_SERVER_PORT=7066
+    export SPEECHT5_SERVER_HOST_IP=${host_ip}
+    export SPEECHT5_SERVER_PORT=7055
+    export LLM_SERVER_HOST_IP=${host_ip}
+    export LLM_SERVER_PORT=3006
    export ANIMATION_SERVICE_HOST_IP=${host_ip}
+    export ANIMATION_SERVICE_PORT=3008

    export MEGA_SERVICE_PORT=8888
-    export ASR_SERVICE_PORT=3001
-    export TTS_SERVICE_PORT=3002
-    export LLM_SERVICE_PORT=3007
-    export ANIMATION_SERVICE_PORT=3008

    export DEVICE="hpu"
    export WAV2LIP_PORT=7860
    export INFERENCE_MODE='wav2lip+gfpgan'
    export CHECKPOINT_PATH='/usr/local/lib/python3.10/dist-packages/Wav2Lip/checkpoints/wav2lip_gan.pth'
-    export FACE="assets/img/avatar1.jpg"
+    export FACE="/home/user/comps/animation/src/assets/img/avatar1.jpg"
    # export AUDIO='assets/audio/eg3_ref.wav' # audio file path is optional, will use base64str in the post request as input if is 'None'
    export AUDIO='None'
    export FACESIZE=96
@@ -74,21 +71,10 @@ function start_services() {
    export FPS=10

    # Start Docker Containers
-    docker compose up -d
-
-    n=0
-    until [[ "$n" -ge 100 ]]; do
-       docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log
-       if grep -q Connected $LOG_PATH/tgi_service_start.log; then
-           break
-       fi
-       sleep 5s
-       n=$((n+1))
-    done
-
-    # sleep 5m
+    docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
+    sleep 60s
    echo "All services are up and running"
-    sleep 5s
+
 }


@@ -99,12 +85,10 @@ function validate_megaservice() {
    if [[ $result == *"mp4"* ]]; then
        echo "Result correct."
    else
+        echo "Result wrong, print docker logs."
        docker logs whisper-service > $LOG_PATH/whisper-service.log
-        docker logs asr-service > $LOG_PATH/asr-service.log
        docker logs speecht5-service > $LOG_PATH/speecht5-service.log
-        docker logs tts-service > $LOG_PATH/tts-service.log
        docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
-        docker logs llm-tgi-gaudi-server > $LOG_PATH/llm-tgi-gaudi-server.log
        docker logs wav2lip-service > $LOG_PATH/wav2lip-service.log
        docker logs animation-gaudi-server > $LOG_PATH/animation-gaudi-server.log

@@ -115,11 +99,6 @@ function validate_megaservice() {
 }


-#function validate_frontend() {
-
-#}
-
-
 function stop_docker() {
    cd $WORKPATH/docker_compose/intel/hpu/gaudi
    docker compose down
--- a/AvatarChatbot/tests/test_compose_on_xeon.sh
+++ b/AvatarChatbot/tests/test_compose_on_xeon.sh
@@ -26,10 +26,10 @@ function build_docker_images() {
    git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="avatarchatbot whisper asr llm-tgi speecht5 tts wav2lip animation"
+    service_list="avatarchatbot whisper speecht5 wav2lip animation"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

-    docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
+    docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu

    docker images && sleep 1s
 }
@@ -41,30 +41,27 @@ function start_services() {
    export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN
    export host_ip=$(hostname -I | awk '{print $1}')

-    export TGI_LLM_ENDPOINT=http://$host_ip:3006
    export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3

-    export ASR_ENDPOINT=http://$host_ip:7066
-    export TTS_ENDPOINT=http://$host_ip:7055
    export WAV2LIP_ENDPOINT=http://$host_ip:7860

    export MEGA_SERVICE_HOST_IP=${host_ip}
-    export ASR_SERVICE_HOST_IP=${host_ip}
-    export TTS_SERVICE_HOST_IP=${host_ip}
-    export LLM_SERVICE_HOST_IP=${host_ip}
+    export WHISPER_SERVER_HOST_IP=${host_ip}
+    export WHISPER_SERVER_PORT=7066
+    export SPEECHT5_SERVER_HOST_IP=${host_ip}
+    export SPEECHT5_SERVER_PORT=7055
+    export LLM_SERVER_HOST_IP=${host_ip}
+    export LLM_SERVER_PORT=3006
    export ANIMATION_SERVICE_HOST_IP=${host_ip}
+    export ANIMATION_SERVICE_PORT=3008

    export MEGA_SERVICE_PORT=8888
-    export ASR_SERVICE_PORT=3001
-    export TTS_SERVICE_PORT=3002
-    export LLM_SERVICE_PORT=3007
-    export ANIMATION_SERVICE_PORT=3008

    export DEVICE="cpu"
    export WAV2LIP_PORT=7860
    export INFERENCE_MODE='wav2lip+gfpgan'
    export CHECKPOINT_PATH='/usr/local/lib/python3.11/site-packages/Wav2Lip/checkpoints/wav2lip_gan.pth'
-    export FACE="assets/img/avatar5.png"
+    export FACE="/home/user/comps/animation/src/assets/img/avatar5.png"
    # export AUDIO='assets/audio/eg3_ref.wav' # audio file path is optional, will use base64str in the post request as input if is 'None'
    export AUDIO='None'
    export FACESIZE=96
@@ -75,17 +72,8 @@ function start_services() {

    # Start Docker Containers
    docker compose up -d
-    n=0
-    until [[ "$n" -ge 100 ]]; do
-       docker logs tgi-service > $LOG_PATH/tgi_service_start.log
-       if grep -q Connected $LOG_PATH/tgi_service_start.log; then
-           break
-       fi
-       sleep 5s
-       n=$((n+1))
-    done
+    sleep 20s
    echo "All services are up and running"
-    sleep 5s
 }


@@ -97,11 +85,8 @@ function validate_megaservice() {
        echo "Result correct."
    else
        docker logs whisper-service > $LOG_PATH/whisper-service.log
-        docker logs asr-service > $LOG_PATH/asr-service.log
        docker logs speecht5-service > $LOG_PATH/speecht5-service.log
-        docker logs tts-service > $LOG_PATH/tts-service.log
        docker logs tgi-service > $LOG_PATH/tgi-service.log
-        docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log
        docker logs wav2lip-service > $LOG_PATH/wav2lip-service.log
        docker logs animation-server > $LOG_PATH/animation-server.log

--- a/ChatQnA/Dockerfile
+++ b/ChatQnA/Dockerfile
@@ -18,7 +18,7 @@ WORKDIR /home/user/
 RUN git clone https://github.com/opea-project/GenAIComps.git

 WORKDIR /home/user/GenAIComps
-RUN pip install --no-cache-dir --upgrade pip && \
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
    pip install --no-cache-dir langchain_core

--- a/ChatQnA/Dockerfile.guardrails
+++ b/ChatQnA/Dockerfile.guardrails
@@ -18,7 +18,7 @@ WORKDIR /home/user/
 RUN git clone https://github.com/opea-project/GenAIComps.git

 WORKDIR /home/user/GenAIComps
-RUN pip install --no-cache-dir --upgrade pip && \
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
    pip install --no-cache-dir langchain_core

--- a/ChatQnA/Dockerfile.without_rerank
+++ b/ChatQnA/Dockerfile.without_rerank
@@ -18,7 +18,7 @@ WORKDIR /home/user/
 RUN git clone https://github.com/opea-project/GenAIComps.git

 WORKDIR /home/user/GenAIComps
-RUN pip install --no-cache-dir --upgrade pip && \
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
    pip install --no-cache-dir langchain_core

--- a/ChatQnA/Dockerfile.wrapper
+++ b/ChatQnA/Dockerfile.wrapper
@@ -0,0 +1,32 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+FROM python:3.11-slim
+
+RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
+    libgl1-mesa-glx \
+    libjemalloc-dev \
+    git
+
+RUN useradd -m -s /bin/bash user && \
+    mkdir -p /home/user && \
+    chown -R user /home/user/
+
+WORKDIR /home/user/
+RUN git clone https://github.com/opea-project/GenAIComps.git
+
+WORKDIR /home/user/GenAIComps
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
+
+COPY ./chatqna_wrapper.py /home/user/chatqna.py
+
+ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps
+
+USER user
+
+WORKDIR /home/user
+
+RUN echo 'ulimit -S -n 999999' >> ~/.bashrc
+
+ENTRYPOINT ["python", "chatqna.py"]
--- a/ChatQnA/README.md
+++ b/ChatQnA/README.md
@@ -4,7 +4,26 @@ Chatbots are the most widely adopted use case for leveraging the powerful chat a

 RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.

-## Deploy ChatQnA Service
+## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
+
+| Cloud Provider       | Intel Architecture                | Intel Optimized Cloud Module for Terraform                                                                                         | Comments                                                             |
+| -------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
+| AWS                  | 4th Gen Intel Xeon with Intel AMX | [AWS Module](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna)                          | Uses Intel/neural-chat-7b-v3-3 by default                            |
+| AWS Falcon2-11B      | 4th Gen Intel Xeon with Intel AMX | [AWS Module with Falcon11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | Uses TII Falcon2-11B LLM Model                                       |
+| GCP                  | 5th Gen Intel Xeon with Intel AMX | [GCP Module](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna)                          | Also supports Confidential AI by using Intel® TDX with 4th Gen Xeon |
+| Azure                | 5th Gen Intel Xeon with Intel AMX | Work-in-progress                                                                                                                   | Work-in-progress                                                     |
+| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress                                                                                                                   | Work-in-progress                                                     |
+
+## Automated Deployment to Ubuntu based system(if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible**
+
+To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform.
+Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal.
+| Operating System | Intel Optimized Cloud Module for Ansible |
+|------------------|------------------------------------------|
+| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) |
+| Ubuntu 22.04 | Work-in-progress |
+
+## Manually Deploy ChatQnA Service

 The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable Processors and Nvidia GPU.

@@ -41,15 +60,18 @@ To set up environment variables for deploying ChatQnA services, follow these ste

 3. Set up other environment variables:

-   > Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly.
+   > Notice that you can only choose **one** hardware option below to set up envs according to your hardware. Make sure port numbers are set correctly as well.

   ```bash
   # on Gaudi
   source ./docker_compose/intel/hpu/gaudi/set_env.sh
+   export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails
   # on Xeon
   source ./docker_compose/intel/cpu/xeon/set_env.sh
+   export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service
   # on Nvidia GPU
   source ./docker_compose/nvidia/gpu/set_env.sh
+   export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service
   ```

 ### Quick Start: 2.Run Docker Compose
@@ -177,7 +199,7 @@ In the below, we provide a table that describes for each microservice component
 Gaudi default compose.yaml
 | MicroService | Open Source Project | HW | Port | Endpoint |
 | ------------ | ------------------- | ----- | ---- | -------------------- |
-| Embedding | Langchain | Xeon | 6000 | /v1/embaddings |
+| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
 | Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval |
 | Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking |
 | LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions |
--- a/ChatQnA/benchmark/accuracy/README.md
+++ b/ChatQnA/benchmark/accuracy/README.md
@@ -48,7 +48,7 @@ To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-
 docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2

 # for better performance, set `PREFILL_BATCH_BUCKET_SIZE`, `BATCH_BUCKET_SIZE`, `max-batch-total-tokens`, `max-batch-prefill-tokens`
-docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
+docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
 ```

 ### Prepare Dataset
--- a/ChatQnA/benchmark/performance/README.md
+++ b/ChatQnA/benchmark/performance/README.md
@@ -1,378 +0,0 @@
-# ChatQnA Benchmarking
-
-This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
-
-By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
-
-## Purpose
-
-We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
-
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
-
-## Metrics
-
-The benchmark will report the below metrics, including:
-
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
-
-Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
-
-## Getting Started
-
-We recommend using Kubernetes to deploy the ChatQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs. Below is a description of Kubernetes deployment and benchmarking. For instructions on deploying and benchmarking with Docker, please refer to [this section](#benchmark-with-docker).
-
-### Prerequisites
-
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
-
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running the stress tool.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.
-
-```bash
-# The way to modify the containered ulimit:
-sudo systemctl edit containerd
-# Add two lines:
-[Service]
-LimitNOFILE=65536:1048576
-
-sudo systemctl daemon-reload; sudo systemctl restart containerd
-```
-
-### Kubernetes Cluster Example
-
-```bash
-$ kubectl get nodes
-NAME                STATUS   ROLES           AGE   VERSION
-k8s-master          Ready    control-plane   35d   v1.29.6
-k8s-work1           Ready    <none>          35d   v1.29.5
-k8s-work2           Ready    <none>          35d   v1.29.6
-k8s-work3           Ready    <none>          35d   v1.29.6
-```
-
-### Manifest preparation
-
-We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
-
-```bash
-# on k8s-master node
-git clone https://github.com/opea-project/GenAIExamples.git
-cd GenAIExamples/ChatQnA/benchmark/performance
-
-# replace the image tag from latest to v0.9 since we want to test with v0.9 release
-IMAGE_TAG=v0.9
-find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
-
-# set the huggingface token
-HUGGINGFACE_TOKEN=<your token>
-find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
-
-# set models
-LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
-EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
-RERANK_MODEL_ID=BAAI/bge-reranker-base
-find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
-find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
-find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
-```
-
-### Test Configurations
-
-By default, the workload and benchmark configuration is as below:
-
-| Key      | Value   |
-| -------- | ------- |
-| Workload | ChatQnA |
-| Tag      | V0.9    |
-
-Models configuration
-| Key | Value |
-| ---------- | ------------------ |
-| Embedding | BAAI/bge-base-en-v1.5 |
-| Reranking | BAAI/bge-reranker-base |
-| Inference | Intel/neural-chat-7b-v3-3 |
-
-Benchmark parameters
-| Key | Value |
-| ---------- | ------------------ |
-| LLM input tokens | 1024 |
-| LLM output tokens | 128 |
-
-Number of test requests for different scheduled node number:
-| Node count | Concurrency | Query number |
-| ----- | -------- | -------- |
-| 1 | 128 | 640 |
-| 2 | 256 | 1280 |
-| 4 | 512 | 2560 |
-
-More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
-
-### Test Steps
-
-#### Single node test
-
-##### 1. Preparation
-
-We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
-
-```bash
-kubectl label nodes k8s-worker1 node-type=chatqna-opea
-```
-
-##### 2. Install ChatQnA
-
-Go to [BKC manifest](./tuned/with_rerank/single_gaudi) and apply to K8s.
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
-kubectl apply -f .
-```
-
-##### 3. Run tests
-
-###### 3.1 Upload Retrieval File
-
-Before running tests, upload a specified file to make sure the llm input have the token length of 1k.
-
-Run the following command to check the cluster ip of dataprep.
-
-```bash
-kubectl get svc
-```
-
-Substitute the `${cluster_ip}` into the real cluster ip of dataprep microservice as below.
-
-```log
-dataprep-svc   ClusterIP   xx.xx.xx.xx    <none>   6007/TCP   5m   app=dataprep-deploy
-```
-
-Run the cURL command to upload file:
-
-```bash
-cd GenAIEval/evals/benchmark/data
-# RAG with Rerank
-curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-     -H "Content-Type: multipart/form-data" \
-     -F "files=@./upload_file.txt" \
-     -F "chunk_size=3800"
-# RAG without Rerank
-curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-     -H "Content-Type: multipart/form-data" \
-     -F "files=@./upload_file_no_rerank.txt"
-```
-
-###### 3.2 Run Benchmark Test
-
-Before the benchmark, we can configure the number of test queries and test output directory by:
-
-```bash
-export USER_QUERIES="[640, 640, 640, 640]"
-export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
-```
-
-And then run the benchmark by:
-
-```bash
-bash benchmark.sh -n 1
-```
-
-The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
-
-##### 4. Data collection
-
-All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
-
-##### 5. Clean up
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
-kubectl delete -f .
-kubectl label nodes k8s-worker1 node-type-
-```
-
-#### Two node test
-
-##### 1. Preparation
-
-We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
-
-```bash
-kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
-```
-
-##### 2. Install ChatQnA
-
-Go to [BKC manifest](./tuned/with_rerank/two_gaudi) and apply to K8s.
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/two_gaudi
-kubectl apply -f .
-```
-
-##### 3. Run tests
-
-Before the benchmark, we can configure the number of test queries and test output directory by:
-
-```bash
-export USER_QUERIES="[1280, 1280, 1280, 1280]"
-export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
-```
-
-And then run the benchmark by:
-
-```bash
-bash benchmark.sh -n 2
-```
-
-The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
-
-##### 4. Data collection
-
-All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
-
-##### 5. Clean up
-
-```bash
-# on k8s-master node
-kubectl delete -f .
-kubectl label nodes k8s-worker1 k8s-worker2 node-type-
-```
-
-#### Four node test
-
-##### 1. Preparation
-
-We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
-
-```bash
-kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
-```
-
-##### 2. Install ChatQnA
-
-Go to [BKC manifest](./tuned/with_rerank/four_gaudi) and apply to K8s.
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/four_gaudi
-kubectl apply -f .
-```
-
-##### 3. Run tests
-
-Before the benchmark, we can configure the number of test queries and test output directory by:
-
-```bash
-export USER_QUERIES="[2560, 2560, 2560, 2560]"
-export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
-```
-
-And then run the benchmark by:
-
-```bash
-bash benchmark.sh -n 4
-```
-
-The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
-
-##### 4. Data collection
-
-All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
-
-##### 5. Clean up
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
-kubectl delete -f .
-kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
-```
-
-## Benchmark with Docker
-
-### Deploy ChatQnA service with Docker
-
-In order to set up the environment correctly, you'll need to configure essential environment variables and, if applicable, proxy-related variables.
-
-```bash
-# Example: host_ip="192.168.1.1"
-export host_ip="External_Public_IP"
-# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
-export no_proxy="Your_No_Proxy"
-export http_proxy="Your_HTTP_Proxy"
-export https_proxy="Your_HTTPs_Proxy"
-export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
-```
-
-#### Deploy ChatQnA on Gaudi
-
-```bash
-cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
-docker compose up -d
-```
-
-Refer to the [Gaudi Guide](../../docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
-
-#### Deploy ChatQnA on Xeon
-
-```bash
-cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
-docker compose up -d
-```
-
-Refer to the [Xeon Guide](../../docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
-
-#### Deploy ChatQnA on NVIDIA GPU
-
-```bash
-cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/
-docker compose up -d
-```
-
-Refer to the [NVIDIA GPU Guide](../../docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source.
-
-### Run tests
-
-Before the benchmark, we can configure the number of test queries and test output directory by:
-
-```bash
-export USER_QUERIES="[640, 640, 640, 640]"
-export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/docker"
-```
-
-And then run the benchmark by:
-
-```bash
-bash benchmark.sh -d docker -i <service-ip> -p <service-port>
-```
-
-The argument `-i` and `-p` refer to the deployed ChatQnA service IP and port, respectively. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
-
-### Data collection
-
-All the test results will come to this folder `/home/sdp/benchmark_output/docker` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
-
-### Clean up
-
-Take gaudi as example, use the below command to clean up system.
-
-```bash
-cd GenAIExamples/docker_compose/intel/hpu/gaudi
-docker compose stop && docker compose rm -f
-echo y | docker system prune
-```
--- a/ChatQnA/benchmark/performance/helm_charts/.helmignore
+++ b/ChatQnA/benchmark/performance/helm_charts/.helmignore
@@ -1,23 +0,0 @@
-# Patterns to ignore when building packages.
-# This supports shell glob matching, relative path matching, and
-# negation (prefixed with !). Only one pattern per line.
-.DS_Store
-# Common VCS dirs
-.git/
-.gitignore
-.bzr/
-.bzrignore
-.hg/
-.hgignore
-.svn/
-# Common backup files
-*.swp
-*.bak
-*.tmp
-*.orig
-*~
-# Various IDEs
-.project
-.idea/
-*.tmproj
-.vscode/
--- a/ChatQnA/benchmark/performance/helm_charts/Chart.yaml
+++ b/ChatQnA/benchmark/performance/helm_charts/Chart.yaml
@@ -1,27 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-apiVersion: v2
-name: chatqna-charts
-description: A Helm chart for Kubernetes
-
-# A chart can be either an 'application' or a 'library' chart.
-#
-# Application charts are a collection of templates that can be packaged into versioned archives
-# to be deployed.
-#
-# Library charts provide useful utilities or functions for the chart developer. They're included as
-# a dependency of application charts to inject those utilities and functions into the rendering
-# pipeline. Library charts do not define any templates and therefore cannot be deployed.
-type: application
-
-# This is the chart version. This version number should be incremented each time you make changes
-# to the chart and its templates, including the app version.
-# Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: 1.0
-
-# This is the version number of the application being deployed. This version number should be
-# incremented each time you make changes to the application. Versions are not expected to
-# follow Semantic Versioning. They should reflect the version the application is using.
-# It is recommended to use it with quotes.
-appVersion: "1.16.0"
--- a/ChatQnA/benchmark/performance/helm_charts/README.md
+++ b/ChatQnA/benchmark/performance/helm_charts/README.md
@@ -1,36 +0,0 @@
-# ChatQnA Deployment
-
-This document guides you through deploying ChatQnA pipelines using Helm charts. Helm charts simplify managing Kubernetes applications by packaging configuration and resources.
-
-## Getting Started
-
-### Preparation
-
-```bash
-# on k8s-master node
-cd GenAIExamples/ChatQnA/benchmark/performance/helm_charts
-
-# Replace the key of HUGGINGFACEHUB_API_TOKEN with your actual Hugging Face token:
-# vim customize.yaml
-HUGGINGFACEHUB_API_TOKEN: hf_xxxxx
-```
-
-### Deploy your ChatQnA
-
-```bash
-# Deploy a ChatQnA pipeline using the specified YAML configuration.
-# To deploy with different configurations, simply provide a different YAML file.
-helm install chatqna helm_charts/ -f customize.yaml
-```
-
-Notes: The provided [BKC manifests](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single, two, and four node Kubernetes clusters are generated using this tool.
-
-## Customize your own ChatQnA pipelines. (Optional)
-
-There are two yaml configs you can specify.
-
- customize.yaml
-  This file can specify image names, the number of replicas and CPU cores to manage your pods.
-
- values.yaml
-  This file contains the default microservice configurations for ChatQnA. Please review and understand each parameter before making any changes.
--- a/Show More
+++ b/Show More