Set no wrapper ChatQnA as default (#891)

Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-10-11 13:30:45 +08:00
parent b71a12d424
commit 619d941047
66 changed files with 649 additions and 4796 deletions
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md
@@ -70,73 +70,19 @@ curl http://${host_ip}:8888/v1/chatqna \

 First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.

-### 1. Build Embedding Image
-
-```bash
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile .
-```
-
-### 2. Build Retriever Image
+### 1. Build Retriever Image

 ```bash
 docker build --no-cache -t opea/retriever-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile .
 ```

-### 3. Build Rerank Image
-
-> Skip for ChatQnA without Rerank pipeline
-
-```bash
-docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
-```
-
-### 4. Build LLM Image
-
-You can use different LLM serving solutions, choose one of following four options.
-
-#### 4.1 Use TGI
-
-```bash
-docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
-```
-
-#### 4.2 Use VLLM
-
-Build vllm docker.
-
-```bash
-docker build --no-cache -t opea/llm-vllm-hpu:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/dependency/Dockerfile.intel_hpu .
-```
-
-Build microservice docker.
-
-```bash
-docker build --no-cache -t opea/llm-vllm:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/Dockerfile .
-```
-
-#### 4.3 Use VLLM-on-Ray
-
-Build vllm-on-ray docker.
-
-```bash
-docker build --no-cache -t opea/llm-vllm-ray-hpu:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/ray/dependency/Dockerfile .
-```
-
-Build microservice docker.
-
-```bash
-docker build --no-cache -t opea/llm-vllm-ray:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/ray/Dockerfile .
-```
-
-### 5. Build Dataprep Image
+### 2. Build Dataprep Image

 ```bash
 docker build --no-cache -t opea/dataprep-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile .
 ```

-### 6. Build Guardrails Docker Image (Optional)
+### 3. Build Guardrails Docker Image (Optional)

 To fortify AI initiatives in production, Guardrails microservice can secure model inputs and outputs, building Trustworthy, Safe, and Secure LLM-based Applications.

@@ -144,7 +90,7 @@ To fortify AI initiatives in production, Guardrails microservice can secure mode
 docker build -t opea/guardrails-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/llama_guard/langchain/Dockerfile .
 ```

-### 7. Build MegaService Docker Image
+### 4. Build MegaService Docker Image

 1. MegaService with Rerank

@@ -176,7 +122,7 @@ docker build -t opea/guardrails-tgi:latest --build-arg https_proxy=$https_proxy
   docker build --no-cache -t opea/chatqna-without-rerank:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.without_rerank .
   ```

-### 8. Build UI Docker Image
+### 5. Build UI Docker Image

 Construct the frontend Docker image using the command below:

@@ -185,7 +131,7 @@ cd GenAIExamples/ChatQnA/ui
 docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
 ```

-### 9. Build Conversational React UI Docker Image (Optional)
+### 6. Build Conversational React UI Docker Image (Optional)

 Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:

@@ -196,21 +142,18 @@ cd GenAIExamples/ChatQnA/ui
 docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
 ```

-### 10. Build Nginx Docker Image
+### 7. Build Nginx Docker Image

 ```bash
 cd GenAIComps
 docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
 ```

-Then run the command `docker images`, you will have the following 8 Docker Images:
+Then run the command `docker images`, you will have the following 5 Docker Images:

- `opea/embedding-tei:latest`
 - `opea/retriever-redis:latest`
- `opea/reranking-tei:latest`
- `opea/llm-tgi:latest` or `opea/llm-vllm:latest` or `opea/llm-vllm-ray:latest`
 - `opea/dataprep-redis:latest`
- `opea/chatqna:latest` or `opea/chatqna-guardrails:latest` or `opea/chatqna-without-rerank:latest`
+- `opea/chatqna:latest`
 - `opea/chatqna-ui:latest`
 - `opea/nginx:latest`

@@ -338,16 +281,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
       -H 'Content-Type: application/json'
   ```

-2. Embedding Microservice
-
-   ```bash
-   curl http://${host_ip}:6000/v1/embeddings \
-     -X POST \
-     -d '{"text":"hello"}' \
-     -H 'Content-Type: application/json'
-   ```
-
-3. Retriever Microservice
+2. Retriever Microservice

   To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector
   is determined by the embedding model.
@@ -363,7 +297,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
     -H 'Content-Type: application/json'
   ```

-4. TEI Reranking Service
+3. TEI Reranking Service

   > Skip for ChatQnA without Rerank pipeline

@@ -374,18 +308,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
       -H 'Content-Type: application/json'
   ```

-5. Reranking Microservice
-
-   > Skip for ChatQnA without Rerank pipeline
-
-   ```bash
-   curl http://${host_ip}:8000/v1/reranking \
-     -X POST \
-     -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-     -H 'Content-Type: application/json'
-   ```
-
-6. LLM backend Service
+4. LLM backend Service

   In first startup, this service will take more time to download the model files. After it's finished, the service will be ready.

@@ -430,39 +353,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
     -d '{"model": "${LLM_MODEL_ID}", "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
   ```

-7. LLM Microservice
-
-   ```bash
-   # TGI service
-   curl http://${host_ip}:9000/v1/chat/completions\
-     -X POST \
-     -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-     -H 'Content-Type: application/json'
-   ```
-
-   For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
-
-   ```bash
-   # vLLM Service
-   curl http://${host_ip}:9000/v1/chat/completions \
-    -X POST \
-    -d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
-    -H 'Content-Type: application/json'
-   ```
-
-   For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
-
-   ```bash
-   # vLLM-on-Ray Service
-   curl http://${host_ip}:9000/v1/chat/completions \
-     -X POST \
-     -d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
-     -H 'Content-Type: application/json'
-   ```
-
-   For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
-
-8. MegaService
+5. MegaService

   ```bash
   curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
@@ -470,7 +361,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
         }'
   ```

-9. Nginx Service
+6. Nginx Service

   ```bash
   curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
@@ -478,7 +369,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
       -d '{"messages": "What is the revenue of Nike in 2023?"}'
   ```

-10. Dataprep Microservice（Optional）
+7. Dataprep Microservice（Optional）

 If you want to update the default knowledge base, you can use the following commands:

@@ -547,7 +438,7 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
     -H "Content-Type: application/json"
 ```

-10. Guardrails (Optional)
+8. Guardrails (Optional)

 ```bash
 curl http://${host_ip}:9090/v1/guardrails\
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -39,26 +39,12 @@ services:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_DEVICES: ${tei_embedding_devices}
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      MAX_WARMUP_SEQUENCE_LENGTH: 512
      INIT_HCCL_ON_ACQUIRE: 0
      ENABLE_EXPERIMENTAL_FLAGS: true
    command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
-  embedding:
-    image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-    container_name: embedding-tei-server
-    depends_on:
-      - tei-embedding-service
-    ports:
-      - "6000:6000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-    restart: unless-stopped
  retriever:
    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
    container_name: retriever-redis-server
@@ -90,23 +76,6 @@ services:
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
    command: --model-id ${RERANK_MODEL_ID} --auto-truncate
-  reranking:
-    image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
-    container_name: reranking-tei-gaudi-server
-    depends_on:
-      - tei-reranking-service
-    ports:
-      - "8000:8000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  tgi-service:
    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
    container_name: tgi-gaudi-server
@@ -121,7 +90,7 @@ services:
      HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
-      HABANA_VISIBLE_DEVICES: ${llm_service_devices}
+      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      ENABLE_HPU_GRAPH: true
      LIMIT_HPU_GRAPH: true
@@ -131,36 +100,16 @@ services:
    cap_add:
      - SYS_NICE
    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-gaudi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "9000:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
+    command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096
  chaqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
    container_name: chatqna-gaudi-backend-server
    depends_on:
      - redis-vector-db
      - tei-embedding-service
-      - embedding
      - retriever
      - tei-reranking-service
-      - reranking
      - tgi-service
-      - llm
    ports:
      - "8888:8888"
    environment:
@@ -168,10 +117,14 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
+      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
+      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
+      - RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
+      - LOGFLAG=${LOGFLAG}
    ipc: host
    restart: always
  chaqna-gaudi-ui-server:
@@ -191,25 +144,6 @@ services:
      - DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
    ipc: host
    restart: always
-  chaqna-gaudi-nginx-server:
-    image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
-    container_name: chaqna-gaudi-nginx-server
-    depends_on:
-      - chaqna-gaudi-backend-server
-      - chaqna-gaudi-ui-server
-    ports:
-      - "${NGINX_PORT:-80}:80"
-    environment:
-      - no_proxy=${no_proxy}
-      - https_proxy=${https_proxy}
-      - http_proxy=${http_proxy}
-      - FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
-      - FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
-      - BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
-      - BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
-      - BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
-    ipc: host
-    restart: always

 networks:
  default:
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml
@@ -82,20 +82,6 @@ services:
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      MAX_WARMUP_SEQUENCE_LENGTH: 512
    command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
-  embedding:
-    image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-    container_name: embedding-tei-server
-    depends_on:
-      - tei-embedding-service
-    ports:
-      - "6000:6000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-    restart: unless-stopped
  retriever:
    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
    container_name: retriever-redis-server
@@ -127,23 +113,6 @@ services:
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
    command: --model-id ${RERANK_MODEL_ID} --auto-truncate
-  reranking:
-    image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
-    container_name: reranking-tei-gaudi-server
-    depends_on:
-      - tei-reranking-service
-    ports:
-      - "8000:8000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  tgi-service:
    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
    container_name: tgi-gaudi-server
@@ -169,23 +138,6 @@ services:
      - SYS_NICE
    ipc: host
    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-gaudi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "9000:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  chaqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/chatqna-guardrails:${TAG:-latest}
    container_name: chatqna-gaudi-guardrails-server
@@ -194,12 +146,9 @@ services:
      - tgi-guardrails-service
      - guardrails
      - tei-embedding-service
-      - embedding
      - retriever
      - tei-reranking-service
-      - reranking
      - tgi-service
-      - llm
    ports:
      - "8888:8888"
    environment:
@@ -208,10 +157,15 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - GUARDRAIL_SERVICE_HOST_IP=${GUARDRAIL_SERVICE_HOST_IP}
-      - EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
+      - GUARDRAIL_SERVICE_PORT=${GUARDRAIL_SERVICE_PORT:-9090}
+      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
+      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
+      - RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
+      - LOGFLAG=${LOGFLAG}
    ipc: host
    restart: always
  chaqna-gaudi-ui-server:
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose_no_wrapper.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose_no_wrapper.yaml
@@ -1,201 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-services:
-  redis-vector-db:
-    image: redis/redis-stack:7.2.0-v9
-    container_name: redis-vector-db
-    ports:
-      - "6379:6379"
-      - "8001:8001"
-  dataprep-redis-service:
-    image: ${REGISTRY:-opea}/dataprep-redis:${TAG:-latest}
-    container_name: dataprep-redis-server
-    depends_on:
-      - redis-vector-db
-      - tei-embedding-service
-    ports:
-      - "6007:6007"
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      REDIS_URL: ${REDIS_URL}
-      INDEX_NAME: ${INDEX_NAME}
-      TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-  tei-embedding-service:
-    image: ghcr.io/huggingface/tei-gaudi:latest
-    container_name: tei-embedding-gaudi-server
-    ports:
-      - "8090:80"
-    volumes:
-      - "./data:/data"
-    runtime: habana
-    cap_add:
-      - SYS_NICE
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      HABANA_VISIBLE_DEVICES: all
-      OMPI_MCA_btl_vader_single_copy_mechanism: none
-      MAX_WARMUP_SEQUENCE_LENGTH: 512
-      INIT_HCCL_ON_ACQUIRE: 0
-      ENABLE_EXPERIMENTAL_FLAGS: true
-    command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
-  # embedding:
-  #   image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-  #   container_name: embedding-tei-server
-  #   depends_on:
-  #     - tei-embedding-service
-  #   ports:
-  #     - "6000:6000"
-  #   ipc: host
-  #   environment:
-  #     no_proxy: ${no_proxy}
-  #     http_proxy: ${http_proxy}
-  #     https_proxy: ${https_proxy}
-  #     TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-  #   restart: unless-stopped
-  retriever:
-    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
-    container_name: retriever-redis-server
-    depends_on:
-      - redis-vector-db
-    ports:
-      - "7000:7000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      REDIS_URL: ${REDIS_URL}
-      INDEX_NAME: ${INDEX_NAME}
-    restart: unless-stopped
-  tei-reranking-service:
-    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
-    container_name: tei-reranking-gaudi-server
-    ports:
-      - "8808:80"
-    volumes:
-      - "./data:/data"
-    shm_size: 1g
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    command: --model-id ${RERANK_MODEL_ID} --auto-truncate
-  # reranking:
-  #   image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
-  #   container_name: reranking-tei-gaudi-server
-  #   depends_on:
-  #     - tei-reranking-service
-  #   ports:
-  #     - "8000:8000"
-  #   ipc: host
-  #   environment:
-  #     no_proxy: ${no_proxy}
-  #     http_proxy: ${http_proxy}
-  #     https_proxy: ${https_proxy}
-  #     TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
-  #     HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-  #     HF_HUB_DISABLE_PROGRESS_BARS: 1
-  #     HF_HUB_ENABLE_HF_TRANSFER: 0
-  #   restart: unless-stopped
-  tgi-service:
-    image: ghcr.io/huggingface/tgi-gaudi:2.0.5
-    container_name: tgi-gaudi-server
-    ports:
-      - "8005:80"
-    volumes:
-      - "./data:/data"
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-      HABANA_VISIBLE_DEVICES: all
-      OMPI_MCA_btl_vader_single_copy_mechanism: none
-      ENABLE_HPU_GRAPH: true
-      LIMIT_HPU_GRAPH: true
-      USE_FLASH_ATTENTION: true
-      FLASH_ATTENTION_RECOMPUTE: true
-    runtime: habana
-    cap_add:
-      - SYS_NICE
-    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096
-  # llm:
-  #   image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-  #   container_name: llm-tgi-gaudi-server
-  #   depends_on:
-  #     - tgi-service
-  #   ports:
-  #     - "9000:9000"
-  #   ipc: host
-  #   environment:
-  #     no_proxy: ${no_proxy}
-  #     http_proxy: ${http_proxy}
-  #     https_proxy: ${https_proxy}
-  #     TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-  #     HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-  #     HF_HUB_DISABLE_PROGRESS_BARS: 1
-  #     HF_HUB_ENABLE_HF_TRANSFER: 0
-  #   restart: unless-stopped
-  chaqna-gaudi-backend-server:
-    image: ${REGISTRY:-opea}/chatqna-no-wrapper:${TAG:-latest}
-    container_name: chatqna-gaudi-backend-server
-    depends_on:
-      - redis-vector-db
-      - tei-embedding-service
-      # - embedding
-      - retriever
-      - tei-reranking-service
-      # - reranking
-      - tgi-service
-      # - llm
-    ports:
-      - "8888:8888"
-    environment:
-      - no_proxy=${no_proxy}
-      - https_proxy=${https_proxy}
-      - http_proxy=${http_proxy}
-      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
-      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
-      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
-      - RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
-      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
-      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
-      - LOGFLAG=${LOGFLAG}
-    ipc: host
-    restart: always
-  chaqna-gaudi-ui-server:
-    image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest}
-    container_name: chatqna-gaudi-ui-server
-    depends_on:
-      - chaqna-gaudi-backend-server
-    ports:
-      - "5173:5173"
-    environment:
-      - no_proxy=${no_proxy}
-      - https_proxy=${https_proxy}
-      - http_proxy=${http_proxy}
-      - CHAT_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
-      - UPLOAD_FILE_BASE_URL=${DATAPREP_SERVICE_ENDPOINT}
-      - GET_FILE=${DATAPREP_GET_FILE_ENDPOINT}
-      - DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
-    ipc: host
-    restart: always
-
-networks:
-  default:
-    driver: bridge
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml
@@ -43,20 +43,6 @@ services:
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      MAX_WARMUP_SEQUENCE_LENGTH: 512
    command: --model-id ${EMBEDDING_MODEL_ID}
-  embedding:
-    image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-    container_name: embedding-tei-server
-    depends_on:
-      - tei-embedding-service
-    ports:
-      - "6000:6000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-    restart: unless-stopped
  retriever:
    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
    container_name: retriever-redis-server
@@ -88,23 +74,6 @@ services:
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
    command: --model-id ${RERANK_MODEL_ID} --auto-truncate
-  reranking:
-    image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
-    container_name: reranking-tei-gaudi-server
-    depends_on:
-      - tei-reranking-service
-    ports:
-      - "8000:8000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  vllm-service:
    image: ${REGISTRY:-opea}/llm-vllm-hpu:${TAG:-latest}
    container_name: vllm-gaudi-server
@@ -125,34 +94,15 @@ services:
      - SYS_NICE
    ipc: host
    command: /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048"
-  llm:
-    image: ${REGISTRY:-opea}/llm-vllm:${TAG:-latest}
-    container_name: llm-vllm-gaudi-server
-    depends_on:
-      - vllm-service
-    ports:
-      - "9000:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      vLLM_ENDPOINT: ${vLLM_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      LLM_MODEL: ${LLM_MODEL_ID}
-    restart: unless-stopped
  chaqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
    container_name: chatqna-gaudi-backend-server
    depends_on:
      - redis-vector-db
      - tei-embedding-service
-      - embedding
      - retriever
      - tei-reranking-service
-      - reranking
      - vllm-service
-      - llm
    ports:
      - "8888:8888"
    environment:
@@ -160,11 +110,14 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
+      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
+      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
+      - RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
+      - RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8007}
+      - LOGFLAG=${LOGFLAG}
    ipc: host
    restart: always
  chaqna-gaudi-ui-server:
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm_ray.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm_ray.yaml
@@ -43,20 +43,6 @@ services:
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      MAX_WARMUP_SEQUENCE_LENGTH: 512
    command: --model-id ${EMBEDDING_MODEL_ID}
-  embedding:
-    image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-    container_name: embedding-tei-server
-    depends_on:
-      - tei-embedding-service
-    ports:
-      - "6000:6000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-    restart: unless-stopped
  retriever:
    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
    container_name: retriever-redis-server
@@ -88,23 +74,6 @@ services:
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
    command: --model-id ${RERANK_MODEL_ID} --auto-truncate
-  reranking:
-    image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
-    container_name: reranking-tei-gaudi-server
-    depends_on:
-      - tei-reranking-service
-    ports:
-      - "8000:8000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  vllm-ray-service:
    image: ${REGISTRY:-opea}/llm-vllm-ray-hpu:${TAG:-latest}
    container_name: vllm-ray-gaudi-server
@@ -125,34 +94,15 @@ services:
      - SYS_NICE
    ipc: host
    command: /bin/bash -c "ray start --head && python vllm_ray_openai.py --port_number 8000 --model_id_or_path $LLM_MODEL_ID --tensor_parallel_size 2 --enforce_eager True"
-  llm:
-    image: ${REGISTRY:-opea}/llm-vllm-ray:${TAG:-latest}
-    container_name: llm-vllm-ray-gaudi-server
-    depends_on:
-      - vllm-ray-service
-    ports:
-      - "9000:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      vLLM_RAY_ENDPOINT: ${vLLM_RAY_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      LLM_MODEL: ${LLM_MODEL_ID}
-    restart: unless-stopped
  chaqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
    container_name: chatqna-gaudi-backend-server
    depends_on:
      - redis-vector-db
      - tei-embedding-service
-      - embedding
      - retriever
      - tei-reranking-service
-      - reranking
      - vllm-ray-service
-      - llm
    ports:
      - "8888:8888"
    environment:
@@ -160,11 +110,14 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
+      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
+      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
-      - LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
+      - RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
+      - RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8006}
+      - LOGFLAG=${LOGFLAG}
    ipc: host
    restart: always
  chaqna-gaudi-ui-server:
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml
@@ -45,20 +45,6 @@ services:
      INIT_HCCL_ON_ACQUIRE: 0
      ENABLE_EXPERIMENTAL_FLAGS: true
    command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
-  embedding:
-    image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
-    container_name: embedding-tei-server
-    depends_on:
-      - tei-embedding-service
-    ports:
-      - "6000:6000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
-    restart: unless-stopped
  retriever:
    image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
    container_name: retriever-redis-server
@@ -99,33 +85,14 @@ services:
      - SYS_NICE
    ipc: host
    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
-  llm:
-    image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
-    container_name: llm-tgi-gaudi-server
-    depends_on:
-      - tgi-service
-    ports:
-      - "9000:9000"
-    ipc: host
-    environment:
-      no_proxy: ${no_proxy}
-      http_proxy: ${http_proxy}
-      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
-      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-      HF_HUB_DISABLE_PROGRESS_BARS: 1
-      HF_HUB_ENABLE_HF_TRANSFER: 0
-    restart: unless-stopped
  chaqna-gaudi-backend-server:
    image: ${REGISTRY:-opea}/chatqna-without-rerank:${TAG:-latest}
    container_name: chatqna-gaudi-backend-server
    depends_on:
      - redis-vector-db
      - tei-embedding-service
-      - embedding
      - retriever
      - tgi-service
-      - llm
    ports:
      - "8888:8888"
    environment:
@@ -133,9 +100,12 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
-      - EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
+      - EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
+      - EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
      - RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
-      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
+      - LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
+      - LOGFLAG=${LOGFLAG}
    ipc: host
    restart: always
  chaqna-gaudi-ui-server:
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md
@@ -26,14 +26,6 @@ The warning messages point out the veriabls are **NOT** set.

 ```
 ubuntu@gaudi-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d
-WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
-WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
 WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: `version` is obsolete
 ```

@@ -172,24 +164,7 @@ This test the embedding service. It sends "What is Deep Learning?" to the embedd

 **Note**: The vector dimension are decided by the embedding model and the output value is dependent on model and input data.

-### 2 Embedding Microservice
-
-```
-curl http://${host_ip}:6000/v1/embeddings\
-  -X POST \
-  -d '{"text":"What is Deep Learning?"}' \
-  -H 'Content-Type: application/json'
-```
-
-This test the embedding microservice. In this test, it sends out `What is Deep Learning?` to embedding.
-Embedding microservice get input data, call embedding service to embedding data.
-Embedding server are with NO state, but microservice keep the state. There is `id` in the output of `Embedding Microservice`.
-
-```
-{"id":"e8c85e588a235a4bc4747a23b3a71d8f","text":"What is Deep Learning?","embedding":[0.00030903306,-0.06356524,0.0025720573,-0.012404448,0.050649878, ...,   0.02776986,-0.0246678,0.03999176,0.037477136,-0.006806653,0.02261455,-0.04570737,-0.033122733,0.022785513,0.0160026,-0.021343587,-0.029969815,-0.0049176104]}
-```
-
-### 3 Retriever Microservice
+### 2 Retriever Microservice

 To consume the retriever microservice, you need to generate a mock embedding vector by Python script.
 The length of embedding vector is determined by the embedding model.
@@ -212,7 +187,7 @@ The output is retrieved text that relevant to the input data:

 ```

-### 4 TEI Reranking Service
+### 3 TEI Reranking Service

 Reranking service

@@ -228,24 +203,7 @@ Output is:

 It scores the input

-### 5 Reranking Microservice
-
-```
-curl http://${host_ip}:8000/v1/reranking\
-  -X POST \
-  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-  -H 'Content-Type: application/json'
-```
-
-Here is the output:
-
-```
-{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n    Deep learning is...\n    Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}
-```
-
-You may notice reranking microservice are with state ('ID' and other meta data), while reranking service are not.
-
-### 6 TGI Service
+### 4 TGI Service

 ```
 curl http://${host_ip}:8008/generate \
@@ -277,56 +235,7 @@ and the log shows model warm up, please wait for a while and try it later.
 2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z  INFO text_generation_router: router/src/main.rs:221: Warming up model
 ```

-### 7 LLM Microservice
-
-```
-curl http://${host_ip}:9000/v1/chat/completions\
-  -X POST \
-  -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-  -H 'Content-Type: application/json'
-```
-
-You will get generated text from LLM:
-
-```
-data: b'\n'
-
-data: b'\n'
-
-data: b'Deep'
-
-data: b' learning'
-
-data: b' is'
-
-data: b' a'
-
-data: b' subset'
-
-data: b' of'
-
-data: b' machine'
-
-data: b' learning'
-
-data: b' that'
-
-data: b' uses'
-
-data: b' algorithms'
-
-data: b' to'
-
-data: b' learn'
-
-data: b' from'
-
-data: b' data'
-
-data: [DONE]
-```
-
-### 8 MegaService
+### 5 MegaService

 ```
 curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
--- a/ChatQnA/docker_compose/intel/hpu/gaudi/set_env.sh
+++ b/ChatQnA/docker_compose/intel/hpu/gaudi/set_env.sh
@@ -8,15 +8,13 @@ export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
 export RERANK_MODEL_ID="BAAI/bge-reranker-base"
 export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
 export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
-export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
-export TGI_LLM_ENDPOINT="http://${host_ip}:8005"
 export REDIS_URL="redis://${host_ip}:6379"
 export INDEX_NAME="rag-redis"
 export MEGA_SERVICE_HOST_IP=${host_ip}
-export EMBEDDING_SERVICE_HOST_IP=${host_ip}
+export EMBEDDING_SERVER_HOST_IP=${host_ip}
 export RETRIEVER_SERVICE_HOST_IP=${host_ip}
-export RERANK_SERVICE_HOST_IP=${host_ip}
-export LLM_SERVICE_HOST_IP=${host_ip}
+export RERANK_SERVER_HOST_IP=${host_ip}
+export LLM_SERVER_HOST_IP=${host_ip}
 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
 export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
 export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"