Adding files to deploy AgentQnA application on ROCm vLLM (#1613)

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2025-04-02 10:17:07 +07:00
parent 46a29cc253
commit 5cc047ce34
15 changed files with 1185 additions and 153 deletions
--- a/AgentQnA/docker_compose/amd/gpu/rocm/README.md
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/README.md
@@ -1,101 +1,334 @@
-# Single node on-prem deployment with Docker Compose on AMD GPU
+# Build Mega Service of AgentQnA on AMD ROCm GPU

-This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
+## Build Docker Images

-## Deployment with docker
+### 1. Build Docker Image

-1. First, clone this repo.
-   ```
-   export WORKDIR=<your-work-directory>
-   cd $WORKDIR
-   git clone https://github.com/opea-project/GenAIExamples.git
-   ```
-2. Set up environment for this example </br>
+- #### Create application install directory and go to it:

-   ```
-   # Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
-   export host_ip=$(hostname -I | awk '{print $1}')
-   # if you are in a proxy environment, also set the proxy-related environment variables
-   export http_proxy="Your_HTTP_Proxy"
-   export https_proxy="Your_HTTPs_Proxy"
-   # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
-   export no_proxy="Your_No_Proxy"
+  ```bash
+  mkdir ~/agentqna-install && cd agentqna-install
+  ```

-   export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
-   #OPANAI_API_KEY if you want to use OpenAI models
-   export OPENAI_API_KEY=<your-openai-key>
-   # Set AMD GPU settings
-   export AGENTQNA_CARD_ID="card1"
-   export AGENTQNA_RENDER_ID="renderD136"
-   ```
+- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):

-3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
+  ```bash
+  git clone https://github.com/opea-project/GenAIExamples.git
+  ```

-   First, launch the mega-service.
+  If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):

-   ```
-   cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
-   bash launch_retrieval_tool.sh
-   ```
+  ```bash
+  git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
+  ```

-   Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
+  We remind you that when using a specific version of the code, you need to use the README from this version:

-   ```
-   bash run_ingest_data.sh
-   ```
+- #### Go to build directory:

-4. Launch Tool service
-   In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
-   ```
-   docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
-   ```
-5. Launch `Agent` service
+  ```bash
+  cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_image_build
+  ```

-   ```
-   cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
-   bash launch_agent_service_tgi_rocm.sh
-   ```
+- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
+  This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:

-6. [Optional] Build `Agent` docker image if pulling images failed.
+  ```bash
+  echo Y | rm -R GenAIComps
+  ```

-   ```
-   git clone https://github.com/opea-project/GenAIComps.git
-   cd GenAIComps
-   docker build -t opea/agent:latest -f comps/agent/src/Dockerfile .
-   ```
+- #### Clone the repository GenAIComps (the default repository branch "main" is used here):

-## Validate services
-
-First look at logs of the agent docker containers:
-
-```
-# worker agent
-docker logs rag-agent-endpoint
+```bash
+git clone https://github.com/opea-project/GenAIComps.git
 ```

-```
-# supervisor agent
-docker logs react-agent-endpoint
+We remind you that when using a specific version of the code, you need to use the README from this version.
+
+- #### Setting the list of images for the build (from the build file.yaml)
+
+  If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
+
+  #### vLLM-based application
+
+  ```bash
+  service_list="vllm-rocm agent agent-ui"
+  ```
+
+  #### TGI-based application
+
+  ```bash
+  service_list="agent agent-ui"
+  ```
+
+- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
+
+  ```bash
+  docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+  ```
+
+- #### Build Docker Images
+
+  ```bash
+  docker compose -f build.yaml build ${service_list} --no-cache
+  ```
+
+- #### Build DocIndexRetriever Docker Images
+
+  ```bash
+  cd ~/agentqna-install/GenAIExamples/DocIndexRetriever/docker_image_build/
+  git clone https://github.com/opea-project/GenAIComps.git
+  service_list="doc-index-retriever dataprep embedding retriever reranking"
+  docker compose -f build.yaml build ${service_list} --no-cache
+  ```
+
+- #### Pull DocIndexRetriever Docker Images
+
+  ```bash
+  docker pull redis/redis-stack:7.2.0-v9
+  docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
+  ```
+
+  After the build, we check the list of images with the command:
+
+  ```bash
+  docker image ls
+  ```
+
+  The list of images should include:
+
+  ##### vLLM-based application:
+
+  - opea/vllm-rocm:latest
+  - opea/agent:latest
+  - redis/redis-stack:7.2.0-v9
+  - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
+  - opea/embedding:latest
+  - opea/retriever:latest
+  - opea/reranking:latest
+  - opea/doc-index-retriever:latest
+
+  ##### TGI-based application:
+
+  - ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
+  - opea/agent:latest
+  - redis/redis-stack:7.2.0-v9
+  - ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
+  - opea/embedding:latest
+  - opea/retriever:latest
+  - opea/reranking:latest
+  - opea/doc-index-retriever:latest
+
+---
+
+## Deploy the AgentQnA Application
+
+### Docker Compose Configuration for AMD GPUs
+
+To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
+
+- compose_vllm.yaml - for vLLM-based application
+- compose.yaml - for TGI-based
+
+```yaml
+shm_size: 1g
+devices:
+  - /dev/kfd:/dev/kfd
+  - /dev/dri:/dev/dri
+cap_add:
+  - SYS_PTRACE
+group_add:
+  - video
+security_opt:
+  - seccomp:unconfined
 ```

-You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
+This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:

-Second, validate worker agent:
-
-```
-curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
-     "query": "Most recent album by Taylor Swift"
-    }'
+```yaml
+shm_size: 1g
+devices:
+  - /dev/kfd:/dev/kfd
+  - /dev/dri/card0:/dev/dri/card0
+  - /dev/dri/render128:/dev/dri/render128
+cap_add:
+  - SYS_PTRACE
+group_add:
+  - video
+security_opt:
+  - seccomp:unconfined
 ```

-Third, validate supervisor agent:
+**How to Identify GPU Device IDs:**
+Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.

-```
-curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
-     "query": "Most recent album by Taylor Swift"
-    }'
+### Set deploy environment variables
+
+#### Setting variables in the operating system environment:
+
+```bash
+### Replace the string 'server_address' with your local server IP address
+export host_ip='server_address'
+### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
+export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
+### Replace the string 'your_langchain_api_key' with your LANGCHAIN API KEY.
+export LANGCHAIN_API_KEY='your_langchain_api_key'
+export LANGCHAIN_TRACING_V2=""
 ```

-## How to register your own tools with agent
+### Start the services:

-You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md).
+#### If you use vLLM
+
+```bash
+cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
+bash launch_agent_service_vllm_rocm.sh
+```
+
+#### If you use TGI
+
+```bash
+cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
+bash launch_agent_service_tgi_rocm.sh
+```
+
+All containers should be running and should not restart:
+
+##### If you use vLLM:
+
+- dataprep-redis-server
+- doc-index-retriever-server
+- embedding-server
+- rag-agent-endpoint
+- react-agent-endpoint
+- redis-vector-db
+- reranking-tei-xeon-server
+- retriever-redis-server
+- sql-agent-endpoint
+- tei-embedding-server
+- tei-reranking-server
+- vllm-service
+
+##### If you use TGI:
+
+- agentqna-tgi-service
+- whisper-service
+- speecht5-service
+- agentqna-backend-server
+- agentqna-ui-server
+
+---
+
+## Validate the Services
+
+### 1. Validate the vLLM/TGI Service
+
+#### If you use vLLM:
+
+```bash
+DATA='{"model": "Intel/neural-chat-7b-v3-3t", '\
+'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}'
+
+curl http://${HOST_IP}:${AUDIOQNA_VLLM_SERVICE_PORT}/v1/chat/completions \
+  -X POST \
+  -d "$DATA" \
+  -H 'Content-Type: application/json'
+```
+
+Checking the response from the service. The response should be similar to JSON:
+
+```json
+{
+  "id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
+  "object": "chat.completion",
+  "created": 1742270316,
+  "model": "Intel/neural-chat-7b-v3-3",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "",
+        "tool_calls": []
+      },
+      "logprobs": null,
+      "finish_reason": "length",
+      "stop_reason": null
+    }
+  ],
+  "usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
+  "prompt_logprobs": null
+}
+```
+
+If the service response has a meaningful response in the value of the "choices.message.content" key,
+then we consider the vLLM service to be successfully launched
+
+#### If you use TGI:
+
+```bash
+DATA='{"inputs":"What is Deep Learning?",'\
+'"parameters":{"max_new_tokens":256,"do_sample": true}}'
+
+curl http://${HOST_IP}:${AUDIOQNA_TGI_SERVICE_PORT}/generate \
+  -X POST \
+  -d "$DATA" \
+  -H 'Content-Type: application/json'
+```
+
+Checking the response from the service. The response should be similar to JSON:
+
+```json
+{
+  "generated_text": " "
+}
+```
+
+If the service response has a meaningful response in the value of the "generated_text" key,
+then we consider the TGI service to be successfully launched
+
+### 2. Validate MegaServices
+
+Test the AgentQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
+base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
+to the response, decode the base64 string and save it as a .wav file.
+
+```bash
+# voice can be "default" or "male"
+curl http://${host_ip}:3008/v1/agentqna \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
+  -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
+```
+
+### 3. Validate MicroServices
+
+```bash
+# whisper service
+curl http://${host_ip}:7066/v1/asr \
+  -X POST \
+  -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
+  -H 'Content-Type: application/json'
+
+# speecht5 service
+curl http://${host_ip}:7055/v1/tts \
+  -X POST \
+  -d '{"text": "Who are you?"}' \
+  -H 'Content-Type: application/json'
+```
+
+### 4. Stop application
+
+#### If you use vLLM
+
+```bash
+cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
+docker compose -f compose_vllm.yaml down
+```
+
+#### If you use TGI
+
+```bash
+cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
+docker compose -f compose.yaml down
+```
--- a/AgentQnA/docker_compose/amd/gpu/rocm/compose.yaml
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/compose.yaml
@@ -1,26 +1,24 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
+# Copyright (C) 2025 Advanced Micro Devices, Inc.

 services:
-  agent-tgi-server:
-    image: ${AGENTQNA_TGI_IMAGE}
-    container_name: agent-tgi-server
+  tgi-service:
+    image: ghcr.io/huggingface/text-generation-inference:3.0.0-rocm
+    container_name: tgi-service
    ports:
-      - "${AGENTQNA_TGI_SERVICE_PORT-8085}:80"
+      - "${TGI_SERVICE_PORT-8085}:80"
    volumes:
-      - ${HF_CACHE_DIR:-/var/opea/agent-service/}:/data
+      - "${MODEL_CACHE:-./data}:/data"
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
-      TGI_LLM_ENDPOINT: "http://${HOST_IP}:${AGENTQNA_TGI_SERVICE_PORT}"
+      TGI_LLM_ENDPOINT: "http://${ip_address}:${TGI_SERVICE_PORT}"
      HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
-    shm_size: 1g
+    shm_size: 32g
    devices:
      - /dev/kfd:/dev/kfd
-      - /dev/dri/${AGENTQNA_CARD_ID}:/dev/dri/${AGENTQNA_CARD_ID}
-      - /dev/dri/${AGENTQNA_RENDER_ID}:/dev/dri/${AGENTQNA_RENDER_ID}
+      - /dev/dri:/dev/dri
    cap_add:
      - SYS_PTRACE
    group_add:
@@ -34,14 +32,14 @@ services:
    image: opea/agent:latest
    container_name: rag-agent-endpoint
    volumes:
-      # - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
-      - ${TOOLSET_PATH}:/home/user/tools/
+      - "${TOOLSET_PATH}:/home/user/tools/"
    ports:
-      - "9095:9095"
+      - "${WORKER_RAG_AGENT_PORT:-9095}:9095"
    ipc: host
    environment:
      ip_address: ${ip_address}
      strategy: rag_agent_llama
+      with_memory: false
      recursion_limit: ${recursion_limit_worker}
      llm_engine: tgi
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
@@ -61,21 +59,49 @@ services:
      LANGCHAIN_PROJECT: "opea-worker-agent-service"
      port: 9095

+  worker-sql-agent:
+    image: opea/agent:latest
+    container_name: sql-agent-endpoint
+    volumes:
+      - "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
+    ports:
+      - "${WORKER_SQL_AGENT_PORT:-9096}:9096"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: sql_agent_llama
+      with_memory: false
+      db_name: ${db_name}
+      db_path: ${db_path}
+      use_hints: false
+      recursion_limit: ${recursion_limit_worker}
+      llm_engine: vllm
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      stream: false
+      require_human_feedback: false
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      port: 9096
+
  supervisor-react-agent:
    image: opea/agent:latest
    container_name: react-agent-endpoint
    depends_on:
-      - agent-tgi-server
      - worker-rag-agent
    volumes:
-      # - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
-      - ${TOOLSET_PATH}:/home/user/tools/
+      - "${TOOLSET_PATH}:/home/user/tools/"
    ports:
-      - "${AGENTQNA_FRONTEND_PORT}:9090"
+      - "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
    ipc: host
    environment:
      ip_address: ${ip_address}
-      strategy: react_langgraph
+      strategy: react_llama
+      with_memory: true
      recursion_limit: ${recursion_limit_supervisor}
      llm_engine: tgi
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
@@ -83,7 +109,7 @@ services:
      model: ${LLM_MODEL_ID}
      temperature: ${temperature}
      max_new_tokens: ${max_new_tokens}
-      stream: false
+      stream: true
      tools: /home/user/tools/supervisor_agent_tools.yaml
      require_human_feedback: false
      no_proxy: ${no_proxy}
@@ -92,6 +118,7 @@ services:
      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
      LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
-      CRAG_SERVER: $CRAG_SERVER
-      WORKER_AGENT_URL: $WORKER_AGENT_URL
+      CRAG_SERVER: ${CRAG_SERVER}
+      WORKER_AGENT_URL: ${WORKER_AGENT_URL}
+      SQL_AGENT_URL: ${SQL_AGENT_URL}
      port: 9090
--- a/AgentQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml
@@ -0,0 +1,128 @@
+# Copyright (C) 2025 Advanced Micro Devices, Inc.
+
+services:
+  vllm-service:
+    image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
+    container_name: vllm-service
+    ports:
+      - "${VLLM_SERVICE_PORT:-8081}:8011"
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      HF_HUB_DISABLE_PROGRESS_BARS: 1
+      HF_HUB_ENABLE_HF_TRANSFER: 0
+      WILM_USE_TRITON_FLASH_ATTENTION: 0
+      PYTORCH_JIT: 0
+    volumes:
+      - "${MODEL_CACHE:-./data}:/data"
+    shm_size: 20G
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri/:/dev/dri/
+    cap_add:
+      - SYS_PTRACE
+    group_add:
+      - video
+    security_opt:
+      - seccomp:unconfined
+      - apparmor=unconfined
+    command: "--model ${VLLM_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\""
+    ipc: host
+
+  worker-rag-agent:
+    image: opea/agent:latest
+    container_name: rag-agent-endpoint
+    volumes:
+      - ${TOOLSET_PATH}:/home/user/tools/
+    ports:
+      - "${WORKER_RAG_AGENT_PORT:-9095}:9095"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: rag_agent_llama
+      with_memory: false
+      recursion_limit: ${recursion_limit_worker}
+      llm_engine: vllm
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      stream: false
+      tools: /home/user/tools/worker_agent_tools.yaml
+      require_human_feedback: false
+      RETRIEVAL_TOOL_URL: ${RETRIEVAL_TOOL_URL}
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
+      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
+      LANGCHAIN_PROJECT: "opea-worker-agent-service"
+      port: 9095
+
+  worker-sql-agent:
+    image: opea/agent:latest
+    container_name: sql-agent-endpoint
+    volumes:
+      - "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
+    ports:
+      - "${WORKER_SQL_AGENT_PORT:-9096}:9096"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: sql_agent_llama
+      with_memory: false
+      db_name: ${db_name}
+      db_path: ${db_path}
+      use_hints: false
+      recursion_limit: ${recursion_limit_worker}
+      llm_engine: vllm
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      stream: false
+      require_human_feedback: false
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      port: 9096
+
+  supervisor-react-agent:
+    image: opea/agent:latest
+    container_name: react-agent-endpoint
+    depends_on:
+      - worker-rag-agent
+    volumes:
+      - ${TOOLSET_PATH}:/home/user/tools/
+    ports:
+      - "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
+    ipc: host
+    environment:
+      ip_address: ${ip_address}
+      strategy: react_llama
+      with_memory: true
+      recursion_limit: ${recursion_limit_supervisor}
+      llm_engine: vllm
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      llm_endpoint_url: ${LLM_ENDPOINT_URL}
+      model: ${LLM_MODEL_ID}
+      temperature: ${temperature}
+      max_new_tokens: ${max_new_tokens}
+      stream: true
+      tools: /home/user/tools/supervisor_agent_tools.yaml
+      require_human_feedback: false
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
+      LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
+      LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
+      CRAG_SERVER: ${CRAG_SERVER}
+      WORKER_AGENT_URL: ${WORKER_AGENT_URL}
+      SQL_AGENT_URL: ${SQL_AGENT_URL}
+      port: 9090
--- a/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_tgi_rocm.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_tgi_rocm.sh
@@ -1,47 +1,87 @@
 # Copyright (C) 2024 Advanced Micro Devices, Inc.
 # SPDX-License-Identifier: Apache-2.0

-WORKPATH=$(dirname "$PWD")/..
-export ip_address=${host_ip}
-export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
-export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.4.1-rocm
-export AGENTQNA_TGI_SERVICE_PORT="8085"
+# Before start script:
+# export host_ip="your_host_ip_or_host_name"
+# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
+# export LANGCHAIN_API_KEY="your_langchain_api_key"
+# export LANGCHAIN_TRACING_V2=""

-# LLM related environment variables
-export AGENTQNA_CARD_ID="card1"
-export AGENTQNA_RENDER_ID="renderD136"
-export HF_CACHE_DIR=${HF_CACHE_DIR}
-ls $HF_CACHE_DIR
-export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
-#export NUM_SHARDS=4
-export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
+# Set server hostname or IP address
+export ip_address=${host_ip}
+
+# Set services IP ports
+export TGI_SERVICE_PORT="18110"
+export WORKER_RAG_AGENT_PORT="18111"
+export WORKER_SQL_AGENT_PORT="18112"
+export SUPERVISOR_REACT_AGENT_PORT="18113"
+export CRAG_SERVER_PORT="18114"
+
+export WORKPATH=$(dirname "$PWD")
+export WORKDIR=${WORKPATH}/../../../
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
+export HF_CACHE_DIR="./data"
+export MODEL_CACHE="./data"
+export TOOLSET_PATH=${WORKPATH}/../../../tools/
+export recursion_limit_worker=12
+export LLM_ENDPOINT_URL=http://${ip_address}:${TGI_SERVICE_PORT}
 export temperature=0.01
 export max_new_tokens=512
-
-# agent related environment variables
-export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
-export TOOLSET_PATH=/home/huggingface/datamonsters/amd-opea/GenAIExamples/AgentQnA/tools/
-echo "TOOLSET_PATH=${TOOLSET_PATH}"
+export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
+export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
+export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
+export db_name=Chinook
+export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
 export recursion_limit_worker=12
 export recursion_limit_supervisor=10
-export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
-export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
-export CRAG_SERVER=http://${ip_address}:18881
-
-export AGENTQNA_FRONTEND_PORT="9090"
-
-#retrieval_tool
+export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
+export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
+export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
+export HF_CACHE_DIR=${HF_CACHE_DIR}
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export no_proxy=${no_proxy}
+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}
+export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+export RERANK_MODEL_ID="BAAI/bge-reranker-base"
 export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
 export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
-export REDIS_URL="redis://${host_ip}:26379"
+export REDIS_URL="redis://${host_ip}:6379"
 export INDEX_NAME="rag-redis"
+export RERANK_TYPE="tei"
 export MEGA_SERVICE_HOST_IP=${host_ip}
 export EMBEDDING_SERVICE_HOST_IP=${host_ip}
 export RETRIEVER_SERVICE_HOST_IP=${host_ip}
 export RERANK_SERVICE_HOST_IP=${host_ip}
 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
 export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
-export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
-export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
+export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
+export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"

+echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
+echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
+echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
+echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
+
+echo "Downloading chinook data..."
+echo Y | rm -R chinook-database
+git clone https://github.com/lerocha/chinook-database.git
+echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
+cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
+
+docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
 docker compose -f compose.yaml up -d
+
+n=0
+until [[ "$n" -ge 100 ]]; do
+    docker logs tgi-service > ${WORKPATH}/tgi_service_start.log
+    if grep -q Connected ${WORKPATH}/tgi_service_start.log; then
+        break
+    fi
+    sleep 10s
+    n=$((n+1))
+done
+
+echo "Starting CRAG server"
+docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
--- a/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_vllm_rocm.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/launch_agent_service_vllm_rocm.sh
@@ -0,0 +1,88 @@
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+# Before start script:
+# export host_ip="your_host_ip_or_host_name"
+# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
+# export LANGCHAIN_API_KEY="your_langchain_api_key"
+# export LANGCHAIN_TRACING_V2=""
+
+# Set server hostname or IP address
+export ip_address=${host_ip}
+
+# Set services IP ports
+export VLLM_SERVICE_PORT="18110"
+export WORKER_RAG_AGENT_PORT="18111"
+export WORKER_SQL_AGENT_PORT="18112"
+export SUPERVISOR_REACT_AGENT_PORT="18113"
+export CRAG_SERVER_PORT="18114"
+
+export WORKPATH=$(dirname "$PWD")
+export WORKDIR=${WORKPATH}/../../../
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
+export HF_CACHE_DIR="./data"
+export MODEL_CACHE="./data"
+export TOOLSET_PATH=${WORKPATH}/../../../tools/
+export recursion_limit_worker=12
+export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
+export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
+export temperature=0.01
+export max_new_tokens=512
+export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
+export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
+export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
+export db_name=Chinook
+export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
+export recursion_limit_worker=12
+export recursion_limit_supervisor=10
+export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
+export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
+export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
+export HF_CACHE_DIR=${HF_CACHE_DIR}
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export no_proxy=${no_proxy}
+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}
+export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+export RERANK_MODEL_ID="BAAI/bge-reranker-base"
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
+export REDIS_URL="redis://${host_ip}:6379"
+export INDEX_NAME="rag-redis"
+export RERANK_TYPE="tei"
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export EMBEDDING_SERVICE_HOST_IP=${host_ip}
+export RETRIEVER_SERVICE_HOST_IP=${host_ip}
+export RERANK_SERVICE_HOST_IP=${host_ip}
+export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
+export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
+export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
+export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
+
+echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
+echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
+echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
+echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
+
+echo "Downloading chinook data..."
+echo Y | rm -R chinook-database
+git clone https://github.com/lerocha/chinook-database.git
+echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
+cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
+
+docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
+docker compose -f compose_vllm.yaml up -d
+
+n=0
+until [[ "$n" -ge 500 ]]; do
+    docker logs vllm-service >& "${WORKPATH}"/vllm-service_start.log
+    if grep -q "Application startup complete" "${WORKPATH}"/vllm-service_start.log; then
+        break
+    fi
+    sleep 20s
+    n=$((n+1))
+done
+
+echo "Starting CRAG server"
+docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
--- a/AgentQnA/docker_compose/amd/gpu/rocm/stop_agent_service_tgi_rocm.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/stop_agent_service_tgi_rocm.sh
@@ -14,7 +14,7 @@ export AGENTQNA_CARD_ID="card1"
 export AGENTQNA_RENDER_ID="renderD136"
 export HF_CACHE_DIR=${HF_CACHE_DIR}
 ls $HF_CACHE_DIR
-export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
+export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
 export NUM_SHARDS=4
 export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
 export temperature=0.01
@@ -44,3 +44,19 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
 export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
 export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
 export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
+
+echo "Removing chinook data..."
+echo Y | rm -R chinook-database
+if [ -d "chinook-database" ]; then
+    rm -rf chinook-database
+fi
+echo "Chinook data removed!"
+
+echo "Stopping CRAG server"
+docker rm kdd-cup-24-crag-service --force
+
+echo "Stopping Agent services"
+docker compose -f compose.yaml down
+
+echo "Stopping Retrieval services"
+docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down
--- a/AgentQnA/docker_compose/amd/gpu/rocm/stop_agent_service_vllm_rocm.sh
+++ b/AgentQnA/docker_compose/amd/gpu/rocm/stop_agent_service_vllm_rocm.sh
@@ -0,0 +1,84 @@
+# Copyright (C) 2024 Advanced Micro Devices, Inc.
+# SPDX-License-Identifier: Apache-2.0
+
+
+# Before start script:
+# export host_ip="your_host_ip_or_host_name"
+# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
+# export LANGCHAIN_API_KEY="your_langchain_api_key"
+# export LANGCHAIN_TRACING_V2=""
+
+# Set server hostname or IP address
+export ip_address=${host_ip}
+
+# Set services IP ports
+export VLLM_SERVICE_PORT="18110"
+export WORKER_RAG_AGENT_PORT="18111"
+export WORKER_SQL_AGENT_PORT="18112"
+export SUPERVISOR_REACT_AGENT_PORT="18113"
+export CRAG_SERVER_PORT="18114"
+
+export WORKPATH=$(dirname "$PWD")
+export WORKDIR=${WORKPATH}/../../../
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
+export HF_CACHE_DIR="./data"
+export MODEL_CACHE="./data"
+export TOOLSET_PATH=${WORKPATH}/../../../tools/
+export recursion_limit_worker=12
+export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
+export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
+export temperature=0.01
+export max_new_tokens=512
+export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
+export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
+export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
+export db_name=Chinook
+export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
+export recursion_limit_worker=12
+export recursion_limit_supervisor=10
+export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
+export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
+export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
+export HF_CACHE_DIR=${HF_CACHE_DIR}
+export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+export no_proxy=${no_proxy}
+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}
+export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+export RERANK_MODEL_ID="BAAI/bge-reranker-base"
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
+export REDIS_URL="redis://${host_ip}:6379"
+export INDEX_NAME="rag-redis"
+export RERANK_TYPE="tei"
+export MEGA_SERVICE_HOST_IP=${host_ip}
+export EMBEDDING_SERVICE_HOST_IP=${host_ip}
+export RETRIEVER_SERVICE_HOST_IP=${host_ip}
+export RERANK_SERVICE_HOST_IP=${host_ip}
+export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
+export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
+export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
+export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
+
+echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
+echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
+echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
+echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
+
+echo "Removing chinook data..."
+echo Y | rm -R chinook-database
+if [ -d "chinook-database" ]; then
+    rm -rf chinook-database
+fi
+echo "Chinook data removed!"
+
+echo "Stopping CRAG server"
+docker rm kdd-cup-24-crag-service --force
+
+echo "Stopping Agent services"
+docker compose -f compose_vllm.yaml down
+
+echo "Stopping Retrieval services"
+docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down