update AgentQnA (#1790)

Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-11 13:33:19 -07:00
parent 8d421b7912
commit 58b47c15c6
6 changed files with 58 additions and 45 deletions
--- a/AgentQnA/README.md
+++ b/AgentQnA/README.md
@@ -4,7 +4,7 @@

 1. [Overview](#overview)
 2. [Deploy with Docker](#deploy-with-docker)
-3. [Launch the UI](#launch-the-ui)
+3. [How to interact with the agent system with UI](#how-to-interact-with-the-agent-system-with-ui)
 4. [Validate Services](#validate-services)
 5. [Register Tools](#how-to-register-other-tools-with-the-ai-agent)

@@ -144,21 +144,19 @@ source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon/set_env.sh

 ### 2. Launch the multi-agent system. </br>

-Two options are provided for the `llm_engine` of the agents: 1. open-source LLMs on Gaudi, 2. OpenAI models via API calls.
+We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.

-#### Gaudi
+#### Launch on Gaudi

-On Gaudi, `meta-llama/Meta-Llama-3.1-70B-Instruct` will be served using vllm.
-By default, both the RAG agent and SQL agent will be launched to support the React Agent.  
-The React Agent requires the DocIndexRetriever's [`compose.yaml`](../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml) file, so two `compose.yaml` files need to be run with docker compose to start the multi-agent system.
-
-> **Note**: To enable the web search tool, skip this step and proceed to the "[Optional] Web Search Tool Support" section.
+On Gaudi, `meta-llama/Meta-Llama-3.3-70B-Instruct` will be served using vllm. The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.

 ```bash
 cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
 docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml up -d
 ```

+> **Note**: To enable the web search tool, skip this step and proceed to the "[Optional] Web Search Tool Support" section.
+
 To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
 Gaudi example with Open Telemetry feature:

@@ -183,11 +181,9 @@ docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/

 </details>

-#### Xeon
+#### Launch on Xeon

-On Xeon, only OpenAI models are supported.
-By default, both the RAG Agent and SQL Agent will be launched to support the React Agent.  
-The React Agent requires the DocIndexRetriever's [`compose.yaml`](../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml) file, so two `compose yaml` files need to be run with docker compose to start the multi-agent system.
+On Xeon, only OpenAI models are supported. The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.

 ```bash
 export OPENAI_API_KEY=<your-openai-key>
@@ -206,9 +202,10 @@ bash run_ingest_data.sh

 > **Note**: This is a one-time operation.

-## Launch the UI
+## How to interact with the agent system with UI

-Open a web browser to http://localhost:5173 to access the UI.
+The UI microservice is launched in the previous step with the other microservices.
+To see the UI, open a web browser to `http://${ip_address}:5173` to access the UI. Note the `ip_address` here is the host IP of the UI microservice.

 1. `create Admin Account` with a random value
 2. add opea agent endpoint `http://$ip_address:9090/v1` which is a openai compatible api
--- a/AgentQnA/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/AgentQnA/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -104,7 +104,7 @@ services:
      - "8080:8000"
    ipc: host
  agent-ui:
-    image: opea/agent-ui
+    image: opea/agent-ui:latest
    container_name: agent-ui
    environment:
      host_ip: ${host_ip}
@@ -138,4 +138,4 @@ services:
    cap_add:
      - SYS_NICE
    ipc: host
-    command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 16384
+    command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --block-size 128 --max-num-seqs 256 --max-seq-len-to-capture 16384
--- a/AgentQnA/retrieval_tool/run_ingest_data.sh
+++ b/AgentQnA/retrieval_tool/run_ingest_data.sh
@@ -1,7 +1,9 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

+host_ip=$(hostname -I | awk '{print $1}')
+port=6007
 FILEDIR=${WORKDIR}/GenAIExamples/AgentQnA/example_data/
 FILENAME=test_docs_music.jsonl

-python3 index_data.py --filedir ${FILEDIR} --filename ${FILENAME} --host_ip $host_ip
+python3 index_data.py --filedir ${FILEDIR} --filename ${FILENAME} --host_ip $host_ip --port $port
--- a/AgentQnA/tests/step4_launch_and_validate_agent_gaudi.sh
+++ b/AgentQnA/tests/step4_launch_and_validate_agent_gaudi.sh
@@ -8,6 +8,8 @@ WORKPATH=$(dirname "$PWD")
 export WORKDIR=$WORKPATH/../../
 echo "WORKDIR=${WORKDIR}"
 export ip_address=$(hostname -I | awk '{print $1}')
+export host_ip=$ip_address
+echo "ip_address=${ip_address}"
 export TOOLSET_PATH=$WORKPATH/tools/
 export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
 HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
@@ -24,12 +26,12 @@ ls $HF_CACHE_DIR
 vllm_port=8086
 vllm_volume=${HF_CACHE_DIR}

-function start_tgi(){
-    echo "Starting tgi-gaudi server"
+
+function start_agent_service() {
+    echo "Starting agent service"
    cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
    source set_env.sh
-    docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml tgi_gaudi.yaml -f compose.telemetry.yaml up -d
-
+    docker compose -f compose.yaml up -d
 }

 function start_all_services() {
@@ -69,7 +71,6 @@ function download_chinook_data(){
    cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite $WORKDIR/GenAIExamples/AgentQnA/tests/
 }

-
 function validate() {
    local CONTENT="$1"
    local EXPECTED_RESULT="$2"
@@ -138,24 +139,6 @@ function remove_chinook_data(){
    echo "Chinook data removed!"
 }

-export host_ip=$ip_address
-echo "ip_address=${ip_address}"
-
-
-function validate() {
-    local CONTENT="$1"
-    local EXPECTED_RESULT="$2"
-    local SERVICE_NAME="$3"
-
-    if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
-        echo "[ $SERVICE_NAME ] Content is as expected: $CONTENT"
-        echo 0
-    else
-        echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
-        echo 1
-    fi
-}
-
 function ingest_data_and_validate() {
    echo "Ingesting data"
    cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool/
--- a/AgentQnA/tests/test_compose_on_gaudi.sh
+++ b/AgentQnA/tests/test_compose_on_gaudi.sh
@@ -26,15 +26,39 @@ function build_agent_docker_image() {
    docker compose -f build.yaml build --no-cache
 }

+function build_retrieval_docker_image() {
+    cd $WORKDIR/GenAIExamples/DocIndexRetriever/docker_image_build/
+    get_genai_comps
+    echo "Build retrieval image with --no-cache..."
+    docker compose -f build.yaml build --no-cache
+}
+
 function stop_crag() {
    cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
    echo "Stopping container kdd-cup-24-crag-service with cid $cid"
    if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
 }

-function stop_agent_docker() {
+function stop_agent_containers() {
    cd $WORKPATH/docker_compose/intel/hpu/gaudi/
-    docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml down
+    container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
+    for container_name in $container_list; do
+        cid=$(docker ps -aq --filter "name=$container_name")
+        echo "Stopping container $container_name"
+        if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
+    done
+}
+
+function stop_telemetry_containers(){
+    cd $WORKPATH/docker_compose/intel/hpu/gaudi/
+    container_list=$(cat compose.telemetry.yaml | grep container_name | cut -d':' -f2)
+    for container_name in $container_list; do
+        cid=$(docker ps -aq --filter "name=$container_name")
+        echo "Stopping container $container_name"
+        if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
+    done
+    container_list=$(cat compose.telemetry.yaml | grep container_name | cut -d':' -f2)
+
 }

 function stop_llm(){
@@ -69,12 +93,16 @@ function stop_retrieval_tool() {
 }
 echo "workpath: $WORKPATH"
 echo "=================== Stop containers ===================="
+stop_llm
 stop_crag
-stop_agent_docker
+stop_agent_containers
+stop_retrieval_tool
+stop_telemetry_containers

 cd $WORKPATH/tests

 echo "=================== #1 Building docker images===================="
+build_retrieval_docker_image
 build_agent_docker_image
 echo "=================== #1 Building docker images completed===================="

@@ -83,8 +111,11 @@ bash $WORKPATH/tests/step4_launch_and_validate_agent_gaudi.sh
 echo "=================== #4 Agent, retrieval test passed ===================="

 echo "=================== #5 Stop agent and API server===================="
+stop_llm
 stop_crag
-stop_agent_docker
+stop_agent_containers
+stop_retrieval_tool
+stop_telemetry_containers
 echo "=================== #5 Agent and API server stopped===================="

 echo y | docker system prune
--- a/AgentQnA/tools/worker_agent_tools.py
+++ b/AgentQnA/tools/worker_agent_tools.py
@@ -12,7 +12,7 @@ def search_knowledge_base(query: str) -> str:
    print(url)
    proxies = {"http": ""}
    payload = {
-        "text": query,
+        "messages": query,
    }
    response = requests.post(url, json=payload, proxies=proxies)
    print(response)