Integrate docker images into compose yaml file to simplify the run instructions. fix ui ip issue and add web search tool support (#1656)

Integrate docker images into compose yaml file to simplify the run instructions. fix ui ip issue and add web search tool support

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: alexsin368 <alex.sin@intel.com>
This commit is contained in:
Louie Tsai
2025-03-20 18:42:20 -07:00
committed by GitHub
parent 6d24c1c77a
commit e8f2313e07
15 changed files with 466 additions and 512 deletions

View File

@@ -1,121 +1,3 @@
# Single node on-prem deployment with Docker Compose on Xeon Scalable processors
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
#OPANAI_API_KEY if you want to use OpenAI models
export OPENAI_API_KEY=<your-openai-key>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Prepare SQL database
In this example, we will use the Chinook SQLite database. Run the commands below.
```
# Download data
cd $WORKDIR
git clone https://github.com/lerocha/chinook-database.git
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite $WORKDIR/GenAIExamples/AgentQnA/tests/
```
5. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
6. Launch multi-agent system
The configurations of the supervisor agent and the worker agents are defined in the docker-compose yaml file. We currently use OpenAI GPT-4o-mini as LLM.
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
bash launch_agent_service_openai.sh
```
7. [Optional] Build `Agent` docker image if pulling images failed.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent:latest -f comps/agent/src/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker RAG agent
docker logs rag-agent-endpoint
# worker SQL agent
docker logs sql-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker RAG agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "Michael Jackson song Thriller"
}'
```
Third, validate worker SQL agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "How many employees are in the company?"
}'
```
Finally, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "How many albums does Iron Maiden have?"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md).
This example showcases a hierarchical multi-agent system for question-answering applications. To deploy the example on Xeon, OpenAI LLM models via API calls are used. For instructions, refer to the deployment guide [here](../../../../README.md).

View File

@@ -94,6 +94,20 @@ services:
WORKER_AGENT_URL: $WORKER_AGENT_URL
SQL_AGENT_URL: $SQL_AGENT_URL
port: 9090
mock-api:
image: docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
container_name: mock-api
ports:
- "8080:8000"
ipc: host
agent-ui:
image: opea:agent-ui
container_name: agent-ui
volumes:
- ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env:/home/user/svelte/.env # test db
ports:
- "5173:5173"
ipc: host
networks:
default:

View File

@@ -1,22 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export ip_address=$(hostname -I | awk '{print $1}')
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export model="gpt-4o-mini-2024-07-18"
export temperature=0
export max_new_tokens=4096
export OPENAI_API_KEY=${OPENAI_API_KEY}
export WORKER_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:9096/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:8080
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
docker compose -f compose_openai.yaml up -d

View File

@@ -0,0 +1,57 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
if [[ -z "${WORKDIR}" ]]; then
echo "Please set WORKDIR environment variable"
exit 0
fi
echo "WORKDIR=${WORKDIR}"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export ip_address=$(hostname -I | awk '{print $1}')
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export model="gpt-4o-mini-2024-07-18"
export temperature=0
export max_new_tokens=4096
export OPENAI_API_KEY=${OPENAI_API_KEY}
export WORKER_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:9096/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:8080
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
if [ ! -f $WORKDIR/GenAIExamples/AgentQnA/tests/Chinook_Sqlite.sqlite ]; then
echo "Download Chinook_Sqlite!"
wget -O $WORKDIR/GenAIExamples/AgentQnA/tests/Chinook_Sqlite.sqlite https://github.com/lerocha/chinook-database/releases/download/v1.4.5/Chinook_Sqlite.sqlite
fi
# retriever
export host_ip=$(hostname -I | awk '{print $1}')
export HF_CACHE_DIR=${HF_CACHE_DIR}
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export no_proxy=${no_proxy}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export RERANK_TYPE="tei"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
export no_proxy="$no_proxy,rag-agent-endpoint,sql-agent-endpoint,react-agent-endpoint,agent-ui"

View File

@@ -1,149 +1,3 @@
# Single node on-prem deployment AgentQnA on Gaudi
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Gaudi using open-source LLMs.
For more details, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
# for using open-source llms
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
# Example export HF_CACHE_DIR=$WORKDIR so that no need to redownload every time
export HF_CACHE_DIR=<directory-where-llms-are-downloaded>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Prepare SQL database
In this example, we will use the Chinook SQLite database. Run the commands below.
```
# Download data
cd $WORKDIR
git clone https://github.com/lerocha/chinook-database.git
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite $WORKDIR/GenAIExamples/AgentQnA/tests/
```
5. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
6. Launch multi-agent system
On Gaudi2 we will serve `meta-llama/Meta-Llama-3.1-70B-Instruct` using vllm.
First build vllm-gaudi docker image.
```bash
cd $WORKDIR
git clone https://github.com/vllm-project/vllm.git
cd ./vllm
git checkout v0.6.6
docker build --no-cache -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
```
Then launch vllm on Gaudi2 with the command below.
```bash
vllm_port=8086
vllm_volume=$HF_CACHE_DIR # you should have set this env var in previous step
model="meta-llama/Meta-Llama-3.1-70B-Instruct"
docker run -d --runtime=habana --rm --name "vllm-gaudi-server" -e HABANA_VISIBLE_DEVICES=0,1,2,3 -p $vllm_port:8000 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host opea/vllm-gaudi:latest --model ${model} --max-seq-len-to-capture 16384 --tensor-parallel-size 4
```
Then launch Agent microservices.
```bash
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
bash launch_agent_service_gaudi.sh
```
7. [Optional] Build `Agent` docker image if pulling images failed.
If docker image pulling failed in Step 6 above, build the agent docker image with the commands below. After image build, try Step 6 again.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent:latest -f comps/agent/src/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker RAG agent
docker logs rag-agent-endpoint
# worker SQL agent
docker logs sql-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker RAG agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "Michael Jackson song Thriller"
}'
```
Third, validate worker SQL agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "How many employees are in the company?"
}'
```
Finally, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"messages": "How many albums does Iron Maiden have?"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md).
This example showcases a hierarchical multi-agent system for question-answering applications. To deploy the example on Gaudi using open-source LLMs, refer to the deployment guide [here](../../../../README.md).

View File

@@ -0,0 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
supervisor-react-agent:
environment:
- tools=/home/user/tools/supervisor_agent_webtools.yaml
- GOOGLE_CSE_ID=${GOOGLE_CSE_ID}
- GOOGLE_API_KEY=${GOOGLE_API_KEY}

View File

@@ -97,3 +97,47 @@ services:
WORKER_AGENT_URL: $WORKER_AGENT_URL
SQL_AGENT_URL: $SQL_AGENT_URL
port: 9090
mock-api:
image: docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
container_name: mock-api
ports:
- "8080:8000"
ipc: host
agent-ui:
image: opea/agent-ui
container_name: agent-ui
volumes:
- ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env:/home/user/svelte/.env
environment:
host_ip: ${host_ip}
ports:
- "5173:5173"
ipc: host
vllm-service:
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
container_name: vllm-gaudi-server
ports:
- "8086:8000"
volumes:
- "./data:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
LLM_MODEL_ID: ${LLM_MODEL_ID}
VLLM_TORCH_PROFILER_DIR: "/mnt"
VLLM_SKIP_WARMUP: true
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:8086/health || exit 1"]
interval: 10s
timeout: 10s
retries: 100
runtime: habana
cap_add:
- SYS_NICE
ipc: host
command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 16384

View File

@@ -1,36 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
WORKPATH=$(dirname "$PWD")/..
# export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
# LLM related environment variables
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export LLM_MODEL_ID="meta-llama/Llama-3.3-70B-Instruct" #"meta-llama/Meta-Llama-3.1-70B-Instruct"
export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:8086"
export temperature=0
export max_new_tokens=4096
# agent related environment variables
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
echo "TOOLSET_PATH=${TOOLSET_PATH}"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export WORKER_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:9096/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:8080
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
docker compose -f compose.yaml up -d

View File

@@ -1,25 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# LLM related environment variables
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export LLM_MODEL_ID="meta-llama/Meta-Llama-3.1-70B-Instruct"
export NUM_SHARDS=4
docker compose -f tgi_gaudi.yaml up -d
sleep 5s
echo "Waiting tgi gaudi ready"
n=0
until [[ "$n" -ge 100 ]] || [[ $ready == true ]]; do
docker logs tgi-server &> tgi-gaudi-service.log
n=$((n+1))
if grep -q Connected tgi-gaudi-service.log; then
break
fi
sleep 5s
done
sleep 5s
echo "Service started successfully"

View File

@@ -0,0 +1,69 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
WORKPATH=$(dirname "$PWD")/..
# export WORKDIR=$WORKPATH/../../
if [[ -z "${WORKDIR}" ]]; then
echo "Please set WORKDIR environment variable"
exit 0
fi
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
# LLM related environment variables
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export LLM_MODEL_ID="meta-llama/Llama-3.3-70B-Instruct"
export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:8086"
export temperature=0
export max_new_tokens=4096
# agent related environment variables
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
echo "TOOLSET_PATH=${TOOLSET_PATH}"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export WORKER_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:9096/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:8080
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
if [ ! -f $WORKDIR/GenAIExamples/AgentQnA/tests/Chinook_Sqlite.sqlite ]; then
echo "Download Chinook_Sqlite!"
wget -O $WORKDIR/GenAIExamples/AgentQnA/tests/Chinook_Sqlite.sqlite https://github.com/lerocha/chinook-database/releases/download/v1.4.5/Chinook_Sqlite.sqlite
fi
# configure agent ui
echo "AGENT_URL = 'http://$ip_address:9090/v1/chat/completions'" | tee ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env
# retriever
export host_ip=$(hostname -I | awk '{print $1}')
export no_proxy=${no_proxy}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export RERANK_TYPE="tei"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
export no_proxy="$no_proxy,rag-agent-endpoint,sql-agent-endpoint,react-agent-endpoint,agent-ui,vllm-gaudi-server,jaeger,grafana,prometheus,127.0.0.1,localhost,0.0.0.0,$host_ip"