Adding files to deploy AgentQnA application on ROCm vLLM (#1613)

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
This commit is contained in:
chyundunovDatamonsters
2025-04-02 10:17:07 +07:00
committed by GitHub
parent 46a29cc253
commit 5cc047ce34
15 changed files with 1185 additions and 153 deletions

View File

@@ -1,101 +1,334 @@
# Single node on-prem deployment with Docker Compose on AMD GPU
# Build Mega Service of AgentQnA on AMD ROCm GPU
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
## Build Docker Images
## Deployment with docker
### 1. Build Docker Image
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
- #### Create application install directory and go to it:
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
```bash
mkdir ~/agentqna-install && cd agentqna-install
```
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
#OPANAI_API_KEY if you want to use OpenAI models
export OPENAI_API_KEY=<your-openai-key>
# Set AMD GPU settings
export AGENTQNA_CARD_ID="card1"
export AGENTQNA_RENDER_ID="renderD136"
```
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
```bash
git clone https://github.com/opea-project/GenAIExamples.git
```
First, launch the mega-service.
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
```bash
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
We remind you that when using a specific version of the code, you need to use the README from this version:
```
bash run_ingest_data.sh
```
- #### Go to build directory:
4. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
5. Launch `Agent` service
```bash
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_image_build
```
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_tgi_rocm.sh
```
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
6. [Optional] Build `Agent` docker image if pulling images failed.
```bash
echo Y | rm -R GenAIComps
```
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent:latest -f comps/agent/src/Dockerfile .
```
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
## Validate services
First look at logs of the agent docker containers:
```
# worker agent
docker logs rag-agent-endpoint
```bash
git clone https://github.com/opea-project/GenAIComps.git
```
```
# supervisor agent
docker logs react-agent-endpoint
We remind you that when using a specific version of the code, you need to use the README from this version.
- #### Setting the list of images for the build (from the build file.yaml)
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
#### vLLM-based application
```bash
service_list="vllm-rocm agent agent-ui"
```
#### TGI-based application
```bash
service_list="agent agent-ui"
```
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
```bash
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
```
- #### Build Docker Images
```bash
docker compose -f build.yaml build ${service_list} --no-cache
```
- #### Build DocIndexRetriever Docker Images
```bash
cd ~/agentqna-install/GenAIExamples/DocIndexRetriever/docker_image_build/
git clone https://github.com/opea-project/GenAIComps.git
service_list="doc-index-retriever dataprep embedding retriever reranking"
docker compose -f build.yaml build ${service_list} --no-cache
```
- #### Pull DocIndexRetriever Docker Images
```bash
docker pull redis/redis-stack:7.2.0-v9
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
```
After the build, we check the list of images with the command:
```bash
docker image ls
```
The list of images should include:
##### vLLM-based application:
- opea/vllm-rocm:latest
- opea/agent:latest
- redis/redis-stack:7.2.0-v9
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/embedding:latest
- opea/retriever:latest
- opea/reranking:latest
- opea/doc-index-retriever:latest
##### TGI-based application:
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- opea/agent:latest
- redis/redis-stack:7.2.0-v9
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/embedding:latest
- opea/retriever:latest
- opea/reranking:latest
- opea/doc-index-retriever:latest
---
## Deploy the AgentQnA Application
### Docker Compose Configuration for AMD GPUs
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
- compose_vllm.yaml - for vLLM-based application
- compose.yaml - for TGI-based
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:
Second, validate worker agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
Third, validate supervisor agent:
**How to Identify GPU Device IDs:**
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
### Set deploy environment variables
#### Setting variables in the operating system environment:
```bash
### Replace the string 'server_address' with your local server IP address
export host_ip='server_address'
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
### Replace the string 'your_langchain_api_key' with your LANGCHAIN API KEY.
export LANGCHAIN_API_KEY='your_langchain_api_key'
export LANGCHAIN_TRACING_V2=""
```
## How to register your own tools with agent
### Start the services:
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md).
#### If you use vLLM
```bash
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_vllm_rocm.sh
```
#### If you use TGI
```bash
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_tgi_rocm.sh
```
All containers should be running and should not restart:
##### If you use vLLM:
- dataprep-redis-server
- doc-index-retriever-server
- embedding-server
- rag-agent-endpoint
- react-agent-endpoint
- redis-vector-db
- reranking-tei-xeon-server
- retriever-redis-server
- sql-agent-endpoint
- tei-embedding-server
- tei-reranking-server
- vllm-service
##### If you use TGI:
- agentqna-tgi-service
- whisper-service
- speecht5-service
- agentqna-backend-server
- agentqna-ui-server
---
## Validate the Services
### 1. Validate the vLLM/TGI Service
#### If you use vLLM:
```bash
DATA='{"model": "Intel/neural-chat-7b-v3-3t", '\
'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}'
curl http://${HOST_IP}:${AUDIOQNA_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
"object": "chat.completion",
"created": 1742270316,
"model": "Intel/neural-chat-7b-v3-3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
"prompt_logprobs": null
}
```
If the service response has a meaningful response in the value of the "choices.message.content" key,
then we consider the vLLM service to be successfully launched
#### If you use TGI:
```bash
DATA='{"inputs":"What is Deep Learning?",'\
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
curl http://${HOST_IP}:${AUDIOQNA_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"generated_text": " "
}
```
If the service response has a meaningful response in the value of the "generated_text" key,
then we consider the TGI service to be successfully launched
### 2. Validate MegaServices
Test the AgentQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
to the response, decode the base64 string and save it as a .wav file.
```bash
# voice can be "default" or "male"
curl http://${host_ip}:3008/v1/agentqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```
### 3. Validate MicroServices
```bash
# whisper service
curl http://${host_ip}:7066/v1/asr \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
```
### 4. Stop application
#### If you use vLLM
```bash
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use TGI
```bash
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
```

View File

@@ -1,26 +1,24 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# Copyright (C) 2025 Advanced Micro Devices, Inc.
services:
agent-tgi-server:
image: ${AGENTQNA_TGI_IMAGE}
container_name: agent-tgi-server
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:3.0.0-rocm
container_name: tgi-service
ports:
- "${AGENTQNA_TGI_SERVICE_PORT-8085}:80"
- "${TGI_SERVICE_PORT-8085}:80"
volumes:
- ${HF_CACHE_DIR:-/var/opea/agent-service/}:/data
- "${MODEL_CACHE:-./data}:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${AGENTQNA_TGI_SERVICE_PORT}"
TGI_LLM_ENDPOINT: "http://${ip_address}:${TGI_SERVICE_PORT}"
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
shm_size: 1g
shm_size: 32g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/${AGENTQNA_CARD_ID}:/dev/dri/${AGENTQNA_CARD_ID}
- /dev/dri/${AGENTQNA_RENDER_ID}:/dev/dri/${AGENTQNA_RENDER_ID}
- /dev/dri:/dev/dri
cap_add:
- SYS_PTRACE
group_add:
@@ -34,14 +32,14 @@ services:
image: opea/agent:latest
container_name: rag-agent-endpoint
volumes:
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
- ${TOOLSET_PATH}:/home/user/tools/
- "${TOOLSET_PATH}:/home/user/tools/"
ports:
- "9095:9095"
- "${WORKER_RAG_AGENT_PORT:-9095}:9095"
ipc: host
environment:
ip_address: ${ip_address}
strategy: rag_agent_llama
with_memory: false
recursion_limit: ${recursion_limit_worker}
llm_engine: tgi
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
@@ -61,21 +59,49 @@ services:
LANGCHAIN_PROJECT: "opea-worker-agent-service"
port: 9095
worker-sql-agent:
image: opea/agent:latest
container_name: sql-agent-endpoint
volumes:
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
ports:
- "${WORKER_SQL_AGENT_PORT:-9096}:9096"
ipc: host
environment:
ip_address: ${ip_address}
strategy: sql_agent_llama
with_memory: false
db_name: ${db_name}
db_path: ${db_path}
use_hints: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
stream: false
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
port: 9096
supervisor-react-agent:
image: opea/agent:latest
container_name: react-agent-endpoint
depends_on:
- agent-tgi-server
- worker-rag-agent
volumes:
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
- ${TOOLSET_PATH}:/home/user/tools/
- "${TOOLSET_PATH}:/home/user/tools/"
ports:
- "${AGENTQNA_FRONTEND_PORT}:9090"
- "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_langgraph
strategy: react_llama
with_memory: true
recursion_limit: ${recursion_limit_supervisor}
llm_engine: tgi
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
@@ -83,7 +109,7 @@ services:
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
stream: false
stream: true
tools: /home/user/tools/supervisor_agent_tools.yaml
require_human_feedback: false
no_proxy: ${no_proxy}
@@ -92,6 +118,7 @@ services:
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
CRAG_SERVER: $CRAG_SERVER
WORKER_AGENT_URL: $WORKER_AGENT_URL
CRAG_SERVER: ${CRAG_SERVER}
WORKER_AGENT_URL: ${WORKER_AGENT_URL}
SQL_AGENT_URL: ${SQL_AGENT_URL}
port: 9090

View File

@@ -0,0 +1,128 @@
# Copyright (C) 2025 Advanced Micro Devices, Inc.
services:
vllm-service:
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
container_name: vllm-service
ports:
- "${VLLM_SERVICE_PORT:-8081}:8011"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
WILM_USE_TRITON_FLASH_ATTENTION: 0
PYTORCH_JIT: 0
volumes:
- "${MODEL_CACHE:-./data}:/data"
shm_size: 20G
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
- apparmor=unconfined
command: "--model ${VLLM_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\""
ipc: host
worker-rag-agent:
image: opea/agent:latest
container_name: rag-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
ports:
- "${WORKER_RAG_AGENT_PORT:-9095}:9095"
ipc: host
environment:
ip_address: ${ip_address}
strategy: rag_agent_llama
with_memory: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
stream: false
tools: /home/user/tools/worker_agent_tools.yaml
require_human_feedback: false
RETRIEVAL_TOOL_URL: ${RETRIEVAL_TOOL_URL}
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-worker-agent-service"
port: 9095
worker-sql-agent:
image: opea/agent:latest
container_name: sql-agent-endpoint
volumes:
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
ports:
- "${WORKER_SQL_AGENT_PORT:-9096}:9096"
ipc: host
environment:
ip_address: ${ip_address}
strategy: sql_agent_llama
with_memory: false
db_name: ${db_name}
db_path: ${db_path}
use_hints: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
stream: false
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
port: 9096
supervisor-react-agent:
image: opea/agent:latest
container_name: react-agent-endpoint
depends_on:
- worker-rag-agent
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
ports:
- "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: true
recursion_limit: ${recursion_limit_supervisor}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
stream: true
tools: /home/user/tools/supervisor_agent_tools.yaml
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
CRAG_SERVER: ${CRAG_SERVER}
WORKER_AGENT_URL: ${WORKER_AGENT_URL}
SQL_AGENT_URL: ${SQL_AGENT_URL}
port: 9090

View File

@@ -1,47 +1,87 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
WORKPATH=$(dirname "$PWD")/..
export ip_address=${host_ip}
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.4.1-rocm
export AGENTQNA_TGI_SERVICE_PORT="8085"
# Before start script:
# export host_ip="your_host_ip_or_host_name"
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
# export LANGCHAIN_API_KEY="your_langchain_api_key"
# export LANGCHAIN_TRACING_V2=""
# LLM related environment variables
export AGENTQNA_CARD_ID="card1"
export AGENTQNA_RENDER_ID="renderD136"
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
#export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
# Set server hostname or IP address
export ip_address=${host_ip}
# Set services IP ports
export TGI_SERVICE_PORT="18110"
export WORKER_RAG_AGENT_PORT="18111"
export WORKER_SQL_AGENT_PORT="18112"
export SUPERVISOR_REACT_AGENT_PORT="18113"
export CRAG_SERVER_PORT="18114"
export WORKPATH=$(dirname "$PWD")
export WORKDIR=${WORKPATH}/../../../
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export HF_CACHE_DIR="./data"
export MODEL_CACHE="./data"
export TOOLSET_PATH=${WORKPATH}/../../../tools/
export recursion_limit_worker=12
export LLM_ENDPOINT_URL=http://${ip_address}:${TGI_SERVICE_PORT}
export temperature=0.01
export max_new_tokens=512
# agent related environment variables
export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
export TOOLSET_PATH=/home/huggingface/datamonsters/amd-opea/GenAIExamples/AgentQnA/tools/
echo "TOOLSET_PATH=${TOOLSET_PATH}"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:18881
export AGENTQNA_FRONTEND_PORT="9090"
#retrieval_tool
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
export HF_CACHE_DIR=${HF_CACHE_DIR}
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export no_proxy=${no_proxy}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:26379"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export RERANK_TYPE="tei"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
echo "Downloading chinook data..."
echo Y | rm -R chinook-database
git clone https://github.com/lerocha/chinook-database.git
echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
docker compose -f compose.yaml up -d
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-service > ${WORKPATH}/tgi_service_start.log
if grep -q Connected ${WORKPATH}/tgi_service_start.log; then
break
fi
sleep 10s
n=$((n+1))
done
echo "Starting CRAG server"
docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0

View File

@@ -0,0 +1,88 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
# Before start script:
# export host_ip="your_host_ip_or_host_name"
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
# export LANGCHAIN_API_KEY="your_langchain_api_key"
# export LANGCHAIN_TRACING_V2=""
# Set server hostname or IP address
export ip_address=${host_ip}
# Set services IP ports
export VLLM_SERVICE_PORT="18110"
export WORKER_RAG_AGENT_PORT="18111"
export WORKER_SQL_AGENT_PORT="18112"
export SUPERVISOR_REACT_AGENT_PORT="18113"
export CRAG_SERVER_PORT="18114"
export WORKPATH=$(dirname "$PWD")
export WORKDIR=${WORKPATH}/../../../
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export HF_CACHE_DIR="./data"
export MODEL_CACHE="./data"
export TOOLSET_PATH=${WORKPATH}/../../../tools/
export recursion_limit_worker=12
export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
export temperature=0.01
export max_new_tokens=512
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
export HF_CACHE_DIR=${HF_CACHE_DIR}
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export no_proxy=${no_proxy}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export RERANK_TYPE="tei"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
echo "Downloading chinook data..."
echo Y | rm -R chinook-database
git clone https://github.com/lerocha/chinook-database.git
echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
docker compose -f compose_vllm.yaml up -d
n=0
until [[ "$n" -ge 500 ]]; do
docker logs vllm-service >& "${WORKPATH}"/vllm-service_start.log
if grep -q "Application startup complete" "${WORKPATH}"/vllm-service_start.log; then
break
fi
sleep 20s
n=$((n+1))
done
echo "Starting CRAG server"
docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0

View File

@@ -14,7 +14,7 @@ export AGENTQNA_CARD_ID="card1"
export AGENTQNA_RENDER_ID="renderD136"
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
export temperature=0.01
@@ -44,3 +44,19 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
echo "Removing chinook data..."
echo Y | rm -R chinook-database
if [ -d "chinook-database" ]; then
rm -rf chinook-database
fi
echo "Chinook data removed!"
echo "Stopping CRAG server"
docker rm kdd-cup-24-crag-service --force
echo "Stopping Agent services"
docker compose -f compose.yaml down
echo "Stopping Retrieval services"
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down

View File

@@ -0,0 +1,84 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
# Before start script:
# export host_ip="your_host_ip_or_host_name"
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
# export LANGCHAIN_API_KEY="your_langchain_api_key"
# export LANGCHAIN_TRACING_V2=""
# Set server hostname or IP address
export ip_address=${host_ip}
# Set services IP ports
export VLLM_SERVICE_PORT="18110"
export WORKER_RAG_AGENT_PORT="18111"
export WORKER_SQL_AGENT_PORT="18112"
export SUPERVISOR_REACT_AGENT_PORT="18113"
export CRAG_SERVER_PORT="18114"
export WORKPATH=$(dirname "$PWD")
export WORKDIR=${WORKPATH}/../../../
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export HF_CACHE_DIR="./data"
export MODEL_CACHE="./data"
export TOOLSET_PATH=${WORKPATH}/../../../tools/
export recursion_limit_worker=12
export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
export temperature=0.01
export max_new_tokens=512
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
export db_name=Chinook
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
export HF_CACHE_DIR=${HF_CACHE_DIR}
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export no_proxy=${no_proxy}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export RERANK_TYPE="tei"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
echo "Removing chinook data..."
echo Y | rm -R chinook-database
if [ -d "chinook-database" ]; then
rm -rf chinook-database
fi
echo "Chinook data removed!"
echo "Stopping CRAG server"
docker rm kdd-cup-24-crag-service --force
echo "Stopping Agent services"
docker compose -f compose_vllm.yaml down
echo "Stopping Retrieval services"
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down