Adding files to deploy AgentQnA application on ROCm vLLM (#1613)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
This commit is contained in:
committed by
GitHub
parent
46a29cc253
commit
5cc047ce34
@@ -1,101 +1,334 @@
|
||||
# Single node on-prem deployment with Docker Compose on AMD GPU
|
||||
# Build Mega Service of AgentQnA on AMD ROCm GPU
|
||||
|
||||
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
|
||||
## Build Docker Images
|
||||
|
||||
## Deployment with docker
|
||||
### 1. Build Docker Image
|
||||
|
||||
1. First, clone this repo.
|
||||
```
|
||||
export WORKDIR=<your-work-directory>
|
||||
cd $WORKDIR
|
||||
git clone https://github.com/opea-project/GenAIExamples.git
|
||||
```
|
||||
2. Set up environment for this example </br>
|
||||
- #### Create application install directory and go to it:
|
||||
|
||||
```
|
||||
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
|
||||
export host_ip=$(hostname -I | awk '{print $1}')
|
||||
# if you are in a proxy environment, also set the proxy-related environment variables
|
||||
export http_proxy="Your_HTTP_Proxy"
|
||||
export https_proxy="Your_HTTPs_Proxy"
|
||||
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
|
||||
export no_proxy="Your_No_Proxy"
|
||||
```bash
|
||||
mkdir ~/agentqna-install && cd agentqna-install
|
||||
```
|
||||
|
||||
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
|
||||
#OPANAI_API_KEY if you want to use OpenAI models
|
||||
export OPENAI_API_KEY=<your-openai-key>
|
||||
# Set AMD GPU settings
|
||||
export AGENTQNA_CARD_ID="card1"
|
||||
export AGENTQNA_RENDER_ID="renderD136"
|
||||
```
|
||||
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
|
||||
|
||||
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
|
||||
```bash
|
||||
git clone https://github.com/opea-project/GenAIExamples.git
|
||||
```
|
||||
|
||||
First, launch the mega-service.
|
||||
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
|
||||
|
||||
```
|
||||
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
|
||||
bash launch_retrieval_tool.sh
|
||||
```
|
||||
```bash
|
||||
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
|
||||
```
|
||||
|
||||
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
|
||||
We remind you that when using a specific version of the code, you need to use the README from this version:
|
||||
|
||||
```
|
||||
bash run_ingest_data.sh
|
||||
```
|
||||
- #### Go to build directory:
|
||||
|
||||
4. Launch Tool service
|
||||
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
|
||||
```
|
||||
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
|
||||
```
|
||||
5. Launch `Agent` service
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_image_build
|
||||
```
|
||||
|
||||
```
|
||||
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
|
||||
bash launch_agent_service_tgi_rocm.sh
|
||||
```
|
||||
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
|
||||
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
|
||||
|
||||
6. [Optional] Build `Agent` docker image if pulling images failed.
|
||||
```bash
|
||||
echo Y | rm -R GenAIComps
|
||||
```
|
||||
|
||||
```
|
||||
git clone https://github.com/opea-project/GenAIComps.git
|
||||
cd GenAIComps
|
||||
docker build -t opea/agent:latest -f comps/agent/src/Dockerfile .
|
||||
```
|
||||
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
|
||||
|
||||
## Validate services
|
||||
|
||||
First look at logs of the agent docker containers:
|
||||
|
||||
```
|
||||
# worker agent
|
||||
docker logs rag-agent-endpoint
|
||||
```bash
|
||||
git clone https://github.com/opea-project/GenAIComps.git
|
||||
```
|
||||
|
||||
```
|
||||
# supervisor agent
|
||||
docker logs react-agent-endpoint
|
||||
We remind you that when using a specific version of the code, you need to use the README from this version.
|
||||
|
||||
- #### Setting the list of images for the build (from the build file.yaml)
|
||||
|
||||
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
|
||||
|
||||
#### vLLM-based application
|
||||
|
||||
```bash
|
||||
service_list="vllm-rocm agent agent-ui"
|
||||
```
|
||||
|
||||
#### TGI-based application
|
||||
|
||||
```bash
|
||||
service_list="agent agent-ui"
|
||||
```
|
||||
|
||||
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
|
||||
```
|
||||
|
||||
- #### Build Docker Images
|
||||
|
||||
```bash
|
||||
docker compose -f build.yaml build ${service_list} --no-cache
|
||||
```
|
||||
|
||||
- #### Build DocIndexRetriever Docker Images
|
||||
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/DocIndexRetriever/docker_image_build/
|
||||
git clone https://github.com/opea-project/GenAIComps.git
|
||||
service_list="doc-index-retriever dataprep embedding retriever reranking"
|
||||
docker compose -f build.yaml build ${service_list} --no-cache
|
||||
```
|
||||
|
||||
- #### Pull DocIndexRetriever Docker Images
|
||||
|
||||
```bash
|
||||
docker pull redis/redis-stack:7.2.0-v9
|
||||
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
```
|
||||
|
||||
After the build, we check the list of images with the command:
|
||||
|
||||
```bash
|
||||
docker image ls
|
||||
```
|
||||
|
||||
The list of images should include:
|
||||
|
||||
##### vLLM-based application:
|
||||
|
||||
- opea/vllm-rocm:latest
|
||||
- opea/agent:latest
|
||||
- redis/redis-stack:7.2.0-v9
|
||||
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
- opea/embedding:latest
|
||||
- opea/retriever:latest
|
||||
- opea/reranking:latest
|
||||
- opea/doc-index-retriever:latest
|
||||
|
||||
##### TGI-based application:
|
||||
|
||||
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
|
||||
- opea/agent:latest
|
||||
- redis/redis-stack:7.2.0-v9
|
||||
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
- opea/embedding:latest
|
||||
- opea/retriever:latest
|
||||
- opea/reranking:latest
|
||||
- opea/doc-index-retriever:latest
|
||||
|
||||
---
|
||||
|
||||
## Deploy the AgentQnA Application
|
||||
|
||||
### Docker Compose Configuration for AMD GPUs
|
||||
|
||||
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
|
||||
|
||||
- compose_vllm.yaml - for vLLM-based application
|
||||
- compose.yaml - for TGI-based
|
||||
|
||||
```yaml
|
||||
shm_size: 1g
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri:/dev/dri
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
group_add:
|
||||
- video
|
||||
security_opt:
|
||||
- seccomp:unconfined
|
||||
```
|
||||
|
||||
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
|
||||
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:
|
||||
|
||||
Second, validate worker agent:
|
||||
|
||||
```
|
||||
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
|
||||
"query": "Most recent album by Taylor Swift"
|
||||
}'
|
||||
```yaml
|
||||
shm_size: 1g
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri/card0:/dev/dri/card0
|
||||
- /dev/dri/render128:/dev/dri/render128
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
group_add:
|
||||
- video
|
||||
security_opt:
|
||||
- seccomp:unconfined
|
||||
```
|
||||
|
||||
Third, validate supervisor agent:
|
||||
**How to Identify GPU Device IDs:**
|
||||
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
|
||||
|
||||
```
|
||||
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
|
||||
"query": "Most recent album by Taylor Swift"
|
||||
}'
|
||||
### Set deploy environment variables
|
||||
|
||||
#### Setting variables in the operating system environment:
|
||||
|
||||
```bash
|
||||
### Replace the string 'server_address' with your local server IP address
|
||||
export host_ip='server_address'
|
||||
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
|
||||
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
|
||||
### Replace the string 'your_langchain_api_key' with your LANGCHAIN API KEY.
|
||||
export LANGCHAIN_API_KEY='your_langchain_api_key'
|
||||
export LANGCHAIN_TRACING_V2=""
|
||||
```
|
||||
|
||||
## How to register your own tools with agent
|
||||
### Start the services:
|
||||
|
||||
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md).
|
||||
#### If you use vLLM
|
||||
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
|
||||
bash launch_agent_service_vllm_rocm.sh
|
||||
```
|
||||
|
||||
#### If you use TGI
|
||||
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
|
||||
bash launch_agent_service_tgi_rocm.sh
|
||||
```
|
||||
|
||||
All containers should be running and should not restart:
|
||||
|
||||
##### If you use vLLM:
|
||||
|
||||
- dataprep-redis-server
|
||||
- doc-index-retriever-server
|
||||
- embedding-server
|
||||
- rag-agent-endpoint
|
||||
- react-agent-endpoint
|
||||
- redis-vector-db
|
||||
- reranking-tei-xeon-server
|
||||
- retriever-redis-server
|
||||
- sql-agent-endpoint
|
||||
- tei-embedding-server
|
||||
- tei-reranking-server
|
||||
- vllm-service
|
||||
|
||||
##### If you use TGI:
|
||||
|
||||
- agentqna-tgi-service
|
||||
- whisper-service
|
||||
- speecht5-service
|
||||
- agentqna-backend-server
|
||||
- agentqna-ui-server
|
||||
|
||||
---
|
||||
|
||||
## Validate the Services
|
||||
|
||||
### 1. Validate the vLLM/TGI Service
|
||||
|
||||
#### If you use vLLM:
|
||||
|
||||
```bash
|
||||
DATA='{"model": "Intel/neural-chat-7b-v3-3t", '\
|
||||
'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}'
|
||||
|
||||
curl http://${HOST_IP}:${AUDIOQNA_VLLM_SERVICE_PORT}/v1/chat/completions \
|
||||
-X POST \
|
||||
-d "$DATA" \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
Checking the response from the service. The response should be similar to JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-142f34ef35b64a8db3deedd170fed951",
|
||||
"object": "chat.completion",
|
||||
"created": 1742270316,
|
||||
"model": "Intel/neural-chat-7b-v3-3",
|
||||
"choices": [
|
||||
{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": []
|
||||
},
|
||||
"logprobs": null,
|
||||
"finish_reason": "length",
|
||||
"stop_reason": null
|
||||
}
|
||||
],
|
||||
"usage": { "prompt_tokens": 66, "total_tokens": 322, "completion_tokens": 256, "prompt_tokens_details": null },
|
||||
"prompt_logprobs": null
|
||||
}
|
||||
```
|
||||
|
||||
If the service response has a meaningful response in the value of the "choices.message.content" key,
|
||||
then we consider the vLLM service to be successfully launched
|
||||
|
||||
#### If you use TGI:
|
||||
|
||||
```bash
|
||||
DATA='{"inputs":"What is Deep Learning?",'\
|
||||
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
|
||||
|
||||
curl http://${HOST_IP}:${AUDIOQNA_TGI_SERVICE_PORT}/generate \
|
||||
-X POST \
|
||||
-d "$DATA" \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
Checking the response from the service. The response should be similar to JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"generated_text": " "
|
||||
}
|
||||
```
|
||||
|
||||
If the service response has a meaningful response in the value of the "generated_text" key,
|
||||
then we consider the TGI service to be successfully launched
|
||||
|
||||
### 2. Validate MegaServices
|
||||
|
||||
Test the AgentQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
|
||||
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
|
||||
to the response, decode the base64 string and save it as a .wav file.
|
||||
|
||||
```bash
|
||||
# voice can be "default" or "male"
|
||||
curl http://${host_ip}:3008/v1/agentqna \
|
||||
-X POST \
|
||||
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
|
||||
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
|
||||
```
|
||||
|
||||
### 3. Validate MicroServices
|
||||
|
||||
```bash
|
||||
# whisper service
|
||||
curl http://${host_ip}:7066/v1/asr \
|
||||
-X POST \
|
||||
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
|
||||
-H 'Content-Type: application/json'
|
||||
|
||||
# speecht5 service
|
||||
curl http://${host_ip}:7055/v1/tts \
|
||||
-X POST \
|
||||
-d '{"text": "Who are you?"}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
### 4. Stop application
|
||||
|
||||
#### If you use vLLM
|
||||
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
|
||||
docker compose -f compose_vllm.yaml down
|
||||
```
|
||||
|
||||
#### If you use TGI
|
||||
|
||||
```bash
|
||||
cd ~/agentqna-install/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
|
||||
docker compose -f compose.yaml down
|
||||
```
|
||||
|
||||
@@ -1,26 +1,24 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
# Copyright (C) 2025 Advanced Micro Devices, Inc.
|
||||
|
||||
services:
|
||||
agent-tgi-server:
|
||||
image: ${AGENTQNA_TGI_IMAGE}
|
||||
container_name: agent-tgi-server
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/text-generation-inference:3.0.0-rocm
|
||||
container_name: tgi-service
|
||||
ports:
|
||||
- "${AGENTQNA_TGI_SERVICE_PORT-8085}:80"
|
||||
- "${TGI_SERVICE_PORT-8085}:80"
|
||||
volumes:
|
||||
- ${HF_CACHE_DIR:-/var/opea/agent-service/}:/data
|
||||
- "${MODEL_CACHE:-./data}:/data"
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${AGENTQNA_TGI_SERVICE_PORT}"
|
||||
TGI_LLM_ENDPOINT: "http://${ip_address}:${TGI_SERVICE_PORT}"
|
||||
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
shm_size: 1g
|
||||
shm_size: 32g
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri/${AGENTQNA_CARD_ID}:/dev/dri/${AGENTQNA_CARD_ID}
|
||||
- /dev/dri/${AGENTQNA_RENDER_ID}:/dev/dri/${AGENTQNA_RENDER_ID}
|
||||
- /dev/dri:/dev/dri
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
group_add:
|
||||
@@ -34,14 +32,14 @@ services:
|
||||
image: opea/agent:latest
|
||||
container_name: rag-agent-endpoint
|
||||
volumes:
|
||||
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
|
||||
- ${TOOLSET_PATH}:/home/user/tools/
|
||||
- "${TOOLSET_PATH}:/home/user/tools/"
|
||||
ports:
|
||||
- "9095:9095"
|
||||
- "${WORKER_RAG_AGENT_PORT:-9095}:9095"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: rag_agent_llama
|
||||
with_memory: false
|
||||
recursion_limit: ${recursion_limit_worker}
|
||||
llm_engine: tgi
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
@@ -61,21 +59,49 @@ services:
|
||||
LANGCHAIN_PROJECT: "opea-worker-agent-service"
|
||||
port: 9095
|
||||
|
||||
worker-sql-agent:
|
||||
image: opea/agent:latest
|
||||
container_name: sql-agent-endpoint
|
||||
volumes:
|
||||
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
|
||||
ports:
|
||||
- "${WORKER_SQL_AGENT_PORT:-9096}:9096"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: sql_agent_llama
|
||||
with_memory: false
|
||||
db_name: ${db_name}
|
||||
db_path: ${db_path}
|
||||
use_hints: false
|
||||
recursion_limit: ${recursion_limit_worker}
|
||||
llm_engine: vllm
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
llm_endpoint_url: ${LLM_ENDPOINT_URL}
|
||||
model: ${LLM_MODEL_ID}
|
||||
temperature: ${temperature}
|
||||
max_new_tokens: ${max_new_tokens}
|
||||
stream: false
|
||||
require_human_feedback: false
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
port: 9096
|
||||
|
||||
supervisor-react-agent:
|
||||
image: opea/agent:latest
|
||||
container_name: react-agent-endpoint
|
||||
depends_on:
|
||||
- agent-tgi-server
|
||||
- worker-rag-agent
|
||||
volumes:
|
||||
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
|
||||
- ${TOOLSET_PATH}:/home/user/tools/
|
||||
- "${TOOLSET_PATH}:/home/user/tools/"
|
||||
ports:
|
||||
- "${AGENTQNA_FRONTEND_PORT}:9090"
|
||||
- "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: react_langgraph
|
||||
strategy: react_llama
|
||||
with_memory: true
|
||||
recursion_limit: ${recursion_limit_supervisor}
|
||||
llm_engine: tgi
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
@@ -83,7 +109,7 @@ services:
|
||||
model: ${LLM_MODEL_ID}
|
||||
temperature: ${temperature}
|
||||
max_new_tokens: ${max_new_tokens}
|
||||
stream: false
|
||||
stream: true
|
||||
tools: /home/user/tools/supervisor_agent_tools.yaml
|
||||
require_human_feedback: false
|
||||
no_proxy: ${no_proxy}
|
||||
@@ -92,6 +118,7 @@ services:
|
||||
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
|
||||
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
|
||||
LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
|
||||
CRAG_SERVER: $CRAG_SERVER
|
||||
WORKER_AGENT_URL: $WORKER_AGENT_URL
|
||||
CRAG_SERVER: ${CRAG_SERVER}
|
||||
WORKER_AGENT_URL: ${WORKER_AGENT_URL}
|
||||
SQL_AGENT_URL: ${SQL_AGENT_URL}
|
||||
port: 9090
|
||||
|
||||
128
AgentQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml
Normal file
128
AgentQnA/docker_compose/amd/gpu/rocm/compose_vllm.yaml
Normal file
@@ -0,0 +1,128 @@
|
||||
# Copyright (C) 2025 Advanced Micro Devices, Inc.
|
||||
|
||||
services:
|
||||
vllm-service:
|
||||
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}
|
||||
container_name: vllm-service
|
||||
ports:
|
||||
- "${VLLM_SERVICE_PORT:-8081}:8011"
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
WILM_USE_TRITON_FLASH_ATTENTION: 0
|
||||
PYTORCH_JIT: 0
|
||||
volumes:
|
||||
- "${MODEL_CACHE:-./data}:/data"
|
||||
shm_size: 20G
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri/:/dev/dri/
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
group_add:
|
||||
- video
|
||||
security_opt:
|
||||
- seccomp:unconfined
|
||||
- apparmor=unconfined
|
||||
command: "--model ${VLLM_LLM_MODEL_ID} --swap-space 16 --disable-log-requests --dtype float16 --tensor-parallel-size 4 --host 0.0.0.0 --port 8011 --num-scheduler-steps 1 --distributed-executor-backend \"mp\""
|
||||
ipc: host
|
||||
|
||||
worker-rag-agent:
|
||||
image: opea/agent:latest
|
||||
container_name: rag-agent-endpoint
|
||||
volumes:
|
||||
- ${TOOLSET_PATH}:/home/user/tools/
|
||||
ports:
|
||||
- "${WORKER_RAG_AGENT_PORT:-9095}:9095"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: rag_agent_llama
|
||||
with_memory: false
|
||||
recursion_limit: ${recursion_limit_worker}
|
||||
llm_engine: vllm
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
llm_endpoint_url: ${LLM_ENDPOINT_URL}
|
||||
model: ${LLM_MODEL_ID}
|
||||
temperature: ${temperature}
|
||||
max_new_tokens: ${max_new_tokens}
|
||||
stream: false
|
||||
tools: /home/user/tools/worker_agent_tools.yaml
|
||||
require_human_feedback: false
|
||||
RETRIEVAL_TOOL_URL: ${RETRIEVAL_TOOL_URL}
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
|
||||
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
|
||||
LANGCHAIN_PROJECT: "opea-worker-agent-service"
|
||||
port: 9095
|
||||
|
||||
worker-sql-agent:
|
||||
image: opea/agent:latest
|
||||
container_name: sql-agent-endpoint
|
||||
volumes:
|
||||
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
|
||||
ports:
|
||||
- "${WORKER_SQL_AGENT_PORT:-9096}:9096"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: sql_agent_llama
|
||||
with_memory: false
|
||||
db_name: ${db_name}
|
||||
db_path: ${db_path}
|
||||
use_hints: false
|
||||
recursion_limit: ${recursion_limit_worker}
|
||||
llm_engine: vllm
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
llm_endpoint_url: ${LLM_ENDPOINT_URL}
|
||||
model: ${LLM_MODEL_ID}
|
||||
temperature: ${temperature}
|
||||
max_new_tokens: ${max_new_tokens}
|
||||
stream: false
|
||||
require_human_feedback: false
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
port: 9096
|
||||
|
||||
supervisor-react-agent:
|
||||
image: opea/agent:latest
|
||||
container_name: react-agent-endpoint
|
||||
depends_on:
|
||||
- worker-rag-agent
|
||||
volumes:
|
||||
- ${TOOLSET_PATH}:/home/user/tools/
|
||||
ports:
|
||||
- "${SUPERVISOR_REACT_AGENT_PORT:-9090}:9090"
|
||||
ipc: host
|
||||
environment:
|
||||
ip_address: ${ip_address}
|
||||
strategy: react_llama
|
||||
with_memory: true
|
||||
recursion_limit: ${recursion_limit_supervisor}
|
||||
llm_engine: vllm
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
llm_endpoint_url: ${LLM_ENDPOINT_URL}
|
||||
model: ${LLM_MODEL_ID}
|
||||
temperature: ${temperature}
|
||||
max_new_tokens: ${max_new_tokens}
|
||||
stream: true
|
||||
tools: /home/user/tools/supervisor_agent_tools.yaml
|
||||
require_human_feedback: false
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
|
||||
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
|
||||
LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
|
||||
CRAG_SERVER: ${CRAG_SERVER}
|
||||
WORKER_AGENT_URL: ${WORKER_AGENT_URL}
|
||||
SQL_AGENT_URL: ${SQL_AGENT_URL}
|
||||
port: 9090
|
||||
@@ -1,47 +1,87 @@
|
||||
# Copyright (C) 2024 Advanced Micro Devices, Inc.
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
WORKPATH=$(dirname "$PWD")/..
|
||||
export ip_address=${host_ip}
|
||||
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
|
||||
export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.4.1-rocm
|
||||
export AGENTQNA_TGI_SERVICE_PORT="8085"
|
||||
# Before start script:
|
||||
# export host_ip="your_host_ip_or_host_name"
|
||||
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
|
||||
# export LANGCHAIN_API_KEY="your_langchain_api_key"
|
||||
# export LANGCHAIN_TRACING_V2=""
|
||||
|
||||
# LLM related environment variables
|
||||
export AGENTQNA_CARD_ID="card1"
|
||||
export AGENTQNA_RENDER_ID="renderD136"
|
||||
export HF_CACHE_DIR=${HF_CACHE_DIR}
|
||||
ls $HF_CACHE_DIR
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
#export NUM_SHARDS=4
|
||||
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
|
||||
# Set server hostname or IP address
|
||||
export ip_address=${host_ip}
|
||||
|
||||
# Set services IP ports
|
||||
export TGI_SERVICE_PORT="18110"
|
||||
export WORKER_RAG_AGENT_PORT="18111"
|
||||
export WORKER_SQL_AGENT_PORT="18112"
|
||||
export SUPERVISOR_REACT_AGENT_PORT="18113"
|
||||
export CRAG_SERVER_PORT="18114"
|
||||
|
||||
export WORKPATH=$(dirname "$PWD")
|
||||
export WORKDIR=${WORKPATH}/../../../
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export HF_CACHE_DIR="./data"
|
||||
export MODEL_CACHE="./data"
|
||||
export TOOLSET_PATH=${WORKPATH}/../../../tools/
|
||||
export recursion_limit_worker=12
|
||||
export LLM_ENDPOINT_URL=http://${ip_address}:${TGI_SERVICE_PORT}
|
||||
export temperature=0.01
|
||||
export max_new_tokens=512
|
||||
|
||||
# agent related environment variables
|
||||
export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
|
||||
export TOOLSET_PATH=/home/huggingface/datamonsters/amd-opea/GenAIExamples/AgentQnA/tools/
|
||||
echo "TOOLSET_PATH=${TOOLSET_PATH}"
|
||||
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
|
||||
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
|
||||
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
|
||||
export db_name=Chinook
|
||||
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
|
||||
export recursion_limit_worker=12
|
||||
export recursion_limit_supervisor=10
|
||||
export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
|
||||
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
|
||||
export CRAG_SERVER=http://${ip_address}:18881
|
||||
|
||||
export AGENTQNA_FRONTEND_PORT="9090"
|
||||
|
||||
#retrieval_tool
|
||||
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
|
||||
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
|
||||
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
|
||||
export HF_CACHE_DIR=${HF_CACHE_DIR}
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export no_proxy=${no_proxy}
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${https_proxy}
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
|
||||
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
|
||||
export REDIS_URL="redis://${host_ip}:26379"
|
||||
export REDIS_URL="redis://${host_ip}:6379"
|
||||
export INDEX_NAME="rag-redis"
|
||||
export RERANK_TYPE="tei"
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
|
||||
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
|
||||
export RERANK_SERVICE_HOST_IP=${host_ip}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
|
||||
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
|
||||
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
|
||||
|
||||
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
|
||||
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
|
||||
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
|
||||
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
|
||||
|
||||
echo "Downloading chinook data..."
|
||||
echo Y | rm -R chinook-database
|
||||
git clone https://github.com/lerocha/chinook-database.git
|
||||
echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
|
||||
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
|
||||
|
||||
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
|
||||
docker compose -f compose.yaml up -d
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 100 ]]; do
|
||||
docker logs tgi-service > ${WORKPATH}/tgi_service_start.log
|
||||
if grep -q Connected ${WORKPATH}/tgi_service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 10s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
echo "Starting CRAG server"
|
||||
docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
|
||||
|
||||
@@ -0,0 +1,88 @@
|
||||
# Copyright (C) 2024 Advanced Micro Devices, Inc.
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
# Before start script:
|
||||
# export host_ip="your_host_ip_or_host_name"
|
||||
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
|
||||
# export LANGCHAIN_API_KEY="your_langchain_api_key"
|
||||
# export LANGCHAIN_TRACING_V2=""
|
||||
|
||||
# Set server hostname or IP address
|
||||
export ip_address=${host_ip}
|
||||
|
||||
# Set services IP ports
|
||||
export VLLM_SERVICE_PORT="18110"
|
||||
export WORKER_RAG_AGENT_PORT="18111"
|
||||
export WORKER_SQL_AGENT_PORT="18112"
|
||||
export SUPERVISOR_REACT_AGENT_PORT="18113"
|
||||
export CRAG_SERVER_PORT="18114"
|
||||
|
||||
export WORKPATH=$(dirname "$PWD")
|
||||
export WORKDIR=${WORKPATH}/../../../
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export HF_CACHE_DIR="./data"
|
||||
export MODEL_CACHE="./data"
|
||||
export TOOLSET_PATH=${WORKPATH}/../../../tools/
|
||||
export recursion_limit_worker=12
|
||||
export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
|
||||
export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
|
||||
export temperature=0.01
|
||||
export max_new_tokens=512
|
||||
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
|
||||
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
|
||||
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
|
||||
export db_name=Chinook
|
||||
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
|
||||
export recursion_limit_worker=12
|
||||
export recursion_limit_supervisor=10
|
||||
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
|
||||
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
|
||||
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
|
||||
export HF_CACHE_DIR=${HF_CACHE_DIR}
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export no_proxy=${no_proxy}
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${https_proxy}
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
|
||||
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
|
||||
export REDIS_URL="redis://${host_ip}:6379"
|
||||
export INDEX_NAME="rag-redis"
|
||||
export RERANK_TYPE="tei"
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
|
||||
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
|
||||
export RERANK_SERVICE_HOST_IP=${host_ip}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
|
||||
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
|
||||
|
||||
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
|
||||
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
|
||||
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
|
||||
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
|
||||
|
||||
echo "Downloading chinook data..."
|
||||
echo Y | rm -R chinook-database
|
||||
git clone https://github.com/lerocha/chinook-database.git
|
||||
echo Y | rm -R ../../../../../AgentQnA/tests/Chinook_Sqlite.sqlite
|
||||
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite ../../../../../AgentQnA/tests
|
||||
|
||||
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml up -d
|
||||
docker compose -f compose_vllm.yaml up -d
|
||||
|
||||
n=0
|
||||
until [[ "$n" -ge 500 ]]; do
|
||||
docker logs vllm-service >& "${WORKPATH}"/vllm-service_start.log
|
||||
if grep -q "Application startup complete" "${WORKPATH}"/vllm-service_start.log; then
|
||||
break
|
||||
fi
|
||||
sleep 20s
|
||||
n=$((n+1))
|
||||
done
|
||||
|
||||
echo "Starting CRAG server"
|
||||
docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=${CRAG_SERVER_PORT}:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
|
||||
@@ -14,7 +14,7 @@ export AGENTQNA_CARD_ID="card1"
|
||||
export AGENTQNA_RENDER_ID="renderD136"
|
||||
export HF_CACHE_DIR=${HF_CACHE_DIR}
|
||||
ls $HF_CACHE_DIR
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export NUM_SHARDS=4
|
||||
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
|
||||
export temperature=0.01
|
||||
@@ -44,3 +44,19 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get"
|
||||
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete"
|
||||
|
||||
echo "Removing chinook data..."
|
||||
echo Y | rm -R chinook-database
|
||||
if [ -d "chinook-database" ]; then
|
||||
rm -rf chinook-database
|
||||
fi
|
||||
echo "Chinook data removed!"
|
||||
|
||||
echo "Stopping CRAG server"
|
||||
docker rm kdd-cup-24-crag-service --force
|
||||
|
||||
echo "Stopping Agent services"
|
||||
docker compose -f compose.yaml down
|
||||
|
||||
echo "Stopping Retrieval services"
|
||||
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down
|
||||
@@ -0,0 +1,84 @@
|
||||
# Copyright (C) 2024 Advanced Micro Devices, Inc.
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
|
||||
# Before start script:
|
||||
# export host_ip="your_host_ip_or_host_name"
|
||||
# export HUGGINGFACEHUB_API_TOKEN="your_huggingface_api_token"
|
||||
# export LANGCHAIN_API_KEY="your_langchain_api_key"
|
||||
# export LANGCHAIN_TRACING_V2=""
|
||||
|
||||
# Set server hostname or IP address
|
||||
export ip_address=${host_ip}
|
||||
|
||||
# Set services IP ports
|
||||
export VLLM_SERVICE_PORT="18110"
|
||||
export WORKER_RAG_AGENT_PORT="18111"
|
||||
export WORKER_SQL_AGENT_PORT="18112"
|
||||
export SUPERVISOR_REACT_AGENT_PORT="18113"
|
||||
export CRAG_SERVER_PORT="18114"
|
||||
|
||||
export WORKPATH=$(dirname "$PWD")
|
||||
export WORKDIR=${WORKPATH}/../../../
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export VLLM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export HF_CACHE_DIR="./data"
|
||||
export MODEL_CACHE="./data"
|
||||
export TOOLSET_PATH=${WORKPATH}/../../../tools/
|
||||
export recursion_limit_worker=12
|
||||
export LLM_ENDPOINT_URL=http://${ip_address}:${VLLM_SERVICE_PORT}
|
||||
export LLM_MODEL_ID=${VLLM_LLM_MODEL_ID}
|
||||
export temperature=0.01
|
||||
export max_new_tokens=512
|
||||
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
|
||||
export LANGCHAIN_API_KEY=${LANGCHAIN_API_KEY}
|
||||
export LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2}
|
||||
export db_name=Chinook
|
||||
export db_path="sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
|
||||
export recursion_limit_worker=12
|
||||
export recursion_limit_supervisor=10
|
||||
export CRAG_SERVER=http://${ip_address}:${CRAG_SERVER_PORT}
|
||||
export WORKER_AGENT_URL="http://${ip_address}:${WORKER_RAG_AGENT_PORT}/v1/chat/completions"
|
||||
export SQL_AGENT_URL="http://${ip_address}:${WORKER_SQL_AGENT_PORT}/v1/chat/completions"
|
||||
export HF_CACHE_DIR=${HF_CACHE_DIR}
|
||||
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
|
||||
export no_proxy=${no_proxy}
|
||||
export http_proxy=${http_proxy}
|
||||
export https_proxy=${https_proxy}
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
|
||||
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
|
||||
export REDIS_URL="redis://${host_ip}:6379"
|
||||
export INDEX_NAME="rag-redis"
|
||||
export RERANK_TYPE="tei"
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
|
||||
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
|
||||
export RERANK_SERVICE_HOST_IP=${host_ip}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/ingest"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get"
|
||||
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete"
|
||||
|
||||
echo ${WORKER_RAG_AGENT_PORT} > ${WORKPATH}/WORKER_RAG_AGENT_PORT_tmp
|
||||
echo ${WORKER_SQL_AGENT_PORT} > ${WORKPATH}/WORKER_SQL_AGENT_PORT_tmp
|
||||
echo ${SUPERVISOR_REACT_AGENT_PORT} > ${WORKPATH}/SUPERVISOR_REACT_AGENT_PORT_tmp
|
||||
echo ${CRAG_SERVER_PORT} > ${WORKPATH}/CRAG_SERVER_PORT_tmp
|
||||
|
||||
echo "Removing chinook data..."
|
||||
echo Y | rm -R chinook-database
|
||||
if [ -d "chinook-database" ]; then
|
||||
rm -rf chinook-database
|
||||
fi
|
||||
echo "Chinook data removed!"
|
||||
|
||||
echo "Stopping CRAG server"
|
||||
docker rm kdd-cup-24-crag-service --force
|
||||
|
||||
echo "Stopping Agent services"
|
||||
docker compose -f compose_vllm.yaml down
|
||||
|
||||
echo "Stopping Retrieval services"
|
||||
docker compose -f ../../../../../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml down
|
||||
Reference in New Issue
Block a user