Compare commits

...

52 Commits

Author SHA1 Message Date
NeuralChatBot
bbb4e231d0 Freeze OPEA images tag
Signed-off-by: NeuralChatBot <grp_neural_chat_bot@intel.com>
2024-11-21 14:24:16 +00:00
bjzhjing
da10068964 Adjustments for helm release change (#1173)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
(cherry picked from commit ef2047b070)
2024-11-21 16:57:30 +08:00
Letong Han
188b568467 Fix Translation Manifest CI with MODEL_ID (#1169)
Signed-off-by: letonghan <letong.han@intel.com>
(cherry picked from commit 94231584aa)
2024-11-21 16:57:29 +08:00
minmin-intel
9e9af9766f Fix DocIndexRetriever CI error on Xeon (#1167)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit c5177c5e2f)
2024-11-21 16:57:28 +08:00
chen, suyue
cc108b5a18 Fix DBQnA image build (#1165)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-20 10:56:49 +08:00
chen, suyue
f70d9c3853 chatqna benchmark for v1.1 release (#1120)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-19 22:57:25 +08:00
ZePan110
8808b51e42 Rename image name XXX-hpu to XXX-gaudi (#1154)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-11-19 22:18:41 +08:00
chen, suyue
17d4b0c97f freeze nodejs version in CI test (#1162)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-19 13:22:56 +08:00
Sun, Xuehao
3a03d31f8f Update manual-freeze-tag workflow (#1161)
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
2024-11-19 11:00:36 +08:00
dependabot[bot]
179fd84362 Bump gradio from 4.44.0 to 5.5.0 in /DocSum/ui/gradio (#1157)
Signed-off-by: dependabot[bot] <support@github.com>
2024-11-18 23:50:56 +08:00
chen, suyue
9ba034b22d fix the docker image name for release image build (#1152)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-18 23:48:01 +08:00
jotpalch
c3e6f43ece Fix command in README for deploying ChatQnA application (#1156) 2024-11-18 22:59:22 +08:00
Theresa
1ac756a1c7 Rename the GraphRAG UI image (#1155)
Signed-off-by: ichbinblau <theresa.shan@intel.com>
2024-11-18 20:07:22 +08:00
sgurunat
56f770cb28 ChatQnA with Remote Inference Endpoints (Kubernetes) (#1149)
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 20:06:17 +08:00
XinyaoWa
0cdeb946e4 DocSum Manifest support multimedia (#1158)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 18:46:01 +08:00
Artem Astafev
5648839411 Add compose example for FaqGen AMD ROCm (#1126)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
2024-11-18 17:38:21 +08:00
Mustafa
eb91d1f054 Docsum (#1095)
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: XinyaoWa <xinyao.wang@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 17:15:42 +08:00
Wang, Kai Lawrence
2587179224 Add instructions of modifying reranking docker image for NVGPU (#1133)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 15:37:32 +08:00
chyundunovDatamonsters
7e62175c2e Adding files to deploy CodeTrans application on AMD GPU (#1138)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2024-11-18 14:58:38 +08:00
Louie Tsai
152adf8012 maintain a version info for docker_compose yaml files among release (#1141)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2024-11-17 22:39:41 -08:00
chyundunovDatamonsters
83172e9a99 Adding files to deploy CodeGen application on AMD GPU (#1130)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 14:36:23 +08:00
Liang Lv
fb514bb8ba Add chatqna wrapper for multiple model selection (#1144)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Ying Hu <ying.hu@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 10:48:09 +08:00
Artem Astafev
b1bb6db52d Add compose example for DocSum amd rocm deployment (#1125)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 09:09:12 +08:00
rui2zhang
7949045176 EdgeCraftRAG: Add E2E test cases for EdgeCraftRAG - local LLM and vllm (#1137)
Signed-off-by: Zhang, Rui <rui2.zhang@intel.com>
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mingyuan Qi <mingyuan.qi@intel.com>
2024-11-17 18:22:32 +08:00
Lianhao Lu
cbe952ec5e Fail CI manifest test if response content is not expected (#1145)
Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-11-17 12:46:31 +08:00
chen, suyue
3b1a9fe9e1 optimize hardware list for test (#1151)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-15 22:46:02 +08:00
chen, suyue
e66d7fe381 fix typo involved in ci workflow (#1150)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-15 21:19:29 +08:00
Artem Astafev
6d3a017609 Add compose example for ChatQnA AMD ROCm deployment (#1122)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 17:24:06 +08:00
Ying Hu
dbf4ba03fa Update AgentQnA README.md for refactor doc structure (#1146)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 16:30:13 +08:00
XinyaoWa
4f96d9e605 vllm hpu fix version for bug fix (#1142)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-11-15 15:12:53 +08:00
Ying Hu
a8f4245384 Update README.md for usage experience (#1135)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-11-15 14:23:12 +08:00
Mingyuan Qi
096a37aacc EdgeCraftRAG: Fix multiple issues (#1143)
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 14:01:27 +08:00
rbrugaro
6f8fa6a689 Grag ex1.1 (#1123)
Signed-off-by: Rita Brugarolas <rita.brugarolas.brufau@intel.com>
Signed-off-by: theresa <theresa.shan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: theresa <theresa.shan@intel.com>
2024-11-15 13:17:06 +08:00
Letong Han
39f68d5d6b Fix SearchQnA CI Issue (#1134)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-15 10:01:27 +08:00
Louie Tsai
00d9bb6128 Enable vLLM Profiling for ChatQnA on Gaudi (#1128)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2024-11-14 15:46:33 -08:00
Abolfazl Shahbazi
59b624c677 Fix minor documentation build issue (#1139)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-11-14 15:29:50 -08:00
chen, suyue
2b2c7ee2f5 upgrade setuptools version to fix CVE-2024-6345 (#999)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-14 14:57:16 +08:00
Hoong Tee, Yeoh
6b9a27dd83 DBQnA: Include workflow in README (#956)
Signed-off-by: Yeoh, Hoong Tee <hoong.tee.yeoh@intel.com>
2024-11-14 14:05:28 +08:00
Yi Yao
5720cd45c0 Add benchmark launcher for AudioQnA (#981)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-14 13:58:51 +08:00
XinyaoWa
73879d3cec fix faq ui bug (#1118)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-14 10:00:30 +08:00
Lucas Melo
7c9ed04132 ChatQnA - Add Terraform and Ansible Modules information (#970)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: lucasmelogithub <lucas.melo@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
2024-11-13 11:42:12 -08:00
lvliang-intel
9ff7df9202 Use fixed version of TEI Gaudi for stability (#1101)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
2024-11-13 10:45:50 -08:00
Abolfazl Shahbazi
b5f95f735e Fix missing end of file chars (#1106)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-13 09:40:53 -08:00
chen, suyue
393367e9f1 Fix left issue of tgi version update (#1121)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-13 15:42:42 +08:00
Louie Tsai
7adbba6add Enable vLLM Profiling for ChatQnA (#1124) 2024-11-13 11:26:31 +08:00
pallavijaini0525
0d52c2f003 Pinecone update to Readme and docker compose for ChatQnA (#540)
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: AI Workloads <aigoldrush1@g2-r3-2.iind.intel.com>
Signed-off-by: Pallavi Jaini <pallavi,jaini@intel.com>
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
Signed-off-by: root <root@test-pjaini.535545281608.us-region-2.idcservice.net>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: AI Workloads <aigoldrush1@g2-r3-2.iind.intel.com>
Co-authored-by: Pallavi Jaini <pallavi,jaini@intel.com>
Co-authored-by: root <root@test-pjaini.535545281608.us-region-2.idcservice.net>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-13 09:32:37 +08:00
lvliang-intel
1ff85f6a85 Upgrade TGI Gaudi version to v2.0.6 (#1088)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-12 14:38:22 +08:00
bjzhjing
f7a7f8aa3f Fix typo (#1117)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-12 09:54:05 +08:00
lvliang-intel
e3187be819 Update ChatQnA manifests using always pull image policy (#1100)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-11-11 14:37:14 +08:00
Sihan Chen
abd9d12937 Fix non stream case (#1115)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-11 14:18:42 +08:00
bjzhjing
a7353bbaa4 Refine performance directory (#1017)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-11 13:58:46 +08:00
Letong Han
aa314f6757 [Readme] Update ChatQnA Readme for LLM Endpoint (#1086)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-11 13:53:06 +08:00
373 changed files with 11431 additions and 1088 deletions

View File

@@ -77,9 +77,9 @@ jobs:
git clone https://github.com/vllm-project/vllm.git
cd vllm && git rev-parse HEAD && cd ../
fi
if [[ $(grep -c "vllm-hpu:" ${docker_compose_path}) != 0 ]]; then
if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/HabanaAI/vllm-fork.git
cd vllm-fork && git rev-parse HEAD && cd ../
cd vllm-fork && git checkout 3c39626 && cd ../
fi
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps && git checkout ${{ inputs.opea_branch }} && git rev-parse HEAD && cd ../

View File

@@ -14,7 +14,7 @@ on:
test_mode:
required: false
type: string
default: 'docker_compose'
default: 'compose'
outputs:
run_matrix:
description: "The matrix string"

View File

@@ -1,13 +1,13 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Freeze OPEA images release tag in readme on manual event
name: Freeze OPEA images release tag
on:
workflow_dispatch:
inputs:
tag:
default: "latest"
default: "1.1.0"
description: "Tag to apply to images"
required: true
type: string
@@ -23,10 +23,6 @@ jobs:
fetch-depth: 0
ref: ${{ github.ref }}
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Set up Git
run: |
git config --global user.name "NeuralChatBot"
@@ -35,9 +31,10 @@ jobs:
- name: Run script
run: |
find . -name "*.md" | xargs sed -i "s|^docker\ compose|TAG=${{ github.event.inputs.tag }}\ docker\ compose|g"
find . -type f -name "*.yaml" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
find . -type f -name "*.md" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
IFS='.' read -r major minor patch <<< "${{ github.event.inputs.tag }}"
echo "VERSION_MAJOR ${major}" > version.txt
echo "VERSION_MINOR ${minor}" >> version.txt
echo "VERSION_PATCH ${patch}" >> version.txt
- name: Commit changes
run: |

View File

@@ -8,7 +8,8 @@ on:
branches: [ 'main' ]
paths:
- "**.py"
- "**Dockerfile"
- "**Dockerfile*"
- "**docker_image_build/build.yaml"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-on-push
@@ -18,7 +19,7 @@ jobs:
job1:
uses: ./.github/workflows/_get-test-matrix.yml
with:
test_mode: "docker_image_build/build.yaml"
test_mode: "docker_image_build"
image-build:
needs: job1

View File

@@ -16,8 +16,13 @@ for example in ${examples}; do
if [[ ! $(find . -type f | grep ${test_mode}) ]]; then continue; fi
cd tests
ls -l
hardware_list=$(find . -type f -name "test_compose*_on_*.sh" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
echo "Test supported hardware list = ${hardware_list}"
if [[ "$test_mode" == "docker_image_build" ]]; then
find_name="test_manifest_on_*.sh"
else
find_name="test_${test_mode}*_on_*.sh"
fi
hardware_list=$(find . -type f -name "${find_name}" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
echo -e "Test supported hardware list: \n${hardware_list}"
run_hardware=""
if [[ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | cut -d'/' -f2 | grep -E '*.py|Dockerfile*|ui|docker_image_build' ) ]]; then

16
.set_env.sh Normal file
View File

@@ -0,0 +1,16 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
#To anounce the version of the codes, please create a version.txt and have following format.
#VERSION_MAJOR 1
#VERSION_MINOR 0
#VERSION_PATCH 0
VERSION_FILE="version.txt"
if [ -f $VERSION_FILE ]; then
VER_OPEA_MAJOR=$(grep "VERSION_MAJOR" $VERSION_FILE | cut -d " " -f 2)
VER_OPEA_MINOR=$(grep "VERSION_MINOR" $VERSION_FILE | cut -d " " -f 2)
VER_OPEA_PATCH=$(grep "VERSION_PATCH" $VERSION_FILE | cut -d " " -f 2)
export TAG=$VER_OPEA_MAJOR.$VER_OPEA_MINOR
echo OPEA Version:$TAG
fi

View File

@@ -83,29 +83,32 @@ flowchart LR
## Deployment with docker
1. Build agent docker image
1. Build agent docker image [Optional]
Note: this is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
> [!NOTE]
> the step is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
First, clone the opea GenAIComps repo.
First, clone the opea GenAIComps repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIComps.git
```
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIComps.git
```
Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.
Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.
```
cd GenAIComps
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
```
```
cd GenAIComps
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
```
2. Set up environment for this example </br>
First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
@@ -113,6 +116,14 @@ flowchart LR
Second, set up env vars.
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
# for using open-source llms
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
@@ -147,6 +158,12 @@ flowchart LR
5. Launch agent services</br>
We provide two options for `llm_engine` of the agents: 1. open-source LLMs, 2. OpenAI models via API calls.
Deploy it on Gaudi or Xeon respectively
::::{tab-set}
:::{tab-item} Gaudi
:sync: Gaudi
To use open-source LLMs on Gaudi2, run commands below.
```
@@ -155,6 +172,10 @@ flowchart LR
bash launch_agent_service_tgi_gaudi.sh
```
:::
:::{tab-item} Xeon
:sync: Xeon
To use OpenAI models, run commands below.
```
@@ -162,6 +183,9 @@ flowchart LR
bash launch_agent_service_openai.sh
```
:::
::::
## Validate services
First look at logs of the agent docker containers:
@@ -181,7 +205,7 @@ You should see something like "HTTP server setup successful" if the docker conta
Second, validate worker agent:
```
curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
@@ -189,7 +213,7 @@ curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: app
Third, validate supervisor agent:
```
curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```

View File

@@ -1,3 +1,100 @@
# Deployment on Xeon
# Single node on-prem deployment with Docker Compose on Xeon Scalable processors
We deploy the retrieval tool on Xeon. For LLMs, we support OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
#OPANAI_API_KEY if you want to use OpenAI models
export OPENAI_API_KEY=<your-openai-key>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
5. Launch `Agent` service
The configurations of the supervisor agent and the worker agent are defined in the docker-compose yaml file. We currently use openAI GPT-4o-mini as LLM, and llama3.1-70B-instruct (served by TGI-Gaudi) in Gaudi example. To use openai llm, run command below.
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
bash launch_agent_service_openai.sh
```
6. [Optional] Build `Agent` docker image if pulling images failed.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker agent
docker logs rag-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
Third, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).

View File

@@ -1,6 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export ip_address=$(hostname -I | awk '{print $1}')
export recursion_limit_worker=12

View File

@@ -0,0 +1,105 @@
# Single node on-prem deployment AgentQnA on Gaudi
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Gaudi using open-source LLMs,
For more details, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
# for using open-source llms
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
# Example export HF_CACHE_DIR=$WORKDIR so that no need to redownload every time
export HF_CACHE_DIR=<directory-where-llms-are-downloaded>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
5. Launch `Agent` service
To use open-source LLMs on Gaudi2, run commands below.
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
bash launch_tgi_gaudi.sh
bash launch_agent_service_tgi_gaudi.sh
```
6. [Optional] Build `Agent` docker image if pulling images failed.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker agent
docker logs rag-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
Third, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).

View File

@@ -1,6 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
WORKPATH=$(dirname "$PWD")/..
# export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"

View File

@@ -3,7 +3,7 @@
services:
tgi-server:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-server
ports:
- "8085:80"

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./audioqna.py /home/user/audioqna.py

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./audioqna_multilang.py /home/user/audioqna_multilang.py

View File

@@ -0,0 +1,77 @@
# AudioQnA Benchmarking
This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
## Getting Started
We recommend using Kubernetes to deploy the AudioQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs.
### Prerequisites
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running GenAIEval.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.
```bash
# The way to modify the containered ulimit:
sudo systemctl edit containerd
# Add two lines:
[Service]
LimitNOFILE=65536:1048576
sudo systemctl daemon-reload; sudo systemctl restart containerd
```
## Test Steps
Please deploy AudioQnA service before benchmarking.
### Run Benchmark Test
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[128, 128, 128, 128]"
export TEST_OUTPUT_DIR="/tmp/benchmark_output"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n <node_count>
```
The argument `-n` refers to the number of test nodes.
### Data collection
All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.

View File

@@ -0,0 +1,99 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
deployment_type="k8s"
node_number=1
service_port=8888
query_per_node=128
benchmark_tool_path="$(pwd)/GenAIEval"
usage() {
echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
echo " -d deployment_type AudioQnA deployment type, select between k8s and docker (default: k8s)"
echo " -n node_number Test node number, required only for k8s deployment_type, (default: 1)"
echo " -i service_ip AudioQnA service ip, required only for docker deployment_type"
echo " -p service_port AudioQnA service port, required only for docker deployment_type, (default: 8888)"
exit 1
}
while getopts ":d:n:i:p:" opt; do
case ${opt} in
d )
deployment_type=$OPTARG
;;
n )
node_number=$OPTARG
;;
i )
service_ip=$OPTARG
;;
p )
service_port=$OPTARG
;;
\? )
echo "Invalid option: -$OPTARG" 1>&2
usage
;;
: )
echo "Invalid option: -$OPTARG requires an argument" 1>&2
usage
;;
esac
done
if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
echo "Error: service_ip is required for docker deployment_type" 1>&2
usage
fi
if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
fi
function main() {
if [[ ! -d ${benchmark_tool_path} ]]; then
echo "Benchmark tool not found, setting up..."
setup_env
fi
run_benchmark
}
function setup_env() {
git clone https://github.com/opea-project/GenAIEval.git
pushd ${benchmark_tool_path}
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
popd
}
function run_benchmark() {
source ${benchmark_tool_path}/stress_venv/bin/activate
export DEPLOYMENT_TYPE=${deployment_type}
export SERVICE_IP=${service_ip:-"None"}
export SERVICE_PORT=${service_port:-"None"}
if [[ -z $USER_QUERIES ]]; then
user_query=$((query_per_node*node_number))
export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
fi
export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
if [[ -z $WARMUP ]]; then export WARMUP=0; fi
if [[ -z $TEST_OUTPUT_DIR ]]; then
if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
else
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
fi
echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
fi
envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
cd ${benchmark_tool_path}/evals/benchmark
python benchmark.py
}
main

View File

@@ -0,0 +1,52 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
test_suite_config: # Overall configuration settings for the test suite
examples: ["audioqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: "k8s" # Default is "k8s", can also be "docker"
service_ip: None # Leave as None for k8s, specify for Docker
service_port: None # Leave as None for k8s, specify for Docker
warm_ups: 0 # Number of test requests for warm-up
run_time: 60m # The max total run time for the test suite
seed: # The seed for all RNGs
user_queries: [1, 2, 4, 8, 16, 32, 64, 128] # Number of test requests at each concurrency level
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "/tmp/benchmark_output" # The directory to store the test output
load_shape: # Tenant concurrency pattern
name: constant # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson
arrival-rate: 1.0 # Request arrival rate
namespace: "" # Fill the user-defined namespace. Otherwise, it will be default.
test_cases:
audioqna:
asr:
run_test: true
service_name: "asr-svc" # Replace with your service name
llm:
run_test: true
service_name: "llm-svc" # Replace with your service name
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
streaming: true
llmserve:
run_test: true
service_name: "llm-svc" # Replace with your service name
tts:
run_test: true
service_name: "tts-svc" # Replace with your service name
e2e:
run_test: true
service_name: "audioqna-backend-server-svc" # Replace with your service name

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -51,7 +51,7 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "3006:80"

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.6
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest

View File

@@ -271,7 +271,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
name: llm-dependency-deploy-demo
securityContext:
capabilities:

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
@@ -100,7 +100,7 @@ function validate_megaservice() {
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="audioqna whisper asr llm-tgi speecht5 tts"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
@@ -90,7 +90,7 @@ function validate_megaservice() {
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -15,7 +15,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
runtime: habana
cap_add:
@@ -39,7 +39,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
runtime: habana
cap_add:
@@ -54,7 +54,7 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "3006:80"
@@ -67,7 +67,7 @@ services:
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
@@ -105,7 +105,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
DEVICE: ${DEVICE}
INFERENCE_MODE: ${INFERENCE_MODE}
@@ -132,7 +132,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
WAV2LIP_ENDPOINT: ${WAV2LIP_ENDPOINT}
runtime: habana

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -29,7 +29,7 @@ function build_docker_images() {
service_list="avatarchatbot whisper-gaudi asr llm-tgi speecht5-gaudi tts wav2lip-gaudi animation"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
@@ -74,7 +74,7 @@ function start_services() {
export FPS=10
# Start Docker Containers
docker compose up -d
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
until [[ "$n" -ge 100 ]]; do
@@ -86,7 +86,6 @@ function start_services() {
n=$((n+1))
done
# sleep 5m
echo "All services are up and running"
sleep 5s
}
@@ -99,6 +98,7 @@ function validate_megaservice() {
if [[ $result == *"mp4"* ]]; then
echo "Result correct."
else
echo "Result wrong, print docker logs."
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs asr-service > $LOG_PATH/asr-service.log
docker logs speecht5-service > $LOG_PATH/speecht5-service.log
@@ -107,19 +107,13 @@ function validate_megaservice() {
docker logs llm-tgi-gaudi-server > $LOG_PATH/llm-tgi-gaudi-server.log
docker logs wav2lip-service > $LOG_PATH/wav2lip-service.log
docker logs animation-gaudi-server > $LOG_PATH/animation-gaudi-server.log
echo "Result wrong."
echo "Exit test."
exit 1
fi
}
#function validate_frontend() {
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi
docker compose down

View File

@@ -29,7 +29,7 @@ function build_docker_images() {
service_list="avatarchatbot whisper asr llm-tgi speecht5 tts wav2lip animation"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -0,0 +1,32 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
FROM python:3.11-slim
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
libgl1-mesa-glx \
libjemalloc-dev \
git
RUN useradd -m -s /bin/bash user && \
mkdir -p /home/user && \
chown -R user /home/user/
WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./chatqna_wrapper.py /home/user/chatqna.py
ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps
USER user
WORKDIR /home/user
RUN echo 'ulimit -S -n 999999' >> ~/.bashrc
ENTRYPOINT ["python", "chatqna.py"]

View File

@@ -4,7 +4,26 @@ Chatbots are the most widely adopted use case for leveraging the powerful chat a
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
## Deploy ChatQnA Service
## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments |
| -------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Module](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Uses Intel/neural-chat-7b-v3-3 by default |
| AWS Falcon2-11B | 4th Gen Intel Xeon with Intel AMX | [AWS Module with Falcon11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | Uses TII Falcon2-11B LLM Model |
| GCP | 5th Gen Intel Xeon with Intel AMX | [GCP Module](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Also supports Confidential AI by using Intel® TDX with 4th Gen Xeon |
| Azure | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
## Automated Deployment to Ubuntu based system(if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible**
To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform.
Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal.
| Operating System | Intel Optimized Cloud Module for Ansible |
|------------------|------------------------------------------|
| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) |
| Ubuntu 22.04 | Work-in-progress |
## Manually Deploy ChatQnA Service
The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable Processors and Nvidia GPU.

View File

@@ -48,7 +48,7 @@ To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2
# for better performance, set `PREFILL_BATCH_BUCKET_SIZE`, `BATCH_BUCKET_SIZE`, `max-batch-total-tokens`, `max-batch-prefill-tokens`
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
```
### Prepare Dataset

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -327,7 +327,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -327,7 +327,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -327,7 +327,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -327,7 +327,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -237,7 +237,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -345,7 +345,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -345,7 +345,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -345,7 +345,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:
@@ -345,7 +345,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
imagePullPolicy: IfNotPresent
name: reranking-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -255,7 +255,7 @@ spec:
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -0,0 +1,196 @@
# ChatQnA Benchmarking
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
## Table of Contents
- [Deployment](#deployment)
- [Prerequisites](#prerequisites)
- [Deployment Scenarios](#deployment-scenarios)
- [Case 1: Baseline Deployment with Rerank](#case-1-baseline-deployment-with-rerank)
- [Case 2: Baseline Deployment without Rerank](#case-2-baseline-deployment-without-rerank)
- [Case 3: Tuned Deployment with Rerank](#case-3-tuned-deployment-with-rerank)
- [Benchmark](#benchmark)
- [Test Configurations](#test-configurations)
- [Test Steps](#test-steps)
- [Upload Retrieval File](#upload-retrieval-file)
- [Run Benchmark Test](#run-benchmark-test)
- [Data collection](#data-collection)
- [Teardown](#teardown)
## Deployment
### Prerequisites
- Kubernetes installation: Use [kubespray](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md) or other official Kubernetes installation guides.
- Helm installation: Follow the [Helm documentation](https://helm.sh/docs/intro/install/#helm) to install Helm.
- Setup Hugging Face Token
To access models and APIs from Hugging Face, set your token as environment variable.
```bash
export HF_TOKEN="insert-your-huggingface-token-here"
```
- Prepare Shared Models (Optional but Strongly Recommended)
Downloading models simultaneously to multiple nodes in your cluster can overload resources such as network bandwidth, memory and storage. To prevent resource exhaustion, it's recommended to preload the models in advance.
```bash
pip install -U "huggingface_hub[cli]"
sudo mkdir -p /mnt/models
sudo chmod 777 /mnt/models
huggingface-cli download --cache-dir /mnt/models Intel/neural-chat-7b-v3-3
export MODEL_DIR=/mnt/models
```
Once the models are downloaded, you can consider the following methods for sharing them across nodes:
- Persistent Volume Claim (PVC): This is the recommended approach for production setups. For more details on using PVC, refer to [PVC](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume).
- Local Host Path: For simpler testing, ensure that each node involved in the deployment follows the steps above to locally prepare the models. After preparing the models, use `--set global.modelUseHostPath=${MODELDIR}` in the deployment command.
- Label Nodes
```base
python deploy.py --add-label --num-nodes 2
```
### Deployment Scenarios
The example below are based on a two-node setup. You can adjust the number of nodes by using the `--num-nodes` option.
By default, these commands use the `default` namespace. To specify a different namespace, use the `--namespace` flag with deploy, uninstall, and kubernetes command. Additionally, update the `namespace` field in `benchmark.yaml` before running the benchmark test.
For additional configuration options, run `python deploy.py --help`
#### Case 1: Baseline Deployment with Rerank
Deploy Command (with node number, Hugging Face token, model directory specified):
```bash
python deploy.py --hf-token $HF_TOKEN --model-dir $MODEL_DIR --num-nodes 2 --with-rerank
```
Uninstall Command:
```bash
python deploy.py --uninstall
```
#### Case 2: Baseline Deployment without Rerank
```bash
python deploy.py --hf-token $HFTOKEN --model-dir $MODELDIR --num-nodes 2
```
#### Case 3: Tuned Deployment with Rerank
```bash
python deploy.py --hf-token $HFTOKEN --model-dir $MODELDIR --num-nodes 2 --with-rerank --tuned
```
## Benchmark
### Test Configurations
| Key | Value |
| -------- | ------- |
| Workload | ChatQnA |
| Tag | V1.1 |
Models configuration
| Key | Value |
| ---------- | ------------------ |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| Inference | Intel/neural-chat-7b-v3-3 |
Benchmark parameters
| Key | Value |
| ---------- | ------------------ |
| LLM input tokens | 1024 |
| LLM output tokens | 128 |
Number of test requests for different scheduled node number:
| Node count | Concurrency | Query number |
| ----- | -------- | -------- |
| 1 | 128 | 640 |
| 2 | 256 | 1280 |
| 4 | 512 | 2560 |
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
### Test Steps
Use `kubectl get pods` to confirm that all pods are `READY` before starting the test.
#### Upload Retrieval File
Before testing, upload a specified file to make sure the llm input have the token length of 1k.
Get files:
```bash
wget https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/data/upload_file_no_rerank.txt
wget https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/data/upload_file.txt
```
Retrieve the `ClusterIP` of the `chatqna-data-prep` service.
```bash
kubectl get svc
```
Expected output:
```log
chatqna-data-prep ClusterIP xx.xx.xx.xx <none> 6007/TCP 51m
```
Use the following `cURL` command to upload file:
```bash
cd GenAIEval/evals/benchmark/data
# RAG with Rerank
curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./upload_file.txt"
# RAG without Rerank
curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./upload_file_no_rerank.txt"
```
#### Run Benchmark Test
Run the benchmark test using:
```bash
bash benchmark.sh -n 2
```
The `-n` argument specifies the number of test nodes. Required dependencies will be automatically installed when running the benchmark for the first time.
#### Data collection
All the test results will come to the folder `GenAIEval/evals/benchmark/benchmark_output`.
## Teardown
After completing the benchmark, use the following command to clean up the environment:
Remove Node Labels:
```bash
python deploy.py --delete-label
```

View File

@@ -0,0 +1,102 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
deployment_type="k8s"
node_number=1
service_port=8888
query_per_node=640
benchmark_tool_path="$(pwd)/GenAIEval"
usage() {
echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
echo " -d deployment_type ChatQnA deployment type, select between k8s and docker (default: k8s)"
echo " -n node_number Test node number, required only for k8s deployment_type, (default: 1)"
echo " -i service_ip chatqna service ip, required only for docker deployment_type"
echo " -p service_port chatqna service port, required only for docker deployment_type, (default: 8888)"
exit 1
}
while getopts ":d:n:i:p:" opt; do
case ${opt} in
d )
deployment_type=$OPTARG
;;
n )
node_number=$OPTARG
;;
i )
service_ip=$OPTARG
;;
p )
service_port=$OPTARG
;;
\? )
echo "Invalid option: -$OPTARG" 1>&2
usage
;;
: )
echo "Invalid option: -$OPTARG requires an argument" 1>&2
usage
;;
esac
done
if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
echo "Error: service_ip is required for docker deployment_type" 1>&2
usage
fi
if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
fi
function main() {
if [[ ! -d ${benchmark_tool_path} ]]; then
echo "Benchmark tool not found, setting up..."
setup_env
fi
run_benchmark
}
function setup_env() {
git clone https://github.com/opea-project/GenAIEval.git
pushd ${benchmark_tool_path}
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
popd
}
function run_benchmark() {
source ${benchmark_tool_path}/stress_venv/bin/activate
export DEPLOYMENT_TYPE=${deployment_type}
export SERVICE_IP=${service_ip:-"None"}
export SERVICE_PORT=${service_port:-"None"}
export LOAD_SHAPE=${load_shape:-"constant"}
export CONCURRENT_LEVEL=${concurrent_level:-5}
export ARRIVAL_RATE=${arrival_rate:-1.0}
if [[ -z $USER_QUERIES ]]; then
user_query=$((query_per_node*node_number))
export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
fi
export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
if [[ -z $WARMUP ]]; then export WARMUP=0; fi
if [[ -z $TEST_OUTPUT_DIR ]]; then
if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
else
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
fi
echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
fi
envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
cd ${benchmark_tool_path}/evals/benchmark
python benchmark.py
}
main

View File

@@ -0,0 +1,67 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
test_suite_config: # Overall configuration settings for the test suite
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: ${DEPLOYMENT_TYPE} # Default is "k8s", can also be "docker"
service_ip: ${SERVICE_IP} # Leave as None for k8s, specify for Docker
service_port: ${SERVICE_PORT} # Leave as None for k8s, specify for Docker
warm_ups: ${WARMUP} # Number of test requests for warm-up
run_time: 60m # The max total run time for the test suite
seed: # The seed for all RNGs
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "${TEST_OUTPUT_DIR}" # The directory to store the test output
load_shape: # Tenant concurrency pattern
name: ${LOAD_SHAPE} # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Constant load shape specific parameters, activate only if load_shape.name is constant
concurrent_level: ${CONCURRENT_LEVEL} # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape.name is poisson
arrival_rate: ${ARRIVAL_RATE} # Request arrival rate
test_cases:
chatqna:
embedding:
run_test: false
service_name: "chatqna-embedding-usvc" # Replace with your service name
embedserve:
run_test: false
service_name: "chatqna-tei" # Replace with your service name
retriever:
run_test: false
service_name: "chatqna-retriever-usvc" # Replace with your service name
parameters:
search_type: "similarity"
k: 4
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
reranking:
run_test: false
service_name: "chatqna-reranking-usvc" # Replace with your service name
parameters:
top_n: 1
rerankserve:
run_test: false
service_name: "chatqna-teirerank" # Replace with your service name
llm:
run_test: false
service_name: "chatqna-llm-uservice" # Replace with your service name
parameters:
max_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
streaming: true
llmserve:
run_test: false
service_name: "chatqna-tgi" # Replace with your service name
e2e:
run_test: true
service_name: "chatqna" # Replace with your service name

View File

@@ -0,0 +1,279 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import argparse
import glob
import json
import os
import shutil
import subprocess
import sys
import yaml
from generate_helm_values import generate_helm_values
def run_kubectl_command(command):
"""Run a kubectl command and return the output."""
try:
result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
return result.stdout
except subprocess.CalledProcessError as e:
print(f"Error running command: {command}\n{e.stderr}")
exit(1)
def get_all_nodes():
"""Get the list of all nodes in the Kubernetes cluster."""
command = ["kubectl", "get", "nodes", "-o", "json"]
output = run_kubectl_command(command)
nodes = json.loads(output)
return [node["metadata"]["name"] for node in nodes["items"]]
def add_label_to_node(node_name, label):
"""Add a label to the specified node."""
command = ["kubectl", "label", "node", node_name, label, "--overwrite"]
print(f"Labeling node {node_name} with {label}...")
run_kubectl_command(command)
print(f"Label {label} added to node {node_name} successfully.")
def add_labels_to_nodes(node_count=None, label=None, node_names=None):
"""Add a label to the specified number of nodes or to specified nodes."""
if node_names:
# Add label to the specified nodes
for node_name in node_names:
add_label_to_node(node_name, label)
else:
# Fetch the node list and label the specified number of nodes
all_nodes = get_all_nodes()
if node_count is None or node_count > len(all_nodes):
print(f"Error: Node count exceeds the number of available nodes ({len(all_nodes)} available).")
sys.exit(1)
selected_nodes = all_nodes[:node_count]
for node_name in selected_nodes:
add_label_to_node(node_name, label)
def clear_labels_from_nodes(label, node_names=None):
"""Clear the specified label from specific nodes if provided, otherwise from all nodes."""
label_key = label.split("=")[0] # Extract key from 'key=value' format
# If specific nodes are provided, use them; otherwise, get all nodes
nodes_to_clear = node_names if node_names else get_all_nodes()
for node_name in nodes_to_clear:
# Check if the node has the label by inspecting its metadata
command = ["kubectl", "get", "node", node_name, "-o", "json"]
node_info = run_kubectl_command(command)
node_metadata = json.loads(node_info)
# Check if the label exists on this node
labels = node_metadata["metadata"].get("labels", {})
if label_key in labels:
# Remove the label from the node
command = ["kubectl", "label", "node", node_name, f"{label_key}-"]
print(f"Removing label {label_key} from node {node_name}...")
run_kubectl_command(command)
print(f"Label {label_key} removed from node {node_name} successfully.")
else:
print(f"Label {label_key} not found on node {node_name}, skipping.")
def install_helm_release(release_name, chart_name, namespace, values_file, device_type):
"""Deploy a Helm release with a specified name and chart.
Parameters:
- release_name: The name of the Helm release.
- chart_name: The Helm chart name or path, e.g., "opea/chatqna".
- namespace: The Kubernetes namespace for deployment.
- values_file: The user values file for deployment.
- device_type: The device type (e.g., "gaudi") for specific configurations (optional).
"""
# Check if the namespace exists; if not, create it
try:
# Check if the namespace exists
command = ["kubectl", "get", "namespace", namespace]
subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
# Namespace does not exist, create it
print(f"Namespace '{namespace}' does not exist. Creating it...")
command = ["kubectl", "create", "namespace", namespace]
subprocess.run(command, check=True)
print(f"Namespace '{namespace}' created successfully.")
# Handle gaudi-specific values file if device_type is "gaudi"
hw_values_file = None
untar_dir = None
if device_type == "gaudi":
print("Device type is gaudi. Pulling Helm chart to get gaudi-values.yaml...")
# Combine chart_name with fixed prefix
chart_pull_url = f"oci://ghcr.io/opea-project/charts/{chart_name}"
# Pull and untar the chart
subprocess.run(["helm", "pull", chart_pull_url, "--untar"], check=True)
# Find the untarred directory
untar_dirs = glob.glob(f"{chart_name}*")
if untar_dirs:
untar_dir = untar_dirs[0]
hw_values_file = os.path.join(untar_dir, "gaudi-values.yaml")
print("gaudi-values.yaml pulled and ready for use.")
else:
print(f"Error: Could not find untarred directory for {chart_name}")
return
# Prepare the Helm install command
command = ["helm", "install", release_name, chart_name, "--namespace", namespace]
# Append additional values file for gaudi if it exists
if hw_values_file:
command.extend(["-f", hw_values_file])
# Append the main values file
command.extend(["-f", values_file])
# Execute the Helm install command
try:
print(f"Running command: {' '.join(command)}") # Print full command for debugging
subprocess.run(command, check=True)
print("Deployment initiated successfully.")
except subprocess.CalledProcessError as e:
print(f"Error occurred while deploying Helm release: {e}")
# Cleanup: Remove the untarred directory
if untar_dir and os.path.isdir(untar_dir):
print(f"Removing temporary directory: {untar_dir}")
shutil.rmtree(untar_dir)
print("Temporary directory removed successfully.")
def uninstall_helm_release(release_name, namespace=None):
"""Uninstall a Helm release and clean up resources, optionally delete the namespace if not 'default'."""
# Default to 'default' namespace if none is specified
if not namespace:
namespace = "default"
try:
# Uninstall the Helm release
command = ["helm", "uninstall", release_name, "--namespace", namespace]
print(f"Uninstalling Helm release {release_name} in namespace {namespace}...")
run_kubectl_command(command)
print(f"Helm release {release_name} uninstalled successfully.")
# If the namespace is specified and not 'default', delete it
if namespace != "default":
print(f"Deleting namespace {namespace}...")
delete_namespace_command = ["kubectl", "delete", "namespace", namespace]
run_kubectl_command(delete_namespace_command)
print(f"Namespace {namespace} deleted successfully.")
else:
print("Namespace is 'default', skipping deletion.")
except subprocess.CalledProcessError as e:
print(f"Error occurred while uninstalling Helm release or deleting namespace: {e}")
def main():
parser = argparse.ArgumentParser(description="Manage Helm Deployment.")
parser.add_argument(
"--release-name",
type=str,
default="chatqna",
help="The Helm release name created during deployment (default: chatqna).",
)
parser.add_argument(
"--chart-name",
type=str,
default="chatqna",
help="The chart name to deploy, composed of repo name and chart name (default: chatqna).",
)
parser.add_argument("--namespace", default="default", help="Kubernetes namespace (default: default).")
parser.add_argument("--hf-token", help="Hugging Face API token.")
parser.add_argument(
"--model-dir", help="Model directory, mounted as volumes for service access to pre-downloaded models"
)
parser.add_argument("--user-values", help="Path to a user-specified values.yaml file.")
parser.add_argument(
"--create-values-only", action="store_true", help="Only create the values.yaml file without deploying."
)
parser.add_argument("--uninstall", action="store_true", help="Uninstall the Helm release.")
parser.add_argument("--num-nodes", type=int, default=1, help="Number of nodes to use (default: 1).")
parser.add_argument("--node-names", nargs="*", help="Optional specific node names to label.")
parser.add_argument("--add-label", action="store_true", help="Add label to specified nodes if this flag is set.")
parser.add_argument(
"--delete-label", action="store_true", help="Delete label from specified nodes if this flag is set."
)
parser.add_argument(
"--label", default="node-type=opea-benchmark", help="Label to add/delete (default: node-type=opea-benchmark)."
)
parser.add_argument("--with-rerank", action="store_true", help="Include rerank service in the deployment.")
parser.add_argument(
"--tuned",
action="store_true",
help="Modify resources for services and change extraCmdArgs when creating values.yaml.",
)
parser.add_argument(
"--device-type",
type=str,
choices=["cpu", "gaudi"],
default="gaudi",
help="Specify the device type for deployment (choices: 'cpu', 'gaudi'; default: gaudi).",
)
args = parser.parse_args()
# Adjust num-nodes based on node-names if specified
if args.node_names:
num_node_names = len(args.node_names)
if args.num_nodes != 1 and args.num_nodes != num_node_names:
parser.error("--num-nodes must match the number of --node-names if both are specified.")
else:
args.num_nodes = num_node_names
# Node labeling management
if args.add_label:
add_labels_to_nodes(args.num_nodes, args.label, args.node_names)
return
elif args.delete_label:
clear_labels_from_nodes(args.label, args.node_names)
return
# Uninstall Helm release if specified
if args.uninstall:
uninstall_helm_release(args.release_name, args.namespace)
return
# Prepare values.yaml if not uninstalling
if args.user_values:
values_file_path = args.user_values
else:
if not args.hf_token:
parser.error("--hf-token are required")
node_selector = {args.label.split("=")[0]: args.label.split("=")[1]}
values_file_path = generate_helm_values(
with_rerank=args.with_rerank,
num_nodes=args.num_nodes,
hf_token=args.hf_token,
model_dir=args.model_dir,
node_selector=node_selector,
tune=args.tuned,
)
# Read back the generated YAML file for verification
with open(values_file_path, "r") as file:
print("Generated YAML contents:")
print(file.read())
# Deploy unless --create-values-only is specified
if not args.create_values_only:
install_helm_release(args.release_name, args.chart_name, args.namespace, values_file_path, args.device_type)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,163 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import os
import yaml
def generate_helm_values(with_rerank, num_nodes, hf_token, model_dir, node_selector=None, tune=False):
"""Create a values.yaml file based on the provided configuration."""
# Log the received parameters
print("Received parameters:")
print(f"with_rerank: {with_rerank}")
print(f"num_nodes: {num_nodes}")
print(f"node_selector: {node_selector}") # Log the node_selector
print(f"tune: {tune}")
if node_selector is None:
node_selector = {}
# Construct the base values dictionary
values = {
"tei": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"tgi": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"data-prep": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"redis-vector-db": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"retriever-usvc": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"chatqna-ui": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"global": {
"HUGGINGFACEHUB_API_TOKEN": hf_token, # Use passed token
"modelUseHostPath": model_dir, # Use passed model directory
},
"nodeSelector": {key: value for key, value in node_selector.items()},
}
if with_rerank:
values["teirerank"] = {"nodeSelector": {key: value for key, value in node_selector.items()}}
else:
values["image"] = {"repository": "opea/chatqna-without-rerank"}
default_replicas = [
{"name": "chatqna", "replicaCount": 2},
{"name": "tei", "replicaCount": 1},
{"name": "teirerank", "replicaCount": 1} if with_rerank else None,
{"name": "tgi", "replicaCount": 7 if with_rerank else 8},
{"name": "data-prep", "replicaCount": 1},
{"name": "redis-vector-db", "replicaCount": 1},
{"name": "retriever-usvc", "replicaCount": 2},
]
if num_nodes > 1:
# Scale replicas based on number of nodes
replicas = [
{"name": "chatqna", "replicaCount": 1 * num_nodes},
{"name": "tei", "replicaCount": 1 * num_nodes},
{"name": "teirerank", "replicaCount": 1} if with_rerank else None,
{"name": "tgi", "replicaCount": (8 * num_nodes - 1) if with_rerank else 8 * num_nodes},
{"name": "data-prep", "replicaCount": 1},
{"name": "redis-vector-db", "replicaCount": 1},
{"name": "retriever-usvc", "replicaCount": 1 * num_nodes},
]
else:
replicas = default_replicas
# Remove None values for rerank disabled
replicas = [r for r in replicas if r]
# Update values.yaml with replicas
for replica in replicas:
service_name = replica["name"]
if service_name == "chatqna":
values["replicaCount"] = replica["replicaCount"]
print(replica["replicaCount"])
elif service_name in values:
values[service_name]["replicaCount"] = replica["replicaCount"]
# Prepare resource configurations based on tuning
resources = []
if tune:
resources = [
{
"name": "chatqna",
"resources": {
"limits": {"cpu": "16", "memory": "8000Mi"},
"requests": {"cpu": "16", "memory": "8000Mi"},
},
},
{
"name": "tei",
"resources": {
"limits": {"cpu": "80", "memory": "20000Mi"},
"requests": {"cpu": "80", "memory": "20000Mi"},
},
},
{"name": "teirerank", "resources": {"limits": {"habana.ai/gaudi": 1}}} if with_rerank else None,
{"name": "tgi", "resources": {"limits": {"habana.ai/gaudi": 1}}},
{"name": "retriever-usvc", "resources": {"requests": {"cpu": "8", "memory": "8000Mi"}}},
]
# Filter out any None values directly as part of initialization
resources = [r for r in resources if r is not None]
# Add resources for each service if tuning
for resource in resources:
service_name = resource["name"]
if service_name == "chatqna":
values["resources"] = resource["resources"]
elif service_name in values:
values[service_name]["resources"] = resource["resources"]
# Add extraCmdArgs for tgi service with default values
if "tgi" in values:
values["tgi"]["extraCmdArgs"] = [
"--max-input-length",
"1280",
"--max-total-tokens",
"2048",
"--max-batch-total-tokens",
"65536",
"--max-batch-prefill-tokens",
"4096",
]
yaml_string = yaml.dump(values, default_flow_style=False)
# Determine the mode based on the 'tune' parameter
mode = "tuned" if tune else "oob"
# Determine the filename based on 'with_rerank' and 'num_nodes'
if with_rerank:
filename = f"{mode}-{num_nodes}-gaudi-with-rerank-values.yaml"
else:
filename = f"{mode}-{num_nodes}-gaudi-without-rerank-values.yaml"
# Write the YAML data to the file
with open(filename, "w") as file:
file.write(yaml_string)
# Get the current working directory and construct the file path
current_dir = os.getcwd()
filepath = os.path.join(current_dir, filename)
print(f"YAML file {filepath} has been generated.")
return filepath # Optionally return the file path
# Main execution for standalone use of create_values_yaml
if __name__ == "__main__":
# Example values for standalone execution
with_rerank = True
num_nodes = 2
hftoken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
modeldir = "/mnt/model"
node_selector = {"node-type": "opea-benchmark"}
tune = True
filename = generate_helm_values(with_rerank, num_nodes, hftoken, modeldir, node_selector, tune)
# Read back the generated YAML file for verification
with open(filename, "r") as file:
print("Generated YAML contents:")
print(file.read())

View File

@@ -148,6 +148,8 @@ def align_outputs(self, data, cur_node, inputs, runtime_graph, llm_parameters_di
next_data["inputs"] = prompt
elif self.services[cur_node].service_type == ServiceType.LLM and not llm_parameters_dict["streaming"]:
next_data["text"] = data["choices"][0]["message"]["content"]
else:
next_data = data

View File

@@ -19,7 +19,7 @@ opea_micro_services:
tei-embedding-service:
host: ${TEI_EMBEDDING_SERVICE_IP}
ports: ${TEI_EMBEDDING_SERVICE_PORT}
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
volumes:
- "./data:/data"
runtime: habana
@@ -38,7 +38,7 @@ opea_micro_services:
tgi-service:
host: ${TGI_SERVICE_IP}
ports: ${TGI_SERVICE_PORT}
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
volumes:
- "./data:/data"
runtime: habana

View File

@@ -0,0 +1,68 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import os
from comps import ChatQnAGateway, MicroService, ServiceOrchestrator, ServiceType
MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
EMBEDDING_SERVICE_HOST_IP = os.getenv("EMBEDDING_SERVICE_HOST_IP", "0.0.0.0")
EMBEDDING_SERVICE_PORT = int(os.getenv("EMBEDDING_SERVICE_PORT", 6000))
RETRIEVER_SERVICE_HOST_IP = os.getenv("RETRIEVER_SERVICE_HOST_IP", "0.0.0.0")
RETRIEVER_SERVICE_PORT = int(os.getenv("RETRIEVER_SERVICE_PORT", 7000))
RERANK_SERVICE_HOST_IP = os.getenv("RERANK_SERVICE_HOST_IP", "0.0.0.0")
RERANK_SERVICE_PORT = int(os.getenv("RERANK_SERVICE_PORT", 8000))
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
class ChatQnAService:
def __init__(self, host="0.0.0.0", port=8000):
self.host = host
self.port = port
self.megaservice = ServiceOrchestrator()
def add_remote_service(self):
embedding = MicroService(
name="embedding",
host=EMBEDDING_SERVICE_HOST_IP,
port=EMBEDDING_SERVICE_PORT,
endpoint="/v1/embeddings",
use_remote_service=True,
service_type=ServiceType.EMBEDDING,
)
retriever = MicroService(
name="retriever",
host=RETRIEVER_SERVICE_HOST_IP,
port=RETRIEVER_SERVICE_PORT,
endpoint="/v1/retrieval",
use_remote_service=True,
service_type=ServiceType.RETRIEVER,
)
rerank = MicroService(
name="rerank",
host=RERANK_SERVICE_HOST_IP,
port=RERANK_SERVICE_PORT,
endpoint="/v1/reranking",
use_remote_service=True,
service_type=ServiceType.RERANK,
)
llm = MicroService(
name="llm",
host=LLM_SERVICE_HOST_IP,
port=LLM_SERVICE_PORT,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
)
self.megaservice.add(embedding).add(retriever).add(rerank).add(llm)
self.megaservice.flow_to(embedding, retriever)
self.megaservice.flow_to(retriever, rerank)
self.megaservice.flow_to(rerank, llm)
self.gateway = ChatQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
if __name__ == "__main__":
chatqna = ChatQnAService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
chatqna.add_remote_service()

View File

@@ -0,0 +1,432 @@
# Build and deploy CodeGen Application on AMD GPU (ROCm)
## Build MegaService of ChatQnA on AMD ROCm GPU
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD ROCm GPU platform. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
Quick Start Deployment Steps:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export HOST_IP=${host_ip}
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export CHATQNA_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
```bash
docker compose up -d
```
It will automatically download the docker image on `docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
Prepare and upload test document
```
# download pdf file
wget https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf
# upload pdf file with dataprep
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
Get MegaSerice(backend) response:
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
## 🚀 Build Docker Images
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
### 1. Source Code install GenAIComps
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
### 2. Build Retriever Image
```bash
docker build --no-cache -t opea/retriever-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile .
```
### 3. Build Dataprep Image
```bash
docker build --no-cache -t opea/dataprep-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile .
```
### 4. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/docker
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
cd ../../..
```
### 5. Build UI Docker Image
Construct the frontend Docker image using the command below:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
cd ../../../..
```
### 6. Build React UI Docker Image (Optional)
Construct the frontend Docker image using the command below:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
cd ../../../..
```
### 7. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
```
Then run the command `docker images`, you will have the following 5 Docker Images:
1. `opea/retriever-redis:latest`
2. `opea/dataprep-redis:latest`
3. `opea/chatqna:latest`
4. `opea/chatqna-ui:latest` or `opea/chatqna-react-ui:latest`
5. `opea/nginx:latest`
## 🚀 Start MicroServices and MegaService
### Required Models
By default, the embedding, reranking and LLM models are set to a default value as listed below:
| Service | Model |
| --------- | ------------------------- |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| LLM | Intel/neural-chat-7b-v3-3 |
Change the `xxx_MODEL_ID` below for your needs.
### Setup Environment Variables
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export CHATQNA_HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export HOST_IP=${host_ip}
export NGINX_PORT=${your_nginx_port}
export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base"
export CHATQNA_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export CHATQNA_TGI_SERVICE_PORT=8008
export CHATQNA_TEI_EMBEDDING_PORT=8090
export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}"
export CHATQNA_TEI_RERANKING_PORT=8808
export CHATQNA_REDIS_VECTOR_PORT=16379
export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001
export CHATQNA_REDIS_DATAPREP_PORT=6007
export CHATQNA_REDIS_RETRIEVER_PORT=7000
export CHATQNA_INDEX_NAME="rag-redis"
export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna"
export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep"
export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get_file"
export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete_file"
export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP}
export CHATQNA_FRONTEND_SERVICE_PORT=5173
export CHATQNA_BACKEND_SERVICE_NAME=chatqna
export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP}
export CHATQNA_BACKEND_SERVICE_PORT=8888
export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}"
export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_NGINX_PORT=5176
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) into tgi-service in compose.yaml file
Example for set isolation for 1 GPU
```
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
```
Example for set isolation for 2 GPUs
```
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
- /dev/dri/card1:/dev/dri/card1
- /dev/dri/renderD129:/dev/dri/renderD129
```
Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
4. Set up other environment variables:
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
```bash
cd GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
docker compose up -d
```
### Validate MicroServices and MegaService
1. TEI Embedding Service
```bash
curl ${host_ip}:8090/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. Retriever Microservice
To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector
is determined by the embedding model.
Here we use the model `EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"`, which vector size is 768.
Check the vecotor dimension of your embedding model, set `your_embedding` dimension equals to it.
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
3. TEI Reranking Service
```bash
curl http://${host_ip}:8808/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
4. TGI Service
In first startup, this service will take more time to download the model files. After it's finished, the service will be ready.
Try the command below to check whether the TGI service is ready.
```bash
docker logs ${CONTAINER_ID} | grep Connected
```
If the service is ready, you will get the response like below.
```
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
Then try the `cURL` command below to validate TGI.
```bash
curl http://${host_ip}:8008/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
5. MegaService
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
6. Nginx Service
```bash
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
7. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File Upload:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Add Knowledge Base via HTTP Links:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
Also, you are able to get the file list that you uploaded:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
To delete the file/link you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
## 🚀 Launch the UI
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-ui-server:
image: opea/chatqna-ui:latest
...
ports:
- "80:5173"
```
### Launch with Nginx
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-ui-server` service with the `chatqna-react-ui-server` service as per the config below:
```yaml
chatqna-react-ui-server:
image: opea/chatqna-react-ui:latest
container_name: chatqna-react-ui-server
environment:
- APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT}
ports:
- "5174:80"
depends_on:
- chaqna-backend-server
ipc: host
restart: always
```
Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-react-ui-server:
image: opea/chatqna-react-ui:latest
...
ports:
- "80:80"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../../assets/img/conversation_ui_response.png)

View File

@@ -0,0 +1,183 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
services:
chatqna-redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
ports:
- "${CHATQNA_REDIS_VECTOR_PORT}:6379"
- "${CHATQNA_REDIS_VECTOR_INSIGHT_PORT}:8001"
chatqna-dataprep-redis-service:
image: ${REGISTRY:-opea}/dataprep-redis:${TAG:-latest}
container_name: dataprep-redis-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
ports:
- "${CHATQNA_REDIS_DATAPREP_PORT}:6007"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${CHATQNA_REDIS_URL}
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
chatqna-tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: chatqna-tei-embedding-server
ports:
- "${CHATQNA_TEI_EMBEDDING_PORT}:80"
volumes:
- "/var/opea/chatqna-service/data:/data"
shm_size: 1g
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
command: --model-id ${CHATQNA_EMBEDDING_MODEL_ID} --auto-truncate
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card1:/dev/dri/card1
- /dev/dri/renderD136:/dev/dri/renderD136
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
chatqna-retriever:
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
container_name: chatqna-retriever-redis-server
depends_on:
- chatqna-redis-vector-db
ports:
- "${CHATQNA_REDIS_RETRIEVER_PORT}:7000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${CHATQNA_REDIS_URL}
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_EMBEDDING_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
restart: unless-stopped
chatqna-tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: chatqna-tei-reranking-server
ports:
- "${CHATQNA_TEI_RERANKING_PORT}:80"
volumes:
- "/var/opea/chatqna-service/data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
command: --model-id ${CHATQNA_RERANK_MODEL_ID} --auto-truncate
chatqna-tgi-service:
image: ${CHATQNA_TGI_SERVICE_IMAGE}
container_name: chatqna-tgi-server
ports:
- "${CHATQNA_TGI_SERVICE_PORT}:80"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
volumes:
- "/var/opea/chatqna-service/data:/data"
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
command: --model-id ${CHATQNA_LLM_MODEL_ID}
ipc: host
chatqna-backend-server:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-backend-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-tgi-service
ports:
- "${CHATQNA_BACKEND_SERVICE_PORT}:8888"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${CHATQNA_MEGA_SERVICE_HOST_IP}
- EMBEDDING_SERVER_HOST_IP=${HOST_IP}
- EMBEDDING_SERVER_PORT=${CHATQNA_TEI_EMBEDDING_PORT:-80}
- RETRIEVER_SERVICE_HOST_IP=${HOST_IP}
- RERANK_SERVER_HOST_IP=${HOST_IP}
- RERANK_SERVER_PORT=${CHATQNA_TEI_RERANKING_PORT:-80}
- LLM_SERVER_HOST_IP=${HOST_IP}
- LLM_SERVER_PORT=${CHATQNA_TGI_SERVICE_PORT:-80}
- LLM_MODEL=${CHATQNA_LLM_MODEL_ID}
ipc: host
restart: always
chatqna-ui-server:
image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest}
container_name: chatqna-ui-server
depends_on:
- chatqna-backend-server
ports:
- "${CHATQNA_FRONTEND_SERVICE_PORT}:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- CHAT_BASE_URL=${CHATQNA_BACKEND_SERVICE_ENDPOINT}
- UPLOAD_FILE_BASE_URL=${CHATQNA_DATAPREP_SERVICE_ENDPOINT}
- GET_FILE=${CHATQNA_DATAPREP_GET_FILE_ENDPOINT}
- DELETE_FILE=${CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT}
ipc: host
restart: always
chatqna-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-nginx-server
depends_on:
- chatqna-backend-server
- chatqna-ui-server
ports:
- "${CHATQNA_NGINX_PORT}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=${CHATQNA_FRONTEND_SERVICE_IP}
- FRONTEND_SERVICE_PORT=${CHATQNA_FRONTEND_SERVICE_PORT}
- BACKEND_SERVICE_NAME=${CHATQNA_BACKEND_SERVICE_NAME}
- BACKEND_SERVICE_IP=${CHATQNA_BACKEND_SERVICE_IP}
- BACKEND_SERVICE_PORT=${CHATQNA_BACKEND_SERVICE_PORT}
ipc: host
restart: always
networks:
default:
driver: bridge

View File

@@ -0,0 +1,34 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base"
export CHATQNA_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export CHATQNA_TGI_SERVICE_PORT=18008
export CHATQNA_TEI_EMBEDDING_PORT=18090
export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}"
export CHATQNA_TEI_RERANKING_PORT=18808
export CHATQNA_REDIS_VECTOR_PORT=16379
export CHATQNA_REDIS_VECTOR_INSIGHT_PORT=8001
export CHATQNA_REDIS_DATAPREP_PORT=6007
export CHATQNA_REDIS_RETRIEVER_PORT=7000
export CHATQNA_INDEX_NAME="rag-redis"
export CHATQNA_MEGA_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_RETRIEVER_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_BACKEND_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna"
export CHATQNA_DATAPREP_SERVICE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep"
export CHATQNA_DATAPREP_GET_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/get_file"
export CHATQNA_DATAPREP_DELETE_FILE_ENDPOINT="http://127.0.0.1:${CHATQNA_REDIS_DATAPREP_PORT}/v1/dataprep/delete_file"
export CHATQNA_FRONTEND_SERVICE_IP=${HOST_IP}
export CHATQNA_FRONTEND_SERVICE_PORT=15173
export CHATQNA_BACKEND_SERVICE_NAME=chatqna
export CHATQNA_BACKEND_SERVICE_IP=${HOST_IP}
export CHATQNA_BACKEND_SERVICE_PORT=18888
export CHATQNA_REDIS_URL="redis://${HOST_IP}:${CHATQNA_REDIS_VECTOR_PORT}"
export CHATQNA_EMBEDDING_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_RERANK_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_LLM_SERVICE_HOST_IP=${HOST_IP}
export CHATQNA_NGINX_PORT=15176

View File

@@ -3,6 +3,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
if [ -z "${your_hf_api_token}" ]; then
echo "Error: HUGGINGFACEHUB_API_TOKEN is not set. Please set your_hf_api_token."

View File

@@ -26,7 +26,6 @@ To set up environment variables for deploying ChatQnA services, follow these ste
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm_service
```
@@ -324,17 +323,17 @@ For details on how to verify the correctness of the response, refer to [how-to-v
```bash
# TGI service
curl http://${host_ip}:9009/generate \
curl http://${host_ip}:9009/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```
```bash
# vLLM Service
curl http://${host_ip}:9009/v1/completions \
curl http://${host_ip}:9009/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
```
5. MegaService
@@ -433,6 +432,66 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-H "Content-Type: application/json"
```
### Profile Microservices
To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
#### 1. vLLM backend Service
Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
##### Start vLLM profiling
```bash
curl http://${host_ip}:9009/start_profile \
-H "Content-Type: application/json" \
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
```
Users would see below docker logs from vllm-service if profiling is started correctly.
```bash
INFO api_server.py:361] Starting profiler...
INFO api_server.py:363] Profiler started.
INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
```
After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
or ChatQnA MicroService.
##### Stop vLLM profiling
By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
under /mnt folder in vllm-service docker instance.
```bash
# vLLM Service
curl http://${host_ip}:9009/stop_profile \
-H "Content-Type: application/json" \
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
```
Users would see below docker logs from vllm-service if profiling is stopped correctly.
```bash
INFO api_server.py:368] Stopping profiler...
INFO api_server.py:370] Profiler stopped.
INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
```
After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
```bash
docker cp vllm-service:/mnt/ .
```
##### Check profiling result
Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
to see the vLLM profiling result as below diagram.
![image](https://github.com/user-attachments/assets/55c7097e-5574-41dc-97a7-5e87c31bc286)
## 🚀 Launch the UI
### Launch with origin port

View File

@@ -0,0 +1,382 @@
# Build Mega Service of ChatQnA (with Pinecone) on Xeon
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
## 🚀 Apply Xeon Server on AWS
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
**Certain ports in the EC2 instance need to opened up in the security group, for the microservices to work with the curl commands**
> See one example below. Please open up these ports in the EC2 instance based on the IP addresses you want to allow
```
data_prep_service
=====================
Port 6007 - Open to 0.0.0.0/0
Port 6008 - Open to 0.0.0.0/0
tei_embedding_service
=====================
Port 6006 - Open to 0.0.0.0/0
embedding
=========
Port 6000 - Open to 0.0.0.0/0
retriever
=========
Port 7000 - Open to 0.0.0.0/0
tei_xeon_service
================
Port 8808 - Open to 0.0.0.0/0
reranking
=========
Port 8000 - Open to 0.0.0.0/0
tgi-service
===========
Port 9009 - Open to 0.0.0.0/0
llm
===
Port 9000 - Open to 0.0.0.0/0
chaqna-xeon-backend-server
==========================
Port 8888 - Open to 0.0.0.0/0
chaqna-xeon-ui-server
=====================
Port 5173 - Open to 0.0.0.0/0
```
## 🚀 Build Docker Images
First of all, you need to build Docker Images locally and install the python package of it.
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
### 1. Build Embedding Image
```bash
docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile .
```
### 2. Build Retriever Image
```bash
docker build --no-cache -t opea/retriever-pinecone:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/pinecone/langchain/Dockerfile .
```
### 3. Build Rerank Image
```bash
docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
```
### 4. Build LLM Image
```bash
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
```
### 5. Build Dataprep Image
```bash
docker build --no-cache -t opea/dataprep-pinecone:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/pinecone/langchain/Dockerfile .
cd ..
```
### 6. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/docker
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
cd ../../..
```
### 7. Build UI Docker Image
Build frontend Docker image via below command:
```bash
cd GenAIExamples/ChatQnA/docker/ui/
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
cd ../../../..
```
### 8. Build Conversational React UI Docker Image (Optional)
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
cd GenAIExamples/ChatQnA/docker/ui/
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get_file"
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg DATAPREP_SERVICE_ENDPOINT=$DATAPREP_SERVICE_ENDPOINT --build-arg DATAPREP_GET_FILE_ENDPOINT=$DATAPREP_GET_FILE_ENDPOINT -f ./docker/Dockerfile.react .
cd ../../../..
```
Then run the command `docker images`, you will have the following 7 Docker Images:
1. `opea/dataprep-pinecone:latest`
2. `opea/embedding-tei:latest`
3. `opea/retriever-pinecone:latest`
4. `opea/reranking-tei:latest`
5. `opea/llm-tgi:latest`
6. `opea/chatqna:latest`
7. `opea/chatqna-ui:latest`
## 🚀 Start Microservices
### Setup Environment Variables
Since the `compose_pinecone.yaml` will consume some environment variables, you need to setup them in advance as below.
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
> Change the External_Public_IP below with the actual IPV4 value
```
export host_ip="External_Public_IP"
```
**Export the value of your Huggingface API token to the `your_hf_api_token` environment variable**
> Change the Your_Huggingface_API_Token below with tyour actual Huggingface API Token value
```
export your_hf_api_token="Your_Huggingface_API_Token"
```
**Append the value of the public IP address to the no_proxy list**
```
export your_no_proxy=${your_no_proxy},"External_Public_IP"
```
\*\*Get the PINECONE_API_KEY and the INDEX_NAME
```
export pinecone_api_key=${api_key}
export pinecone_index_name=${pinecone_index}
```
```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export TGI_LLM_ENDPOINT="http://${host_ip}:9009"
export PINECONE_API_KEY=${pinecone_api_key}
export PINECONE_INDEX_NAME=${pinecone_index_name}
export INDEX_NAME=${pinecone_index_name}
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6008/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6009/v1/dataprep/delete_file"
```
Note: Please replace with `host_ip` with you external IP address, do not use localhost.
### Start all the services Docker Containers
> Before running the docker compose command, you need to be in the folder that has the docker compose yaml file
```bash
cd GenAIExamples/ChatQnA/docker/xeon/
docker compose -f compose_pinecone.yaml up -d
```
### Validate Microservices
1. TEI Embedding Service
```bash
curl ${host_ip}:6006/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. Embedding Microservice
```bash
curl http://${host_ip}:6000/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```
3. Retriever Microservice
To validate the retriever microservice, you need to generate a mock embedding vector of length 768 in Python script:
```Python
import random
embedding = [random.uniform(-1, 1) for _ in range(768)]
print(embedding)
```
Then substitute your mock embedding vector for the `${your_embedding}` in the following cURL command:
```bash
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d '{"text":"What is the revenue of Nike in 2023?","embedding":"'"${your_embedding}"'"}' \
-H 'Content-Type: application/json'
```
4. TEI Reranking Service
```bash
curl http://${host_ip}:8808/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
5. Reranking Microservice
```bash
curl http://${host_ip}:8000/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```
6. TGI Service
```bash
curl http://${host_ip}:9009/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
7. LLM Microservice
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
8. MegaService
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
9. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File Upload:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Add Knowledge Base via HTTP Links:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
Also, you are able to get the file list that you uploaded:
```bash
curl -X POST "http://${host_ip}:6008/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
## Enable LangSmith for Monotoring Application (Optional)
LangSmith offers tools to debug, evaluate, and monitor language models and intelligent agents. It can be used to assess benchmark data for each microservice. Before launching your services with `docker compose -f compose_pinecone.yaml up -d`, you need to enable LangSmith tracing by setting the `LANGCHAIN_TRACING_V2` environment variable to true and configuring your LangChain API key.
Here's how you can do it:
1. Install the latest version of LangSmith:
```bash
pip install -U langsmith
```
2. Set the necessary environment variables:
```bash
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=ls_...
```
## 🚀 Launch the UI
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-gaudi-ui-server:
image: opea/chatqna-ui:latest
...
ports:
- "80:5173"
```
## 🚀 Launch the Conversational UI (react)
To access the Conversational UI frontend, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-xeon-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
...
ports:
- "80:80"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../../assets/img/conversation_ui_response.png)

View File

@@ -252,9 +252,9 @@ For details on how to verify the correctness of the response, refer to [how-to-v
Then try the `cURL` command below to validate TGI.
```bash
curl http://${host_ip}:6042/generate \
curl http://${host_ip}:6042/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```

View File

@@ -0,0 +1,151 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
version: "3.8"
services:
dataprep-pinecone-service:
image: ${REGISTRY:-opea}/dataprep-pinecone:${TAG:-latest}
container_name: dataprep-pinecone-server
depends_on:
- tei-embedding-service
ports:
- "6007:6007"
- "6008:6008"
- "6009:6009"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
PINECONE_API_KEY: ${PINECONE_API_KEY}
PINECONE_INDEX_NAME: ${PINECONE_INDEX_NAME}
TEI_EMBEDDING_ENDPOINT: http://tei-embedding-service:80
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
retriever:
image: ${REGISTRY:-opea}/retriever-pinecone:${TAG:-latest}
container_name: retriever-pinecone-server
ports:
- "7000:7000"
ipc: host
environment:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
PINECONE_API_KEY: ${PINECONE_API_KEY}
INDEX_NAME: ${PINECONE_INDEX_NAME}
PINECONE_INDEX_NAME: ${PINECONE_INDEX_NAME}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
TEI_EMBEDDING_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "9009:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
chatqna-xeon-backend-server:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-xeon-backend-server
depends_on:
- tei-embedding-service
- dataprep-pinecone-service
- retriever
- tei-reranking-service
- tgi-service
ports:
- "8888:8888"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=chatqna-xeon-backend-server
- EMBEDDING_SERVER_HOST_IP=tei-embedding-service
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-80}
- RETRIEVER_SERVICE_HOST_IP=retriever
- RERANK_SERVER_HOST_IP=tei-reranking-service
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-80}
- LLM_SERVER_HOST_IP=tgi-service
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-80}
- LOGFLAG=${LOGFLAG}
- LLM_MODEL=${LLM_MODEL_ID}
ipc: host
restart: always
chatqna-xeon-ui-server:
image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest}
container_name: chatqna-xeon-ui-server
depends_on:
- chatqna-xeon-backend-server
ports:
- "5173:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
ipc: host
restart: always
chatqna-xeon-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chatqna-xeon-nginx-server
depends_on:
- chatqna-xeon-backend-server
- chatqna-xeon-ui-server
ports:
- "${NGINX_PORT:-80}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=chatqna-xeon-ui-server
- FRONTEND_SERVICE_PORT=5173
- BACKEND_SERVICE_NAME=chatqna
- BACKEND_SERVICE_IP=chatqna-xeon-backend-server
- BACKEND_SERVICE_PORT=8888
- DATAPREP_SERVICE_IP=dataprep-pinecone-service
- DATAPREP_SERVICE_PORT=6007
ipc: host
restart: always
networks:
default:
driver: bridge

View File

@@ -86,6 +86,7 @@ services:
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LLM_MODEL_ID: ${LLM_MODEL_ID}
VLLM_TORCH_PROFILER_DIR: "/mnt"
command: --model $LLM_MODEL_ID --host 0.0.0.0 --port 80
chatqna-xeon-backend-server:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}

View File

@@ -3,6 +3,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"

View File

@@ -192,7 +192,7 @@ For users in China who are unable to download models directly from Huggingface,
export HF_TOKEN=${your_hf_token}
export HF_ENDPOINT="https://hf-mirror.com"
model_name="Intel/neural-chat-7b-v3-3"
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id $model_name --max-input-tokens 1024 --max-total-tokens 2048
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id $model_name --max-input-tokens 1024 --max-total-tokens 2048
```
2. Offline
@@ -206,7 +206,7 @@ For users in China who are unable to download models directly from Huggingface,
```bash
export HF_TOKEN=${your_hf_token}
export model_path="/path/to/model"
docker run -p 8008:80 -v $model_path:/data --name tgi_service --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id /data --max-input-tokens 1024 --max-total-tokens 2048
docker run -p 8008:80 -v $model_path:/data --name tgi_service --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id /data --max-input-tokens 1024 --max-total-tokens 2048
```
### Setup Environment Variables
@@ -326,23 +326,18 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
Then try the `cURL` command below to validate services.
```bash
#TGI Service
curl http://${host_ip}:8005/generate \
# TGI service
curl http://${host_ip}:9009/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
-d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```
```bash
#vLLM Service
curl http://${host_ip}:8007/v1/completions \
# vLLM Service
curl http://${host_ip}:9009/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "${LLM_MODEL_ID}",
"prompt": "What is Deep Learning?",
"max_tokens": 32,
"temperature": 0
}'
-d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
```
5. MegaService
@@ -439,6 +434,68 @@ curl http://${host_ip}:9090/v1/guardrails\
-H 'Content-Type: application/json'
```
### Profile Microservices
To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
#### 1. vLLM backend Service
Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
##### Start vLLM profiling
```bash
curl http://${host_ip}:9009/start_profile \
-H "Content-Type: application/json" \
-d '{"model": ${LLM_MODEL_ID}}'
```
Users would see below docker logs from vllm-service if profiling is started correctly.
```bash
INFO api_server.py:361] Starting profiler...
INFO api_server.py:363] Profiler started.
INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
```
After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
or ChatQnA MicroService.
##### Stop vLLM profiling
By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
under /mnt folder in vllm-service docker instance.
```bash
# vLLM Service
curl http://${host_ip}:9009/stop_profile \
-H "Content-Type: application/json" \
-d '{"model": ${LLM_MODEL_ID}}'
```
Users would see below docker logs from vllm-service if profiling is stopped correctly.
```bash
INFO api_server.py:368] Stopping profiler...
INFO api_server.py:370] Profiler stopped.
INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
```
After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
```bash
docker cp vllm-service:/mnt/ .
```
##### Check profiling result
Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
to see the vLLM profiling result as below diagram.
![image](https://github.com/user-attachments/assets/487c52c8-d187-46dc-ab3a-43f21d657d41)
![image](https://github.com/user-attachments/assets/e3c51ce5-d704-4eb7-805e-0d88b0c158e3)
## 🚀 Launch the UI
### Launch with origin port

View File

@@ -57,7 +57,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
container_name: tei-reranking-gaudi-server
ports:
- "8808:80"
@@ -78,7 +78,7 @@ services:
MAX_WARMUP_SEQUENCE_LENGTH: 512
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "8005:80"

View File

@@ -26,7 +26,7 @@ services:
TEI_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tgi-guardrails-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-guardrails-server
ports:
- "8088:80"
@@ -96,7 +96,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
container_name: tei-reranking-gaudi-server
ports:
- "8808:80"
@@ -117,7 +117,7 @@ services:
MAX_WARMUP_SEQUENCE_LENGTH: 512
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "8008:80"

View File

@@ -57,7 +57,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:1.5.0
container_name: tei-reranking-gaudi-server
ports:
- "8808:80"
@@ -78,7 +78,7 @@ services:
MAX_WARMUP_SEQUENCE_LENGTH: 512
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
vllm-service:
image: ${REGISTRY:-opea}/vllm-hpu:${TAG:-latest}
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
container_name: vllm-gaudi-server
ports:
- "8007:80"
@@ -92,6 +92,7 @@ services:
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
LLM_MODEL_ID: ${LLM_MODEL_ID}
VLLM_TORCH_PROFILER_DIR: "/mnt"
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -57,7 +57,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "8005:80"

View File

@@ -48,16 +48,16 @@ f810f3b4d329 opea/embedding-tei:latest "python e
2fa17d84605f opea/dataprep-redis:latest "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
69e1fb59e92c opea/retriever-redis:latest "/home/user/comps/re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
313b9d14928a opea/reranking-tei:latest "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:latest "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
74084469aa33 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
88399dbc9e43 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server
```
In this case, `ghcr.io/huggingface/tgi-gaudi:2.0.5` Existed.
In this case, `ghcr.io/huggingface/tgi-gaudi:2.0.6` Existed.
```
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
```
Next we can check the container logs to get to know what happened during the docker start.
@@ -68,7 +68,7 @@ Check the log of container by:
`docker logs <CONTAINER ID> -t`
View the logs of `ghcr.io/huggingface/tgi-gaudi:2.0.5`
View the logs of `ghcr.io/huggingface/tgi-gaudi:2.0.6`
`docker logs 05c40b636239 -t`
@@ -97,7 +97,7 @@ So just make sure the devices are available.
Here is another failure example:
```
f7a08f9867f9 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 16 seconds ago Exited (2) 14 seconds ago tgi-gaudi-server
f7a08f9867f9 ghcr.io/huggingface/tgi-gaudi:2.0.6 "text-generation-lau…" 16 seconds ago Exited (2) 14 seconds ago tgi-gaudi-server
```
Check the log by `docker logs f7a08f9867f9 -t`.
@@ -114,7 +114,7 @@ View the docker input parameters in `./ChatQnA/docker_compose/intel/hpu/gaudi/co
```
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "8008:80"

View File

@@ -2,6 +2,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"

View File

@@ -5,8 +5,9 @@ This document outlines the deployment process for a ChatQnA application utilizin
Quick Start Deployment Steps:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
2. Modify the TEI Docker Image for Reranking
3. Run Docker Compose.
4. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
@@ -35,7 +36,30 @@ To set up environment variables for deploying ChatQnA services, follow these ste
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
## Quick Start: 2.Modify the TEI Docker Image for Reranking
> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```
## Quick Start: 3.Run Docker Compose
```bash
docker compose up -d
@@ -56,7 +80,7 @@ In following cases, you could build docker image from source by yourself.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
## QuickStart: 4.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
@@ -176,6 +200,29 @@ Change the `xxx_MODEL_ID` below for your needs.
source ./set_env.sh
```
### Modify the TEI Docker Image for Reranking
> **Note:**
> The default Docker image for the `tei-reranking-service` in `compose.yaml` is built for A100 and A30 backend with compute capacity 8.0. If you are using A100/A30, skip this step. For other GPU architectures, please modify the `image` with specific tag for `tei-reranking-service` based on the following table with target CUDA compute capacity.
| GPU Arch | GPU | Compute Capacity | Image |
| ------------ | ------------------------------------------ | ---------------- | -------------------------------------------------------- |
| Volta | V100 | 7.0 | NOT SUPPORTED |
| Turing | T4, GeForce RTX 2000 Series | 7.5 | ghcr.io/huggingface/text-embeddings-inference:turing-1.5 |
| Ampere 80 | A100, A30 | 8.0 | ghcr.io/huggingface/text-embeddings-inference:1.5 |
| Ampere 86 | A40, A10, A16, A2, GeForce RTX 3000 Series | 8.6 | ghcr.io/huggingface/text-embeddings-inference:86-1.5 |
| Ada Lovelace | L40S, L40, L4, GeForce RTX 4000 Series | 8.9 | ghcr.io/huggingface/text-embeddings-inference:89-1.5 |
| Hopper | H100 | 9.0 | ghcr.io/huggingface/text-embeddings-inference:hopper-1.5 |
For instance, if Hopper arch GPU (such as H100/H100 NVL) is the target backend:
```
# vim compose.yaml
tei-reranking-service:
#image: ghcr.io/huggingface/text-embeddings-inference:1.5
image: ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
```
### Start all the services Docker Containers
```bash
@@ -238,9 +285,9 @@ docker compose up -d
Then try the `cURL` command below to validate TGI.
```bash
curl http://${host_ip}:8008/generate \
curl http://${host_ip}:9009/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```

View File

@@ -11,6 +11,12 @@ services:
context: ../
dockerfile: ./Dockerfile
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
chatqna-wrapper:
build:
context: ../
dockerfile: ./Dockerfile.wrapper
extends: chatqna
image: ${REGISTRY:-opea}/chatqna-wrapper:${TAG:-latest}
chatqna-guardrails:
build:
context: ../
@@ -53,6 +59,12 @@ services:
dockerfile: comps/retrievers/qdrant/haystack/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/retriever-qdrant:${TAG:-latest}
retriever-pinecone:
build:
context: GenAIComps
dockerfile: comps/retrievers/pinecone/langchain/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/retriever-pinecone:${TAG:-latest}
reranking-tei:
build:
context: GenAIComps
@@ -89,6 +101,12 @@ services:
dockerfile: comps/dataprep/qdrant/langchain/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/dataprep-qdrant:${TAG:-latest}
dataprep-pinecone:
build:
context: GenAIComps
dockerfile: comps/dataprep/pinecone/langchain/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/dataprep-pinecone:${TAG:-latest}
guardrails-tgi:
build:
context: GenAIComps
@@ -101,12 +119,12 @@ services:
dockerfile: Dockerfile.cpu
extends: chatqna
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
vllm-hpu:
vllm-gaudi:
build:
context: vllm-fork
dockerfile: Dockerfile.hpu
extends: chatqna
image: ${REGISTRY:-opea}/vllm-hpu:${TAG:-latest}
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
nginx:
build:
context: GenAIComps

View File

@@ -15,7 +15,7 @@
```
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
kubectl apply -f chatqna.yaml
```
@@ -35,10 +35,55 @@ kubectl apply -f chatqna_bf16.yaml
```
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
kubectl apply -f chatqna.yaml
```
## Deploy on Xeon with Remote LLM Model
```
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-remote-inference.yaml
sed -i "s|insert-your-remote-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-remote-inference.yaml
```
### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config" ConfigMap
### Deploy
```
kubectl apply -f chatqna-remote-inference.yaml
```
## Deploy on Gaudi with TEI, Rerank, and vLLM Models Running Remotely
```
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
export TEI_EMBEDDING_ENDPOINT="Your Remote TEI Embedding Endpoint"
export TEI_RERANKING_ENDPOINT="Your Remote Reranking Endpoint"
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-vllm-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-embedding-endpoint|${TEI_EMBEDDING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-reranking-endpoint|${TEI_RERANKING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
```
### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config", "chatqna-data-prep-config", "chatqna-embedding-usvc-config", "chatqna-reranking-usvc-config", "chatqna-retriever-usvc-config" ConfigMaps
### Deploy
```
kubectl apply -f chatqna-vllm-remote-inference.yaml
```
## Verify Services
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.

View File

@@ -24,8 +24,9 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tei-embedding-service: ghcr.io/huggingface/tei-gaudi:latest
- tgi-service: gghcr.io/huggingface/tgi-gaudi:2.0.5
tei-embedding-service: ghcr.io/huggingface/tei-gaudi:1.5.0
tgi-service: gghcr.io/huggingface/tgi-gaudi:2.0.6
> [NOTE]
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/hpu/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.

View File

@@ -31,7 +31,7 @@ docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# de088666cef2 gcr.io/k8s-minikube/kicbase:v0.0.45 "/usr/local/bin/entr…" 2 days ago Up 2 days 127.0.0.1:49157->22/tcp... minikube
```
6. Deploy the ChatQnA application with `minikube apply -f chatqna.yaml`, check that the opea pods are in a running state with `kubectl get pods`
6. Deploy the ChatQnA application with `kubectl apply -f chatqna.yaml`, check that the opea pods are in a running state with `kubectl get pods`
```bash
kubectl get pods
# NAME READY STATUS RESTARTS AGE

View File

@@ -554,7 +554,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -612,7 +612,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -687,7 +687,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -762,7 +762,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/guardrails-tgi:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: guardrails-usvc
containerPort: 9090
@@ -840,7 +840,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -919,7 +919,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -1010,7 +1010,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -1101,7 +1101,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1181,7 +1181,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1273,7 +1273,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna-guardrails:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp
name: tmp
@@ -1314,7 +1314,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

File diff suppressed because it is too large Load Diff

View File

@@ -454,7 +454,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -512,7 +512,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -587,7 +587,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -662,7 +662,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -741,7 +741,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -832,7 +832,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -923,7 +923,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1011,7 +1011,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp
name: tmp
@@ -1052,7 +1052,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

View File

@@ -455,7 +455,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -513,7 +513,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -588,7 +588,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -663,7 +663,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -742,7 +742,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -833,7 +833,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -926,7 +926,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1014,7 +1014,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp
name: tmp
@@ -1055,7 +1055,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

View File

@@ -556,7 +556,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -614,7 +614,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -692,7 +692,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/guardrails-tgi:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: guardrails-usvc
containerPort: 9090
@@ -767,7 +767,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -842,7 +842,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -920,7 +920,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tei-gaudi:latest"
image: "ghcr.io/huggingface/tei-gaudi:1.5.0"
imagePullPolicy: IfNotPresent
args:
- "--auto-truncate"
@@ -1013,7 +1013,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -1103,8 +1103,8 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
image: "ghcr.io/huggingface/tgi-gaudi:2.0.6"
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1184,8 +1184,8 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
image: "ghcr.io/huggingface/tgi-gaudi:2.0.6"
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1278,7 +1278,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna-guardrails:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp
name: tmp
@@ -1319,7 +1319,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

File diff suppressed because it is too large Load Diff

View File

@@ -43,6 +43,7 @@ metadata:
app.kubernetes.io/managed-by: Helm
data:
TEI_EMBEDDING_ENDPOINT: "http://chatqna-tei"
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
http_proxy: ""
https_proxy: ""
no_proxy: ""
@@ -70,9 +71,8 @@ data:
no_proxy: ""
LOGFLAG: ""
vLLM_ENDPOINT: "http://chatqna-vllm"
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
LLM_MODEL: "meta-llama/Llama-3.1-70B-Instruct"
MODEL_ID: "meta-llama/Llama-3.1-70B-Instruct"
LLM_MODEL: "meta-llama/Meta-Llama-3-8B-Instruct"
MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
---
# Source: chatqna/charts/reranking-usvc/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
@@ -145,7 +145,6 @@ data:
NUMBA_CACHE_DIR: "/tmp"
TRANSFORMERS_CACHE: "/tmp/transformers_cache"
HF_HOME: "/tmp/.cache/huggingface"
MAX_WARMUP_SEQUENCE_LENGTH: "512"
---
# Source: chatqna/charts/teirerank/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
@@ -170,6 +169,7 @@ data:
NUMBA_CACHE_DIR: "/tmp"
TRANSFORMERS_CACHE: "/tmp/transformers_cache"
HF_HOME: "/tmp/.cache/huggingface"
MAX_WARMUP_SEQUENCE_LENGTH: "512"
---
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -183,7 +183,7 @@ metadata:
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "2.1.0"
data:
MODEL_ID: "meta-llama/Llama-3.1-70B-Instruct"
MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
PORT: "2080"
HF_TOKEN: "insert-your-huggingface-token-here"
http_proxy: ""
@@ -194,6 +194,12 @@ data:
PT_HPU_ENABLE_LAZY_COLLECTIVES: "true"
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
HF_HOME: "/tmp/.cache/huggingface"
GPU_MEMORY_UTILIZATION: "0.5"
DTYPE: "auto"
TENSOR_PARALLEL_SIZE: "1"
BLOCK_SIZE: "128"
MAX_NUM_SEQS: "256"
MAX_SEQ_LEN_TO_CAPTURE: "2048"
---
# Source: chatqna/templates/nginx-deployment.yaml
apiVersion: v1
@@ -592,7 +598,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -649,8 +655,8 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:v0.9"
imagePullPolicy: IfNotPresent
image: "opea/dataprep-redis:latest"
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -728,7 +734,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/embedding-tei:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: embedding-usvc
containerPort: 6000
@@ -806,7 +812,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/llm-vllm:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: llm-uservice
containerPort: 9000
@@ -881,7 +887,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -956,7 +962,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/reranking-tei:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: reranking-usvc
containerPort: 8000
@@ -1034,7 +1040,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -1103,10 +1109,8 @@ spec:
- configMapRef:
name: chatqna-tei-config
securityContext:
privileged: true
capabilities:
add: ["SYS_NICE"]
image: "ghcr.io/huggingface/tei-gaudi:latest"
{}
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
args:
- "--auto-truncate"
@@ -1140,16 +1144,8 @@ spec:
initialDelaySeconds: 5
periodSeconds: 5
resources:
limits:
habana.ai/gaudi: 1
cpu: 10
memory: 100Gi
hugepages-2Mi: 9800Mi
requests:
habana.ai/gaudi: 1
cpu: 10
memory: 100Gi
hugepages-2Mi: 9800Mi
{}
volumes:
- name: model-volume # Replace with Persistent volume claim/ host directory
emptyDir: {}
@@ -1191,11 +1187,17 @@ spec:
- configMapRef:
name: chatqna-teirerank-config
securityContext:
{}
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tei-gaudi:1.5.0"
imagePullPolicy: IfNotPresent
args:
- "--auto-truncate"
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1228,7 +1230,8 @@ spec:
initialDelaySeconds: 5
periodSeconds: 5
resources:
{}
limits:
habana.ai/gaudi: 1
volumes:
- name: model-volume # Replace with Persistent volume claim/ host directory
emptyDir: {}
@@ -1242,6 +1245,7 @@ spec:
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: apps/v1
kind: Deployment
metadata:
@@ -1271,17 +1275,37 @@ spec:
- configMapRef:
name: chatqna-vllm-config
securityContext:
privileged: true
allowPrivilegeEscalation: false
capabilities:
add: ["SYS_NICE"]
image: "opea/llm-vllm-hpu:latest"
command:
- /bin/bash
- -c
- |
export VLLM_CPU_KVCACHE_SPACE=40 && \
python3 -m vllm.entrypoints.openai.api_server --enforce-eager --gpu-memory-utilization 0.5 --dtype auto --model $MODEL_ID --port 2080 --tensor-parallel-size 8 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
imagePullPolicy: IfNotPresent
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/vllm-gaudi:latest"
args:
- "--enforce-eager"
- "--model"
- "$(MODEL_ID)"
- "--tensor-parallel-size"
- "1"
- "--gpu-memory-utilization"
- "$(GPU_MEMORY_UTILIZATION)"
- "--dtype"
- "$(DTYPE)"
- "--max-num-seqs"
- "$(MAX_NUM_SEQS)"
- "--block-size"
- "$(BLOCK_SIZE)"
- "--max-seq-len-to-capture"
- "$(MAX_SEQ_LEN_TO_CAPTURE)"
- "--host"
- "0.0.0.0"
- "--port"
- "$(PORT)"
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1293,20 +1317,13 @@ spec:
protocol: TCP
resources:
limits:
habana.ai/gaudi: 8
cpu: 40
memory: 400Gi
hugepages-2Mi: 9800Mi
requests:
habana.ai/gaudi: 8
cpu: 40
memory: 400Gi
hugepages-2Mi: 9800Mi
habana.ai/gaudi: 1
volumes:
- name: model-volume # Replace with Persistent volume claim/ host directory
emptyDir: {}
- name: tmp
emptyDir: {}
---
# Source: chatqna/templates/deployment.yaml
# Copyright (C) 2024 Intel Corporation
@@ -1350,8 +1367,8 @@ spec:
value: chatqna-retriever-usvc
- name: EMBEDDING_SERVICE_HOST_IP
value: chatqna-embedding-usvc
- name: GUARDRAIL_SERVICE_HOST_IP
value: chatqna-guardrails-usvc
- name: MODEL_ID
value: "meta-llama/Meta-Llama-3-8B-Instruct"
securityContext:
allowPrivilegeEscalation: false
capabilities:
@@ -1362,7 +1379,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
image: "opea/chatqna-wrapper:latest"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp
@@ -1404,7 +1421,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

View File

@@ -455,7 +455,7 @@ spec:
securityContext:
{}
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: ui
containerPort: 5173
@@ -513,7 +513,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/dataprep-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: data-prep
containerPort: 6007
@@ -588,7 +588,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "redis/redis-stack:7.2.0-v9"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
@@ -663,7 +663,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/retriever-redis:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
ports:
- name: retriever-usvc
containerPort: 7000
@@ -741,7 +741,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tei-gaudi:latest"
image: "ghcr.io/huggingface/tei-gaudi:1.5.0"
imagePullPolicy: IfNotPresent
args:
- "--auto-truncate"
@@ -834,7 +834,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
args:
- "--auto-truncate"
volumeMounts:
@@ -924,8 +924,8 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
image: "ghcr.io/huggingface/tgi-gaudi:2.0.6"
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: model-volume
@@ -1014,7 +1014,7 @@ spec:
seccompProfile:
type: RuntimeDefault
image: "opea/chatqna:latest"
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
volumeMounts:
- mountPath: /tmp
name: tmp
@@ -1055,7 +1055,7 @@ spec:
spec:
containers:
- image: nginx:1.27.1
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: nginx
volumeMounts:
- mountPath: /etc/nginx/conf.d

Some files were not shown because too many files have changed in this diff Show More