Compare commits

..

27 Commits

Author SHA1 Message Date
WenjiaoYue
e6fde1456d Added the function of detecting whether the uploaded content is safe and providing prompts (#867)
Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com>
2024-09-24 16:29:10 +08:00
Steve Zhang
954a22051b Make all xeon tgi image version consistent (#851)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-24 11:19:37 +08:00
Hoong Tee, Yeoh
6f4b00f829 Documentation README update for ProductivitySuite example (#863)
Signed-off-by: Yeoh, Hoong Tee <hoong.tee.yeoh@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-23 22:39:14 +08:00
lvliang-intel
3fb60608b3 Use official tei gaudi image and update tgi gaudi version (#810)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-23 17:52:56 +08:00
Letong Han
c35fe0b429 [Doc] Update ChatQnA README for Nginx Docker Image (#862)
Signed-off-by: letonghan <letong.han@intel.com>
2024-09-23 12:25:30 +09:00
lvliang-intel
28f5e4a268 Add docker based benchmark instructions for ChatQnA (#859)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-23 10:14:44 +08:00
Letong Han
d55a33dda1 [ProductivitySuite] Fix CD Issue (#858)
Signed-off-by: letonghan <letong.han@intel.com>
2024-09-20 16:20:01 +08:00
XinyaoWa
daf2a4fad7 Fix SearchQnA tests bug (#857)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-09-20 16:16:46 +08:00
chen, suyue
3ce395582b print image build test commit (#856)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-20 15:34:04 +08:00
Letong Han
7eaab93d0b [Doc] Refine ChatQnA README (#855)
Signed-off-by: letonghan <letong.han@intel.com>
2024-09-20 11:20:20 +08:00
Neo Zhang Jianyu
bc817700b9 refactor the network port setting for AWS (#849)
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
2024-09-19 21:58:56 +08:00
lvliang-intel
bd811bd622 Add validate microservice details link (#852)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-09-19 21:54:32 +08:00
WenjiaoYue
05f9828e77 Add nginx and UI to the ChatQnA manifest (#848)
Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 21:04:12 +08:00
Letong Han
6c364487d3 [ChatQnA] Add Nginx in Docker Compose and README (#850)
Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 20:39:58 +08:00
ZePan110
21e215c5d5 Refine code scan output and remove opea_release_data.md. (#844)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-09-19 17:34:55 +08:00
Letong Han
a09395e4a4 [Doc] Update CodeGen and Translation READMEs (#847)
Signed-off-by: letonghan <letong.han@intel.com>
2024-09-19 16:01:35 +08:00
lkk
f04f061f8c move evaluation scripts (#842)
Co-authored-by: root <root@idc708073.jf.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 15:59:13 +08:00
Tiep Le
872e93e4bd Handle uncontrolled data path for MultimodalQnA v1.0 release (#845)
Signed-off-by: Tiep Le <tiep.le@intel.com>
2024-09-19 15:45:49 +08:00
XinyaoWa
2f03a3a894 Align parameters for "max_token, repetition_penalty,presence_penalty,frequency_penalty" (#726)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 14:15:25 +08:00
Letong Han
372d78c2ac [Doc] Refine READMEs (#841)
Signed-off-by: letonghan <letong.han@intel.com>
2024-09-19 13:25:40 +08:00
Zhenzhong1
933c3d3445 [ChatQnA] Update OOB with wrapper manifests. (#823) 2024-09-19 11:03:10 +08:00
ZePan110
88829c9381 Remove useless folder. (#840)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-09-19 10:34:33 +08:00
Malini Bhandaru
d85ec0947c Remove marketing materials (#837)
Signed-off-by: Malini Bhandaru <malini.bhandaru@intel.com>
2024-09-19 09:27:01 +08:00
rbrugaro
dc94026d98 doc PR to main instead of of v1.0r (#838)
Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 09:20:55 +08:00
Letong Han
1e130314d9 [Translation] Support manifests and nginx (#812)
Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: root <root@a4bf019305c5.jf.intel.com>
Co-authored-by: root <root@a4bf019305c5.jf.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-19 07:08:13 +08:00
Ying Hu
b205dc7571 Update README.md for Multiplatforms (#834)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 23:25:05 +08:00
kevinintel
3b70fb0d42 Refine the quick start of ChatQnA (#828)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-09-18 22:23:22 +08:00
181 changed files with 3560 additions and 1147 deletions

4
.github/CODEOWNERS vendored Normal file → Executable file
View File

@@ -3,10 +3,10 @@
/ChatQnA/ liang1.lv@intel.com
/CodeGen/ liang1.lv@intel.com
/CodeTrans/ sihan.chen@intel.com
/DocSum/ sihan.chen@intel.com
/DocSum/ letong.han@intel.com
/DocIndexRetriever/ xuhui.ren@intel.com chendi.xue@intel.com
/FaqGen/ xinyao.wang@intel.com
/SearchQnA/ letong.han@intel.com
/SearchQnA/ sihan.chen@intel.com
/Translation/ liang1.lv@intel.com
/VisualQnA/ liang1.lv@intel.com
/ProductivitySuite/ hoong.tee.yeoh@intel.com

View File

@@ -46,33 +46,30 @@ jobs:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Get checkout ref
- name: Get Checkout Ref
run: |
if [ "${{ github.event_name }}" == "pull_request" ] || [ "${{ github.event_name }}" == "pull_request_target" ]; then
echo "CHECKOUT_REF=refs/pull/${{ github.event.number }}/merge" >> $GITHUB_ENV
else
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
fi
echo "checkout ref ${{ env.CHECKOUT_REF }}"
- name: Checkout out Repo
- name: Checkout out GenAIExamples
uses: actions/checkout@v4
with:
ref: ${{ env.CHECKOUT_REF }}
fetch-depth: 0
- name: Clone required Repo
- name: Clone Required Repo
run: |
cd ${{ github.workspace }}/${{ inputs.example }}/docker_image_build
docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml
if [[ $(grep -c "tei-gaudi:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/huggingface/tei-gaudi.git
fi
if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/vllm-project/vllm.git
cd vllm && git rev-parse HEAD && cd ../
fi
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps && git checkout ${{ inputs.opea_branch }} && cd ../
cd GenAIComps && git checkout ${{ inputs.opea_branch }} && git rev-parse HEAD && cd ../
- name: Build Image
if: ${{ fromJSON(inputs.build) }}

View File

@@ -136,7 +136,7 @@ jobs:
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid link from $real_path: $url_dev"
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
fail="TRUE"
fi
else

View File

@@ -3,7 +3,7 @@
services:
tgi-server:
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-server
ports:
- "8085:80"
@@ -13,12 +13,16 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -0,0 +1,51 @@
# AudioQnA accuracy Evaluation
AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio scene, which contains Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The following is the piepline for evaluating the ASR accuracy.
## Dataset
We evaluate the ASR accuracy on the test set of librispeech [dataset](https://huggingface.co/datasets/andreagasparini/librispeech_test_only), which contains 2620 records of audio and texts.
## Metrics
We evaluate the WER (Word Error Rate) metric of the ASR microservice.
## Evaluation
### Launch ASR microservice
Launch the ASR microserice with the following commands. For more details please refer to [doc](https://github.com/opea-project/GenAIComps/tree/main/comps/asr).
```bash
git clone https://github.com/opea-project/GenAIComps
cd GenAIComps
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
# change the name of model by editing model_name_or_path you want to evaluate
docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest --model_name_or_path "openai/whisper-tiny"
```
### Evaluate
Install dependencies:
```
pip install -r requirements.txt
```
Evaluate the performance with the LLM:
```py
# validate the offline model
# python offline_evaluate.py
# validate the online asr microservice accuracy
python online_evaluate.py
```
### Performance Result
Here is the tested result for your reference
|| WER |
| --- | ---- |
|whisper-large-v2| 2.87|
|whisper-large| 2.7 |
|whisper-medium| 3.45 |

View File

@@ -0,0 +1,35 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import torch
from datasets import load_dataset
from evaluate import load
from transformers import WhisperForConditionalGeneration, WhisperProcessor
device = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_NAME = "openai/whisper-large-v2"
librispeech_test_clean = load_dataset(
"andreagasparini/librispeech_test_only", "clean", split="test", trust_remote_code=True
)
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to(device)
def map_to_pred(batch):
audio = batch["audio"]
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
batch["reference"] = processor.tokenizer._normalize(batch["text"])
with torch.no_grad():
predicted_ids = model.generate(input_features.to(device))[0]
transcription = processor.decode(predicted_ids)
batch["prediction"] = processor.tokenizer._normalize(transcription)
return batch
result = librispeech_test_clean.map(map_to_pred)
wer = load("wer")
print(100 * wer.compute(references=result["reference"], predictions=result["prediction"]))

View File

@@ -0,0 +1,56 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import base64
import json
import requests
import torch
from datasets import load_dataset
from evaluate import load
from pydub import AudioSegment
from transformers import WhisperForConditionalGeneration, WhisperProcessor
MODEL_NAME = "openai/whisper-large-v2"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
librispeech_test_clean = load_dataset(
"andreagasparini/librispeech_test_only", "clean", split="test", trust_remote_code=True
)
def map_to_pred(batch):
batch["reference"] = processor.tokenizer._normalize(batch["text"])
file_path = batch["file"]
# process the file_path
pidx = file_path.rfind("/")
sidx = file_path.rfind(".")
file_path_prefix = file_path[: pidx + 1]
file_path_suffix = file_path[sidx:]
file_path_mid = file_path[pidx + 1 : sidx]
splits = file_path_mid.split("-")
file_path_mid = f"LibriSpeech/test-clean/{splits[0]}/{splits[1]}/{file_path_mid}"
file_path = file_path_prefix + file_path_mid + file_path_suffix
audio = AudioSegment.from_file(file_path)
audio.export("tmp.wav")
with open("tmp.wav", "rb") as f:
test_audio_base64_str = base64.b64encode(f.read()).decode("utf-8")
inputs = {"audio": test_audio_base64_str}
endpoint = "http://localhost:7066/v1/asr"
response = requests.post(url=endpoint, data=json.dumps(inputs), proxies={"http": None})
result_str = response.json()["asr_result"]
batch["prediction"] = processor.tokenizer._normalize(result_str)
return batch
result = librispeech_test_clean.map(map_to_pred)
wer = load("wer")
print(100 * wer.compute(references=result["reference"], predictions=result["prediction"]))

View File

@@ -0,0 +1,8 @@
datasets
evaluate
jiwer
librosa
pydub
soundfile
torch
transformers

View File

@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service

View File

@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service

View File

@@ -51,7 +51,7 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "3006:80"
@@ -61,11 +61,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest

View File

@@ -247,7 +247,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/text-generation-inference:2.2.0
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
name: llm-dependency-deploy-demo
securityContext:
capabilities:

View File

@@ -271,7 +271,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -303,6 +303,14 @@ spec:
value: none
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
value: 'true'
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
- name: runtime
value: habana
- name: HABANA_VISIBLE_DEVICES
@@ -315,7 +323,7 @@ spec:
volumes:
- name: model-volume
hostPath:
path: /home/sdp/cesg
path: /mnt/models
type: Directory
- name: shm
emptyDir:

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker images && sleep 1s
}

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="audioqna whisper asr llm-tgi speecht5 tts"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker images && sleep 1s
}

View File

@@ -34,7 +34,7 @@ function validate_audioqa() {
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
echo "$CLIENT_POD"
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
if [ -z "$byte_str" ]; then
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

View File

@@ -34,7 +34,7 @@ function validate_audioqa() {
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
echo "$CLIENT_POD"
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
if [ -z "$byte_str" ]; then
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

View File

@@ -22,7 +22,6 @@ RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./chatqna.py /home/user/chatqna.py
COPY ./gateway.py /home/user/gateway.py
ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps

View File

@@ -53,6 +53,7 @@ To set up environment variables for deploying ChatQnA services, follow these ste
### Quick Start: 2.Run Docker Compose
Select the compose.yaml file that matches your hardware.
CPU example:
```bash
@@ -69,9 +70,13 @@ docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
If you want to build docker by yourself, please refer to `built from source`: [Guide](docker_compose/intel/cpu/xeon/README.md).
In following cases, you could build docker image from source by yourself.
> Note: The optional docker image **opea/chatqna-without-rerank:latest** has not been published yet, users need to build this docker image from source.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to the 'Build Docker Images' in [Guide](docker_compose/intel/cpu/xeon/README.md).
### QuickStart: 3.Consume the ChatQnA Service
@@ -245,7 +250,9 @@ Refer to the [AI PC Guide](./docker_compose/intel/cpu/aipc/README.md) for instru
Refer to the [Intel Technology enabling for Openshift readme](https://github.com/intel/intel-technology-enabling-for-openshift/blob/main/workloads/opea/chatqna/README.md) for instructions to deploy ChatQnA prototype on RHOCP with [Red Hat OpenShift AI (RHOAI)](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai).
## Consume ChatQnA Service
## Consume ChatQnA Service with RAG
### Check Service Status
Before consuming ChatQnA Service, make sure the TGI/vLLM service is ready (which takes up to 2 minutes to start).
@@ -260,6 +267,23 @@ Consume ChatQnA service until you get the TGI response like below.
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
### Upload RAG Files (Optional)
To chat with retrieved information, you need to upload a file using `Dataprep` service.
Here is an example of `Nike 2023` pdf.
```bash
# download pdf file
wget https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf
# upload pdf file with dataprep
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
### Consume Chat Service
Two ways of consuming ChatQnA Service:
1. Use cURL command on terminal

View File

@@ -288,12 +288,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -288,12 +288,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -288,12 +288,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -323,7 +323,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -359,8 +359,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume
@@ -483,7 +491,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: opea/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:latest
name: reranking-dependency-deploy
args:
- --model-id

View File

@@ -323,7 +323,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -359,8 +359,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume

View File

@@ -323,7 +323,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -359,8 +359,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume

View File

@@ -29,6 +29,8 @@ Results will be displayed in the terminal and saved as CSV file named `1_stats.c
## Getting Started
We recommend using Kubernetes to deploy the ChatQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs. Below is a description of Kubernetes deployment and benchmarking. For instructions on deploying and benchmarking with Docker, please refer to [this section](#benchmark-with-docker).
### Prerequisites
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
@@ -67,7 +69,7 @@ We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples
```bash
# on k8s-master node
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/benchmark
cd GenAIExamples/ChatQnA/benchmark/performance
# replace the image tag from latest to v0.9 since we want to test with v0.9 release
IMAGE_TAG=v0.9
@@ -144,11 +146,11 @@ kubectl label nodes k8s-worker1 node-type=chatqna-opea
##### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/single_gaudi) and apply to K8s.
Go to [BKC manifest](./tuned/with_rerank/single_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl apply -f .
```
@@ -187,10 +189,13 @@ curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
###### 3.2 Run Benchmark Test
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.deployment_type`, `test_suite_config.service_ip`, `test_suite_config.service_port`, `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 640]"
export DEPLOYMENT_TYPE="k8s"
export SERVICE_IP = None
export SERVICE_PORT = None
export USER_QUERIES="[640, 640, 640, 640]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
@@ -210,7 +215,7 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_1
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-worker1 node-type-
```
@@ -227,30 +232,32 @@ kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
##### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/two_gaudi) and apply to K8s.
Go to [BKC manifest](./tuned/with_rerank/two_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/two_gaudi
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/two_gaudi
kubectl apply -f .
```
##### 3. Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.deployment_type`, `test_suite_config.service_ip`, `test_suite_config.service_port`, `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 1280]"
````bash
export DEPLOYMENT_TYPE="k8s"
export SERVICE_IP = None
export SERVICE_PORT = None
export USER_QUERIES="[1280, 1280, 1280, 1280]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
And then run the benchmark tool by:
```bash
cd GenAIEval/evals/benchmark
python benchmark.py
```
````
##### 4. Data collection
@@ -276,20 +283,23 @@ kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=cha
##### 2. Install ChatQnA
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/four_gaudi) and apply to K8s.
Go to [BKC manifest](./tuned/with_rerank/four_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/four_gaudi
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/four_gaudi
kubectl apply -f .
```
##### 3. Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.deployment_type`, `test_suite_config.service_ip`, `test_suite_config.service_port`, `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export USER_QUERIES="[4, 8, 16, 2560]"
export DEPLOYMENT_TYPE="k8s"
export SERVICE_IP = None
export SERVICE_PORT = None
export USER_QUERIES="[2560, 2560, 2560, 2560]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
@@ -309,11 +319,84 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_4
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
```
#### 6. Results
## Benchmark with Docker
Check OOB performance data [here](/opea_release_data.md#chatqna), tuned performance data will be released soon.
### Deploy ChatQnA service with Docker
In order to set up the environment correctly, you'll need to configure essential environment variables and, if applicable, proxy-related variables.
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
#### Deploy ChatQnA on Gaudi
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose up -d
```
Refer to the [Gaudi Guide](../../docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
#### Deploy ChatQnA on Xeon
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
docker compose up -d
```
Refer to the [Xeon Guide](../../docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
#### Deploy ChatQnA on NVIDIA GPU
```bash
cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/
docker compose up -d
```
Refer to the [NVIDIA GPU Guide](../../docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source.
### Run tests
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.deployment_type`, `test_suite_config.service_ip`, `test_suite_config.service_port`, `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
```bash
export DEPLOYMENT_TYPE="docker"
export SERVICE_IP = "ChatQnA Service IP"
export SERVICE_PORT = "ChatQnA Service Port"
export USER_QUERIES="[640, 640, 640, 640]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/docker"
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
```
And then run the benchmark tool by:
```bash
cd GenAIEval/evals/benchmark
python benchmark.py
```
### Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/docker` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
### Clean up
Take gaudi as example, use the below command to clean up system.
```bash
cd GenAIExamples/docker_compose/intel/hpu/gaudi
docker compose stop && docker compose rm -f
echo y | docker system prune
```

View File

@@ -3,6 +3,9 @@
test_suite_config: # Overall configuration settings for the test suite
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: ${DEPLOYMENT_TYPE} # Default is "k8s", can also be "docker"
service_ip: ${SERVICE_IP} # Leave as None for k8s, specify for Docker
service_port: ${SERVICE_PORT} # Leave as None for k8s, specify for Docker
concurrent_level: 5 # The concurrency level, adjustable based on requirements
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
random_prompt: false # Use random prompts if true, fixed prompts if false
@@ -41,7 +44,7 @@ test_cases:
run_test: false
service_name: "llm-svc" # Replace with your service name
parameters:
max_new_tokens: 128
max_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95

View File

@@ -306,12 +306,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -306,12 +306,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -306,12 +306,20 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
imagePullPolicy: IfNotPresent
name: llm-dependency-deploy
ports:

View File

@@ -342,7 +342,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -378,8 +378,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume

View File

@@ -342,7 +342,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -378,8 +378,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume

View File

@@ -342,7 +342,7 @@ spec:
- envFrom:
- configMapRef:
name: qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.4
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -378,8 +378,16 @@ spec:
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: HF_TOKEN
- name: HUGGING_FACE_HUB_TOKEN
value: ${HF_TOKEN}
- name: ENABLE_HPU_GRAPH
value: 'true'
- name: LIMIT_HPU_GRAPH
value: 'true'
- name: USE_FLASH_ATTENTION
value: 'true'
- name: FLASH_ATTENTION_RECOMPUTE
value: 'true'
serviceAccountName: default
volumes:
- name: model-volume

View File

@@ -3,8 +3,7 @@
import os
from comps import MicroService, ServiceOrchestrator, ServiceType
from gateway import ChatQnAGateway
from comps import ChatQnAGateway, MicroService, ServiceOrchestrator, ServiceType
MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))

View File

@@ -19,7 +19,7 @@ opea_micro_services:
tei-embedding-service:
host: ${TEI_EMBEDDING_SERVICE_IP}
ports: ${TEI_EMBEDDING_SERVICE_PORT}
image: opea/tei-gaudi:latest
image: ghcr.io/huggingface/tei-gaudi:latest
volumes:
- "./data:/data"
runtime: habana
@@ -48,7 +48,7 @@ opea_micro_services:
tgi-service:
host: ${TGI_SERVICE_IP}
ports: ${TGI_SERVICE_PORT}
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
volumes:
- "./data:/data"
runtime: habana
@@ -56,10 +56,13 @@ opea_micro_services:
- SYS_NICE
ipc: host
environment:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
model-id: ${LLM_MODEL_ID}
llm:
host: ${LLM_SERVICE_HOST_IP}

View File

@@ -69,10 +69,12 @@ def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **k
next_inputs = {}
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
next_inputs["messages"] = [{"role": "user", "content": inputs["inputs"]}]
next_inputs["max_tokens"] = llm_parameters_dict["max_new_tokens"]
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
next_inputs["top_p"] = llm_parameters_dict["top_p"]
next_inputs["stream"] = inputs["streaming"]
next_inputs["frequency_penalty"] = inputs["repetition_penalty"]
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
next_inputs["presence_penalty"] = inputs["presence_penalty"]
next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
next_inputs["temperature"] = inputs["temperature"]
inputs = next_inputs

View File

@@ -171,6 +171,9 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL
### Validate Microservices
Follow the instructions to validate MicroServices.
For details on how to verify the correctness of the response, refer to [how-to-validate_service](../../hpu/gaudi/how_to_validate_service.md).
1. TEI Embedding Service
```bash
@@ -229,7 +232,7 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

View File

@@ -2,6 +2,69 @@
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
Quick Start:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
```bash
docker compose up -d
```
It will automatically download the docker image on `docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
## 🚀 Apply Xeon Server on AWS
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors that are optimized for demanding workloads.
@@ -10,52 +73,25 @@ For detailed information about these instance types, you can refer to this [link
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
**Certain ports in the EC2 instance need to opened up in the security group, for the microservices to work with the curl commands**
### Network Port & Security
> See one example below. Please open up these ports in the EC2 instance based on the IP addresses you want to allow
- Access the ChatQnA UI by web browser
```
redis-vector-db
===============
Port 6379 - Open to 0.0.0.0/0
Port 8001 - Open to 0.0.0.0/0
It supports to access by `80` port. Please confirm the `80` port is opened in the firewall of EC2 instance.
tei_embedding_service
=====================
Port 6006 - Open to 0.0.0.0/0
- Access the microservice by tool or API
embedding
=========
Port 6000 - Open to 0.0.0.0/0
1. Login to the EC2 instance and access by **local IP address** and port.
retriever
=========
Port 7000 - Open to 0.0.0.0/0
It's recommended and do nothing of the network port setting.
tei_xeon_service
================
Port 8808 - Open to 0.0.0.0/0
2. Login to a remote client and access by **public IP address** and port.
reranking
=========
Port 8000 - Open to 0.0.0.0/0
You need to open the port of the microservice in the security group setting of firewall of EC2 instance setting.
tgi-service or vLLM_service
===========
Port 9009 - Open to 0.0.0.0/0
For detailed guide, please refer to [Validate Microservices](#validate-microservices).
llm
===
Port 9000 - Open to 0.0.0.0/0
chaqna-xeon-backend-server
==========================
Port 8888 - Open to 0.0.0.0/0
chaqna-xeon-ui-server
=====================
Port 5173 - Open to 0.0.0.0/0
```
Note, it will increase the risk of security, so please confirm before do it.
## 🚀 Build Docker Images
@@ -157,7 +193,14 @@ cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
Then run the command `docker images`, you will have the following 7 Docker Images:
### 9. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
```
Then run the command `docker images`, you will have the following 8 Docker Images:
1. `opea/dataprep-redis:latest`
2. `opea/embedding-tei:latest`
@@ -166,6 +209,7 @@ Then run the command `docker images`, you will have the following 7 Docker Image
5. `opea/llm-tgi:latest` or `opea/llm-vllm:latest`
6. `opea/chatqna:latest` or `opea/chatqna-without-rerank:latest`
7. `opea/chatqna-ui:latest`
8. `opea/nginx:latest`
## 🚀 Start Microservices
@@ -189,7 +233,7 @@ For users in China who are unable to download models directly from Huggingface,
export HF_TOKEN=${your_hf_token}
export HF_ENDPOINT="https://hf-mirror.com"
model_name="Intel/neural-chat-7b-v3-3"
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.2.0 --model-id $model_name
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu --model-id $model_name
```
2. Offline
@@ -203,62 +247,35 @@ For users in China who are unable to download models directly from Huggingface,
```bash
export HF_TOKEN=${your_hf_token}
export model_path="/path/to/model"
docker run -p 8008:80 -v $model_path:/data --name tgi_service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.2.0 --model-id /data
docker run -p 8008:80 -v $model_path:/data --name tgi_service --shm-size 1g ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu --model-id /data
```
### Setup Environment Variables
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
1. Set the required environment variables:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
```
> Change the External_Public_IP below with the actual IPV4 value
2. If you are in a proxy environment, also set the proxy-related environment variables:
```
export host_ip="External_Public_IP"
```
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
**Export the value of your Huggingface API token to the `your_hf_api_token` environment variable**
3. Set up other environment variables:
> Change the Your_Huggingface_API_Token below with tyour actual Huggingface API Token value
```
export your_hf_api_token="Your_Huggingface_API_Token"
```
**Append the value of the public IP address to the no_proxy list**
```bash
export your_no_proxy=${your_no_proxy},"External_Public_IP"
```
```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export TGI_LLM_ENDPOINT="http://${host_ip}:9009"
export vLLM_LLM_ENDPOINT="http://${host_ip}:9009"
export LLM_SERVICE_PORT=9000
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
```
Note: Please replace with `host_ip` with you external IP address, do not use localhost.
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
@@ -285,6 +302,10 @@ docker compose -f compose_vllm.yaml up -d
### Validate Microservices
Note, when verify the microservices by curl or API from remote client, please make sure the **ports** of the microservices are opened in the firewall of the cloud node.
Follow the instructions to validate MicroServices.
For details on how to verify the correctness of the response, refer to [how-to-validate_service](../../hpu/gaudi/how_to_validate_service.md).
1. TEI Embedding Service
```bash
@@ -379,102 +400,125 @@ docker compose -f compose_vllm.yaml up -d
This service depends on above LLM backend service startup. It will be ready after long time, to wait for them being ready in first startup.
```bash
# TGI service
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
```bash
# vLLM Service
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
-H 'Content-Type: application/json'
```
For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
8. MegaService
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
9. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File [nke-10k-2023.pdf](https://github.com/opea-project/GenAIComps/blob/main/comps/retrievers/redis/data/nke-10k-2023.pdf). Or
click [here](https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf) to download the file via any web browser.
Or run this command to get the file on a terminal.
9. Nginx Service
```bash
wget https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
Upload:
10. Dataprep MicroserviceOptional
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
If you want to update the default knowledge base, you can use the following commands:
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Update Knowledge Base via Local File [nke-10k-2023.pdf](https://github.com/opea-project/GenAIComps/blob/main/comps/retrievers/redis/data/nke-10k-2023.pdf). Or
click [here](https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf) to download the file via any web browser.
Or run this command to get the file on a terminal.
Add Knowledge Base via HTTP Links:
```bash
wget https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
Upload:
Also, you are able to get the file list that you uploaded:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
Add Knowledge Base via HTTP Links:
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
To delete the file/link you uploaded:
This command updates a knowledge base by submitting a list of HTTP links for processing.
The `file_path` here should be the `id` get from `/v1/dataprep/get_file` API.
Also, you are able to get the file list that you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
To delete the file/link you uploaded:
The `file_path` here should be the `id` get from `/v1/dataprep/get_file` API.
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
## 🚀 Launch the UI
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
@@ -485,6 +529,10 @@ To access the frontend, open the following URL in your browser: http://{host_ip}
- "80:5173"
```
### Launch with Nginx
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below:

View File

@@ -222,6 +222,9 @@ docker compose -f compose_qdrant.yaml up -d
### Validate Microservices
Follow the instructions to validate MicroServices.
For details on how to verify the correctness of the response, refer to [how-to-validate_service](../../hpu/gaudi/how_to_validate_service.md).
1. TEI Embedding Service
```bash
@@ -304,7 +307,7 @@ docker compose -f compose_qdrant.yaml up -d
```bash
curl http://${host_ip}:6047/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

View File

@@ -178,6 +178,25 @@ services:
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
ipc: host
restart: always
chaqna-xeon-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-xeon-nginx-server
depends_on:
- chaqna-xeon-backend-server
- chaqna-xeon-ui-server
ports:
- "${NGINX_PORT:-80}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
ipc: host
restart: always
networks:
default:

View File

@@ -69,7 +69,7 @@ services:
INDEX_NAME: ${INDEX_NAME}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "6041:80"

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
@@ -75,7 +75,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"

View File

@@ -22,3 +22,8 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=chatqna
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=8888

View File

@@ -2,6 +2,70 @@
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
Quick Start:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
```bash
docker compose up -d
```
It will automatically download the docker image on `docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
## 🚀 Build Docker Images
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
@@ -132,7 +196,14 @@ cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
Then run the command `docker images`, you will have the following 7 Docker Images:
### 10. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
```
Then run the command `docker images`, you will have the following 8 Docker Images:
- `opea/embedding-tei:latest`
- `opea/retriever-redis:latest`
@@ -141,6 +212,7 @@ Then run the command `docker images`, you will have the following 7 Docker Image
- `opea/dataprep-redis:latest`
- `opea/chatqna:latest` or `opea/chatqna-guardrails:latest` or `opea/chatqna-without-rerank:latest`
- `opea/chatqna-ui:latest`
- `opea/nginx:latest`
If Conversation React UI is built, you will find one more image:
@@ -191,51 +263,30 @@ For users in China who are unable to download models directly from Huggingface,
### Setup Environment Variables
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
1. Set the required environment variables:
```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export LLM_MODEL_ID_NAME="neural-chat-7b-v3-3"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export TGI_LLM_ENDPOINT="http://${host_ip}:8005"
export vLLM_LLM_ENDPOINT="http://${host_ip}:8007"
export vLLM_RAY_LLM_ENDPOINT="http://${host_ip}:8006"
export LLM_SERVICE_PORT=9000
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
```
export llm_service_devices=all
export tei_embedding_devices=all
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
To specify the device ids, "llm_service_devices" and "tei_embedding_devices"` can be set as "0,1,2,3" alike. More info in [gaudi docs](https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html).
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
If guardrails microservice is enabled in the pipeline, the below environment variables are necessary to be set.
3. Set up other environment variables:
```bash
export GURADRAILS_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
export SAFETY_GUARD_ENDPOINT="http://${host_ip}:8088"
export GUARDRAIL_SERVICE_HOST_IP=${host_ip}
```
Note: Please replace `host_ip` with your external IP address, do **NOT** use localhost.
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
@@ -382,88 +433,119 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
7. LLM Microservice
```bash
curl http://${host_ip}:9000/v1/chat/completions \
# TGI service
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
```bash
# vLLM Service
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
-H 'Content-Type: application/json'
```
For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
```bash
# vLLM-on-Ray Service
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
-H 'Content-Type: application/json'
```
For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
8. MegaService
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
"messages": "What is the revenue of Nike in 2023?"
}'
```
9. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File Upload:
9. Nginx Service
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
10. Dataprep MicroserviceOptional
Add Knowledge Base via HTTP Links:
If you want to update the default knowledge base, you can use the following commands:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
Update Knowledge Base via Local File Upload:
This command updates a knowledge base by submitting a list of HTTP links for processing.
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
Also, you are able to get the file/link list that you uploaded:
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
Add Knowledge Base via HTTP Links:
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
To delete the file/link you uploaded:
Also, you are able to get the file/link list that you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
To delete the file/link you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
10. Guardrails (Optional)
@@ -476,6 +558,8 @@ curl http://${host_ip}:9090/v1/guardrails\
## 🚀 Launch the UI
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
@@ -486,11 +570,9 @@ To access the frontend, open the following URL in your browser: http://{host_ip}
- "80:5173"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
### Launch with Nginx
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
@@ -521,6 +603,12 @@ Once the services are up, open the following URL in your browser: http://{host_i
- "80:80"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../../assets/img/conversation_ui_response.png)

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -108,7 +108,7 @@ services:
HF_HUB_ENABLE_HF_TRANSFER: 0
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8005:80"
@@ -118,11 +118,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: ${llm_service_devices}
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE
@@ -187,6 +191,25 @@ services:
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
ipc: host
restart: always
chaqna-gaudi-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-gaudi-nginx-server
depends_on:
- chaqna-gaudi-backend-server
- chaqna-gaudi-ui-server
ports:
- "${NGINX_PORT:-80}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
ipc: host
restart: always
networks:
default:

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tgi-guardrails-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-guardrails-server
ports:
- "8088:80"
@@ -35,11 +35,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE
@@ -60,7 +64,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -141,7 +145,7 @@ services:
HF_HUB_ENABLE_HF_TRANSFER: 0
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8008:80"
@@ -151,11 +155,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -108,7 +108,7 @@ services:
# HF_HUB_ENABLE_HF_TRANSFER: 0
# restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8005:80"
@@ -118,11 +118,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -73,7 +73,7 @@ services:
INDEX_NAME: ${INDEX_NAME}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-gaudi-server
ports:
- "8808:80"

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -73,7 +73,7 @@ services:
INDEX_NAME: ${INDEX_NAME}
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-gaudi-server
ports:
- "8808:80"

View File

@@ -25,7 +25,7 @@ services:
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
tei-embedding-service:
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
image: ghcr.io/huggingface/tei-gaudi:latest
container_name: tei-embedding-gaudi-server
ports:
- "8090:80"
@@ -75,7 +75,7 @@ services:
INDEX_NAME: ${INDEX_NAME}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8005:80"
@@ -85,11 +85,15 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -56,16 +56,16 @@ f810f3b4d329 opea/embedding-tei:latest "python e
2fa17d84605f opea/dataprep-redis:latest "python prepare_doc_…" 2 minutes ago Up 2 minutes 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
69e1fb59e92c opea/retriever-redis:latest "/home/user/comps/re…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
313b9d14928a opea/reranking-tei:latest "python reranking_te…" 2 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:1.2.1 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
174bd43fa6b5 opea/tei-gaudi:latest "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
174bd43fa6b5 ghcr.io/huggingface/tei-gaudi:latest "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, :::8090->80/tcp tei-embedding-gaudi-server
74084469aa33 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
88399dbc9e43 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, :::8808->80/tcp tei-reranking-gaudi-server
```
In this case, `ghcr.io/huggingface/tgi-gaudi:1.2.1` Existed.
In this case, `ghcr.io/huggingface/tgi-gaudi:2.0.5` Existed.
```
05c40b636239 ghcr.io/huggingface/tgi-gaudi:1.2.1 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
05c40b636239 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 2 minutes ago Exited (1) About a minute ago tgi-gaudi-server
```
Next we can check the container logs to get to know what happened during the docker start.
@@ -76,7 +76,7 @@ Check the log of container by:
`docker logs <CONTAINER ID> -t`
View the logs of `ghcr.io/huggingface/tgi-gaudi:1.2.1`
View the logs of `ghcr.io/huggingface/tgi-gaudi:2.0.5`
`docker logs 05c40b636239 -t`
@@ -105,7 +105,7 @@ So just make sure the devices are available.
Here is another failure example:
```
f7a08f9867f9 ghcr.io/huggingface/tgi-gaudi:1.2.1 "text-generation-lau…" 16 seconds ago Exited (2) 14 seconds ago tgi-gaudi-server
f7a08f9867f9 ghcr.io/huggingface/tgi-gaudi:2.0.5 "text-generation-lau…" 16 seconds ago Exited (2) 14 seconds ago tgi-gaudi-server
```
Check the log by `docker logs f7a08f9867f9 -t`.
@@ -122,7 +122,7 @@ View the docker input parameters in `./ChatQnA/docker_compose/intel/hpu/gaudi/co
```
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:1.2.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8008:80"
@@ -131,9 +131,13 @@ View the docker input parameters in `./ChatQnA/docker_compose/intel/hpu/gaudi/co
environment:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE
@@ -278,7 +282,7 @@ and the log shows model warm up, please wait for a while and try it later.
```
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

View File

@@ -21,3 +21,8 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=chatqna
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=8888

View File

@@ -2,6 +2,70 @@
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on NVIDIA GPU platform. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
Quick Start Deployment Steps:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
## Quick Start: 1.Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
```bash
docker compose up -d
```
It will automatically download the docker image on `docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
## 🚀 Build Docker Images
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
@@ -74,7 +138,14 @@ docker build --no-cache -t opea/chatqna-react-ui:latest --build-arg https_proxy=
cd ../../../..
```
Then run the command `docker images`, you will have the following 7 Docker Images:
### 10. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
```
Then run the command `docker images`, you will have the following 8 Docker Images:
1. `opea/embedding-tei:latest`
2. `opea/retriever-redis:latest`
@@ -82,8 +153,8 @@ Then run the command `docker images`, you will have the following 7 Docker Image
4. `opea/llm-tgi:latest`
5. `opea/dataprep-redis:latest`
6. `opea/chatqna:latest`
7. `opea/chatqna-ui:latest`
8. `opea/chatqna-react-ui:latest`
7. `opea/chatqna-ui:latest` or `opea/chatqna-react-ui:latest`
8. `opea/nginx:latest`
## 🚀 Start MicroServices and MegaService
@@ -101,33 +172,30 @@ Change the `xxx_MODEL_ID` below for your needs.
### Setup Environment Variables
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
1. Set the required environment variables:
```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
```
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
```
Note: Please replace with `host_ip` with you external IP address, do **NOT** use localhost.
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
@@ -220,7 +288,7 @@ docker compose up -d
```bash
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```
@@ -232,58 +300,68 @@ docker compose up -d
}'
```
9. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File Upload:
9. Nginx Service
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
10. Dataprep MicroserviceOptional
Add Knowledge Base via HTTP Links:
If you want to update the default knowledge base, you can use the following commands:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
Update Knowledge Base via Local File Upload:
This command updates a knowledge base by submitting a list of HTTP links for processing.
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
Also, you are able to get the file list that you uploaded:
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
Add Knowledge Base via HTTP Links:
To delete the file/link you uploaded:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev"}' \
-H "Content-Type: application/json"
This command updates a knowledge base by submitting a list of HTTP links for processing.
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
Also, you are able to get the file list that you uploaded:
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get_file" \
-H "Content-Type: application/json"
```
To delete the file/link you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "https://opea.dev"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
## 🚀 Launch the UI
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
@@ -294,6 +372,10 @@ To access the frontend, open the following URL in your browser: http://{host_ip}
- "80:5173"
```
### Launch with Nginx
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-ui-server` service with the `chatqna-react-ui-server` service as per the config below:
@@ -324,3 +406,11 @@ Once the services are up, open the following URL in your browser: http://{host_i
```
![project-screenshot](../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../assets/img/conversation_ui_response.png)

View File

@@ -197,6 +197,25 @@ services:
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
ipc: host
restart: always
chaqna-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-nginx-server
depends_on:
- chaqna-backend-server
- chaqna-ui-server
ports:
- "${NGINX_PORT:-80}:80"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
ipc: host
restart: always
networks:
default:

View File

@@ -21,3 +21,8 @@ export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=chatqna
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=8888

View File

@@ -125,15 +125,15 @@ services:
dockerfile: comps/guardrails/llama_guard/langchain/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/guardrails-tgi:${TAG:-latest}
tei-gaudi:
build:
context: tei-gaudi
dockerfile: Dockerfile-hpu
extends: chatqna
image: ${REGISTRY:-opea}/tei-gaudi:${TAG:-latest}
vllm:
build:
context: vllm
dockerfile: Dockerfile.cpu
extends: chatqna
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
nginx:
build:
context: GenAIComps
dockerfile: comps/nginx/Dockerfile
extends: chatqna
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}

View File

@@ -1,69 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from comps.cores.mega.constants import MegaServiceEndpoint
from comps.cores.mega.gateway import Gateway
from comps.cores.proto.api_protocol import (
ChatCompletionRequest,
ChatCompletionResponse,
ChatCompletionResponseChoice,
ChatMessage,
UsageInfo,
)
from comps.cores.proto.docarray import LLMParams, RerankerParms, RetrieverParms
from fastapi import Request
from fastapi.responses import StreamingResponse
class ChatQnAGateway(Gateway):
def __init__(self, megaservice, host="0.0.0.0", port=8888):
super().__init__(
megaservice, host, port, str(MegaServiceEndpoint.CHAT_QNA), ChatCompletionRequest, ChatCompletionResponse
)
async def handle_request(self, request: Request):
data = await request.json()
stream_opt = data.get("stream", True)
chat_request = ChatCompletionRequest.parse_obj(data)
prompt = self._handle_message(chat_request.messages)
parameters = LLMParams(
max_new_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
top_k=chat_request.top_k if chat_request.top_k else 10,
top_p=chat_request.top_p if chat_request.top_p else 0.95,
temperature=chat_request.temperature if chat_request.temperature else 0.01,
repetition_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 1.03,
streaming=stream_opt,
chat_template=chat_request.chat_template if chat_request.chat_template else None,
)
retriever_parameters = RetrieverParms(
search_type=chat_request.search_type if chat_request.search_type else "similarity",
k=chat_request.k if chat_request.k else 4,
distance_threshold=chat_request.distance_threshold if chat_request.distance_threshold else None,
fetch_k=chat_request.fetch_k if chat_request.fetch_k else 20,
lambda_mult=chat_request.lambda_mult if chat_request.lambda_mult else 0.5,
score_threshold=chat_request.score_threshold if chat_request.score_threshold else 0.2,
)
reranker_parameters = RerankerParms(
top_n=chat_request.top_n if chat_request.top_n else 1,
)
result_dict, runtime_graph = await self.megaservice.schedule(
initial_inputs={"text": prompt},
llm_parameters=parameters,
retriever_parameters=retriever_parameters,
reranker_parameters=reranker_parameters,
)
for node, response in result_dict.items():
if isinstance(response, StreamingResponse):
return response
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["text"]
choices = []
usage = UsageInfo()
choices.append(
ChatCompletionResponseChoice(
index=0,
message=ChatMessage(role="assistant", content=response),
finish_reason="stop",
)
)
return ChatCompletionResponse(model="chatqna", choices=choices, usage=usage)

View File

@@ -7,6 +7,8 @@
> You can also customize the "MODEL_ID" if needed.
>
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the ChatQnA workload is running. Otherwise, you need to modify the `chatqna.yaml` file to change the `model-volume` to a directory that exists on the node.
>
> File upload size limit: The maximum size for uploaded files is 10GB.
## Deploy On Xeon

View File

@@ -27,8 +27,8 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tei-embedding-service: opea/tei-gaudi:latest
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
- tei-embedding-service: ghcr.io/huggingface/tei-gaudi:latest
- tgi-service: gghcr.io/huggingface/tgi-gaudi:2.0.5
> [NOTE]
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/hpu/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.

View File

@@ -1,31 +1,4 @@
---
# Source: chatqna/charts/chatqna-ui/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: chatqna-chatqna-ui-config
labels:
helm.sh/chart: chatqna-ui-1.0.0
app.kubernetes.io/name: chatqna-ui
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "v1.0"
app.kubernetes.io/managed-by: Helm
data:
APP_BACKEND_SERVICE_ENDPOINT: "/v1/chatqna"
APP_DATA_PREP_SERVICE_URL: "/v1/dataprep"
CHAT_BASE_URL: "/v1/chatqna"
UPLOAD_FILE_BASE_URL: "/v1/dataprep"
GET_FILE: "/v1/dataprep/get_file"
DELETE_FILE: "/v1/dataprep/delete_file"
BASE_URL: "/v1/chatqna"
DOC_BASE_URL: "/v1/chatqna"
BASIC_URL: "/v1/chatqna"
VITE_CODE_GEN_URL: "/v1/chatqna"
VITE_DOC_SUM_URL: "/v1/chatqna"
---
# Source: chatqna/charts/data-prep/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -283,12 +256,19 @@ data:
listen 80;
listen [::]:80;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_max_body_size 10G;
location /home {
alias /usr/share/nginx/html/index.html;
}
location / {
proxy_pass http://chatqna-chatqna-ui:5174;
proxy_pass http://chatqna-chatqna-ui:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
@@ -349,7 +329,7 @@ metadata:
spec:
type: ClusterIP
ports:
- port: 5174
- port: 5173
targetPort: ui
protocol: TCP
name: ui
@@ -711,12 +691,9 @@ spec:
{}
containers:
- name: chatqna-ui
envFrom:
- configMapRef:
name: chatqna-chatqna-ui-config
securityContext:
{}
image: "opea/chatqna-conversation-ui:latest"
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
ports:
- name: ui
@@ -1497,7 +1474,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.2.0"
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
@@ -1577,7 +1554,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:2.2.0"
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data

View File

@@ -1,31 +1,4 @@
---
# Source: chatqna/charts/chatqna-ui/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: chatqna-chatqna-ui-config
labels:
helm.sh/chart: chatqna-ui-1.0.0
app.kubernetes.io/name: chatqna-ui
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "v1.0"
app.kubernetes.io/managed-by: Helm
data:
APP_BACKEND_SERVICE_ENDPOINT: "/v1/chatqna"
APP_DATA_PREP_SERVICE_URL: "/v1/dataprep"
CHAT_BASE_URL: "/v1/chatqna"
UPLOAD_FILE_BASE_URL: "/v1/dataprep"
GET_FILE: "/v1/dataprep/get_file"
DELETE_FILE: "/v1/dataprep/delete_file"
BASE_URL: "/v1/chatqna"
DOC_BASE_URL: "/v1/chatqna"
BASIC_URL: "/v1/chatqna"
VITE_CODE_GEN_URL: "/v1/chatqna"
VITE_DOC_SUM_URL: "/v1/chatqna"
---
# Source: chatqna/charts/data-prep/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -233,12 +206,19 @@ data:
listen 80;
listen [::]:80;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_max_body_size 10G;
location /home {
alias /usr/share/nginx/html/index.html;
}
location / {
proxy_pass http://chatqna-chatqna-ui:5174;
proxy_pass http://chatqna-chatqna-ui:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
@@ -299,7 +279,7 @@ metadata:
spec:
type: ClusterIP
ports:
- port: 5174
- port: 5173
targetPort: ui
protocol: TCP
name: ui
@@ -611,12 +591,9 @@ spec:
{}
containers:
- name: chatqna-ui
envFrom:
- configMapRef:
name: chatqna-chatqna-ui-config
securityContext:
{}
image: "opea/chatqna-conversation-ui:latest"
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
ports:
- name: ui

View File

@@ -1,31 +1,4 @@
---
# Source: chatqna/charts/chatqna-ui/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: chatqna-chatqna-ui-config
labels:
helm.sh/chart: chatqna-ui-1.0.0
app.kubernetes.io/name: chatqna-ui
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "v1.0"
app.kubernetes.io/managed-by: Helm
data:
APP_BACKEND_SERVICE_ENDPOINT: "/v1/chatqna"
APP_DATA_PREP_SERVICE_URL: "/v1/dataprep"
CHAT_BASE_URL: "/v1/chatqna"
UPLOAD_FILE_BASE_URL: "/v1/dataprep"
GET_FILE: "/v1/dataprep/get_file"
DELETE_FILE: "/v1/dataprep/delete_file"
BASE_URL: "/v1/chatqna"
DOC_BASE_URL: "/v1/chatqna"
BASIC_URL: "/v1/chatqna"
VITE_CODE_GEN_URL: "/v1/chatqna"
VITE_DOC_SUM_URL: "/v1/chatqna"
---
# Source: chatqna/charts/data-prep/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -234,12 +207,19 @@ data:
listen 80;
listen [::]:80;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_max_body_size 10G;
location /home {
alias /usr/share/nginx/html/index.html;
}
location / {
proxy_pass http://chatqna-chatqna-ui:5174;
proxy_pass http://chatqna-chatqna-ui:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
@@ -300,7 +280,7 @@ metadata:
spec:
type: ClusterIP
ports:
- port: 5174
- port: 5173
targetPort: ui
protocol: TCP
name: ui
@@ -612,12 +592,9 @@ spec:
{}
containers:
- name: chatqna-ui
envFrom:
- configMapRef:
name: chatqna-chatqna-ui-config
securityContext:
{}
image: "opea/chatqna-conversation-ui:latest"
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
ports:
- name: ui

View File

@@ -1,31 +1,4 @@
---
# Source: chatqna/charts/chatqna-ui/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: chatqna-chatqna-ui-config
labels:
helm.sh/chart: chatqna-ui-1.0.0
app.kubernetes.io/name: chatqna-ui
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "v1.0"
app.kubernetes.io/managed-by: Helm
data:
APP_BACKEND_SERVICE_ENDPOINT: "/v1/chatqna"
APP_DATA_PREP_SERVICE_URL: "/v1/dataprep"
CHAT_BASE_URL: "/v1/chatqna"
UPLOAD_FILE_BASE_URL: "/v1/dataprep"
GET_FILE: "/v1/dataprep/get_file"
DELETE_FILE: "/v1/dataprep/delete_file"
BASE_URL: "/v1/chatqna"
DOC_BASE_URL: "/v1/chatqna"
BASIC_URL: "/v1/chatqna"
VITE_CODE_GEN_URL: "/v1/chatqna"
VITE_DOC_SUM_URL: "/v1/chatqna"
---
# Source: chatqna/charts/data-prep/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -285,12 +258,19 @@ data:
listen 80;
listen [::]:80;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_max_body_size 10G;
location /home {
alias /usr/share/nginx/html/index.html;
}
location / {
proxy_pass http://chatqna-chatqna-ui:5174;
proxy_pass http://chatqna-chatqna-ui:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
@@ -351,7 +331,7 @@ metadata:
spec:
type: ClusterIP
ports:
- port: 5174
- port: 5173
targetPort: ui
protocol: TCP
name: ui
@@ -713,12 +693,9 @@ spec:
{}
containers:
- name: chatqna-ui
envFrom:
- configMapRef:
name: chatqna-chatqna-ui-config
securityContext:
{}
image: "opea/chatqna-conversation-ui:latest"
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
ports:
- name: ui
@@ -1500,7 +1477,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.1"
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
@@ -1581,7 +1558,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.1"
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data

View File

@@ -1,31 +1,4 @@
---
# Source: chatqna/charts/chatqna-ui/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: chatqna-chatqna-ui-config
labels:
helm.sh/chart: chatqna-ui-1.0.0
app.kubernetes.io/name: chatqna-ui
app.kubernetes.io/instance: chatqna
app.kubernetes.io/version: "v1.0"
app.kubernetes.io/managed-by: Helm
data:
APP_BACKEND_SERVICE_ENDPOINT: "/v1/chatqna"
APP_DATA_PREP_SERVICE_URL: "/v1/dataprep"
CHAT_BASE_URL: "/v1/chatqna"
UPLOAD_FILE_BASE_URL: "/v1/dataprep"
GET_FILE: "/v1/dataprep/get_file"
DELETE_FILE: "/v1/dataprep/delete_file"
BASE_URL: "/v1/chatqna"
DOC_BASE_URL: "/v1/chatqna"
BASIC_URL: "/v1/chatqna"
VITE_CODE_GEN_URL: "/v1/chatqna"
VITE_DOC_SUM_URL: "/v1/chatqna"
---
# Source: chatqna/charts/data-prep/templates/configmap.yaml
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
@@ -234,12 +207,19 @@ data:
listen 80;
listen [::]:80;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_max_body_size 10G;
location /home {
alias /usr/share/nginx/html/index.html;
}
location / {
proxy_pass http://chatqna-chatqna-ui:5174;
proxy_pass http://chatqna-chatqna-ui:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
@@ -300,7 +280,7 @@ metadata:
spec:
type: ClusterIP
ports:
- port: 5174
- port: 5173
targetPort: ui
protocol: TCP
name: ui
@@ -612,12 +592,9 @@ spec:
{}
containers:
- name: chatqna-ui
envFrom:
- configMapRef:
name: chatqna-chatqna-ui-config
securityContext:
{}
image: "opea/chatqna-conversation-ui:latest"
image: "opea/chatqna-ui:latest"
imagePullPolicy: IfNotPresent
ports:
- name: ui
@@ -1321,7 +1298,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.1"
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data

View File

@@ -17,14 +17,14 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna-guardrails chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi tei-gaudi guardrails-tgi"
service_list="chatqna-guardrails chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi guardrails-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}

View File

@@ -17,14 +17,14 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna-no-wrapper chatqna-ui dataprep-redis retriever-redis tei-gaudi"
service_list="chatqna-no-wrapper chatqna-ui dataprep-redis retriever-redis"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="chatqna-no-wrapper chatqna-ui chatqna-conversation-ui dataprep-redis retriever-redis"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker images && sleep 1s

View File

@@ -17,14 +17,14 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi tei-gaudi"
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi nginx"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}
@@ -52,6 +52,12 @@ function start_services() {
export DATAPREP_DELETE_FILE_ENDPOINT="http://${ip_address}:6009/v1/dataprep/delete_file"
export llm_service_devices=all
export tei_embedding_devices=all
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=chatqna
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=8888
export NGINX_PORT=80
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env

View File

@@ -19,10 +19,10 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna chatqna-ui chatqna-conversation-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi"
service_list="chatqna chatqna-ui chatqna-conversation-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-tgi nginx"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker images && sleep 1s
@@ -50,6 +50,12 @@ function start_services() {
export DATAPREP_SERVICE_ENDPOINT="http://${ip_address}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${ip_address}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${ip_address}:6007/v1/dataprep/delete_file"
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=chatqna
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=8888
export NGINX_PORT=80
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env

View File

@@ -17,13 +17,13 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei tei-gaudi llm-vllm-hpu llm-vllm"
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-vllm-hpu llm-vllm"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}

View File

@@ -23,7 +23,7 @@ function build_docker_images() {
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-vllm vllm"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker images && sleep 1s

View File

@@ -17,13 +17,13 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei tei-gaudi llm-vllm-ray-hpu llm-vllm-ray"
service_list="chatqna chatqna-ui dataprep-redis embedding-tei retriever-redis reranking-tei llm-vllm-ray-hpu llm-vllm-ray"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}

View File

@@ -17,14 +17,14 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
git clone https://github.com/huggingface/tei-gaudi
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="chatqna-without-rerank chatqna-ui dataprep-redis embedding-tei retriever-redis llm-tgi tei-gaudi"
service_list="chatqna-without-rerank chatqna-ui dataprep-redis embedding-tei retriever-redis llm-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/tei-gaudi:latest
docker images && sleep 1s
}

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="chatqna-without-rerank chatqna-ui chatqna-conversation-ui dataprep-redis embedding-tei retriever-redis llm-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker images && sleep 1s

View File

@@ -43,6 +43,8 @@ By default, the LLM model is set to a default value as listed below:
[meta-llama/CodeLlama-7b-hf](https://huggingface.co/meta-llama/CodeLlama-7b-hf) is a gated model that requires submitting an access request through Hugging Face. You can replace it with another model.
Change the `LLM_MODEL_ID` below for your needs, such as: [Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat), [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
If you choose to use `meta-llama/CodeLlama-7b-hf` as LLM model, you will need to visit [here](https://huggingface.co/meta-llama/CodeLlama-7b-hf), click the `Expand to review and access` button to ask for model access.
### Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
@@ -132,10 +134,13 @@ Two ways of consuming CodeGen Service:
http_proxy=""
curl http://${host_ip}:8028/generate \
-X POST \
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:7778, the port may be allocated by other users, you can modify the `compose.yaml`.
2. If you get errors like "aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host xx.xx.xx.xx:8028", check the `tgi service` first. If there is "Cannot access gated repo for url
https://huggingface.co/meta-llama/CodeLlama-7b-hf/resolve/main/config.json." error of `tgi service`, Then you need to ask for model access first. Follow the instruction in the [Required Models](#required-models) section for more information.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
3. (Docker only) If all microservices work well, check the port ${host_ip}:7778, the port may be allocated by other users, you can modify the `compose.yaml`.
4. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.

View File

@@ -0,0 +1,100 @@
# CodeGen accuracy Evaluation
## Evaluation Framework
We evaluate accuracy by [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness). It is a framework for the evaluation of code generation models.
## Evaluation FAQs
### Launch CodeGen microservice
Please refer to [CodeGen Examples](https://github.com/opea-project/GenAIExamples/tree/main/CodeGen), follow the guide to deploy CodeGen megeservice.
Use `curl` command to test codegen service and ensure that it has started properly
```bash
export CODEGEN_ENDPOINT = "http://${your_ip}:7778/v1/codegen"
curl $CODEGEN_ENDPOINT \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
### Generation and Evaluation
For evaluating the models on coding tasks or specifically coding LLMs, we follow the [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness) and provide the command line usage and function call usage. [HumanEval](https://huggingface.co/datasets/openai_humaneval), [HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus), [InstructHumanEval](https://huggingface.co/datasets/codeparrot/instructhumaneval), [APPS](https://huggingface.co/datasets/codeparrot/apps), [MBPP](https://huggingface.co/datasets/mbpp), [MBPP+](https://huggingface.co/datasets/evalplus/mbppplus), and [DS-1000](https://github.com/HKUNLP/DS-1000/) for both completion (left-to-right) and insertion (FIM) mode are available.
#### command line usage
```shell
git clone https://github.com/opea-project/GenAIEval
cd GenAIEval
pip install -r requirements.txt
pip install -e .
cd evals/evaluation/bigcode_evaluation_harness/examples
python main.py --model Qwen/CodeQwen1.5-7B-Chat \
--tasks humaneval \
--codegen_url $CODEGEN_ENDPOINT \
--max_length_generation 2048 \
--batch_size 1 \
--save_generations \
--save_references \
--allow_code_execution
```
**_Note:_** Currently, our framework is designed to execute tasks in full. To ensure the accuracy of results, we advise against using the 'limit' or 'limit_start' parameters to restrict the number of test samples.
### accuracy Result
Here is the tested result for your reference
```json
{
"humaneval": {
"pass@1": 0.7195121951219512
},
"config": {
"prefix": "",
"do_sample": true,
"temperature": 0.2,
"top_k": 0,
"top_p": 0.95,
"n_samples": 1,
"eos": "<|endoftext|>",
"seed": 0,
"model": "Qwen/CodeQwen1.5-7B-Chat",
"modeltype": "causal",
"peft_model": null,
"revision": null,
"use_auth_token": false,
"trust_remote_code": false,
"tasks": "humaneval",
"instruction_tokens": null,
"batch_size": 1,
"max_length_generation": 2048,
"precision": "fp32",
"load_in_8bit": false,
"load_in_4bit": false,
"left_padding": false,
"limit": null,
"limit_start": 0,
"save_every_k_tasks": -1,
"postprocess": true,
"allow_code_execution": true,
"generation_only": false,
"load_generations_path": null,
"load_data_path": null,
"metric_output_path": "evaluation_results.json",
"save_generations": true,
"load_generations_intermediate_paths": null,
"save_generations_path": "generations.json",
"save_references": true,
"save_references_path": "references.json",
"prompt": "prompt",
"max_memory_per_gpu": null,
"check_references": false,
"codegen_url": "http://192.168.123.104:31234/v1/codegen"
}
}
```

View File

@@ -6,7 +6,7 @@ opea_micro_services:
tgi-service:
host: ${TGI_SERVICE_IP}
ports: ${TGI_SERVICE_PORT}
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
volumes:
- "./data:/data"
runtime: habana
@@ -17,7 +17,11 @@ opea_micro_services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
model-id: ${LLM_MODEL_ID}
llm:
host: ${LLM_SERVICE_HOST_IP}

View File

@@ -138,7 +138,7 @@ docker compose up -d
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_new_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

View File

@@ -119,7 +119,7 @@ docker compose up -d
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_new_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

View File

@@ -3,7 +3,7 @@
services:
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
container_name: tgi-gaudi-server
ports:
- "8028:80"
@@ -15,7 +15,11 @@ services:
https_proxy: ${https_proxy}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
runtime: habana
cap_add:
- SYS_NICE

View File

@@ -405,7 +405,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/tgi-gaudi:2.0.1"
image: "ghcr.io/huggingface/tgi-gaudi:2.0.5"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data

View File

@@ -22,7 +22,7 @@ function build_docker_images() {
service_list="codegen codegen-ui llm-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker images && sleep 1s
}

View File

@@ -34,7 +34,7 @@ function validate_codegen() {
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
echo "$CLIENT_POD"
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codegen')].status.accessUrl}")
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
exit_code=$?
if [ $exit_code -ne 0 ]; then
echo "chatqna failed, please check the logs in ${LOG_PATH}!"

Some files were not shown because too many files have changed in this diff Show More