Add final README.md and set_env.sh script for quickstart review. Previous pull request was 1595. (#1662)

Signed-off-by: Edwards, James A <jaedwards@habana.ai>
Co-authored-by: Edwards, James A <jaedwards@habana.ai>
This commit is contained in:
James Edwards
2025-03-14 18:05:01 -05:00
committed by GitHub
parent 7159ce3731
commit 527b146a80
2 changed files with 293 additions and 542 deletions

View File

@@ -1,94 +1,106 @@
# Build MegaService of ChatQnA on Gaudi
# Example ChatQnA deployments on an Intel® Gaudi® Platform
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
This example covers the single-node on-premises deployment of the ChatQnA example using OPEA components. There are various ways to enable ChatQnA, but this example will focus on four options available for deploying the ChatQnA pipeline to Intel® Gaudi® AI Accelerators. This example begins with a Quick Start section and then documents how to modify deployments, leverage new models and configure the number of allocated devices.
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component. It also provides options of not using rerank in the pipeline, leveraging guardrails, or using TGI backend for LLM microservice, please refer to [start-all-the-services-docker-containers](#start-all-the-services-docker-containers) section in this page.
This example includes the following sections:
Quick Start:
- [ChatQnA Quick Start Deployment](#chatqna-quick-start-deployment): Demonstrates how to quickly deploy a ChatQnA application/pipeline on a Intel® Gaudi® platform.
- [ChatQnA Docker Compose Files](#chatqna-docker-compose-files): Describes some example deployments and their docker compose files.
- [ChatQnA Service Configuration](#chatqna-service-configuration): Describes the services and possible configuration changes.
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
**Note** This example requires access to a properly installed Intel® Gaudi® platform with a functional Docker service configured to use the habanalabs-container-runtime. Please consult the [Intel® Gaudi® software Installation Guide](https://docs.habana.ai/en/v1.20.0/Installation_Guide/Driver_Installation.html) for more information.
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). We now support running the latest DeepSeek models, including [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) and [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) on Gaudi accelerators. To run `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`, update the `LLM_MODEL_ID` and configure `NUM_CARDS` to 8 in the [set_env.sh](./set_env.sh) script. To run `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`, update the `LLM_MODEL_ID` and configure `NUM_CARDS` to 4 in the [set_env.sh](./set_env.sh) script.
## ChatQnA Quick Start Deployment
## Quick Start: 1.Setup Environment Variable
This section describes how to quickly deploy and test the ChatQnA service manually on an Intel® Gaudi® platform. The basic steps are:
To set up environment variables for deploying ChatQnA services, follow these steps:
1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
5. [Check the Deployment Status](#check-the-deployment-status)
6. [Test the Pipeline](#test-the-pipeline)
7. [Cleanup the Deployment](#cleanup-the-deployment)
1. Set the required environment variables:
### Access the Code
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
Clone the GenAIExample repository and access the ChatQnA Intel® Gaudi® platform Docker Compose files and supporting scripts:
2. If you are in a proxy environment, also set the proxy-related environment variables:
```
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
```
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails
```
Checkout a released version, such as v1.2:
3. Set up other environment variables:
```
git checkout v1.2
```
```bash
source ./set_env.sh
```
### Generate a HuggingFace Access Token
4. Change Model for LLM serving
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
By default, Meta-Llama-3-8B-Instruct is used for LLM serving, the default model can be changed to other validated LLM models.
Please pick a [validated llm models](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation#validated-llm-models) from the table.
To change the default model defined in set_env.sh, overwrite it by exporting LLM_MODEL_ID to the new model or by modifying set_env.sh, and then repeat step 3.
For example, change to DeepSeek-R1-Distill-Qwen-32B using the following command.
### Configure the Deployment Environment
```bash
export LLM_MODEL_ID="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
```
To set up environment variables for deploying ChatQnA services, source the _setup_env.sh_ script in this directory:
Please also check [required gaudi cards for different models](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation#system-requirements-for-llm-models) for new models.
It might be necessary to increase the number of Gaudi cards for the model by exporting NUM_CARDS to the new model or by modifying set_env.sh, and then repeating step 3. For example, increase the number of Gaudi cards for DeepSeek-R1-
Distill-Qwen-32B using the following command:
```
source ./set_env.sh
```
```bash
export NUM_CARDS=4
```
The _set_env.sh_ script will prompt for required and optional environment variables used to configure the ChatQnA services. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [ChatQnA Service configuration](#chatqna-service-configuration) for information on how service specific configuration parameters affect deployments.
## Quick Start: 2.Run Docker Compose
### Deploy the Services Using Docker Compose
To deploy the ChatQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
```bash
docker compose up -d
```
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
The ChatQnA docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Gaudi® Platform:
> NOTE : To get supported Grafana Dashboard, please run download_opea_dashboard.sh following below commands.
```bash
./grafana/dashboards/download_opea_dashboard.sh
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
[+] Running 10/10
✔ Network gaudi_default Created 0.1s
✔ Container tei-reranking-gaudi-server Started 0.7s
✔ Container vllm-gaudi-server Started 0.7s
✔ Container tei-embedding-gaudi-server Started 0.3s
✔ Container redis-vector-db Started 0.6s
✔ Container retriever-redis-server Started 1.1s
✔ Container dataprep-redis-server Started 1.1s
✔ Container chatqna-gaudi-backend-server Started 1.3s
✔ Container chatqna-gaudi-ui-server Started 1.7s
✔ Container chatqna-gaudi-nginx-server Started 1.9s
```
It will automatically download the docker image on `docker hub`:
### Check the Deployment Status
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
After running docker compose, check if all the containers launched via docker compose have started:
```
docker ps -a
```
In following cases, you could build docker image from source by yourself.
For the default deployment, the following 10 containers should have started:
- Failed to download the docker image.
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8365b0a6024d opea/nginx:latest "/docker-entrypoint.…" 2 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp chatqna-gaudi-nginx-server
f090fe262c74 opea/chatqna-ui:latest "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-gaudi-ui-server
ec97d7651c96 opea/chatqna:latest "python chatqna.py" 2 minutes ago Up 2 minutes 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-gaudi-backend-server
a61fb7dc4fae opea/dataprep:latest "sh -c 'python $( [ …" 2 minutes ago Up 2 minutes 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server
d560c232b120 opea/retriever:latest "python opea_retriev…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
a1d7ca2d3787 ghcr.io/huggingface/tei-gaudi:1.5.0 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-gaudi-server
9a9f3fd4fd4c opea/vllm-gaudi:latest "python3 -m vllm.ent…" 2 minutes ago Exited (1) 2 minutes ago vllm-gaudi-server
1ab9bbdf5182 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
9ee0789d819e ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 2 minutes ago Up 2 minutes 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-gaudi-server
```
- If you want to use a specific version of Docker image.
### Test the Pipeline
Please refer to 'Build Docker Images' in below.
## QuickStart: 3.Consume the ChatQnA Service
Once the ChatQnA services are running, test the pipeline using the following command:
```bash
curl http://${host_ip}:8888/v1/chatqna \
@@ -98,504 +110,171 @@ curl http://${host_ip}:8888/v1/chatqna \
}'
```
## 🚀 Build Docker Images
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
### Cleanup the Deployment
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
To stop the containers associated with the deployment, execute the following command:
```
docker compose -f compose.yaml down
```
### 1. Build Retriever Image
```bash
docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile .
```
[+] Running 10/10
✔ Container chatqna-gaudi-nginx-server Removed 10.5s
✔ Container dataprep-redis-server Removed 10.5s
✔ Container chatqna-gaudi-ui-server Removed 10.3s
✔ Container chatqna-gaudi-backend-server Removed 10.3s
✔ Container vllm-gaudi-server Removed 0.0s
✔ Container retriever-redis-server Removed 10.4s
✔ Container tei-reranking-gaudi-server Removed 2.0s
✔ Container tei-embedding-gaudi-server Removed 1.2s
✔ Container redis-vector-db Removed 0.4s
✔ Network gaudi_default Removed 0.4s
```
### 2. Build Dataprep Image
All the ChatQnA containers will be stopped and then removed on completion of the "down" command.
## ChatQnA Docker Compose Files
In the context of deploying a ChatQnA pipeline on an Intel® Gaudi® platform, the allocation and utilization of Gaudi devices across different services are important considerations for optimizing performance and resource efficiency. Each of the four example deployments, defined by the example Docker compose yaml files, demonstrates a unique approach to leveraging Gaudi hardware, reflecting different priorities and operational strategies.
### compose.yaml - Default Deployment
The default deployment utilizes Gaudi devices primarily for the `vllm-service`, which handles large language model (LLM) tasks. This service is configured to maximize the use of Gaudi's capabilities, potentially allocating multiple devices to enhance parallel processing and throughput. The `tei-reranking-service` also uses Gaudi hardware (1 card), however, indicating a balanced approach where both LLM processing and reranking tasks benefit from Gaudi's performance enhancements.
| Service Name | Image Name | Gaudi Use |
| ---------------------------- | ----------------------------------------------------- | ------------ |
| redis-vector-db | redis/redis-stack:7.2.0-v9 | No |
| dataprep-redis-service | opea/dataprep:latest | No |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | No |
| retriever | opea/retriever:latest | No |
| tei-reranking-service | ghcr.io/huggingface/tei-gaudi:1.5.0 | 1 card |
| vllm-service | opea/vllm-gaudi:latest | Configurable |
| chatqna-gaudi-backend-server | opea/chatqna:latest | No |
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest | No |
| chatqna-gaudi-nginx-server | opea/nginx:latest | No |
### compose_tgi.yaml - TGI Deployment
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
| Service Name | Image Name | Gaudi Specific |
| ---------------------------- | ----------------------------------------------------- | -------------- |
| redis-vector-db | redis/redis-stack:7.2.0-v9 | No |
| dataprep-redis-service | opea/dataprep:latest | No |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | No |
| retriever | opea/retriever:latest | No |
| tei-reranking-service | ghcr.io/huggingface/tei-gaudi:1.5.0 | 1 card |
| **tgi-service** | ghcr.io/huggingface/tgi-gaudi:2.0.6 | Configurable |
| chatqna-gaudi-backend-server | opea/chatqna:latest | No |
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest | No |
| chatqna-gaudi-nginx-server | opea/nginx:latest | No |
This deployment may allocate more Gaudi resources to the tgi-service to optimize LLM tasks depending on the specific configuration and workload requirements.
### compose_without_rerank.yaml - No ReRank Deployment
The _compose_without_rerank.yaml_ Docker Compose file is distinct from the default deployment primarily due to the exclusion of the reranking service. In this version, the `tei-reranking-service`, which is typically responsible for providing reranking capabilities for text embeddings and is configured to run on Gaudi hardware, is absent. This omission simplifies the service architecture by removing a layer of processing that would otherwise enhance the ranking of text embeddings. Consequently, the `chatqna-gaudi-backend-server` in this deployment uses a specialized image, `opea/chatqna-without-rerank:latest`, indicating that it is tailored to function without the reranking feature. As a result, the backend server's dependencies are adjusted, without the need for the reranking service. This streamlined setup may impact the application's functionality and performance by focusing on core operations without the additional processing layer provided by reranking, potentially making it more efficient for scenarios where reranking is not essential and freeing Intel® Gaudi® accelerators for other tasks.
| Service Name | Image Name | Gaudi Specific |
| ---------------------------- | ----------------------------------------------------- | -------------- |
| redis-vector-db | redis/redis-stack:7.2.0-v9 | No |
| dataprep-redis-service | opea/dataprep:latest | No |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | No |
| retriever | opea/retriever:latest | No |
| vllm-service | opea/vllm-gaudi:latest | Configurable |
| chatqna-gaudi-backend-server | **opea/chatqna-without-rerank:latest** | No |
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest | No |
| chatqna-gaudi-nginx-server | opea/nginx:latest | No |
This setup might allow for more Gaudi devices to be dedicated to the `vllm-service`, enhancing LLM processing capabilities and accommodating larger models. However, it also means that the benefits of reranking are sacrificed, which could impact the overall quality of the pipeline's output.
### compose_guardrails.yaml - Guardrails Deployment
The _compose_guardrails.yaml_ Docker Compose file introduces enhancements over the default deployment by incorporating additional services focused on safety and ChatQnA response control. Notably, it includes the `tgi-guardrails-service` and `guardrails` services. The `tgi-guardrails-service` uses the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is configured to run on Gaudi hardware, providing functionality to manage input constraints and ensure safe operations within defined limits. The guardrails service, using the `opea/guardrails:latest` image, acts as a safety layer that interfaces with the `tgi-guardrails-service` to enforce safety protocols and manage interactions with the large language model (LLM). Additionally, the `chatqna-gaudi-backend-server` is updated to use the `opea/chatqna-guardrails:latest` image, indicating its design to integrate with these new guardrail services. This backend server now depends on the `tgi-guardrails-service` and `guardrails`, alongside existing dependencies like `redis-vector-db`, `tei-embedding-service`, `retriever`, `tei-reranking-service`, and `vllm-service`. The environment configurations for the backend are also updated to include settings for the guardrail services.
| Service Name | Image Name | Gaudi Specific | Uses LLM |
| ---------------------------- | ----------------------------------------------------- | -------------- | -------- |
| redis-vector-db | redis/redis-stack:7.2.0-v9 | No | No |
| dataprep-redis-service | opea/dataprep:latest | No | No |
| _tgi-guardrails-service_ | ghcr.io/huggingface/tgi-gaudi:2.0.6 | 1 card | Yes |
| _guardrails_ | opea/guardrails:latest | No | No |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | No | No |
| retriever | opea/retriever:latest | No | No |
| tei-reranking-service | ghcr.io/huggingface/tei-gaudi:1.5.0 | 1 card | No |
| vllm-service | opea/vllm-gaudi:latest | Configurable | Yes |
| chatqna-gaudi-backend-server | opea/chatqna-guardrails:latest | No | No |
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest | No | No |
| chatqna-gaudi-nginx-server | opea/nginx:latest | No | No |
The deployment with guardrails introduces additional Gaudi-specific services, such as the `tgi-guardrails-service`, which necessitates careful consideration of Gaudi allocation. This deployment aims to balance safety and performance, potentially requiring a strategic distribution of Gaudi devices between the guardrail services and the LLM tasks to maintain both operational safety and efficiency.
### Telemetry Enablement - compose.telemetry.yaml and compose_tgi.telemetry.yaml
The telemetry Docker Compose files are incremental configurations designed to enhance existing deployments by integrating telemetry metrics, thereby providing valuable insights into the performance and behavior of certain services. This setup modifies specific services, such as the `tgi-service`, `tei-embedding-service` and `tei-reranking-service`, by adding a command-line argument that specifies an OpenTelemetry Protocol (OTLP) endpoint. This enables these services to export telemetry data to a designated endpoint, facilitating detailed monitoring and analysis. The `chatqna-gaudi-backend-server` is configured with environment variables that enable telemetry and specify the telemetry endpoint, ensuring that the backend server's operations are also monitored.
Additionally, the telemetry files introduce a new service, `jaeger`, which uses the `jaegertracing/all-in-one:latest` image. Jaeger is a powerful open-source tool for tracing and monitoring distributed systems, offering a user-friendly interface for visualizing traces and understanding the flow of requests through the system.
To enable Open Telemetry Tracing, compose.telemetry.yaml file needs to be merged along with default compose.yaml file on deployment:
```bash
docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
```
### 3. Build Guardrails Docker Image (Optional)
To fortify AI initiatives in production, Guardrails microservice can secure model inputs and outputs, building Trustworthy, Safe, and Secure LLM-based Applications.
```bash
docker build -t opea/guardrails:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/guardrails/Dockerfile .
```
### 4. Build MegaService Docker Image
1. MegaService with Rerank
To construct the Mega Service with Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
2. MegaService with Guardrails
If you want to enable guardrails microservice in the pipeline, please use the below command instead:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/
docker build --no-cache -t opea/chatqna-guardrails:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.guardrails .
```
3. MegaService without Rerank
To construct the Mega Service without Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna_without_rerank.py` Python script. Build MegaService Docker image via below command:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna-without-rerank:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.without_rerank .
```
### 5. Build UI Docker Image
Construct the frontend Docker image using the command below:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
### 6. Build Conversational React UI Docker Image (Optional)
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
**Export the value of the public IP address of your Gaudi node to the `host_ip` environment variable**
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
### 7. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .
```
Then run the command `docker images`, you will have the following 5 Docker Images:
- `opea/retriever:latest`
- `opea/dataprep:latest`
- `opea/chatqna:latest`
- `opea/chatqna-ui:latest`
- `opea/nginx:latest`
If Conversation React UI is built, you will find one more image:
- `opea/chatqna-conversation-ui:latest`
If Guardrails docker image is built, you will find one more image:
- `opea/guardrails:latest`
## 🚀 Start MicroServices and MegaService
### Required Models
By default, the embedding, reranking and LLM models are set to a default value as listed below:
| Service | Model |
| --------- | ----------------------------------- |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
Change the `xxx_MODEL_ID` below for your needs.
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
1. Online
```bash
export HF_TOKEN=${your_hf_token}
export HF_ENDPOINT="https://hf-mirror.com"
model_name="meta-llama/Meta-Llama-3-8B-Instruct"
# Start vLLM LLM Service
docker run -p 8007:80 -v ./data:/data --name vllm-gaudi-server -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e VLLM_TORCH_PROFILER_DIR="/mnt" --cap-add=sys_nice --ipc=host opea/vllm-gaudi:latest --model $model_name --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
# Start TGI LLM Service
docker run -p 8005:80 -v ./data:/data --name tgi-gaudi-server -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id $model_name --max-input-tokens 1024 --max-total-tokens 2048
```
2. Offline
- Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`.
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
- Run the following command to start the LLM service.
```bash
export HF_TOKEN=${your_hf_token}
export model_path="/path/to/model"
# Start vLLM LLM Service
docker run -p 8007:80 -v $model_path:/data --name vllm-gaudi-server --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e VLLM_TORCH_PROFILER_DIR="/mnt" --cap-add=sys_nice --ipc=host opea/vllm-gaudi:latest --model /data --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
# Start TGI LLM Service
docker run -p 8005:80 -v $model_path:/data --name tgi-gaudi-server --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e ENABLE_HPU_GRAPH=true -e LIMIT_HPU_GRAPH=true -e USE_FLASH_ATTENTION=true -e FLASH_ATTENTION_RECOMPUTE=true --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id /data --max-input-tokens 1024 --max-total-tokens 2048
```
### Setup Environment Variables
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
```
If use vLLM as the LLM serving backend.
```bash
# Start ChatQnA with Rerank Pipeline
docker compose -f compose.yaml up -d
# Start ChatQnA without Rerank Pipeline
docker compose -f compose_without_rerank.yaml up -d
# Start ChatQnA with Rerank Pipeline and Open Telemetry Tracing
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
If use TGI as the LLM serving backend.
For a TGI Deployment, this would become:
```bash
docker compose -f compose_tgi.yaml up -d
# Start ChatQnA with Open Telemetry Tracing
```
docker compose -f compose_tgi.yaml -f compose_tgi.telemetry.yaml up -d
```
If you want to enable guardrails microservice in the pipeline, please follow the below command instead:
## ChatQnA Service Configuration
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose -f compose_guardrails.yaml up -d
```
The table provides a comprehensive overview of the ChatQnA services utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture. These services collectively enable functionalities such as data storage and management, text embedding, retrieval, reranking, and large language model processing. Additionally, specialized services like `tgi-service` and `guardrails` are included to enhance text generation inference and ensure operational safety, respectively. The table also highlights the integration of telemetry through the `jaeger` service, which provides tracing and monitoring capabilities.
> **_NOTE:_** Users need at least two Gaudi cards to run the ChatQnA successfully.
| Service Name | Possible Image Names | Optional | Description |
| ---------------------------- | ----------------------------------------------------- | -------- | -------------------------------------------------------------------------------------------------- |
| redis-vector-db | redis/redis-stack:7.2.0-v9 | No | Acts as a Redis database for storing and managing data. |
| dataprep-redis-service | opea/dataprep:latest | No | Prepares data and interacts with the Redis database. |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | No | Provides text embedding services, often using Hugging Face models. |
| retriever | opea/retriever:latest | No | Retrieves data from the Redis database and interacts with embedding services. |
| tei-reranking-service | ghcr.io/huggingface/tei-gaudi:1.5.0 | Yes | Reranks text embeddings, typically using Gaudi hardware for enhanced performance. |
| vllm-service | opea/vllm-gaudi:latest | No | Handles large language model (LLM) tasks, utilizing Gaudi hardware. |
| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.0.6 | Yes | Specific to the TGI deployment, focuses on text generation inference using Gaudi hardware. |
| tgi-guardrails-service | ghcr.io/huggingface/tgi-gaudi:2.0.6 | Yes | Provides guardrails functionality, ensuring safe operations within defined limits. |
| guardrails | opea/guardrails:latest | Yes | Acts as a safety layer, interfacing with the `tgi-guardrails-service` to enforce safety protocols. |
| chatqna-gaudi-backend-server | opea/chatqna:latest | No | Serves as the backend for the ChatQnA application, with variations depending on the deployment. |
| | opea/chatqna-without-rerank:latest | | |
| | opea/chatqna-guardrails:latest | | |
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest | No | Provides the user interface for the ChatQnA application. |
| chatqna-gaudi-nginx-server | opea/nginx:latest | No | Acts as a reverse proxy, managing traffic between the UI and backend services. |
| jaeger | jaegertracing/all-in-one:latest | Yes | Provides tracing and monitoring capabilities for distributed systems. |
### Validate MicroServices and MegaService
Many of these services provide pipeline support required for all ChatQnA deployments, and are not specific to supporting the Intel® Gaudi® platform. Therefore, while the `redis-vector-db`, `dataprep-redis-service`, `retriever`, `chatqna-gaudi-backend-server`, `chatqna-gaudi-ui-server`, `chatqna-gaudi-nginx-server`, `jaeger` are configurable, they will not be covered by this example, which will focus on the configuration specifics of the services modified to support the Intel® Gaudi® platform.
Follow the instructions to validate MicroServices.
For validation details, please refer to [how-to-validate_service](./how_to_validate_service.md).
### vllm-service & tgi-service
1. TEI Embedding Service
In the configuration of the `vllm-service` and the `tgi-service`, two variables play a primary role in determining the service's performance and functionality: `LLM_MODEL_ID` and `NUM_CARDS`. Both can be set using the appropriate environment variables. The `LLM_MODEL_ID` parameter specifies the particular large language model (LLM) that the service will utilize, effectively determining the capabilities and characteristics of the language processing tasks it can perform. This model identifier ensures that the service is aligned with the specific requirements of the application, whether it involves text generation, comprehension, or other language-related tasks. The `NUM_CARDS` parameter dictates the number of Gaudi devices allocated to the service. A higher number of Gaudi devices can enhance parallel processing capabilities, reduce latency, and improve throughput.
```bash
curl ${host_ip}:8090/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
However, developers need to be aware of the models that have been tested with the respective service image supporting the `vllm-service` and `tgi-service`. For example, documentation for the OPEA GenAIComps v1.0 release specify the list of [validated LLM models](https://github.com/opea-project/GenAIComps/blob/v1.0/comps/llms/text-generation/README.md#validated-llm-models) for each Gaudi enabled service image. Specific models may have stringent requirements on the number of Intel® Gaudi® devices required to support them.
2. Retriever Microservice
#### Deepseek Model Support for Intel® Gaudi® Platform ChatQnA pipeline
To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector
is determined by the embedding model.
Here we use the model `EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"`, which vector size is 768.
ChatQnA now supports running the latest DeepSeek models, including [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) and [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) on Gaudi accelerators. To run `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`, set the `LLM_MODEL_ID` appropriately and the `NUM_CARDS` to 8. To run `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`, update the `LLM_MODEL_ID` appropriately and set the `NUM_CARDS` to 4.
Check the vecotor dimension of your embedding model, set `your_embedding` dimension equals to it.
### tei-embedding-service & tei-reranking-service
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
The `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5` image supporting `tei-embedding-service` and `tei-reranking-service` depends on the `EMBEDDING_MODEL_ID` or `RERANK_MODEL_ID` environment variables respectively to specify the embedding model and reranking model used for converting text into vector representations and rankings. This choice impacts the quality and relevance of the embeddings rerankings for various applications. Unlike the `vllm-service`, the `tei-embedding-service` and `tei-reranking-service` each typically acquires only one Gaudi device and does not use the `NUM_CARDS` parameter; embedding and reranking tasks generally do not require extensive parallel processing and one Gaudi per service is appropriate. The list of [supported embedding and reranking models](https://github.com/huggingface/tei-gaudi?tab=readme-ov-file#supported-models) can be found at the the [huggingface/tei-gaudi](https://github.com/huggingface/tei-gaudi?tab=readme-ov-file#supported-models) website.
3. TEI Reranking Service
### tgi-gaurdrails-service
> Skip for ChatQnA without Rerank pipeline
The `tgi-guardrails-service` uses the `GUARDRAILS_MODEL_ID` parameter to select a [supported model](https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#tested-models-and-configurations) for the associated `ghcr.io/huggingface/tgi-gaudi:2.0.6` image. Like the `tei-embedding-service` and `tei-reranking-service` services, it doesn't use the `NUM_CARDS` parameter.
```bash
curl http://${host_ip}:8808/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
## Conclusion
4. LLM backend Service
In examining the various services and configurations across different deployments, developers should gain a comprehensive understanding of how each component contributes to the overall functionality and performance of a ChatQnA pipeline on an Intel® Gaudi® platform. Key services such as the `vllm-service`, `tei-embedding-service`, `tei-reranking-service`, and `tgi-guardrails-service` each consume Gaudi accelerators, leveraging specific models and hardware resources to optimize their respective tasks. The `LLM_MODEL_ID`, `EMBEDDING_MODEL_ID`, `RERANK_MODEL_ID`, and `GUARDRAILS_MODEL_ID` parameters specify the models used, directly impacting the quality and effectiveness of language processing, embedding, reranking, and safety operations.
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready.
The allocation of Gaudi devices, affected by the Gaudi dependent services and the `NUM_CARDS` parameter supporting the `vllm-service` or `tgi-service`, determines where computational power is utilized to enhance performance.
Try the command below to check whether the LLM serving is ready.
```bash
# vLLM service
docker logs vllm-gaudi-server 2>&1 | grep complete
# If the service is ready, you will get the response like below.
INFO: Application startup complete.
```
```bash
# TGI service
docker logs tgi-gaudi-server | grep Connected
If the service is ready, you will get the response like below.
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
Then try the `cURL` command below to validate services.
```bash
# vLLM Service
curl http://${host_ip}:8007/v1/chat/completions \
-X POST \
-d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```
```bash
# TGI service
curl http://${host_ip}:8005/v1/chat/completions \
-X POST \
-d '{"model": ${LLM_MODEL_ID}, "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'
```
5. MegaService
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
6. Nginx Service
```bash
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
-H "Content-Type: application/json" \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
7. Dataprep MicroserviceOptional
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File Upload:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Add Knowledge Base via HTTP Links:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
Also, you are able to get the file/link list that you uploaded:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get" \
-H "Content-Type: application/json"
```
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
To delete the file/link you uploaded:
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
8. Guardrails (Optional)
```bash
curl http://${host_ip}:9090/v1/guardrails\
-X POST \
-d '{"text":"How do you buy a tiger in the US?","parameters":{"max_new_tokens":32}}' \
-H 'Content-Type: application/json'
```
### Profile Microservices
To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
#### 1. vLLM backend Service
Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
##### Start vLLM profiling
```bash
curl http://${host_ip}:9009/start_profile \
-H "Content-Type: application/json" \
-d '{"model": ${LLM_MODEL_ID}}'
```
Users would see below docker logs from vllm-service if profiling is started correctly.
```bash
INFO api_server.py:361] Starting profiler...
INFO api_server.py:363] Profiler started.
INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
```
After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
or ChatQnA MicroService.
##### Stop vLLM profiling
By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
under /mnt folder in vllm-service docker instance.
```bash
# vLLM Service
curl http://${host_ip}:9009/stop_profile \
-H "Content-Type: application/json" \
-d '{"model": ${LLM_MODEL_ID}}'
```
Users would see below docker logs from vllm-service if profiling is stopped correctly.
```bash
INFO api_server.py:368] Stopping profiler...
INFO api_server.py:370] Profiler stopped.
INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
```
After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
```bash
docker cp vllm-service:/mnt/ .
```
##### Check profiling result
Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
to see the vLLM profiling result as below diagram.
![image](https://github.com/user-attachments/assets/487c52c8-d187-46dc-ab3a-43f21d657d41)
![image](https://github.com/user-attachments/assets/e3c51ce5-d704-4eb7-805e-0d88b0c158e3)
## 🚀 Launch the UI
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chatqna-gaudi-ui-server:
image: opea/chatqna-ui:latest
...
ports:
- "80:5173"
```
### Launch with Nginx
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chatqna-gaudi-ui-server` service with the `chatqna-gaudi-conversation-ui-server` service as per the config below:
```yaml
chatqna-gaudi-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
container_name: chatqna-gaudi-conversation-ui-server
environment:
- APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT}
ports:
- "5174:80"
depends_on:
- chatqna-gaudi-backend-server
ipc: host
restart: always
```
Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chatqna-gaudi-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
...
ports:
- "80:80"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../../assets/img/conversation_ui_response.png)
Overall, the strategic configuration of these services, through careful selection of models and resource allocation, enables a balanced and efficient deployment. This approach ensures that the ChatQnA pipeline can meet diverse operational needs, from high-performance language model processing to robust safety protocols, all while optimizing the use of available hardware resources.

100
ChatQnA/docker_compose/intel/hpu/gaudi/set_env.sh Normal file → Executable file
View File

@@ -1,22 +1,94 @@
#!/usr/bin/env bash
#/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# Function to prompt for input and set environment variables
prompt_for_env_var() {
local var_name="$1"
local prompt_message="$2"
local default_value="$3"
local mandatory="$4"
if [[ "$mandatory" == "true" ]]; then
while [[ -z "$value" ]]; do
read -p "$prompt_message [default: \"${default_value}\"]: " value
if [[ -z "$value" ]]; then
echo "Input cannot be empty. Please try again."
fi
done
else
read -p "$prompt_message [default: \"${default_value}\"]: " value
fi
if [[ "$value" == "" ]]; then
export "$var_name"="$default_value"
else
export "$var_name"="$value"
fi
}
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
# Prompt the user for each required environment variable
prompt_for_env_var "EMBEDDING_MODEL_ID" "Enter the EMBEDDING_MODEL_ID" "BAAI/bge-base-en-v1.5" false
prompt_for_env_var "HUGGINGFACEHUB_API_TOKEN" "Enter the HUGGINGFACEHUB_API_TOKEN" "" true
prompt_for_env_var "RERANK_MODEL_ID" "Enter the RERANK_MODEL_ID" "BAAI/bge-reranker-base" false
prompt_for_env_var "LLM_MODEL_ID" "Enter the LLM_MODEL_ID" "meta-llama/Meta-Llama-3-8B-Instruct" false
prompt_for_env_var "INDEX_NAME" "Enter the INDEX_NAME" "rag-redis" false
prompt_for_env_var "NUM_CARDS" "Enter the number of Gaudi devices" "1" false
prompt_for_env_var "host_ip" "Enter the host_ip" "$(curl ifconfig.me)" false
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export INDEX_NAME="rag-redis"
export NUM_CARDS=1
# Set it as a non-null string, such as true, if you want to enable logging facility,
# otherwise, keep it as "" to disable it.
export LOGFLAG=""
# Set OpenTelemetry Tracing Endpoint
export JAEGER_IP=$(ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+')
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export TELEMETRY_ENDPOINT=http://$JAEGER_IP:4318/v1/traces
export no_proxy="$no_proxy,chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-gaudi-server,vllm-gaudi-server,guardrails,jaeger,prometheus,grafana,node-exporter,gaudi-exporter,$JAEGER_IP"
#Query for enabling http_proxy
prompt_for_env_var "http_proxy" "Enter the http_proxy." "" false
#Query for enabling https_proxy
prompt_for_env_var "https_proxy" "Enter the https_proxy." "" false
#Query for enabling no_proxy
prompt_for_env_var "no_proxy" "Enter the no_proxy." "" false
# Query for enabling logging
read -p "Enable logging? (yes/no): " logging && logging=$(echo "$logging" | tr '[:upper:]' '[:lower:]')
if [[ "$logging" == "yes" || "$logging" == "y" ]]; then
export LOGFLAG=true
else
export LOGFLAG=false
fi
# Query for enabling OpenTelemetry Tracing Endpoint
read -p "Enable OpenTelemetry Tracing Endpoint? (yes/no): " telemetry && telemetry=$(echo "$telemetry" | tr '[:upper:]' '[:lower:]')
if [[ "$telemetry" == "yes" || "$telemetry" == "y" ]]; then
export JAEGER_IP=$(ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+')
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export TELEMETRY_ENDPOINT=http://$JAEGER_IP:4318/v1/traces
telemetry_flag=true
else
telemetry_flag=false
fi
# Generate the .env file
cat <<EOF > .env
#!/bin/bash
# Set all required ENV values
export TAG=${TAG}
export EMBEDDING_MODEL_ID=${EMBEDDING_MODEL_ID}
export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN
export RERANK_MODEL_ID=${RERANK_MODEL_ID}
export LLM_MODEL_ID=${LLM_MODEL_ID}
export INDEX_NAME=${INDEX_NAME}
export NUM_CARDS=${NUM_CARDS}
export host_ip=${host_ip}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export no_proxy=${no_proxy}
export LOGFLAG=${LOGFLAG}
export JAEGER_IP=${JAEGER_IP}
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=${OTEL_EXPORTER_OTLP_TRACES_ENDPOINT}
export TELEMETRY_ENDPOINT=${TELEMETRY_ENDPOINT}
EOF
echo ".env file has been created with the following content:"
cat .env