[ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu (#1430)
Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi for docker-compose deployment to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b. Slow serving issue of neural-chat-7b on ICX: #1420 Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
This commit is contained in:
committed by
GitHub
parent
f11ab458d8
commit
3d3ac59bfb
@@ -10,6 +10,8 @@ Quick Start Deployment Steps:
|
||||
2. Run Docker Compose.
|
||||
3. Consume the ChatQnA Service.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
To set up environment variables for deploying ChatQnA services, follow these steps:
|
||||
@@ -155,11 +157,11 @@ Then run the command `docker images`, you will have the following 5 Docker Image
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -179,7 +181,7 @@ Change the `xxx_MODEL_ID` below for your needs.
|
||||
export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
|
||||
export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export CHATQNA_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export CHATQNA_TGI_SERVICE_PORT=8008
|
||||
export CHATQNA_TEI_EMBEDDING_PORT=8090
|
||||
export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}"
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
export CHATQNA_TGI_SERVICE_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
|
||||
export CHATQNA_EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export CHATQNA_RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export CHATQNA_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export CHATQNA_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export CHATQNA_TGI_SERVICE_PORT=18008
|
||||
export CHATQNA_TEI_EMBEDDING_PORT=18090
|
||||
export CHATQNA_TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}"
|
||||
|
||||
@@ -10,6 +10,8 @@ Quick Start:
|
||||
2. Run Docker Compose.
|
||||
3. Consume the ChatQnA Service.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
To set up environment variables for deploying ChatQnA services, follow these steps:
|
||||
@@ -180,11 +182,11 @@ Then run the command `docker images`, you will have the following 5 Docker Image
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -195,7 +197,7 @@ For users in China who are unable to download models directly from Huggingface,
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
model_name="Intel/neural-chat-7b-v3-3"
|
||||
model_name="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8008:80 -v ./data:/data --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80
|
||||
# Start TGI LLM Service
|
||||
@@ -204,7 +206,7 @@ For users in China who are unable to download models directly from Huggingface,
|
||||
|
||||
2. Offline
|
||||
|
||||
- Search your model name in ModelScope. For example, check [this page](https://www.modelscope.cn/models/ai-modelscope/neural-chat-7b-v3-1/files) for model `neural-chat-7b-v3-1`.
|
||||
- Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`.
|
||||
|
||||
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
|
||||
|
||||
@@ -337,7 +339,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v
|
||||
# either vLLM or TGI service
|
||||
curl http://${host_ip}:9009/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
@@ -450,7 +452,7 @@ Users could follow previous section to testing vLLM microservice or ChatQnA Mega
|
||||
```bash
|
||||
curl http://${host_ip}:9009/start_profile \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct"}'
|
||||
```
|
||||
|
||||
Users would see below docker logs from vllm-service if profiling is started correctly.
|
||||
@@ -473,7 +475,7 @@ By following command, users could stop vLLM profliing and generate a \*.pt.trace
|
||||
# vLLM Service
|
||||
curl http://${host_ip}:9009/stop_profile \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3"}'
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct"}'
|
||||
```
|
||||
|
||||
Users would see below docker logs from vllm-service if profiling is stopped correctly.
|
||||
|
||||
@@ -10,6 +10,8 @@ Quick Start:
|
||||
2. Run Docker Compose.
|
||||
3. Consume the ChatQnA Service.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
To set up environment variables for deploying ChatQnA services, follow these steps:
|
||||
@@ -183,11 +185,11 @@ Then run the command `docker images`, you will have the following 5 Docker Image
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -198,13 +200,13 @@ For users in China who are unable to download models directly from Huggingface,
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
model_name="Intel/neural-chat-7b-v3-3"
|
||||
model_name="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
docker run -p 8008:80 -v ./data:/data --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80
|
||||
```
|
||||
|
||||
2. Offline
|
||||
|
||||
- Search your model name in ModelScope. For example, check [this page](https://www.modelscope.cn/models/ai-modelscope/neural-chat-7b-v3-1/files) for model `neural-chat-7b-v3-1`.
|
||||
- Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`.
|
||||
|
||||
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
|
||||
|
||||
@@ -324,7 +326,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v
|
||||
```bash
|
||||
curl http://${host_ip}:9009/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
|
||||
@@ -4,6 +4,8 @@ This document outlines the deployment process for a ChatQnA application utilizin
|
||||
|
||||
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## 🚀 Apply Xeon Server on AWS
|
||||
|
||||
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
|
||||
@@ -141,11 +143,11 @@ Then run the command `docker images`, you will have the following 5 Docker Image
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -181,7 +183,7 @@ export http_proxy=${your_http_proxy}
|
||||
export https_proxy=${your_http_proxy}
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export INDEX_NAME="rag-qdrant"
|
||||
```
|
||||
|
||||
@@ -256,7 +258,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v
|
||||
```bash
|
||||
curl http://${host_ip}:6042/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ popd > /dev/null
|
||||
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export INDEX_NAME="rag-redis"
|
||||
# Set it as a non-null string, such as true, if you want to enable logging facility,
|
||||
# otherwise, keep it as "" to disable it.
|
||||
|
||||
@@ -10,6 +10,8 @@ Quick Start:
|
||||
2. Run Docker Compose.
|
||||
3. Consume the ChatQnA Service.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
To set up environment variables for deploying ChatQnA services, follow these steps:
|
||||
@@ -178,11 +180,11 @@ If Guardrails docker image is built, you will find one more image:
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -193,7 +195,7 @@ For users in China who are unable to download models directly from Huggingface,
|
||||
```bash
|
||||
export HF_TOKEN=${your_hf_token}
|
||||
export HF_ENDPOINT="https://hf-mirror.com"
|
||||
model_name="Intel/neural-chat-7b-v3-3"
|
||||
model_name="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
# Start vLLM LLM Service
|
||||
docker run -p 8007:80 -v ./data:/data --name vllm-gaudi-server -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e VLLM_TORCH_PROFILER_DIR="/mnt" --cap-add=sys_nice --ipc=host opea/vllm-gaudi:latest --model $model_name --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
|
||||
# Start TGI LLM Service
|
||||
@@ -202,7 +204,7 @@ For users in China who are unable to download models directly from Huggingface,
|
||||
|
||||
2. Offline
|
||||
|
||||
- Search your model name in ModelScope. For example, check [this page](https://www.modelscope.cn/models/ai-modelscope/neural-chat-7b-v3-1/files) for model `neural-chat-7b-v3-1`.
|
||||
- Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`.
|
||||
|
||||
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
|
||||
|
||||
|
||||
@@ -231,7 +231,7 @@ and the log shows model warm up, please wait for a while and try it later.
|
||||
```
|
||||
2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set
|
||||
2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit.
|
||||
2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3
|
||||
2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model meta-llama/Meta-Llama-3-8B-Instruct
|
||||
2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model
|
||||
```
|
||||
|
||||
@@ -239,7 +239,7 @@ and the log shows model warm up, please wait for a while and try it later.
|
||||
|
||||
```
|
||||
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
|
||||
"model": "Intel/neural-chat-7b-v3-3",
|
||||
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
|
||||
"messages": "What is the revenue of Nike in 2023?"
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -9,7 +9,7 @@ popd > /dev/null
|
||||
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export INDEX_NAME="rag-redis"
|
||||
# Set it as a non-null string, such as true, if you want to enable logging facility,
|
||||
# otherwise, keep it as "" to disable it.
|
||||
|
||||
@@ -9,6 +9,8 @@ Quick Start Deployment Steps:
|
||||
3. Run Docker Compose.
|
||||
4. Consume the ChatQnA Service.
|
||||
|
||||
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
|
||||
|
||||
## Quick Start: 1.Setup Environment Variable
|
||||
|
||||
To set up environment variables for deploying ChatQnA services, follow these steps:
|
||||
@@ -165,11 +167,11 @@ Then run the command `docker images`, you will have the following 5 Docker Image
|
||||
|
||||
By default, the embedding, reranking and LLM models are set to a default value as listed below:
|
||||
|
||||
| Service | Model |
|
||||
| --------- | ------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | Intel/neural-chat-7b-v3-3 |
|
||||
| Service | Model |
|
||||
| --------- | ----------------------------------- |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
|
||||
|
||||
Change the `xxx_MODEL_ID` below for your needs.
|
||||
|
||||
@@ -287,7 +289,7 @@ docker compose up -d
|
||||
```bash
|
||||
curl http://${host_ip}:8008/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"model": "Intel/neural-chat-7b-v3-3", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
|
||||
export INDEX_NAME="rag-redis"
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
|
||||
Reference in New Issue
Block a user