Set no wrapper ChatQnA as default (#891)
Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
@@ -70,73 +70,19 @@ curl http://${host_ip}:8888/v1/chatqna \
|
||||
|
||||
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
|
||||
|
||||
### 1. Build Embedding Image
|
||||
|
||||
```bash
|
||||
git clone https://github.com/opea-project/GenAIComps.git
|
||||
cd GenAIComps
|
||||
docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile .
|
||||
```
|
||||
|
||||
### 2. Build Retriever Image
|
||||
### 1. Build Retriever Image
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/retriever-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/redis/langchain/Dockerfile .
|
||||
```
|
||||
|
||||
### 3. Build Rerank Image
|
||||
|
||||
> Skip for ChatQnA without Rerank pipeline
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
|
||||
```
|
||||
|
||||
### 4. Build LLM Image
|
||||
|
||||
You can use different LLM serving solutions, choose one of following four options.
|
||||
|
||||
#### 4.1 Use TGI
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
|
||||
```
|
||||
|
||||
#### 4.2 Use VLLM
|
||||
|
||||
Build vllm docker.
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/llm-vllm-hpu:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/dependency/Dockerfile.intel_hpu .
|
||||
```
|
||||
|
||||
Build microservice docker.
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/llm-vllm:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/langchain/Dockerfile .
|
||||
```
|
||||
|
||||
#### 4.3 Use VLLM-on-Ray
|
||||
|
||||
Build vllm-on-ray docker.
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/llm-vllm-ray-hpu:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/ray/dependency/Dockerfile .
|
||||
```
|
||||
|
||||
Build microservice docker.
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/llm-vllm-ray:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/ray/Dockerfile .
|
||||
```
|
||||
|
||||
### 5. Build Dataprep Image
|
||||
### 2. Build Dataprep Image
|
||||
|
||||
```bash
|
||||
docker build --no-cache -t opea/dataprep-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/redis/langchain/Dockerfile .
|
||||
```
|
||||
|
||||
### 6. Build Guardrails Docker Image (Optional)
|
||||
### 3. Build Guardrails Docker Image (Optional)
|
||||
|
||||
To fortify AI initiatives in production, Guardrails microservice can secure model inputs and outputs, building Trustworthy, Safe, and Secure LLM-based Applications.
|
||||
|
||||
@@ -144,7 +90,7 @@ To fortify AI initiatives in production, Guardrails microservice can secure mode
|
||||
docker build -t opea/guardrails-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/llama_guard/langchain/Dockerfile .
|
||||
```
|
||||
|
||||
### 7. Build MegaService Docker Image
|
||||
### 4. Build MegaService Docker Image
|
||||
|
||||
1. MegaService with Rerank
|
||||
|
||||
@@ -176,7 +122,7 @@ docker build -t opea/guardrails-tgi:latest --build-arg https_proxy=$https_proxy
|
||||
docker build --no-cache -t opea/chatqna-without-rerank:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.without_rerank .
|
||||
```
|
||||
|
||||
### 8. Build UI Docker Image
|
||||
### 5. Build UI Docker Image
|
||||
|
||||
Construct the frontend Docker image using the command below:
|
||||
|
||||
@@ -185,7 +131,7 @@ cd GenAIExamples/ChatQnA/ui
|
||||
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
|
||||
```
|
||||
|
||||
### 9. Build Conversational React UI Docker Image (Optional)
|
||||
### 6. Build Conversational React UI Docker Image (Optional)
|
||||
|
||||
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
|
||||
|
||||
@@ -196,21 +142,18 @@ cd GenAIExamples/ChatQnA/ui
|
||||
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
|
||||
```
|
||||
|
||||
### 10. Build Nginx Docker Image
|
||||
### 7. Build Nginx Docker Image
|
||||
|
||||
```bash
|
||||
cd GenAIComps
|
||||
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile .
|
||||
```
|
||||
|
||||
Then run the command `docker images`, you will have the following 8 Docker Images:
|
||||
Then run the command `docker images`, you will have the following 5 Docker Images:
|
||||
|
||||
- `opea/embedding-tei:latest`
|
||||
- `opea/retriever-redis:latest`
|
||||
- `opea/reranking-tei:latest`
|
||||
- `opea/llm-tgi:latest` or `opea/llm-vllm:latest` or `opea/llm-vllm-ray:latest`
|
||||
- `opea/dataprep-redis:latest`
|
||||
- `opea/chatqna:latest` or `opea/chatqna-guardrails:latest` or `opea/chatqna-without-rerank:latest`
|
||||
- `opea/chatqna:latest`
|
||||
- `opea/chatqna-ui:latest`
|
||||
- `opea/nginx:latest`
|
||||
|
||||
@@ -338,16 +281,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
2. Embedding Microservice
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:6000/v1/embeddings \
|
||||
-X POST \
|
||||
-d '{"text":"hello"}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
3. Retriever Microservice
|
||||
2. Retriever Microservice
|
||||
|
||||
To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector
|
||||
is determined by the embedding model.
|
||||
@@ -363,7 +297,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
4. TEI Reranking Service
|
||||
3. TEI Reranking Service
|
||||
|
||||
> Skip for ChatQnA without Rerank pipeline
|
||||
|
||||
@@ -374,18 +308,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
5. Reranking Microservice
|
||||
|
||||
> Skip for ChatQnA without Rerank pipeline
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8000/v1/reranking \
|
||||
-X POST \
|
||||
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
6. LLM backend Service
|
||||
4. LLM backend Service
|
||||
|
||||
In first startup, this service will take more time to download the model files. After it's finished, the service will be ready.
|
||||
|
||||
@@ -430,39 +353,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
-d '{"model": "${LLM_MODEL_ID}", "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
|
||||
```
|
||||
|
||||
7. LLM Microservice
|
||||
|
||||
```bash
|
||||
# TGI service
|
||||
curl http://${host_ip}:9000/v1/chat/completions\
|
||||
-X POST \
|
||||
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
|
||||
|
||||
```bash
|
||||
# vLLM Service
|
||||
curl http://${host_ip}:9000/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
|
||||
|
||||
```bash
|
||||
# vLLM-on-Ray Service
|
||||
curl http://${host_ip}:9000/v1/chat/completions \
|
||||
-X POST \
|
||||
-d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
|
||||
|
||||
8. MegaService
|
||||
5. MegaService
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
|
||||
@@ -470,7 +361,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
}'
|
||||
```
|
||||
|
||||
9. Nginx Service
|
||||
6. Nginx Service
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
|
||||
@@ -478,7 +369,7 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
|
||||
-d '{"messages": "What is the revenue of Nike in 2023?"}'
|
||||
```
|
||||
|
||||
10. Dataprep Microservice(Optional)
|
||||
7. Dataprep Microservice(Optional)
|
||||
|
||||
If you want to update the default knowledge base, you can use the following commands:
|
||||
|
||||
@@ -547,7 +438,7 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
|
||||
-H "Content-Type: application/json"
|
||||
```
|
||||
|
||||
10. Guardrails (Optional)
|
||||
8. Guardrails (Optional)
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:9090/v1/guardrails\
|
||||
|
||||
@@ -39,26 +39,12 @@ services:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HABANA_VISIBLE_DEVICES: ${tei_embedding_devices}
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
MAX_WARMUP_SEQUENCE_LENGTH: 512
|
||||
INIT_HCCL_ON_ACQUIRE: 0
|
||||
ENABLE_EXPERIMENTAL_FLAGS: true
|
||||
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
|
||||
embedding:
|
||||
image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
container_name: embedding-tei-server
|
||||
depends_on:
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6000:6000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
@@ -90,23 +76,6 @@ services:
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
|
||||
reranking:
|
||||
image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
|
||||
container_name: reranking-tei-gaudi-server
|
||||
depends_on:
|
||||
- tei-reranking-service
|
||||
ports:
|
||||
- "8000:8000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
|
||||
container_name: tgi-gaudi-server
|
||||
@@ -121,7 +90,7 @@ services:
|
||||
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
HABANA_VISIBLE_DEVICES: ${llm_service_devices}
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
ENABLE_HPU_GRAPH: true
|
||||
LIMIT_HPU_GRAPH: true
|
||||
@@ -131,36 +100,16 @@ services:
|
||||
cap_add:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
|
||||
container_name: llm-tgi-gaudi-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-backend-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
- embedding
|
||||
- retriever
|
||||
- tei-reranking-service
|
||||
- reranking
|
||||
- tgi-service
|
||||
- llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
@@ -168,10 +117,14 @@ services:
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
- RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
|
||||
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
@@ -191,25 +144,6 @@ services:
|
||||
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-nginx-server:
|
||||
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
|
||||
container_name: chaqna-gaudi-nginx-server
|
||||
depends_on:
|
||||
- chaqna-gaudi-backend-server
|
||||
- chaqna-gaudi-ui-server
|
||||
ports:
|
||||
- "${NGINX_PORT:-80}:80"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP}
|
||||
- FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT}
|
||||
- BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME}
|
||||
- BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP}
|
||||
- BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT}
|
||||
ipc: host
|
||||
restart: always
|
||||
|
||||
networks:
|
||||
default:
|
||||
|
||||
@@ -82,20 +82,6 @@ services:
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
MAX_WARMUP_SEQUENCE_LENGTH: 512
|
||||
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
|
||||
embedding:
|
||||
image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
container_name: embedding-tei-server
|
||||
depends_on:
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6000:6000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
@@ -127,23 +113,6 @@ services:
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
|
||||
reranking:
|
||||
image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
|
||||
container_name: reranking-tei-gaudi-server
|
||||
depends_on:
|
||||
- tei-reranking-service
|
||||
ports:
|
||||
- "8000:8000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
|
||||
container_name: tgi-gaudi-server
|
||||
@@ -169,23 +138,6 @@ services:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
|
||||
container_name: llm-tgi-gaudi-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna-guardrails:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-guardrails-server
|
||||
@@ -194,12 +146,9 @@ services:
|
||||
- tgi-guardrails-service
|
||||
- guardrails
|
||||
- tei-embedding-service
|
||||
- embedding
|
||||
- retriever
|
||||
- tei-reranking-service
|
||||
- reranking
|
||||
- tgi-service
|
||||
- llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
@@ -208,10 +157,15 @@ services:
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- GUARDRAIL_SERVICE_HOST_IP=${GUARDRAIL_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
|
||||
- GUARDRAIL_SERVICE_PORT=${GUARDRAIL_SERVICE_PORT:-9090}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
- RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
|
||||
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
|
||||
@@ -1,201 +0,0 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
services:
|
||||
redis-vector-db:
|
||||
image: redis/redis-stack:7.2.0-v9
|
||||
container_name: redis-vector-db
|
||||
ports:
|
||||
- "6379:6379"
|
||||
- "8001:8001"
|
||||
dataprep-redis-service:
|
||||
image: ${REGISTRY:-opea}/dataprep-redis:${TAG:-latest}
|
||||
container_name: dataprep-redis-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6007:6007"
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
REDIS_URL: ${REDIS_URL}
|
||||
INDEX_NAME: ${INDEX_NAME}
|
||||
TEI_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
tei-embedding-service:
|
||||
image: ghcr.io/huggingface/tei-gaudi:latest
|
||||
container_name: tei-embedding-gaudi-server
|
||||
ports:
|
||||
- "8090:80"
|
||||
volumes:
|
||||
- "./data:/data"
|
||||
runtime: habana
|
||||
cap_add:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
MAX_WARMUP_SEQUENCE_LENGTH: 512
|
||||
INIT_HCCL_ON_ACQUIRE: 0
|
||||
ENABLE_EXPERIMENTAL_FLAGS: true
|
||||
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
|
||||
# embedding:
|
||||
# image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
# container_name: embedding-tei-server
|
||||
# depends_on:
|
||||
# - tei-embedding-service
|
||||
# ports:
|
||||
# - "6000:6000"
|
||||
# ipc: host
|
||||
# environment:
|
||||
# no_proxy: ${no_proxy}
|
||||
# http_proxy: ${http_proxy}
|
||||
# https_proxy: ${https_proxy}
|
||||
# TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
# restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
ports:
|
||||
- "7000:7000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
REDIS_URL: ${REDIS_URL}
|
||||
INDEX_NAME: ${INDEX_NAME}
|
||||
restart: unless-stopped
|
||||
tei-reranking-service:
|
||||
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
|
||||
container_name: tei-reranking-gaudi-server
|
||||
ports:
|
||||
- "8808:80"
|
||||
volumes:
|
||||
- "./data:/data"
|
||||
shm_size: 1g
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
|
||||
# reranking:
|
||||
# image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
|
||||
# container_name: reranking-tei-gaudi-server
|
||||
# depends_on:
|
||||
# - tei-reranking-service
|
||||
# ports:
|
||||
# - "8000:8000"
|
||||
# ipc: host
|
||||
# environment:
|
||||
# no_proxy: ${no_proxy}
|
||||
# http_proxy: ${http_proxy}
|
||||
# https_proxy: ${https_proxy}
|
||||
# TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
|
||||
# HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
# HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
# HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
# restart: unless-stopped
|
||||
tgi-service:
|
||||
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
|
||||
container_name: tgi-gaudi-server
|
||||
ports:
|
||||
- "8005:80"
|
||||
volumes:
|
||||
- "./data:/data"
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
HABANA_VISIBLE_DEVICES: all
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
ENABLE_HPU_GRAPH: true
|
||||
LIMIT_HPU_GRAPH: true
|
||||
USE_FLASH_ATTENTION: true
|
||||
FLASH_ATTENTION_RECOMPUTE: true
|
||||
runtime: habana
|
||||
cap_add:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096
|
||||
# llm:
|
||||
# image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
|
||||
# container_name: llm-tgi-gaudi-server
|
||||
# depends_on:
|
||||
# - tgi-service
|
||||
# ports:
|
||||
# - "9000:9000"
|
||||
# ipc: host
|
||||
# environment:
|
||||
# no_proxy: ${no_proxy}
|
||||
# http_proxy: ${http_proxy}
|
||||
# https_proxy: ${https_proxy}
|
||||
# TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
# HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
# HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
# HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
# restart: unless-stopped
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna-no-wrapper:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-backend-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
# - embedding
|
||||
- retriever
|
||||
- tei-reranking-service
|
||||
# - reranking
|
||||
- tgi-service
|
||||
# - llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
|
||||
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
image: ${REGISTRY:-opea}/chatqna-ui:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-ui-server
|
||||
depends_on:
|
||||
- chaqna-gaudi-backend-server
|
||||
ports:
|
||||
- "5173:5173"
|
||||
environment:
|
||||
- no_proxy=${no_proxy}
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- CHAT_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
|
||||
- UPLOAD_FILE_BASE_URL=${DATAPREP_SERVICE_ENDPOINT}
|
||||
- GET_FILE=${DATAPREP_GET_FILE_ENDPOINT}
|
||||
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
|
||||
ipc: host
|
||||
restart: always
|
||||
|
||||
networks:
|
||||
default:
|
||||
driver: bridge
|
||||
@@ -43,20 +43,6 @@ services:
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
MAX_WARMUP_SEQUENCE_LENGTH: 512
|
||||
command: --model-id ${EMBEDDING_MODEL_ID}
|
||||
embedding:
|
||||
image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
container_name: embedding-tei-server
|
||||
depends_on:
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6000:6000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
@@ -88,23 +74,6 @@ services:
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
|
||||
reranking:
|
||||
image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
|
||||
container_name: reranking-tei-gaudi-server
|
||||
depends_on:
|
||||
- tei-reranking-service
|
||||
ports:
|
||||
- "8000:8000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
vllm-service:
|
||||
image: ${REGISTRY:-opea}/llm-vllm-hpu:${TAG:-latest}
|
||||
container_name: vllm-gaudi-server
|
||||
@@ -125,34 +94,15 @@ services:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048"
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-vllm:${TAG:-latest}
|
||||
container_name: llm-vllm-gaudi-server
|
||||
depends_on:
|
||||
- vllm-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
vLLM_ENDPOINT: ${vLLM_LLM_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
LLM_MODEL: ${LLM_MODEL_ID}
|
||||
restart: unless-stopped
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-backend-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
- embedding
|
||||
- retriever
|
||||
- tei-reranking-service
|
||||
- reranking
|
||||
- vllm-service
|
||||
- llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
@@ -160,11 +110,14 @@ services:
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
|
||||
- RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
|
||||
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8007}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
|
||||
@@ -43,20 +43,6 @@ services:
|
||||
OMPI_MCA_btl_vader_single_copy_mechanism: none
|
||||
MAX_WARMUP_SEQUENCE_LENGTH: 512
|
||||
command: --model-id ${EMBEDDING_MODEL_ID}
|
||||
embedding:
|
||||
image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
container_name: embedding-tei-server
|
||||
depends_on:
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6000:6000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
@@ -88,23 +74,6 @@ services:
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
|
||||
reranking:
|
||||
image: ${REGISTRY:-opea}/reranking-tei:${TAG:-latest}
|
||||
container_name: reranking-tei-gaudi-server
|
||||
depends_on:
|
||||
- tei-reranking-service
|
||||
ports:
|
||||
- "8000:8000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
vllm-ray-service:
|
||||
image: ${REGISTRY:-opea}/llm-vllm-ray-hpu:${TAG:-latest}
|
||||
container_name: vllm-ray-gaudi-server
|
||||
@@ -125,34 +94,15 @@ services:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: /bin/bash -c "ray start --head && python vllm_ray_openai.py --port_number 8000 --model_id_or_path $LLM_MODEL_ID --tensor_parallel_size 2 --enforce_eager True"
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-vllm-ray:${TAG:-latest}
|
||||
container_name: llm-vllm-ray-gaudi-server
|
||||
depends_on:
|
||||
- vllm-ray-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
vLLM_RAY_ENDPOINT: ${vLLM_RAY_LLM_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
LLM_MODEL: ${LLM_MODEL_ID}
|
||||
restart: unless-stopped
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-backend-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
- embedding
|
||||
- retriever
|
||||
- tei-reranking-service
|
||||
- reranking
|
||||
- vllm-ray-service
|
||||
- llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
@@ -160,11 +110,14 @@ services:
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
|
||||
- RERANK_SERVER_HOST_IP=${RERANK_SERVER_HOST_IP}
|
||||
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-8808}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8006}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
|
||||
@@ -45,20 +45,6 @@ services:
|
||||
INIT_HCCL_ON_ACQUIRE: 0
|
||||
ENABLE_EXPERIMENTAL_FLAGS: true
|
||||
command: --model-id ${EMBEDDING_MODEL_ID} --auto-truncate
|
||||
embedding:
|
||||
image: ${REGISTRY:-opea}/embedding-tei:${TAG:-latest}
|
||||
container_name: embedding-tei-server
|
||||
depends_on:
|
||||
- tei-embedding-service
|
||||
ports:
|
||||
- "6000:6000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
|
||||
restart: unless-stopped
|
||||
retriever:
|
||||
image: ${REGISTRY:-opea}/retriever-redis:${TAG:-latest}
|
||||
container_name: retriever-redis-server
|
||||
@@ -99,33 +85,14 @@ services:
|
||||
- SYS_NICE
|
||||
ipc: host
|
||||
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
|
||||
llm:
|
||||
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
|
||||
container_name: llm-tgi-gaudi-server
|
||||
depends_on:
|
||||
- tgi-service
|
||||
ports:
|
||||
- "9000:9000"
|
||||
ipc: host
|
||||
environment:
|
||||
no_proxy: ${no_proxy}
|
||||
http_proxy: ${http_proxy}
|
||||
https_proxy: ${https_proxy}
|
||||
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
|
||||
HF_HUB_DISABLE_PROGRESS_BARS: 1
|
||||
HF_HUB_ENABLE_HF_TRANSFER: 0
|
||||
restart: unless-stopped
|
||||
chaqna-gaudi-backend-server:
|
||||
image: ${REGISTRY:-opea}/chatqna-without-rerank:${TAG:-latest}
|
||||
container_name: chatqna-gaudi-backend-server
|
||||
depends_on:
|
||||
- redis-vector-db
|
||||
- tei-embedding-service
|
||||
- embedding
|
||||
- retriever
|
||||
- tgi-service
|
||||
- llm
|
||||
ports:
|
||||
- "8888:8888"
|
||||
environment:
|
||||
@@ -133,9 +100,12 @@ services:
|
||||
- https_proxy=${https_proxy}
|
||||
- http_proxy=${http_proxy}
|
||||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
|
||||
- EMBEDDING_SERVER_HOST_IP=${EMBEDDING_SERVER_HOST_IP}
|
||||
- EMBEDDING_SERVER_PORT=${EMBEDDING_SERVER_PORT:-8090}
|
||||
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
|
||||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
|
||||
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
|
||||
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-8005}
|
||||
- LOGFLAG=${LOGFLAG}
|
||||
ipc: host
|
||||
restart: always
|
||||
chaqna-gaudi-ui-server:
|
||||
|
||||
@@ -26,14 +26,6 @@ The warning messages point out the veriabls are **NOT** set.
|
||||
|
||||
```
|
||||
ubuntu@gaudi-vm:~/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d
|
||||
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_API_KEY" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] The "LANGCHAIN_TRACING_V2" variable is not set. Defaulting to a blank string.
|
||||
WARN[0000] /home/ubuntu/GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: `version` is obsolete
|
||||
```
|
||||
|
||||
@@ -172,24 +164,7 @@ This test the embedding service. It sends "What is Deep Learning?" to the embedd
|
||||
|
||||
**Note**: The vector dimension are decided by the embedding model and the output value is dependent on model and input data.
|
||||
|
||||
### 2 Embedding Microservice
|
||||
|
||||
```
|
||||
curl http://${host_ip}:6000/v1/embeddings\
|
||||
-X POST \
|
||||
-d '{"text":"What is Deep Learning?"}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
This test the embedding microservice. In this test, it sends out `What is Deep Learning?` to embedding.
|
||||
Embedding microservice get input data, call embedding service to embedding data.
|
||||
Embedding server are with NO state, but microservice keep the state. There is `id` in the output of `Embedding Microservice`.
|
||||
|
||||
```
|
||||
{"id":"e8c85e588a235a4bc4747a23b3a71d8f","text":"What is Deep Learning?","embedding":[0.00030903306,-0.06356524,0.0025720573,-0.012404448,0.050649878, ..., 0.02776986,-0.0246678,0.03999176,0.037477136,-0.006806653,0.02261455,-0.04570737,-0.033122733,0.022785513,0.0160026,-0.021343587,-0.029969815,-0.0049176104]}
|
||||
```
|
||||
|
||||
### 3 Retriever Microservice
|
||||
### 2 Retriever Microservice
|
||||
|
||||
To consume the retriever microservice, you need to generate a mock embedding vector by Python script.
|
||||
The length of embedding vector is determined by the embedding model.
|
||||
@@ -212,7 +187,7 @@ The output is retrieved text that relevant to the input data:
|
||||
|
||||
```
|
||||
|
||||
### 4 TEI Reranking Service
|
||||
### 3 TEI Reranking Service
|
||||
|
||||
Reranking service
|
||||
|
||||
@@ -228,24 +203,7 @@ Output is:
|
||||
|
||||
It scores the input
|
||||
|
||||
### 5 Reranking Microservice
|
||||
|
||||
```
|
||||
curl http://${host_ip}:8000/v1/reranking\
|
||||
-X POST \
|
||||
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
Here is the output:
|
||||
|
||||
```
|
||||
{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}
|
||||
```
|
||||
|
||||
You may notice reranking microservice are with state ('ID' and other meta data), while reranking service are not.
|
||||
|
||||
### 6 TGI Service
|
||||
### 4 TGI Service
|
||||
|
||||
```
|
||||
curl http://${host_ip}:8008/generate \
|
||||
@@ -277,56 +235,7 @@ and the log shows model warm up, please wait for a while and try it later.
|
||||
2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model
|
||||
```
|
||||
|
||||
### 7 LLM Microservice
|
||||
|
||||
```
|
||||
curl http://${host_ip}:9000/v1/chat/completions\
|
||||
-X POST \
|
||||
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
|
||||
-H 'Content-Type: application/json'
|
||||
```
|
||||
|
||||
You will get generated text from LLM:
|
||||
|
||||
```
|
||||
data: b'\n'
|
||||
|
||||
data: b'\n'
|
||||
|
||||
data: b'Deep'
|
||||
|
||||
data: b' learning'
|
||||
|
||||
data: b' is'
|
||||
|
||||
data: b' a'
|
||||
|
||||
data: b' subset'
|
||||
|
||||
data: b' of'
|
||||
|
||||
data: b' machine'
|
||||
|
||||
data: b' learning'
|
||||
|
||||
data: b' that'
|
||||
|
||||
data: b' uses'
|
||||
|
||||
data: b' algorithms'
|
||||
|
||||
data: b' to'
|
||||
|
||||
data: b' learn'
|
||||
|
||||
data: b' from'
|
||||
|
||||
data: b' data'
|
||||
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
### 8 MegaService
|
||||
### 5 MegaService
|
||||
|
||||
```
|
||||
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
|
||||
|
||||
@@ -8,15 +8,13 @@ export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
|
||||
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
|
||||
export TGI_LLM_ENDPOINT="http://${host_ip}:8005"
|
||||
export REDIS_URL="redis://${host_ip}:6379"
|
||||
export INDEX_NAME="rag-redis"
|
||||
export MEGA_SERVICE_HOST_IP=${host_ip}
|
||||
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
|
||||
export EMBEDDING_SERVER_HOST_IP=${host_ip}
|
||||
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
|
||||
export RERANK_SERVICE_HOST_IP=${host_ip}
|
||||
export LLM_SERVICE_HOST_IP=${host_ip}
|
||||
export RERANK_SERVER_HOST_IP=${host_ip}
|
||||
export LLM_SERVER_HOST_IP=${host_ip}
|
||||
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
|
||||
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
|
||||
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
|
||||
|
||||
Reference in New Issue
Block a user