172 lines
5.6 KiB
Markdown
172 lines
5.6 KiB
Markdown
# Build Mega Service of SearchQnA on Gaudi
|
|
|
|
This document outlines the deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server.
|
|
|
|
## 🚀 Build Docker Images
|
|
|
|
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
|
|
|
|
### 1. Source Code install GenAIComps
|
|
|
|
```bash
|
|
git clone https://github.com/opea-project/GenAIComps.git
|
|
cd GenAIComps
|
|
```
|
|
|
|
### 2. Build Embedding Image
|
|
|
|
```bash
|
|
docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/langchain/docker/Dockerfile .
|
|
```
|
|
|
|
### 3. Build Retriever Image
|
|
|
|
```bash
|
|
docker build --no-cache -t opea/web-retriever-chroma:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/web_retrievers/langchain/chroma/docker/Dockerfile .
|
|
```
|
|
|
|
### 4. Build Rerank Image
|
|
|
|
```bash
|
|
docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/docker/Dockerfile .
|
|
```
|
|
|
|
### 5. Build LLM Image
|
|
|
|
```bash
|
|
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
|
|
```
|
|
|
|
### 6. Build TEI Gaudi Image
|
|
|
|
Since a TEI Gaudi Docker image hasn't been published, we'll need to build it from the [tei-guadi](https://github.com/huggingface/tei-gaudi) repository.
|
|
|
|
```bash
|
|
git clone https://github.com/huggingface/tei-gaudi
|
|
cd tei-gaudi/
|
|
docker build --no-cache -f Dockerfile-hpu -t opea/tei-gaudi:latest .
|
|
cd ../..
|
|
```
|
|
|
|
### 7. Build MegaService Docker Image
|
|
|
|
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `searchqna.py` Python script. Build the MegaService Docker image using the command below:
|
|
|
|
```bash
|
|
git clone https://github.com/opea-project/GenAIExamples.git
|
|
cd GenAIExamples/SearchQnA/docker
|
|
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
|
|
cd ../../..
|
|
```
|
|
|
|
Then you need to build the last Docker image `opea/searchqna:latest`, which represents the Mega service through following commands:
|
|
|
|
```bash
|
|
cd GenAIExamples/SearchQnA/docker
|
|
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
|
|
```
|
|
|
|
Then run the command `docker images`, you will have
|
|
|
|
1. `opea/tei-gaudi:latest`
|
|
2. `opea/embedding-tei:latest`
|
|
3. `opea/web-retriever-chroma:latest`
|
|
4. `opea/reranking-tei:latest`
|
|
5. `opea/llm-tgi:latest`
|
|
6. `opea/searchqna:latest`
|
|
|
|
## 🚀 Set the environment variables
|
|
|
|
Before starting the services with `docker compose`, you have to recheck the following environment variables.
|
|
|
|
```bash
|
|
export host_ip=<your External Public IP>
|
|
export GOOGLE_CSE_ID=<your cse id>
|
|
export GOOGLE_API_KEY=<your google api key>
|
|
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
|
|
|
|
export EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
|
|
export TEI_EMBEDDING_ENDPOINT=http://$host_ip:3001
|
|
export RERANK_MODEL_ID=BAAI/bge-reranker-base
|
|
export TEI_RERANKING_ENDPOINT=http://$host_ip:3004
|
|
|
|
export TGI_LLM_ENDPOINT=http://$host_ip:3006
|
|
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
|
|
|
|
export MEGA_SERVICE_HOST_IP=${host_ip}
|
|
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
|
|
export WEB_RETRIEVER_SERVICE_HOST_IP=${host_ip}
|
|
export RERANK_SERVICE_HOST_IP=${host_ip}
|
|
export LLM_SERVICE_HOST_IP=${host_ip}
|
|
|
|
export EMBEDDING_SERVICE_PORT=3002
|
|
export WEB_RETRIEVER_SERVICE_PORT=3003
|
|
export RERANK_SERVICE_PORT=3005
|
|
export LLM_SERVICE_PORT=3007
|
|
```
|
|
|
|
## 🚀 Start the MegaService
|
|
|
|
```bash
|
|
cd GenAIExamples/SearchQnA/docker/gaudi/
|
|
docker compose up -d
|
|
```
|
|
|
|
## 🚀 Test MicroServices
|
|
|
|
```bash
|
|
# tei
|
|
curl http://${host_ip}:3001/embed \
|
|
-X POST \
|
|
-d '{"inputs":"What is Deep Learning?"}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
# embedding microservice
|
|
curl http://${host_ip}:3002/v1/embeddings\
|
|
-X POST \
|
|
-d '{"text":"hello"}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
# web retriever microservice
|
|
your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
|
|
curl http://${host_ip}:3003/v1/web_retrieval \
|
|
-X POST \
|
|
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
|
|
-H 'Content-Type: application/json'
|
|
|
|
|
|
# tei reranking service
|
|
curl http://${host_ip}:3004/rerank \
|
|
-X POST \
|
|
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
# reranking microservice
|
|
curl http://${host_ip}:3005/v1/reranking\
|
|
-X POST \
|
|
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
# tgi service
|
|
curl http://${host_ip}:3006/generate \
|
|
-X POST \
|
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
# llm microservice
|
|
curl http://${host_ip}:3007/v1/chat/completions\
|
|
-X POST \
|
|
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
|
|
-H 'Content-Type: application/json'
|
|
|
|
```
|
|
|
|
## 🚀 Test MegaService
|
|
|
|
```bash
|
|
curl http://${host_ip}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
|
|
"messages": "What is the latest news? Give me also the source link.",
|
|
"stream": "True"
|
|
}'
|
|
```
|