[ SearchQnA ] Refine documents (#1803)

Signed-off-by: WenjiaoYue <wenjiao.yue@intel.com>
This commit is contained in:
WenjiaoYue
2025-04-21 17:16:41 +08:00
committed by GitHub
parent 697f78ea71
commit 52c4db2fc6
5 changed files with 519 additions and 843 deletions

View File

@@ -16,7 +16,14 @@ Operating within the LangChain framework, the Google Search QnA chatbot mimics h
By integrating search capabilities with LLMs within the LangChain framework, this Google Search QnA chatbot delivers comprehensive and precise answers, akin to human search behavior.
The workflow falls into the following architecture:
## Table of contents
1. [Architecture](#architecture)
2. [Deployment Options](#deployment-options)
## Architecture
The architecture of the SearchQnA Application is illustrated below:
![architecture](./assets/img/searchqna.png)
@@ -85,104 +92,14 @@ flowchart LR
```
## Deploy SearchQnA Service
This SearchQnA use case performs Search-augmented Question Answering across multiple platforms. Currently, we provide the example for Intel® Gaudi® 2 and Intel® Xeon® Scalable Processors, and we invite contributions from other hardware vendors to expand OPEA ecosystem.
The SearchQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processors.
## Deployment Options
Currently we support two ways of deploying SearchQnA services with docker compose:
The table below lists the available deployment options and their implementation details for different hardware platforms.
1. Start services using the docker image on `docker hub`:
```bash
docker pull opea/searchqna:latest
```
2. Start services using the docker images `built from source`: [Guide](https://github.com/opea-project/GenAIExamples/tree/main/SearchQnA/docker_compose/)
### Setup Environment Variable
To set up environment variables for deploying SearchQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export GOOGLE_CSE_ID="Your_CSE_ID"
export GOOGLE_API_KEY="Your_Google_API_Key"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
```bash
source ./docker_compose/set_env.sh
```
### Deploy SearchQnA on Gaudi
If your version of `Habana Driver` < 1.16.0 (check with `hl-smi`), run the following command directly to start SearchQnA services. Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
```bash
cd GenAIExamples/SearchQnA/docker_compose/intel/hpu/gaudi/
docker compose up -d
```
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
### Deploy SearchQnA on Xeon
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
```bash
cd GenAIExamples/SearchQnA/docker_compose/intel/cpu/xeon/
docker compose up -d
```
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
## Consume SearchQnA Service
Two ways of consuming SearchQnA Service:
1. Use cURL command on terminal
```bash
curl http://${host_ip}:3008/v1/searchqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "True"
}'
```
2. Access via frontend
To access the frontend, open the following URL in your browser: http://{host_ip}:5173.
By default, the UI runs on port 5173 internally.
## Troubleshooting
1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
```bash
http_proxy=""
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:3008, the port may be allocated by other users, you can modify the `compose.yaml`.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
| Category | Deployment Option | Description |
| ---------------------- | ---------------------- | -------------------------------------------------------------- |
| On-premise Deployments | Docker Compose (Xeon) | [DocSum deployment on Xeon](./docker_compose/intel/cpu/xeon) |
| | Docker Compose (Gaudi) | [DocSum deployment on Gaudi](./docker_compose/intel/hpu/gaudi) |
| | Docker Compose (ROCm) | [DocSum deployment on AMD ROCm](./docker_compose/amd/gpu/rocm) |

View File

@@ -0,0 +1,48 @@
# SearchQnA Docker Image Build
## Table of Contents
1. [Build MegaService Docker Image](#build-megaservice-docker-image)
2. [Build UI Docker Image](#build-ui-docker-image)
3. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
4. [Troubleshooting](#troubleshooting)
## Build MegaService Docker Image
To build the SearchQnA Megaservice, use the [GenAIExamples](https://github.com/opea-project/GenAIExamples.git) repository.
Use the following command to build the Megaservice Docker image:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/SearchQnA
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
## Build UI Docker Image
Build frontend Docker image via below command:
```bash
cd GenAIExamples/SearchQnA/ui
docker build -t opea/searchqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
## Generate a HuggingFace Access Token
Some HuggingFace resources require an access token. Developers can create one by first signing up on [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
## Troubleshooting
1. If errors such as "Access Denied" occur, validate the [microservice](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) that is querying the embed API. A simple example:
```bash
http_proxy=""
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:3008, the port might already be in use by another service, you can modify the `compose.yaml`.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.

View File

@@ -1,532 +1,116 @@
# Build and deploy SearchQnA Application on AMD GPU (ROCm)
# Example SearchQnA deployments on AMD GPU (ROCm)
## Build Docker Images
This document outlines the deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD GPU (ROCm).
### 1. Build Docker Image
This example includes the following sections:
- #### Create application install directory and go to it:
- [SearchQnA Quick Start Deployment](#searchqna-quick-start-deployment): Demonstrates how to quickly deploy a SearchQnA application/pipeline on AMD GPU platform.
- [SearchQnA Docker Compose Files](#searchqna-docker-compose-files): Describes some example deployments and their docker compose files.
- [Launch the UI](#launch-the-ui): Guideline for UI usage
```bash
mkdir ~/searchqna-install && cd searchqna-install
```
## SearchQnA Quick Start Deployment
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
This section describes how to quickly deploy and test the SearchQnA service manually on AMD GPU (ROCm). The basic steps are:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
```
1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
5. [Check the Deployment Status](#check-the-deployment-status)
6. [Test the Pipeline](#test-the-pipeline)
7. [Cleanup the Deployment](#cleanup-the-deployment)
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
### Access the Code
```bash
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
```
We remind you that when using a specific version of the code, you need to use the README from this version:
- #### Go to build directory:
```bash
cd ~/searchqna-install/GenAIExamples/SearchQnA/docker_image_build
```
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
```bash
echo Y | rm -R GenAIComps
```
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
```bash
git clone https://github.com/opea-project/GenAIComps.git
```
If you use a specific tag of the GenAIExamples repository,
then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value):
```bash
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3
```
We remind you that when using a specific version of the code, you need to use the README from this version.
- #### Setting the list of images for the build (from the build file.yaml)
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
#### vLLM-based application
```bash
service_list="vllm-rocm llm-textgen reranking web-retriever embedding searchqna-ui searchqna"
```
#### TGI-based application
```bash
service_list="llm-textgen reranking web-retriever embedding searchqna-ui searchqna"
```
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
```bash
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
```
- #### Build Docker Images
```bash
docker compose -f build.yaml build ${service_list} --no-cache
```
After the build, we check the list of images with the command:
```bash
docker image ls
```
The list of images should include:
##### vLLM-based application:
- opea/vllm-rocm:latest
- opea/llm-textgen:latest
- opea/reranking:latest
- opea/searchqna:latest
- opea/searchqna-ui:latest
- opea/web-retriever:latest
##### TGI-based application:
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- opea/llm-textgen:latest
- opea/reranking:latest
- opea/searchqna:latest
- opea/searchqna-ui:latest
- opea/web-retriever:latest
---
## Deploy the SearchQnA Application
### Docker Compose Configuration for AMD GPUs
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
- compose_vllm.yaml - for vLLM-based application
- compose.yaml - for TGI-based
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
**How to Identify GPU Device IDs:**
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
### Set deploy environment variables
#### Setting variables in the operating system environment:
##### Set variable HUGGINGFACEHUB_API_TOKEN:
Clone the GenAIExample repository and access the SearchQnA AMD GPU (ROCm) Docker Compose files and supporting scripts:
```bash
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
### Replace the string 'your_google_api_token' with your Google API access token
export GOOGLE_API_KEY='your_google_api_token'
### Replace the string 'your_google_cse_id' with your Google CSE ID
export GOOGLE_CSE_ID='your_google_cse_id'
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm
```
#### Set variables value in set_env\*\*\*\*.sh file:
Go to Docker Compose directory:
Checkout a released version, such as v1.2:
```bash
cd ~/searchqna-install/GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm
git checkout v1.2
```
The example uses the Nano text editor. You can use any convenient text editor:
### Generate a HuggingFace Access Token
#### If you use vLLM
Some HuggingFace resources require an access token. Developers can create one by first signing up on [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
```bash
nano set_env_vllm.sh
### Configure the Deployment Environment
To set up environment variables for deploying SearchQnA services, source the _setup_env.sh_ script in this directory:
```
//with TGI:
source ./set_env.sh
```
#### If you use TGI
```bash
nano set_env.sh
```
//with VLLM:
source ./set_env_vllm.sh
```
If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
Set the values of the variables:
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
We set these values in the file set_env\*\*\*\*.sh
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
#### Set variables with script set_env\*\*\*\*.sh
#### If you use vLLM
```bash
. set_env_vllm.sh
```
#### If you use TGI
```bash
. set_env.sh
```
### Start the services:
#### If you use vLLM
```bash
docker compose -f compose_vllm.yaml up -d
```
#### If you use TGI
The _setup_env.sh_ script will prompt for required and optional environment variables used to configure the SearchQnA services based on TGI. The _setup_env_vllm.sh_ script will prompt for required and optional environment variables used to configure the SearchQnA services based on VLLM. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [SearchQnA Service configuration](#SearchQnA-service-configuration) for information on how service specific configuration parameters affect deployments.
### Deploy the Services Using Docker Compose
To deploy the SearchQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
```bash
//with TGI:
docker compose -f compose.yaml up -d
```
All containers should be running and should not restart:
```bash
//with VLLM:
docker compose -f compose_vllm.yaml up -d
```
##### If you use vLLM:
**Note**: developers should build docker image from source when:
- search-vllm-service
- search-llm-server
- search-web-retriever-server
- search-tei-embedding-server
- search-tei-reranking-server
- search-reranking-server
- search-embedding-server
- search-backend-server
- search-ui-server
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
- Unable to download the docker image.
- Use a specific version of Docker image.
##### If you use TGI:
Please refer to the table below to build different microservices from source:
- search-tgi-service
- search-llm-server
- search-web-retriever-server
- search-tei-embedding-server
- search-tei-reranking-server
- search-reranking-server
- search-embedding-server
- search-backend-server
- search-ui-server
| Microservice | Deployment Guide |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| Reranking | [whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings/src) |
| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) |
| LLM-TextGen | [LLM-TextGen build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation#1-build-docker-image) |
| Web-Retriever | [Web-Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/web_retrievers/src) |
| Embedding | [Embedding build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/src) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
---
### Check the Deployment Status
## Validate the Services
After running Docker Compose, the list of images can be checked using the following command:
### 1. Validate the vLLM/TGI Service
```
docker ps -a
```
#### If you use vLLM:
For the default deployment, the following containers should have started
### Test the Pipeline
Once the SearchQnA services are running, test the pipeline using the following command:
```bash
DATA='{"model": "Intel/neural-chat-7b-v3-3", '\
'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 32}'
DATA='{"messages": "What is the latest news from the AI world? '\
'Give me a summary.","stream": "True"}'
curl http://${HOST_IP}:${SEARCH_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
curl http://${host_ip}:3008/v1/searchqna \
-H "Content-Type: application/json" \
-d "$DATA"
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "chatcmpl-a3761920c4034131b3cab073b8e8b841",
"object": "chat.completion",
"created": 1742959065,
"model": "Intel/neural-chat-7b-v3-3",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Deep Learning refers to a modern approach of Artificial Intelligence that aims to replicate the way human brains process information by teaching computers to learn from data without extensive programming",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 15, "total_tokens": 47, "completion_tokens": 32, "prompt_tokens_details": null },
"prompt_logprobs": null
}
```
If the service response has a meaningful response in the value of the "choices.message.content" key,
then we consider the vLLM service to be successfully launched
#### If you use TGI:
```bash
DATA='{"inputs":"What is Deep Learning?",'\
'"parameters":{"max_new_tokens":256,"do_sample": true}}'
curl http://${HOST_IP}:${SEARCH_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"generated_text": "\n\nDeep Learning is a subset of machine learning, which focuses on developing methods inspired by the functioning of the human brain; more specifically, the way it processes and acquires various types of knowledge and information. To enable deep learning, the networks are composed of multiple processing layers that form a hierarchy, with each layer learning more complex and abstraction levels of data representation.\n\nThe principle of Deep Learning is to emulate the structure of neurons in the human brain to construct artificial neural networks capable to accomplish complicated pattern recognition tasks more effectively and accurately. Therefore, these neural networks contain a series of hierarchical components, where units in earlier layers receive simple inputs and are activated by these inputs. The activation of the units in later layers are the results of multiple nonlinear transformations generated from reconstructing and integrating the information in previous layers. In other words, by combining various pieces of information at each layer, a Deep Learning network can extract the input features that best represent the structure of data, providing their outputs at the last layer or final level of abstraction.\n\nThe main idea of using these 'deep' networks in contrast to regular algorithms is that they are capable of representing hierarchical relationships that exist within the data and learn these representations by"
}
```
If the service response has a meaningful response in the value of the "generated_text" key,
then we consider the TGI service to be successfully launched
### 2. Validate the LLM Service
```bash
DATA='{"query":"What is Deep Learning?",'\
'"max_tokens":32,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\
'"repetition_penalty":1.03,"stream":false}'
curl http://${HOST_IP}:${SEARCH_LLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "cmpl-0b974d00a7604c2ab8b721ebf6b88ae3",
"choices": [
{
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": "\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial",
"stop_reason": null,
"prompt_logprobs": null
}
],
"created": 1742959134,
"model": "Intel/neural-chat-7b-v3-3",
"object": "text_completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 32,
"prompt_tokens": 6,
"total_tokens": 38,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}
```
### 3. Validate TEI Embedding service
```bash
curl http://${HOST_IP}:${SEARCH_TEI_EMBEDDING_PORT}/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to text:
```textmate
[[0.00037115702,-0.06356819,..................,-0.02125421,-0.02984927,-0.0049473033]]
```
If the response text is similar to the one above, then we consider the service verification successful.
### 4. Validate Embedding service
```bash
curl http://${HOST_IP}:${SEARCH_EMBEDDING_SERVICE_PORT}/v1/embeddings \
-X POST \
-d '{"input":"Hello!"}' \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"object": "list",
"model": "BAAI/bge-base-en-v1.5",
"data": [
{ "index": 0, "object": "embedding", "embedding": [0.010614655, 0.019818036, "******", 0.06571652, -0.019738553] }
],
"usage": { "prompt_tokens": 4, "total_tokens": 4, "completion_tokens": 0 }
}
```
If the response JSON is similar to the one above, then we consider the service verification successful.
### 5. Validate Web Retriever service
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${HOST_IP}:${SEARCH_WEB_RETRIEVER_SERVICE_PORT}/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "ec32c767e0ae107c4943b634648c9752",
"retrieved_docs": [
{
"downstream_black_list": [],
"id": "ab002cd89cd20d9229adae1e091c7e2d",
"text": "2025\n\n * ### New Years Day 2024/2025 \n\nWednesday, January 1, 2025 Early Close (2:00 p.m. Eastern Time): Tuesday,\nDecember 31, 2024\n\n * ### Martin Luther King Day \n\nMonday, January 20, 2025\n\n * ### Presidents Day \n\nMonday, February 17, 2025\n\n * ### Good Friday \n\nFriday, April 18, 2025 Early Close (2:00 p.m. Eastern Time): Thursday, April\n17, 2025\n\n * ### Memorial Day \n\nMonday, May 26, 2025 Early Close (2:00 p.m. Eastern Time): Friday, May 23,\n2025\n\n * ### Juneteenth \n\nThursday, June 19, 2025\n\n * ### U.S. Independence Day \n\nFriday, July 4, 2025 Early Close (2:00 p.m. Eastern Time): Thursday, July 3,\n2025\n\n * ### Labor Day \n\nMonday, September 1, 2025\n\n * ### Columbus Day \n\nMonday, October 13, 2025\n\n * ### Veterans Day \n\nTuesday, November 11, 2025\n\n * ### Thanksgiving Day \n\nThursday, November 27, 2025 Early Close (2:00 p.m. Eastern Time): Friday,\nNovember 28, 2025\n\n * ### Christmas Day \n\nThursday, December 25, 2025 Early Close (2:00 p.m. Eastern Time): Wednesday,\nDecember 24, 2025\n\n * ### New Years Day 2025/2026 \n\nThursday, January 1, 2026 Early Close (2:00 p.m. Eastern Time): Wednesday,\nDecember 31, 2025\n\n2026\n\n * ### New Years Day 2025/2026 \n\nThursday, January 1, 2026 Early Close (2:00 p.m. Eastern Time): Wednesday,\nDecember 31, 2025\n\n * ### Martin Luther King Day \n\nMonday, January 19, 2026\n\n * ### Presidents Day \n\nMonday, February 16, 2026\n\n * ### Good Friday \n description: \n \n title: \n Holiday Schedule - SIFMA - Holiday Schedule - SIFMA\n \n \n source: https://www.sifma.org/resources/general/holiday-schedule/ \n"
},
{
"downstream_black_list": [],
"id": "f498f4a1357bfbc631a5d67663c64680",
"text": "Monday, May 26, 2025\n\n * ### Juneteenth \n\nThursday, June 19, 2025\n\n * ### U.S. Independence Day \n\nFriday, July 4, 2025\n\n * ### Summer Bank Holiday \n\nMonday, August 25, 2025\n\n * ### Labor Day \n\nMonday, September 1, 2025\n\n * ### Columbus Day \n\nMonday, October 13, 2025\n\n * ### Veterans Day \n\nTuesday, November 11, 2025\n\n * ### Thanksgiving Day \n\nThursday, November 27, 2025\n\n * ### Christmas Day \n\nThursday, December 25, 2025\n\n * ### Boxing Day \n\nFriday, December 26, 2025\n\n * ### New Years Day 2025/2026 \n\nThursday, January 1, 2026\n\n2026\n\n * ### New Years Day 2025/2026 \n\nThursday, January 1, 2026\n\n * ### Martin Luther King Day \n\nMonday, January 19, 2026\n\n * ### Presidents Day \n\nMonday, February 16, 2026\n\n * ### Good Friday \n\nFriday, April 3, 2026\n\n * ### Easter Monday \n\nMonday, April 6, 2026\n\n * ### May Day \n\nMonday, May 4, 2026\n\n * ### Memorial Day \n\nMonday, May 25, 2026\n\n * ### Spring Bank Holiday \n\nMonday, May 25, 2026\n\n * ### Juneteenth \n\nFriday, June 19, 2026\n\n * ### U.S. Independence Day \n\nFriday, July 3, 2026\n\n * ### Summer Bank Holiday \n\nMonday, August 31, 2026\n\n * ### Labor Day \n\nMonday, September 7, 2026\n\n * ### Columbus Day \n\nMonday, October 12, 2026\n\n * ### Veterans Day \n\nWednesday, November 11, 2026\n\n * ### Thanksgiving Day \n\nThursday, November 26, 2026\n\n * ### Christmas Day \n\nFriday, December 25, 2026\n\n * ### Boxing Day (Substitute) \n description: \n \n title: \n Holiday Schedule - SIFMA - Holiday Schedule - SIFMA\n \n \n source: https://www.sifma.org/resources/general/holiday-schedule/ \n"
},
{
"downstream_black_list": [],
"id": "3a845fba37a225ee3a67601cfa51f6d6",
"text": "**Holiday** | **2024** | **Non-Management, Supervisory Units** | **Department of Corrections Employees** | **State Police Unit** | **Exempt, Managerial, and Confidential** \n---|---|---|---|---|--- \n**New Years Day** | **Monday, January 1, 2024** | Observed | Observed | Observed | Observed \n**Martin Luther King Jr. Day** | **Monday, January 15, 2024** | Observed | Observed | Observed | Observed \n**Presidents' Day** | **Monday, February 19, 2024** | Observed | Observed | Observed | Observed \n**Town Meeting Day** | **Tuesday, \nMarch 5, 2024** | Observed | Observed | Observed | Observed \n**Memorial Day** | **Monday, \nMay 27, 2024** | Observed | Observed | Observed | Observed \n**Independence Day** | **Thursday, \nJuly 4, 2024** | Observed | Observed | Observed | Observed \n**Bennington Battle Day** | **Friday, \nAugust 16, 2024** | Observed | **Not Observed** | **Not Observed** | Observed \n**Labor Day** | **Monday, September 2, 2024** | Observed | Observed | Observed | Observed \n**Indigenous Peoples' Day** | **Monday, October 14, 2024** | **Not Observed** | Observed | Observed | **Not Observed** \n**Veterans' Day** | **Monday, November 11, 2024** | Observed | Observed | Observed | Observed \n**Thanksgiving Day** | **Thursday, November 28, 2024** | Observed | Observed | Observed | Observed \n**Christmas Day** | **Wednesday, December 25, 2024** | Observed | Observed | Observed | Observed \n title: State Holiday Schedule | Department of Human Resources \n \n source: https://humanresources.vermont.gov/benefits-wellness/holiday-schedule \n"
},
{
"downstream_black_list": [],
"id": "34926c9655c38d2af761833d57c8ab8a",
"text": "* ### Good Friday \n\nNone Early Close (12:00 p.m. Eastern Time): Friday, April 3, 2026 - Tentative\n- pending confirmation of scheduled release of BLS employment report\n\n * ### Memorial Day \n\nMonday, May 25, 2026 Early Close (2:00 p.m. Eastern Time): Friday, May 22,\n2026\n\n * ### Juneteenth \n\nFriday, June 19, 2026\n\n * ### U.S. Independence Day (observed) \n\nFriday, July 3, 2026 Early Close (2:00 p.m. Eastern Time): Thursday, July 2,\n2026\n\n * ### Labor Day \n\nMonday, September 7, 2026\n\n * ### Columbus Day \n\nMonday, October 12, 2026\n\n * ### Veterans Day \n\nWednesday, November 11, 2026\n\n * ### Thanksgiving Day \n\nThursday, November 26, 2026 Early Close (2:00 p.m. Eastern Time): Friday,\nNovember 27, 2026\n\n * ### Christmas Day \n\nFriday, December 25, 2026 Early Close (2:00 p.m. Eastern Time): Thursday,\nDecember 24, 2026\n\n * ### New Years Day 2026/2027 \n\nFriday, January 1, 2027 Early Close (2:00 p.m. Eastern Time): Thursday,\nDecember 31, 2026\n\nArchive\n\n### U.K. Holiday Recommendations\n\n2025\n\n * ### New Years Day 2024/2025 \n\nWednesday, January 1, 2025\n\n * ### Martin Luther King Day \n\nMonday, January 20, 2025\n\n * ### Presidents Day \n\nMonday, February 17, 2025\n\n * ### Good Friday \n\nFriday, April 18, 2025\n\n * ### Easter Monday \n\nMonday, April 21, 2025\n\n * ### May Day \n\nMonday, May 5, 2025\n\n * ### Memorial Day \n\nMonday, May 26, 2025\n\n * ### Spring Bank Holiday \n\nMonday, May 26, 2025\n\n * ### Juneteenth \n description: \n \n title: \n Holiday Schedule - SIFMA - Holiday Schedule - SIFMA\n \n \n source: https://www.sifma.org/resources/general/holiday-schedule/ \n"
}
],
"initial_query": "What is the 2024 holiday schedule?",
"top_n": 1
}
```
If the response JSON is similar to the one above, then we consider the service verification successful.
### 6. Validate the TEI Reranking Service
```bash
DATA='{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}'
curl http://${HOST_IP}:${SEARCH_TEI_RERANKING_PORT}/rerank \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
[
{ "index": 1, "score": 0.94238955 },
{ "index": 0, "score": 0.120219156 }
]
```
If the response JSON is similar to the one above, then we consider the service verification successful.
### 7. Validate Reranking service
```bash
DATA='{"initial_query":"What is Deep Learning?", "retrieved_docs": '\
'[{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}'
curl http://${HOST_IP}:${SEARCH_RERANK_SERVICE_PORT}/v1/reranking \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "d44b5be4002e8e2cc3b6a4861e396093",
"model": null,
"query": "What is Deep Learning?",
"max_tokens": 1024,
"max_new_tokens": 1024,
"top_k": 10,
"top_p": 0.95,
"typical_p": 0.95,
"temperature": 0.01,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"repetition_penalty": 1.03,
"stream": true,
"language": "auto",
"chat_template": null,
"documents": ["Deep learning is..."]
}
```
If the response JSON is similar to the one above, then we consider the service verification successful.
### 8. Validate MegaService
```bash
DATA='{"messages": "What is the latest news from the AI world? '\
'Give me a summary.","stream": "True"}'
curl http://${HOST_IP}:${SEARCH_BACKEND_SERVICE_PORT}/v1/searchqna \
-H "Content-Type: application/json" \
-d "$DATA"
```
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
Checking the response from the service. The response should be similar to JSON:
@@ -545,32 +129,41 @@ data: {"id":"cmpl-f095893d094a4e9989423c2364f00bc1","choices":[{"finish_reason":
data: [DONE]
```
If the response text is similar to the one above, then we consider the service verification successful.
A response text similar to the one above indicates that the service verification was successful.
### 9. Validate Frontend
### Cleanup the Deployment
To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${SEARCH_FRONTEND_SERVICE_PORT} A page should open when you click through to this address:
To stop the containers associated with the deployment, execute the following command:
```bash
//with TGI:
docker compose -f compose.yaml down
```
```bash
//with VLLM:
docker compose -f compose_vllm.yaml down
```
All the SearchQnA containers will be stopped and then removed on completion of the "down" command.
## SearchQnA Docker Compose Files
When deploying the SearchQnA pipeline on AMD GPUs (ROCm), different large language model serving frameworks can be selected. The table below outlines the available configurations included in the application.
| File | Description |
| ---------------------------------------- | ------------------------------------------------------------------------------------------ |
| [compose.yaml](./compose.yaml) | Default compose file using tgi as serving framework |
| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM serving framework is vLLM. All other configurations remain the same as the default |
## Launch the UI
Access the UI at http://${EXTERNAL_HOST_IP}:${SEARCH_FRONTEND_SERVICE_PORT}. A page should open when navigating to this address.
![UI start page](../../../../assets/img/searchqna-ui-starting-page.png)
If a page of this type has opened, then we believe that the service is running and responding, and we can proceed to functional UI testing.
The appearance of such a page indicates that the service is operational and responsive, allowing functional UI testing to proceed.
Let's enter the task for the service in the "Enter prompt here" field. For example, "What is DeepLearning?" and press Enter. After that, a page with the result of the task should open:
![UI start page](../../../../assets/img/searchqna-ui-response-example.png)
If the result shown on the page is correct, then we consider the verification of the UI service to be successful.
### 10. Stop application
#### If you use vLLM
```bash
cd ~/searchqna-install/GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use TGI
```bash
cd ~/searchqna-install/GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
```
A correct result displayed on the page indicates that the UI service has been successfully verified.

View File

@@ -1,155 +1,213 @@
# Build Mega Service of SearchQnA on Xeon
# Deploying SearchQnA on Intel® Xeon® Processors
This document outlines the deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server.
This document outlines the single node deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server.
## 🚀 Build Docker images
## Table of Contents
### 1. Build Embedding Image
1. [SearchQnA Quick Start Deployment](#searchqna-quick-start-deployment)
2. [SearchQnA Docker Compose Files](#searchqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build --no-cache -t opea/embedding:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/src/Dockerfile .
```
## SearchQnA Quick Start Deployment
### 2. Build Retriever Image
This section describes how to quickly deploy and test the SearchQnA service manually on an Intel® Xeon® processor. The basic steps are:
```bash
docker build --no-cache -t opea/web-retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/web_retrievers/src/Dockerfile .
```
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
### 3. Build Rerank Image
### Access the Code
```bash
docker build --no-cache -t opea/reranking:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/rerankings/src/Dockerfile .
```
### 4. Build LLM Image
```bash
docker build --no-cache -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `searchqna.py` Python script. Build the MegaService Docker image using the command below:
Clone the GenAIExample repository and access the SearchQnA Intel® Xeon® platform Docker Compose files and supporting scripts:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/SearchQnA
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
### 6. Build UI Docker Image
Build frontend Docker image via below command:
Then checkout a released version, such as v1.2:
```bash
cd GenAIExamples/SearchQnA/ui
docker build --no-cache -t opea/opea/searchqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
git checkout v1.2
```
Then run the command `docker images`, you will have following images ready:
### Configure the Deployment Environment
1. `opea/embedding:latest`
2. `opea/web-retriever:latest`
3. `opea/reranking:latest`
4. `opea/llm-textgen:latest`
5. `opea/searchqna:latest`
6. `opea/searchqna-ui:latest`
## 🚀 Set the environment variables
Before starting the services with `docker compose`, you have to recheck the following environment variables.
To set up environment variables for deploying SearchQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
```bash
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export GOOGLE_CSE_ID=<your cse id>
export GOOGLE_API_KEY=<your google api key>
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
export TEI_EMBEDDING_ENDPOINT=http://${host_ip}:3001
export RERANK_MODEL_ID=BAAI/bge-reranker-base
export TEI_RERANKING_ENDPOINT=http://${host_ip}:3004
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/searchqna
export TGI_LLM_ENDPOINT=http://${host_ip}:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export WEB_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_PORT=3002
export WEB_RETRIEVER_SERVICE_PORT=3003
export RERANK_SERVICE_PORT=3005
export LLM_SERVICE_PORT=3007
export host_ip="External_Public_IP" # ip address of the node
export GOOGLE_CSE_ID="your cse id"
export GOOGLE_API_KEY="your google api key"
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env.sh
```
## 🚀 Start the MegaService
Consult the section on [SearchQnA Service configuration](#SearchQnA-configuration) for information on how service specific configuration parameters affect deployments.
### Deploy the Services Using Docker Compose
To deploy the SearchQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
```bash
cd GenAIExamples/SearchQnA/docker_compose/intel/cpu/xeon
docker compose up -d
cd docker_compose/intel/cpu/xeon
docker compose -f compose.yaml up -d
```
## 🚀 Test MicroServices
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
Please refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
| ------------ | -------------------------------------------------------------------------------------------------- |
| Embedding | [Embedding build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/src) |
| Retriever | [Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers/src) |
| Reranking | [Reranking build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings/src) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```bash
# tei
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
# embedding microservice
curl http://${host_ip}:3002/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
# web retriever microservice
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:3003/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
# tei reranking service
curl http://${host_ip}:3004/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
# reranking microservice
curl http://${host_ip}:3005/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
docker ps -a
```
## 🚀 Test MegaService
For the default deployment, the following containers should have started
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
### Validate the Pipeline
Once the SearchQnA services are running, test the pipeline using the following command:
```bash
curl http://${host_ip}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "True"
"stream": "true"
}'
```
**Note** : Access the SearchQnA UI by web browser through this URL: `http://${host_ip}:80`. Please confirm the `80` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
### Cleanup the Deployment
To stop the containers associated with the deployment, execute the following command:
```bash
docker compose -f compose.yaml down
```
## SearchQnA Docker Compose Files
When deploying a SearchQnA pipeline on an Intel® Xeon® platform, different large language model serving frameworks can be selected. The table below outlines the available configurations included in the application. These configurations can serve as templates and be extended to other components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| ------------------------------ | --------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
## Validate Microservices
1. Embedding backend Service
```bash
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. Embedding Microservice
```bash
curl http://${host_ip}:3002/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```
3. Web Retriever Microservice
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:3003/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
4. Reranking backend Service
```bash
# TEI Reranking service
curl http://${host_ip}:3004/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
5. Reranking Microservice
```bash
curl http://${host_ip}:3005/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```
6. LLM backend Service
```bash
# TGI service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
7. LLM Microservice
```bash
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
```
8. MegaService
```bash
curl http://${host_ip}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "true"
}'
```
9. Nginx Service
```bash
curl http://${host_ip}:${NGINX_PORT}/v1/searchqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "true"
}'
```
## Conclusion
This guide should enable developer to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -1,153 +1,213 @@
# Build Mega Service of SearchQnA on Gaudi
# Deploying searchqna on Intel® Gaudi® Processors
This document outlines the deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server.
This document outlines the single node deployment process for a SearchQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Gaudi server.
## 🚀 Build Docker Images
## Table of Contents
First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
1. [SearchQnA Quick Start Deployment](#searchqna-quick-start-deployment)
2. [SearchQnA Docker Compose Files](#searchqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
### 1. Build Embedding Image
## SearchQnA Quick Start Deployment
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build --no-cache -t opea/embedding:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/src/Dockerfile .
```
This section describes how to quickly deploy and test the searchqna service manually on an Intel® Gaudi® processor. The basic steps are:
### 2. Build Retriever Image
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
```bash
docker build --no-cache -t opea/web-retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/web_retrievers/src/Dockerfile .
```
### Access the Code
### 3. Build Rerank Image
```bash
docker build --no-cache -t opea/reranking:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/rerankings/src/Dockerfile .
```
### 4. Build LLM Image
```bash
docker build --no-cache -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `searchqna.py` Python script. Build the MegaService Docker image using the command below:
Clone the GenAIExample repository and access the searchqna Intel® Gaudi® platform Docker Compose files and supporting scripts:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/SearchQnA
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
Then you need to build the last Docker image `opea/searchqna:latest`, which represents the Mega service through following commands:
Then checkout a released version, such as v1.2:
```bash
cd GenAIExamples/SearchQnA
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
git checkout v1.2
```
Then run the command `docker images`, you will have
### Configure the Deployment Environment
1. `opea/embedding:latest`
2. `opea/web-retriever:latest`
3. `opea/reranking:latest`
4. `opea/llm-textgen:latest`
5. `opea/searchqna:latest`
## 🚀 Set the environment variables
Before starting the services with `docker compose`, you have to recheck the following environment variables.
To set up environment variables for deploying SearchQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
```bash
export host_ip=<your External Public IP>
export GOOGLE_CSE_ID=<your cse id>
export GOOGLE_API_KEY=<your google api key>
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
export TEI_EMBEDDING_ENDPOINT=http://$host_ip:3001
export RERANK_MODEL_ID=BAAI/bge-reranker-base
export TEI_RERANKING_ENDPOINT=http://$host_ip:3004
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export WEB_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_PORT=3002
export WEB_RETRIEVER_SERVICE_PORT=3003
export RERANK_SERVICE_PORT=3005
export LLM_SERVICE_PORT=3007
export host_ip="External_Public_IP" # ip address of the node
export GOOGLE_CSE_ID="your cse id"
export GOOGLE_API_KEY="your google api key"
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env.sh
```
## 🚀 Start the MegaService
Consult the section on [SearchQnA Service configuration](#SearchQnA-configuration) for information on how service specific configuration parameters affect deployments.
### Deploy the Services Using Docker Compose
To deploy the SearchQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
```bash
cd GenAIExamples/SearchQnA/docker_compose/intel/hpu/gaudi/
docker compose up -d
cd docker_compose/intel/hpu/gaudi
docker compose -f compose.yaml up -d
```
## 🚀 Test MicroServices
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
Please refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
| ------------ | -------------------------------------------------------------------------------------------------- |
| Embedding | [Embedding build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings/src) |
| Retriever | [Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers/src) |
| Reranking | [Reranking build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings/src) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```bash
# tei
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
# embedding microservice
curl http://${host_ip}:3002/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
# web retriever microservice
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:3003/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
# tei reranking service
curl http://${host_ip}:3004/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
# reranking microservice
curl http://${host_ip}:3005/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
docker ps -a
```
## 🚀 Test MegaService
For the default deployment, the following containers should have started
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
### Validate the Pipeline
Once the SearchQnA services are running, test the pipeline using the following command:
```bash
curl http://${host_ip}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "True"
"stream": "true"
}'
```
**Note** : Access the SearchQnA UI by web browser through this URL: `http://${host_ip}:80`. Please confirm the `80` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
### Cleanup the Deployment
To stop the containers associated with the deployment, execute the following command:
```bash
docker compose -f compose.yaml down
```
## SearchQnA Docker Compose Files
When deploying a SearchQnA pipeline on an Intel® Xeon® platform, different large language model serving frameworks can be selected. The table below outlines the available configurations included in the application. These configurations can serve as templates and be extended to other components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| ------------------------------ | --------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
## Validate Microservices
1. Embedding backend Service
```bash
curl http://${host_ip}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. Embedding Microservice
```bash
curl http://${host_ip}:3002/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```
3. Web Retriever Microservice
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:3003/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
4. Reranking backend Service
```bash
# TEI Reranking service
curl http://${host_ip}:3004/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
5. Reranking Microservice
```bash
curl http://${host_ip}:3005/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```
6. LLM backend Service
```bash
# TGI service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```
7. LLM Microservice
```bash
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
```
8. MegaService
```bash
curl http://${host_ip}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "true"
}'
```
9. Nginx Service
```bash
curl http://${host_ip}:${NGINX_PORT}/v1/searchqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "true"
}'
```
## Conclusion
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.