Enable vllm for DocSum (#1716)

Set vllm as default llm serving, and add related docker compose files, readmes, and test scripts.

Fix issue #1436

Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Letong Han
2025-03-28 17:15:01 +08:00
committed by GitHub
parent 87baeb833d
commit d4dcbd18ef
12 changed files with 1403 additions and 317 deletions

View File

@@ -2,6 +2,8 @@
This document outlines the deployment process for a Document Summarization application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on an Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to Docker Hub soon, which will simplify the deployment process for this service.
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [start-microservice-docker-containers](#start-microservice-docker-containers) section in this page.
## 🚀 Apply Intel Xeon Server on AWS
To apply a Intel Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
@@ -116,9 +118,20 @@ To set up environment variables for deploying Document Summarization services, f
```bash
cd GenAIExamples/DocSum/docker_compose/intel/cpu/xeon
```
If use vLLM as the LLM serving backend.
```bash
docker compose -f compose.yaml up -d
```
If use TGI as the LLM serving backend.
```bash
docker compose -f compose_tgi.yaml up -d
```
You will have the following Docker Images:
1. `opea/docsum-ui:latest`
@@ -128,10 +141,30 @@ You will have the following Docker Images:
### Validate Microservices
1. TGI Service
1. LLM backend Service
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready.
Try the command below to check whether the LLM serving is ready.
```bash
curl http://${host_ip}:8008/generate \
# vLLM service
docker logs docsum-xeon-vllm-service 2>&1 | grep complete
# If the service is ready, you will get the response like below.
INFO: Application startup complete.
```
```bash
# TGI service
docker logs docsum-xeon-tgi-service | grep Connected
# If the service is ready, you will get the response like below.
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
Then try the `cURL` command below to validate services.
```bash
# either vLLM or TGI service
curl http://${host_ip}:8008/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'