From afc3341156aed7f6aeab3a61e2ba889bb2fbda76 Mon Sep 17 00:00:00 2001 From: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Tue, 3 Sep 2024 15:06:50 +0800 Subject: [PATCH] Refine ChatQnA README for TGI (#715) * update chatqna readme for tgi Signed-off-by: letonghan * update log block Signed-off-by: letonghan --------- Signed-off-by: letonghan --- ChatQnA/README.md | 13 +++++++++++++ ChatQnA/docker/gaudi/README.md | 16 ++++++++++++++++ ChatQnA/docker/gpu/README.md | 16 ++++++++++++++++ ChatQnA/docker/xeon/README.md | 16 ++++++++++++++-- ChatQnA/docker/xeon/README_qdrant.md | 16 ++++++++++++++++ 5 files changed, 75 insertions(+), 2 deletions(-) diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 0d4798810..5d3f93e8f 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -224,6 +224,19 @@ Refer to the [Intel Technology enabling for Openshift readme](https://github.com ## Consume ChatQnA Service +Before consuming ChatQnA Service, make sure the TGI/vLLM service is ready (which takes up to 2 minutes to start). + +```bash +# TGI example +docker logs tgi-service | grep Connected +``` + +Consume ChatQnA service until you get the TGI response like below. + +```log +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` + Two ways of consuming ChatQnA Service: 1. Use cURL command on terminal diff --git a/ChatQnA/docker/gaudi/README.md b/ChatQnA/docker/gaudi/README.md index 053484f77..717988c6b 100644 --- a/ChatQnA/docker/gaudi/README.md +++ b/ChatQnA/docker/gaudi/README.md @@ -306,6 +306,22 @@ curl http://${host_ip}:8000/v1/reranking \ 6. LLM backend Service +In first startup, this service will take more time to download the model files. After it's finished, the service will be ready. + +Try the command below to check whether the LLM serving is ready. + +```bash +docker logs ${CONTAINER_ID} | grep Connected +``` + +If the service is ready, you will get the response like below. + +```log +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` + +Then try the `cURL` command below to validate services. + ```bash #TGI Service curl http://${host_ip}:8005/generate \ diff --git a/ChatQnA/docker/gpu/README.md b/ChatQnA/docker/gpu/README.md index 48c287fb5..f559230b6 100644 --- a/ChatQnA/docker/gpu/README.md +++ b/ChatQnA/docker/gpu/README.md @@ -192,6 +192,22 @@ curl http://${host_ip}:8000/v1/reranking \ 6. TGI Service +In first startup, this service will take more time to download the model files. After it's finished, the service will be ready. + +Try the command below to check whether the TGI service is ready. + +```bash +docker logs ${CONTAINER_ID} | grep Connected +``` + +If the service is ready, you will get the response like below. + +```log +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` + +Then try the `cURL` command below to validate TGI. + ```bash curl http://${host_ip}:8008/generate \ -X POST \ diff --git a/ChatQnA/docker/xeon/README.md b/ChatQnA/docker/xeon/README.md index dc8735928..675e74cea 100644 --- a/ChatQnA/docker/xeon/README.md +++ b/ChatQnA/docker/xeon/README.md @@ -303,9 +303,21 @@ curl http://${host_ip}:8000/v1/reranking\ 6. LLM backend Service -In first startup, this service will take more time to download the LLM file. After it's finished, the service will be ready. +In first startup, this service will take more time to download the model files. After it's finished, the service will be ready. -Use `docker logs CONTAINER_ID` to check if the download is finished. +Try the command below to check whether the LLM serving is ready. + +```bash +docker logs ${CONTAINER_ID} | grep Connected +``` + +If the service is ready, you will get the response like below. + +```log +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` + +Then try the `cURL` command below to validate services. ```bash # TGI service diff --git a/ChatQnA/docker/xeon/README_qdrant.md b/ChatQnA/docker/xeon/README_qdrant.md index f103d5a73..a03b563b2 100644 --- a/ChatQnA/docker/xeon/README_qdrant.md +++ b/ChatQnA/docker/xeon/README_qdrant.md @@ -276,6 +276,22 @@ curl http://${host_ip}:6046/v1/reranking\ 6. TGI Service +In first startup, this service will take more time to download the model files. After it's finished, the service will be ready. + +Try the command below to check whether the TGI service is ready. + +```bash +docker logs ${CONTAINER_ID} | grep Connected +``` + +If the service is ready, you will get the response like below. + +```log +2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected +``` + +Then try the `cURL` command below to validate TGI. + ```bash curl http://${host_ip}:6042/generate \ -X POST \