ChatQnA with Remote Inference Endpoints (Kubernetes) (#1149)

Signed-off-by: sgurunat <gurunath.s@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 17:36:17 +05:30
parent 0cdeb946e4
commit 56f770cb28
3 changed files with 2437 additions and 2 deletions
--- a/ChatQnA/kubernetes/intel/README.md
+++ b/ChatQnA/kubernetes/intel/README.md
@@ -15,7 +15,7 @@
 ```
 cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
 export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
-sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
+sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
 kubectl apply -f chatqna.yaml
 ```

@@ -35,10 +35,55 @@ kubectl apply -f chatqna_bf16.yaml
 ```
 cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
 export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
-sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
+sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
 kubectl apply -f chatqna.yaml
 ```

+## Deploy on Xeon with Remote LLM Model
+
+```
+cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
+export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
+export vLLM_ENDPOINT="Your Remote Inference Endpoint"
+sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-remote-inference.yaml
+sed -i "s|insert-your-remote-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-remote-inference.yaml
+```
+
+### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
+
+If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config" ConfigMap
+
+
+
+### Deploy
+```
+kubectl apply -f chatqna-remote-inference.yaml
+```
+
+## Deploy on Gaudi with TEI, Rerank, and vLLM Models Running Remotely
+
+```
+cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
+export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
+export vLLM_ENDPOINT="Your Remote Inference Endpoint"
+export TEI_EMBEDDING_ENDPOINT="Your Remote TEI Embedding Endpoint"
+export TEI_RERANKING_ENDPOINT="Your Remote Reranking Endpoint"
+
+sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-vllm-remote-inference.yaml
+sed -i "s|insert-your-remote-vllm-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
+sed -i "s|insert-your-remote-embedding-endpoint|${TEI_EMBEDDING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
+sed -i "s|insert-your-remote-reranking-endpoint|${TEI_RERANKING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
+```
+
+### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)
+
+If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config", "chatqna-data-prep-config", "chatqna-embedding-usvc-config", "chatqna-reranking-usvc-config", "chatqna-retriever-usvc-config" ConfigMaps
+
+### Deploy
+```
+kubectl apply -f chatqna-vllm-remote-inference.yaml
+```
+
 ## Verify Services

 To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
--- a/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna-remote-inference.yaml
+++ b/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna-remote-inference.yaml
--- a/ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm-remote-inference.yaml
+++ b/ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm-remote-inference.yaml