Files
GenAIExamples/DocSum/kubernetes/helm/README.md
chyundunovDatamonsters 3b0bcb80a8 DocSum - Adding files to deploy an application in the K8S environment using Helm (#1758)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
Co-authored-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-04-25 13:33:08 +08:00

4.6 KiB

Deploy DocSum on Kubernetes cluster

Deploy on Xeon

export HFTOKEN="insert-your-huggingface-token-here"
helm install docsum oci://ghcr.io/opea-project/charts/docsum  --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f cpu-values.yaml

Deploy on Gaudi

export HFTOKEN="insert-your-huggingface-token-here"
helm install docsum oci://ghcr.io/opea-project/charts/docsum  --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml

Deploy on AMD ROCm using Helm charts from the binary Helm repository

mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install

Cloning repos

git clone git clone https://github.com/opea-project/GenAIExamples.git

Go to the installation directory

cd GenAIExamples/DocSum/kubernetes/helm

Settings system variables

export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"

Setting variables in Values files

If ROCm vLLM used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
  • HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
  • TENSOR_PARALLEL_SIZE - must match the number of GPUs used
  • resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

If ROCm TGI used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
  • HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
  • extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
  • resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

Installing the Helm Chart

If ROCm vLLM used

helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values rocm-values.yaml

If ROCm TGI used

helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values rocm-tgi-values.yaml

Deploy on AMD ROCm using Helm charts from Git repositories

Creating working dirs

mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install

Cloning repos

git clone git clone https://github.com/opea-project/GenAIExamples.git
git clone git clone https://github.com/opea-project/GenAIInfra.git

Go to the installation directory

cd GenAIExamples/DocSum/kubernetes/helm

Settings system variables

export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"

Setting variables in Values files

If ROCm vLLM used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
  • HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
  • TENSOR_PARALLEL_SIZE - must match the number of GPUs used
  • resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

If ROCm TGI used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
  • HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
  • extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
  • resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

Installing the Helm Chart

If ROCm vLLM used

cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml

If ROCm TGI used

cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml