Files

chyundunovDatamonsters 3b0bcb80a8 DocSum - Adding files to deploy an application in the K8S environment using Helm (#1758 )

Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
Co-authored-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

2025-04-25 13:33:08 +08:00

4.6 KiB

Raw Blame History

Deploy DocSum on Kubernetes cluster

You should have Helm (version >= 3.15) installed. Refer to the Helm Installation Guide for more information.
For more deploy options, refer to helm charts README.

Deploy on Xeon

export HFTOKEN="insert-your-huggingface-token-here"
helm install docsum oci://ghcr.io/opea-project/charts/docsum  --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f cpu-values.yaml

Deploy on Gaudi

export HFTOKEN="insert-your-huggingface-token-here"
helm install docsum oci://ghcr.io/opea-project/charts/docsum  --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml

Deploy on AMD ROCm using Helm charts from the binary Helm repository

mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install

Cloning repos

git clone git clone https://github.com/opea-project/GenAIExamples.git

Go to the installation directory

cd GenAIExamples/DocSum/kubernetes/helm

Settings system variables

export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"

Setting variables in Values files

If ROCm vLLM used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml

HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
TENSOR_PARALLEL_SIZE - must match the number of GPUs used
resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

If ROCm TGI used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml

HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

Installing the Helm Chart

If ROCm vLLM used

helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values rocm-values.yaml

If ROCm TGI used

helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values rocm-tgi-values.yaml

Deploy on AMD ROCm using Helm charts from Git repositories

Creating working dirs

mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install

Cloning repos

git clone git clone https://github.com/opea-project/GenAIExamples.git
git clone git clone https://github.com/opea-project/GenAIInfra.git

Go to the installation directory

cd GenAIExamples/DocSum/kubernetes/helm

Settings system variables

export HFTOKEN="your_huggingface_token"
export MODELDIR="/mnt/opea-models"
export MODELNAME="Intel/neural-chat-7b-v3-3"

Setting variables in Values files

If ROCm vLLM used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml

HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
TENSOR_PARALLEL_SIZE - must match the number of GPUs used
resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

If ROCm TGI used

nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml

HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
resources: limits: amd.com/gpu: "1" - replace "1" with the number of GPUs used

Installing the Helm Chart

If ROCm vLLM used

cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml

If ROCm TGI used

cd ~/docsum-k8s-install/GenAIInfra/helm-charts
./update_dependency.sh
helm dependency update docsum
helm upgrade --install docsum docsum \
    --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
    --values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml

4.6 KiB Raw Blame History

Deploy DocSum on Kubernetes cluster

Deploy on Xeon

Deploy on Gaudi

Deploy on AMD ROCm using Helm charts from the binary Helm repository

Cloning repos

Go to the installation directory

Settings system variables

Setting variables in Values files

If ROCm vLLM used

If ROCm TGI used

Installing the Helm Chart

If ROCm vLLM used

If ROCm TGI used

Deploy on AMD ROCm using Helm charts from Git repositories

Creating working dirs

Cloning repos

Go to the installation directory

Settings system variables

Setting variables in Values files

If ROCm vLLM used

If ROCm TGI used

Installing the Helm Chart

If ROCm vLLM used

If ROCm TGI used

4.6 KiB

Raw Blame History