166 lines
4.6 KiB
Markdown
166 lines
4.6 KiB
Markdown
# Deploy DocSum on Kubernetes cluster
|
|
|
|
- You should have Helm (version >= 3.15) installed. Refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) for more information.
|
|
- For more deploy options, refer to [helm charts README](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts#readme).
|
|
|
|
## Deploy on Xeon
|
|
|
|
```
|
|
export HFTOKEN="insert-your-huggingface-token-here"
|
|
helm install docsum oci://ghcr.io/opea-project/charts/docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f cpu-values.yaml
|
|
```
|
|
|
|
## Deploy on Gaudi
|
|
|
|
```
|
|
export HFTOKEN="insert-your-huggingface-token-here"
|
|
helm install docsum oci://ghcr.io/opea-project/charts/docsum --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} -f gaudi-values.yaml
|
|
```
|
|
|
|
## Deploy on AMD ROCm using Helm charts from the binary Helm repository
|
|
|
|
```bash
|
|
mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install
|
|
```
|
|
|
|
### Cloning repos
|
|
|
|
```bash
|
|
git clone git clone https://github.com/opea-project/GenAIExamples.git
|
|
```
|
|
|
|
### Go to the installation directory
|
|
|
|
```bash
|
|
cd GenAIExamples/DocSum/kubernetes/helm
|
|
```
|
|
|
|
### Settings system variables
|
|
|
|
```bash
|
|
export HFTOKEN="your_huggingface_token"
|
|
export MODELDIR="/mnt/opea-models"
|
|
export MODELNAME="Intel/neural-chat-7b-v3-3"
|
|
```
|
|
|
|
### Setting variables in Values files
|
|
|
|
#### If ROCm vLLM used
|
|
```bash
|
|
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
|
|
```
|
|
|
|
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
|
|
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
|
|
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
|
|
- resources:
|
|
limits:
|
|
amd.com/gpu: "1" - replace "1" with the number of GPUs used
|
|
|
|
#### If ROCm TGI used
|
|
|
|
```bash
|
|
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
|
|
```
|
|
|
|
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
|
|
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
|
|
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
|
|
- resources:
|
|
limits:
|
|
amd.com/gpu: "1" - replace "1" with the number of GPUs used
|
|
|
|
### Installing the Helm Chart
|
|
|
|
#### If ROCm vLLM used
|
|
```bash
|
|
helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
|
|
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
|
|
--values rocm-values.yaml
|
|
```
|
|
|
|
#### If ROCm TGI used
|
|
```bash
|
|
helm upgrade --install docsum oci://ghcr.io/opea-project/charts/docsum \
|
|
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
|
|
--values rocm-tgi-values.yaml
|
|
```
|
|
|
|
## Deploy on AMD ROCm using Helm charts from Git repositories
|
|
|
|
### Creating working dirs
|
|
|
|
```bash
|
|
mkdir ~/docsum-k8s-install && cd ~/docsum-k8s-install
|
|
```
|
|
|
|
### Cloning repos
|
|
|
|
```bash
|
|
git clone git clone https://github.com/opea-project/GenAIExamples.git
|
|
git clone git clone https://github.com/opea-project/GenAIInfra.git
|
|
```
|
|
|
|
### Go to the installation directory
|
|
|
|
```bash
|
|
cd GenAIExamples/DocSum/kubernetes/helm
|
|
```
|
|
|
|
### Settings system variables
|
|
|
|
```bash
|
|
export HFTOKEN="your_huggingface_token"
|
|
export MODELDIR="/mnt/opea-models"
|
|
export MODELNAME="Intel/neural-chat-7b-v3-3"
|
|
```
|
|
|
|
### Setting variables in Values files
|
|
|
|
#### If ROCm vLLM used
|
|
```bash
|
|
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
|
|
```
|
|
|
|
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
|
|
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
|
|
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used
|
|
- resources:
|
|
limits:
|
|
amd.com/gpu: "1" - replace "1" with the number of GPUs used
|
|
|
|
#### If ROCm TGI used
|
|
|
|
```bash
|
|
nano ~/docsum-k8s-install/GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
|
|
```
|
|
|
|
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use.
|
|
You can specify either one or several comma-separated ones - "0" or "0,1,2,3"
|
|
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used
|
|
- resources:
|
|
limits:
|
|
amd.com/gpu: "1" - replace "1" with the number of GPUs used
|
|
|
|
### Installing the Helm Chart
|
|
|
|
#### If ROCm vLLM used
|
|
```bash
|
|
cd ~/docsum-k8s-install/GenAIInfra/helm-charts
|
|
scripts/update_dependency.sh
|
|
helm dependency update docsum
|
|
helm upgrade --install docsum docsum \
|
|
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
|
|
--values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-values.yaml
|
|
```
|
|
|
|
#### If ROCm TGI used
|
|
```bash
|
|
cd ~/docsum-k8s-install/GenAIInfra/helm-charts
|
|
scripts/update_dependency.sh
|
|
helm dependency update docsum
|
|
helm upgrade --install docsum docsum \
|
|
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \
|
|
--values ../../GenAIExamples/DocSum/kubernetes/helm/rocm-tgi-values.yaml
|
|
```
|