Refactor folder to support different vendors (#743)

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
This commit is contained in:
XinyaoWa
2024-09-10 23:27:19 +08:00
committed by GitHub
parent ba94e0130d
commit d73129cbf0
878 changed files with 915 additions and 1184 deletions

View File

@@ -0,0 +1,32 @@
# Deploy AudioQnA in a Kubernetes Cluster
> [NOTE]
> The following values must be set before you can deploy:
> HUGGINGFACEHUB_API_TOKEN
> You can also customize the "MODEL_ID" and "model-volume"
## Deploy On Xeon
```
cd GenAIExamples/AudioQnA/kubernetes/manifests/xeon
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" audioqna.yaml
kubectl apply -f audioqna.yaml
```
## Deploy On Gaudi
```
cd GenAIExamples/AudioQnA/kubernetes/manifests/gaudi
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" audioqna.yaml
kubectl apply -f audioqna.yaml
```
## Verify Services
Make sure all the pods are running, and restart the audioqna-xxxx pod if necessary.
```bash
kubectl get pods
curl http://${host_ip}:3008/v1/audioqna -X POST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json'
```

View File

@@ -0,0 +1,74 @@
# Deploy AudioQnA in Kubernetes Cluster on Xeon and Gaudi
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server and Gaudi machines.
The AudioQnA Service leverages a Kubernetes operator called genai-microservices-connector(GMC). GMC supports connecting microservices to create pipelines based on the specification in the pipeline yaml file in addition to allowing the user to dynamically control which model is used in a service such as an LLM or embedder. The underlying pipeline language also supports using external services that may be running in public or private cloud elsewhere.
Install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector). Soon as we publish images to Docker Hub, at which point no builds will be required, simplifying install.
The AudioQnA application is defined as a Custom Resource (CR) file that the above GMC operator acts upon. It first checks if the microservices listed in the CR yaml file are running, if not starts them and then proceeds to connect them. When the AudioQnA pipeline is ready, the service endpoint details are returned, letting you use the application. Should you use "kubectl get pods" commands you will see all the component microservices, in particular `asr`, `tts`, and `llm`.
## Using prebuilt images
The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
- tgi-service: ghcr.io/huggingface/text-generation-inference:1.4
- llm: opea/llm-tgi:latest
- asr: opea/asr:latest
- whisper: opea/whisper:latest
- tts: opea/tts:latest
- speecht5: opea/speecht5:latest
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest
> [NOTE]
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker_compose/intel/cpu/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/AudioQnA/docker_compose/intel/hpu/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.
## Deploy AudioQnA pipeline
This involves deploying the AudioQnA custom resource. You can use audioQnA_xeon.yaml or if you have a Gaudi cluster, you could use audioQnA_gaudi.yaml.
1. Create namespace and deploy application
```sh
kubectl create ns audioqa
kubectl apply -f $(pwd)/audioQnA_xeon.yaml
```
2. GMC will reconcile the AudioQnA custom resource and get all related components/services ready. Check if the service up.
```sh
kubectl get service -n audioqa
```
3. Retrieve the application access URL
```sh
kubectl get gmconnectors.gmc.opea.io -n audioqa
NAME URL READY AGE
audioqa http://router-service.audioqa.svc.cluster.local:8080 6/0/6 5m
```
4. Deploy a client pod to test the application
```sh
kubectl create deployment client-test -n audioqa --image=python:3.8.13 -- sleep infinity
```
5. Access the application using the above URL from the client pod
```sh
export CLIENT_POD=$(kubectl get pod -n audioqa -l app=client-test -o jsonpath={.items..metadata.name})
export accessUrl=$(kubectl get gmc -n audioqa -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
kubectl exec "$CLIENT_POD" -n audioqa -- curl -s --no-buffer $accessUrl -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json'
```
> [NOTE]
You can remove your AudioQnA pipeline by executing standard Kubernetes kubectl commands to remove a custom resource. Verify it was removed by executing kubectl get pods in the audioqa namespace.

View File

@@ -0,0 +1,58 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
app.kubernetes.io/managed-by: kustomize
gmc/platform: xeon
name: audioqa
namespace: audioqa
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Asr
internalService:
serviceName: asr-svc
config:
endpoint: /v1/audio/transcriptions
ASR_ENDPOINT: whisper-svc
- name: Whisper
internalService:
serviceName: whisper-svc
config:
endpoint: /v1/asr
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-svc
config:
endpoint: /v1/chat/completions
TGI_LLM_ENDPOINT: tgi-svc
- name: Tgi
internalService:
serviceName: tgi-svc
config:
endpoint: /generate
isDownstreamService: true
- name: Tts
data: $response
internalService:
serviceName: tts-svc
config:
endpoint: /v1/audio/speech
TTS_ENDPOINT: speecht5-svc
- name: SpeechT5
internalService:
serviceName: speecht5-svc
config:
endpoint: /v1/tts
isDownstreamService: true

View File

@@ -0,0 +1,395 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: audio-qna-config
namespace: default
data:
ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
ASR_SERVICE_HOST_IP: asr-svc
ASR_SERVICE_PORT: "3001"
LLM_SERVICE_HOST_IP: llm-svc
LLM_SERVICE_PORT: "3007"
TTS_SERVICE_HOST_IP: tts-svc
TTS_SERVICE_PORT: "3002"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: asr-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: asr-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: asr-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: asr-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
ports:
- containerPort: 9099
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: asr-svc
spec:
type: ClusterIP
selector:
app: asr-deploy
ports:
- name: service
port: 3001
targetPort: 9099
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whisper-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: whisper-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: whisper-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: whisper-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/whisper:latest
imagePullPolicy: IfNotPresent
name: whisper-deploy
args: null
ports:
- containerPort: 7066
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: whisper-svc
spec:
type: ClusterIP
selector:
app: whisper-deploy
ports:
- name: service
port: 7066
targetPort: 7066
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tts-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tts-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: tts-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: tts-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
ports:
- containerPort: 9088
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: tts-svc
spec:
type: ClusterIP
selector:
app: tts-deploy
ports:
- name: service
port: 3002
targetPort: 9088
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: speecht5-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: speecht5-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: speecht5-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: speecht5-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/speecht5:latest
imagePullPolicy: IfNotPresent
name: speecht5-deploy
args: null
ports:
- containerPort: 7055
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: speecht5-svc
spec:
type: ClusterIP
selector:
app: speecht5-deploy
ports:
- name: service
port: 7055
targetPort: 7055
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-dependency-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-dependency-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-dependency-deploy
spec:
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/text-generation-inference:2.2.0
name: llm-dependency-deploy-demo
securityContext:
capabilities:
add:
- SYS_NICE
args:
- --model-id
- $(LLM_MODEL_ID)
- --max-input-length
- '2048'
- --max-total-tokens
- '4096'
volumeMounts:
- mountPath: /data
name: model-volume
- mountPath: /dev/shm
name: shm
ports:
- containerPort: 80
serviceAccountName: default
volumes:
- name: model-volume
hostPath:
path: /home/sdp/cesg
type: Directory
- name: shm
emptyDir:
medium: Memory
sizeLimit: 1Gi
---
kind: Service
apiVersion: v1
metadata:
name: llm-dependency-svc
spec:
type: ClusterIP
selector:
app: llm-dependency-deploy
ports:
- name: service
port: 3006
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: llm-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
ports:
- containerPort: 9000
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: llm-svc
spec:
type: ClusterIP
selector:
app: llm-deploy
ports:
- name: service
port: 3007
targetPort: 9000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: audioqna-backend-server-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: audioqna-backend-server-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: audioqna-backend-server-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: audioqna-backend-server-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/audioqna:latest
imagePullPolicy: IfNotPresent
name: audioqna-backend-server-deploy
args: null
ports:
- containerPort: 8888
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: audioqna-backend-server-svc
spec:
type: NodePort
selector:
app: audioqna-backend-server-deploy
ports:
- name: service
port: 3008
targetPort: 8888
nodePort: 30666

View File

@@ -0,0 +1,58 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
app.kubernetes.io/managed-by: kustomize
gmc/platform: gaudi
name: audioqa
namespace: audioqa
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Asr
internalService:
serviceName: asr-svc
config:
endpoint: /v1/audio/transcriptions
ASR_ENDPOINT: whisper-gaudi-svc
- name: WhisperGaudi
internalService:
serviceName: whisper-gaudi-svc
config:
endpoint: /v1/asr
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-svc
config:
endpoint: /v1/chat/completions
TGI_LLM_ENDPOINT: tgi-gaudi-svc
- name: TgiGaudi
internalService:
serviceName: tgi-gaudi-svc
config:
endpoint: /generate
isDownstreamService: true
- name: Tts
data: $response
internalService:
serviceName: tts-svc
config:
endpoint: /v1/audio/speech
TTS_ENDPOINT: speecht5-gaudi-svc
- name: SpeechT5Gaudi
internalService:
serviceName: speecht5-gaudi-svc
config:
endpoint: /v1/tts
isDownstreamService: true

View File

@@ -0,0 +1,439 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: audio-qna-config
namespace: default
data:
ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
ASR_SERVICE_HOST_IP: asr-svc
ASR_SERVICE_PORT: "3001"
LLM_SERVICE_HOST_IP: llm-svc
LLM_SERVICE_PORT: "3007"
TTS_SERVICE_HOST_IP: tts-svc
TTS_SERVICE_PORT: "3002"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: asr-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: asr-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: asr-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: asr-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
ports:
- containerPort: 9099
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: asr-svc
spec:
type: ClusterIP
selector:
app: asr-deploy
ports:
- name: service
port: 3001
targetPort: 9099
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whisper-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: whisper-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: whisper-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: whisper-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/whisper-gaudi:latest
imagePullPolicy: IfNotPresent
name: whisper-deploy
args: null
ports:
- containerPort: 7066
resources:
limits:
habana.ai/gaudi: 1
env:
- name: OMPI_MCA_btl_vader_single_copy_mechanism
value: none
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
value: 'true'
- name: runtime
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: whisper-svc
spec:
type: ClusterIP
selector:
app: whisper-deploy
ports:
- name: service
port: 7066
targetPort: 7066
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tts-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tts-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: tts-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: tts-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
ports:
- containerPort: 9088
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: tts-svc
spec:
type: ClusterIP
selector:
app: tts-deploy
ports:
- name: service
port: 3002
targetPort: 9088
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: speecht5-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: speecht5-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: speecht5-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: speecht5-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/speecht5-gaudi:latest
imagePullPolicy: IfNotPresent
name: speecht5-deploy
args: null
ports:
- containerPort: 7055
resources:
limits:
habana.ai/gaudi: 1
env:
- name: OMPI_MCA_btl_vader_single_copy_mechanism
value: none
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
value: 'true'
- name: runtime
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: speecht5-svc
spec:
type: ClusterIP
selector:
app: speecht5-deploy
ports:
- name: service
port: 7055
targetPort: 7055
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-dependency-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-dependency-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-dependency-deploy
spec:
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.1
name: llm-dependency-deploy-demo
securityContext:
capabilities:
add:
- SYS_NICE
args:
- --model-id
- $(LLM_MODEL_ID)
- --max-input-length
- '2048'
- --max-total-tokens
- '4096'
- --max-batch-total-tokens
- '65536'
- --max-batch-prefill-tokens
- '4096'
volumeMounts:
- mountPath: /data
name: model-volume
- mountPath: /dev/shm
name: shm
ports:
- containerPort: 80
resources:
limits:
habana.ai/gaudi: 1
env:
- name: OMPI_MCA_btl_vader_single_copy_mechanism
value: none
- name: PT_HPU_ENABLE_LAZY_COLLECTIVES
value: 'true'
- name: runtime
value: habana
- name: HABANA_VISIBLE_DEVICES
value: all
- name: PREFILL_BATCH_BUCKET_SIZE
value: "1"
- name: BATCH_BUCKET_SIZE
value: "8"
serviceAccountName: default
volumes:
- name: model-volume
hostPath:
path: /home/sdp/cesg
type: Directory
- name: shm
emptyDir:
medium: Memory
sizeLimit: 1Gi
---
kind: Service
apiVersion: v1
metadata:
name: llm-dependency-svc
spec:
type: ClusterIP
selector:
app: llm-dependency-deploy
ports:
- name: service
port: 3006
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: llm-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
ports:
- containerPort: 9000
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: llm-svc
spec:
type: ClusterIP
selector:
app: llm-deploy
ports:
- name: service
port: 3007
targetPort: 9000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: audioqna-backend-server-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: audioqna-backend-server-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: audioqna-backend-server-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: audioqna-backend-server-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/audioqna:latest
imagePullPolicy: IfNotPresent
name: audioqna-backend-server-deploy
args: null
ports:
- containerPort: 8888
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: audioqna-backend-server-svc
spec:
type: NodePort
selector:
app: audioqna-backend-server-deploy
ports:
- name: service
port: 3008
targetPort: 8888
nodePort: 30666