Add benchmark README for ChatQnA (#662)
* Add benchmark README for ChatQnA Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmark.yaml Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml path Signed-off-by: lvliang-intel <liang1.lv@intel.com> * fix preci issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update title Signed-off-by: lvliang-intel <liang1.lv@intel.com> --------- Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
546
ChatQnA/benchmark/README.md
Normal file
546
ChatQnA/benchmark/README.md
Normal file
@@ -0,0 +1,546 @@
|
||||
# ChatQnA Benchmarking
|
||||
|
||||
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
|
||||
|
||||
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
|
||||
|
||||
# Purpose
|
||||
|
||||
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
|
||||
|
||||
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
|
||||
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
|
||||
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
|
||||
|
||||
# Metrics
|
||||
|
||||
The benchmark will report the below metrics, including:
|
||||
|
||||
- Number of Concurrent Requests
|
||||
- End-to-End Latency: P50, P90, P99 (in milliseconds)
|
||||
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
|
||||
- Average Next Token Latency (in milliseconds)
|
||||
- Average Token Latency (in milliseconds)
|
||||
- Requests Per Second (RPS)
|
||||
- Output Tokens Per Second
|
||||
- Input Tokens Per Second
|
||||
|
||||
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
|
||||
|
||||
# Getting Started
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
|
||||
|
||||
- Every node has direct internet access
|
||||
- Set up kubectl on the master node with access to the Kubernetes cluster.
|
||||
- Install Python 3.8+ on the master node for running the stress tool.
|
||||
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
|
||||
|
||||
## Kubernetes Cluster Example
|
||||
|
||||
```bash
|
||||
$ kubectl get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
k8s-master Ready control-plane 35d v1.29.6
|
||||
k8s-work1 Ready <none> 35d v1.29.5
|
||||
k8s-work2 Ready <none> 35d v1.29.6
|
||||
k8s-work3 Ready <none> 35d v1.29.6
|
||||
```
|
||||
|
||||
## Manifest preparation
|
||||
|
||||
We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
git clone https://github.com/opea-project/GenAIExamples.git
|
||||
cd GenAIExamples/ChatQnA/benchmark
|
||||
|
||||
# replace the image tag from latest to v0.9 since we want to test with v0.9 release
|
||||
IMAGE_TAG=v0.9
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
|
||||
|
||||
# set the huggingface token
|
||||
HUGGINGFACE_TOKEN=<your token>
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
|
||||
|
||||
# set models
|
||||
LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
|
||||
EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
|
||||
RERANK_MODEL_ID=BAAI/bge-reranker-base
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
|
||||
find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
|
||||
```
|
||||
|
||||
## Benchmark tool preparation
|
||||
|
||||
The test uses the [benchmark tool](https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark) to do performance test. We need to set up benchmark tool at the master node of Kubernetes which is k8s-master.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
git clone https://github.com/opea-project/GenAIEval.git
|
||||
cd GenAIEval
|
||||
python3 -m venv stress_venv
|
||||
source stress_venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Test Configurations
|
||||
|
||||
Workload configuration:
|
||||
|
||||
| Key | Value |
|
||||
| -------- | ------- |
|
||||
| Workload | ChatQnA |
|
||||
| Tag | V0.9 |
|
||||
|
||||
Models configuration
|
||||
| Key | Value |
|
||||
| ---------- | ------------------ |
|
||||
| Embedding | BAAI/bge-base-en-v1.5 |
|
||||
| Reranking | BAAI/bge-reranker-base |
|
||||
| Inference | Intel/neural-chat-7b-v3-3 |
|
||||
|
||||
Benchmark parameters
|
||||
| Key | Value |
|
||||
| ---------- | ------------------ |
|
||||
| LLM input tokens | 1024 |
|
||||
| LLM output tokens | 128 |
|
||||
|
||||
Number of test requests for different scheduled node number:
|
||||
| Node count | Concurrency | Query number |
|
||||
| ----- | -------- | -------- |
|
||||
| 1 | 128 | 640 |
|
||||
| 2 | 256 | 1280 |
|
||||
| 4 | 512 | 2560 |
|
||||
|
||||
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
|
||||
|
||||
## Test Steps
|
||||
|
||||
### Single node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-worker1 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 640]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-worker1 node-type-
|
||||
```
|
||||
|
||||
### Two node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/two_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 1280]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-worker1 k8s-worker2 node-type-
|
||||
```
|
||||
|
||||
### Four node test
|
||||
|
||||
#### 1. Preparation
|
||||
|
||||
We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
|
||||
|
||||
```bash
|
||||
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
|
||||
```
|
||||
|
||||
#### 2. Install ChatQnA
|
||||
|
||||
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s.
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/four_gaudi
|
||||
kubectl apply -f .
|
||||
```
|
||||
|
||||
#### 3. Run tests
|
||||
|
||||
We copy the configuration file [benchmark.yaml](./benchmark.yaml) to `GenAIEval/evals/benchmark/benchmark.yaml` and config `test_suite_config.user_queries` and `test_suite_config.test_output_dir`.
|
||||
|
||||
```bash
|
||||
export USER_QUERIES="[4, 8, 16, 2560]"
|
||||
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
|
||||
envsubst < ./benchmark.yaml > GenAIEval/evals/benchmark/benchmark.yaml
|
||||
```
|
||||
|
||||
And then run the benchmark tool by:
|
||||
|
||||
```bash
|
||||
cd GenAIEval/evals/benchmark
|
||||
python benchmark.py
|
||||
```
|
||||
|
||||
#### 4. Data collection
|
||||
|
||||
All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
|
||||
|
||||
#### 5. Clean up
|
||||
|
||||
```bash
|
||||
# on k8s-master node
|
||||
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
|
||||
kubectl delete -f .
|
||||
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
|
||||
```
|
||||
|
||||
### Example Result
|
||||
|
||||
The following is a summary of the test result, with files saved at `TEST_OUTPUT_DIR`.
|
||||
|
||||
```statistics
|
||||
Concurrency : 512
|
||||
Max request count : 2560
|
||||
Http timeout : 60000
|
||||
|
||||
Benchmark target : chatqnafixed
|
||||
|
||||
=================Total statistics=====================
|
||||
Succeed Response: 2560 (Total 2560, 100.0% Success), Duration: 26.44s, Input Tokens: 61440, Output Tokens: 255985, RPS: 96.82, Input Tokens per Second: 2323.71, Output Tokens per Second: 9681.57
|
||||
End to End latency(ms), P50: 3576.34, P90: 4242.19, P99: 5252.23, Avg: 3581.55
|
||||
First token latency(ms), P50: 726.64, P90: 1128.27, P99: 1796.09, Avg: 769.58
|
||||
Average Next token latency(ms): 28.41
|
||||
Average token latency(ms) : 35.85
|
||||
======================================================
|
||||
```
|
||||
|
||||
```test spec
|
||||
benchmarkresult:
|
||||
Average_Next_token_latency: '28.41'
|
||||
Average_token_latency: '35.85'
|
||||
Duration: '26.44'
|
||||
End_to_End_latency_Avg: '3581.55'
|
||||
End_to_End_latency_P50: '3576.34'
|
||||
End_to_End_latency_P90: '4242.19'
|
||||
End_to_End_latency_P99: '5252.23'
|
||||
First_token_latency_Avg: '769.58'
|
||||
First_token_latency_P50: '726.64'
|
||||
First_token_latency_P90: '1128.27'
|
||||
First_token_latency_P99: '1796.09'
|
||||
Input_Tokens: '61440'
|
||||
Input_Tokens_per_Second: '2323.71'
|
||||
Onput_Tokens: '255985'
|
||||
Output_Tokens_per_Second: '9681.57'
|
||||
RPS: '96.82'
|
||||
Succeed_Response: '2560'
|
||||
locust_P50: '160'
|
||||
locust_P99: '810'
|
||||
locust_num_failures: '0'
|
||||
locust_num_requests: '2560'
|
||||
benchmarkspec:
|
||||
bench-target: chatqnafixed
|
||||
endtest_time: '2024-08-25T14:19:25.955973'
|
||||
host: http://10.110.105.197:8888
|
||||
llm-model: Intel/neural-chat-7b-v3-3
|
||||
locustfile: /home/sdp/lvl/GenAIEval/evals/benchmark/stresscli/locust/aistress.py
|
||||
max_requests: 2560
|
||||
namespace: default
|
||||
processes: 2
|
||||
run_name: benchmark
|
||||
runtime: 60m
|
||||
starttest_time: '2024-08-25T14:18:50.366514'
|
||||
stop_timeout: 120
|
||||
tool: locust
|
||||
users: 512
|
||||
hardwarespec:
|
||||
aise-gaudi-00:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375272Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-01:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375256Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-02:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.7.18
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-92-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056375260Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.3 LTS
|
||||
aise-gaudi-03:
|
||||
architecture: amd64
|
||||
containerRuntimeVersion: containerd://1.6.8
|
||||
cpu: '160'
|
||||
habana.ai/gaudi: '8'
|
||||
kernelVersion: 5.15.0-112-generic
|
||||
kubeProxyVersion: v1.29.7
|
||||
kubeletVersion: v1.29.7
|
||||
memory: 1056374404Ki
|
||||
operatingSystem: linux
|
||||
osImage: Ubuntu 22.04.4 LTS
|
||||
workloadspec:
|
||||
aise-gaudi-00:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-01:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
prometheus-operator:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 200Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-02:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
aise-gaudi-03:
|
||||
chatqna-backend-server-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 4000Mi
|
||||
dataprep-deploy:
|
||||
replica: 1
|
||||
embedding-dependency-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
requests:
|
||||
cpu: '80'
|
||||
memory: 20000Mi
|
||||
embedding-deploy:
|
||||
replica: 1
|
||||
llm-dependency-deploy:
|
||||
replica: 8
|
||||
resources:
|
||||
limits:
|
||||
habana.ai/gaudi: '1'
|
||||
requests:
|
||||
habana.ai/gaudi: '1'
|
||||
llm-deploy:
|
||||
replica: 1
|
||||
retriever-deploy:
|
||||
replica: 1
|
||||
resources:
|
||||
limits:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
requests:
|
||||
cpu: '8'
|
||||
memory: 2500Mi
|
||||
vector-db:
|
||||
replica: 1
|
||||
```
|
||||
55
ChatQnA/benchmark/benchmark.yaml
Normal file
55
ChatQnA/benchmark/benchmark.yaml
Normal file
@@ -0,0 +1,55 @@
|
||||
# Copyright (C) 2024 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
test_suite_config: # Overall configuration settings for the test suite
|
||||
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
|
||||
concurrent_level: 5 # The concurrency level, adjustable based on requirements
|
||||
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
|
||||
random_prompt: false # Use random prompts if true, fixed prompts if false
|
||||
run_time: 60m # The max total run time for the test suite
|
||||
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
|
||||
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
|
||||
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
|
||||
test_output_dir: "${TEST_OUTPUT_DIR}" # The directory to store the test output
|
||||
|
||||
test_cases:
|
||||
chatqna:
|
||||
embedding:
|
||||
run_test: false
|
||||
service_name: "embedding-svc" # Replace with your service name
|
||||
embedserve:
|
||||
run_test: false
|
||||
service_name: "embedding-dependency-svc" # Replace with your service name
|
||||
retriever:
|
||||
run_test: false
|
||||
service_name: "retriever-svc" # Replace with your service name
|
||||
parameters:
|
||||
search_type: "similarity"
|
||||
k: 4
|
||||
fetch_k: 20
|
||||
lambda_mult: 0.5
|
||||
score_threshold: 0.2
|
||||
reranking:
|
||||
run_test: false
|
||||
service_name: "reranking-svc" # Replace with your service name
|
||||
parameters:
|
||||
top_n: 1
|
||||
rerankserve:
|
||||
run_test: false
|
||||
service_name: "reranking-dependency-svc" # Replace with your service name
|
||||
llm:
|
||||
run_test: false
|
||||
service_name: "llm-svc" # Replace with your service name
|
||||
parameters:
|
||||
max_new_tokens: 128
|
||||
temperature: 0.01
|
||||
top_k: 10
|
||||
top_p: 0.95
|
||||
repetition_penalty: 1.03
|
||||
streaming: true
|
||||
llmserve:
|
||||
run_test: false
|
||||
service_name: "llm-dependency-svc" # Replace with your service name
|
||||
e2e:
|
||||
run_test: true
|
||||
service_name: "chatqna-backend-server-svc" # Replace with your service name
|
||||
Reference in New Issue
Block a user