Compare commits

...

11 Commits

Author SHA1 Message Date
pre-commit-ci[bot]
97d277cd1d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-01-24 02:30:47 +00:00
letonghan
3f918422c9 refine script for hardcodes variables and test codes
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-24 10:30:14 +08:00
letonghan
53e15bfb79 fix merge conflict
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-23 15:13:19 +08:00
letonghan
bbe649c44c fix preci issues of variable names conflicts
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-23 15:12:08 +08:00
pre-commit-ci[bot]
6e26d4615a [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-01-23 06:44:39 +00:00
letonghan
500fcdb975 fix merge conflicts
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-23 14:44:09 +08:00
letonghan
4825420f04 Merge branch 'main' of https://github.com/opea-project/GenAIExamples into refactor_benchmark 2025-01-23 14:42:10 +08:00
letonghan
78a1efd7f0 refactor python script into deploy_and_benchmark.py
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-23 14:41:11 +08:00
Letong Han
9b9314b062 Merge branch 'main' into refactor_benchmark 2025-01-21 15:06:19 +08:00
pre-commit-ci[bot]
8b85e8c793 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-01-21 07:05:57 +00:00
letonghan
eba1c300b3 Support ChatQnA benchmark pipeline on pubmed dataset.
Add file benchmark.py, benchmark.yaml, and benchmark_requirements.txt.
Related PR in GenAIEval: https://github.com/opea-project/GenAIEval/pull/228

Signed-off-by: letonghan <letong.han@intel.com>
2025-01-21 15:02:30 +08:00
3 changed files with 1233 additions and 0 deletions

90
ChatQnA/chatqna.yaml Normal file
View File

@@ -0,0 +1,90 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
deploy:
device: gaudi
version: 1.1.0
modelUseHostPath: /mnt/models
HUGGINGFACEHUB_API_TOKEN: ""
node: [1, 2, 4]
namespace: "default"
cards_per_node: 8
services:
backend:
instance_num: [2, 2, 4]
cores_per_instance: ""
memory_capacity: ""
teirerank:
enabled: True
model_id: ""
instance_num: [1, 1, 1]
cards_per_instance: 1
tei:
model_id: ""
instance_num: [1, 2, 4]
cores_per_instance: ""
memory_capacity: ""
llm:
engine: tgi
model_id: ""
instance_num: [7, 15, 31]
max_batch_size: [1, 2, 4, 8]
max_input_length: ""
max_total_tokens: ""
max_batch_total_tokens: ""
max_batch_prefill_tokens: ""
cards_per_instance: 1
data-prep:
instance_num: [1, 1, 1]
cores_per_instance: ""
memory_capacity: ""
retriever-usvc:
instance_num: [2, 2, 4]
cores_per_instance: ""
memory_capacity: ""
redis-vector-db:
instance_num: [1, 1, 1]
cores_per_instance: ""
memory_capacity: ""
chatqna-ui:
instance_num: [1, 1, 1]
nginx:
instance_num: [1, 1, 1]
benchmark:
# http request behavior related fields
concurrency: [1, 2, 4]
totoal_query_num: [2048, 4096]
duration: [5, 10] # unit minutes
query_num_per_concurrency: [4, 8, 16]
possion: True
possion_arrival_rate: 1.0
warmup_iterations: 10
seed: 1024
# dataset relted fields
dataset: pub_med10 # [dummy_english, dummy_chinese, pub_med100] predefined keywords for supported dataset
user_queries: [1, 2, 4]
query_token_size: 128 # if specified, means fixed query token size will be sent out
# advance settings in each component which will impact perf.
dataprep: # not target this time
chunk_size: [1024]
chunk_overlap: [1000]
retriever: # not target this time
algo: IVF
fetch_k: 2
k: 1
rerank:
top_n: 2
llm:
max_token_size: 128 # specify the output token size

1134
deploy_and_benchmark.py Normal file

File diff suppressed because it is too large Load Diff

9
requirements.txt Normal file
View File

@@ -0,0 +1,9 @@
kubernetes
locust
numpy
opea-eval
pytest
pyyaml
requests
sseclient-py
transformers