Compare commits

...

134 Commits

Author SHA1 Message Date
Xinyao Wang
e4d0d23810 audioqna trigger ut
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2025-01-02 08:53:10 +08:00
Liang Lv
51f6a12ea8 Update CodeTrans for GenAIComps Refactor (#1309)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: WenjiaoYue <wenjiao.yue@intel.com>
2025-01-02 09:33:26 +08:00
Liang Lv
1b061f510e Update DocIndexRetriever for GenAIComps refactor (#1317)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-01-02 09:32:12 +08:00
chen, suyue
f57ba212a1 Dockerfile path fix error and keep unify (#1327)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-01 19:05:52 +08:00
pre-commit-ci[bot]
d694b0d19f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-01-01 03:17:33 +00:00
chen, suyue
849c11d94e Merge branch 'main' into genaicomps_refactor 2025-01-01 11:17:05 +08:00
ZePan110
0962d3a8f8 Fix Dockerfile path errors
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-12-31 23:58:00 +08:00
ZePan110
ae444b1ad4 Fix AvatarChatbot
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-12-31 23:55:37 +08:00
ZePan110
a1f026ff3e Fix MultimodalQnA
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-12-31 23:54:17 +08:00
ZePan110
56a7227ff1 Fix CodeTrans.
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-12-31 23:52:05 +08:00
Sihan Chen
cc1d97f816 Refactor AudioQnA/MultiModalQnA/AvatarChatbot (#1310)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chensuyue <suyue.chen@intel.com>
2024-12-31 12:47:30 +08:00
chensuyue
b813ea489d Revert "for test only, need to revert before merge"
This reverts commit e046807a50.
2024-12-31 11:17:42 +08:00
xiguiw
250ffb8b66 [DOC] Fix docker build command in document (#1287)
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
2024-12-31 00:02:22 +08:00
ZePan110
1e9d111982 Block the manifest test first and restore it after the Refactor work is completed. (#1321)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-12-30 16:24:19 +08:00
Letong Han
3411a0935b Fix SearchQnA and ProductivitySuite CI Issues (#1295)
Signed-off-by: letonghan <letong.han@intel.com>
2024-12-30 16:21:55 +08:00
Ying Hu
597f17b979 Update set_env.sh to fix LOGFLAG warning (#1319) 2024-12-30 10:54:26 +08:00
chensuyue
e046807a50 for test only, need to revert before merge
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-29 23:01:27 +08:00
Liang Lv
73b3f50737 Update CodeGen for GenAIComps Refactor (#1308)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: WenjiaoYue <wenjiao.yue@intel.com>
2024-12-29 10:11:35 +08:00
Liang Lv
9efbc91774 Update dockerfile for GraphRAG (#1318)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-12-29 10:11:15 +08:00
Yao Qing
b9790d809b Refactoring animation. (#1301)
Signed-off-by: Yao, Qing <qing.yao@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-27 17:21:47 +08:00
Daniel De León
b27b48c488 Add microservice resources to no_proxy in the main ChatQnA README (#1269)
Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-12-27 16:14:28 +08:00
Dina Suehiro Jones
0bf1d0be65 Bug fix to add missing BRIDGE_TOWER_EMBEDDING env var for MultimodalQnA (#1280)
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
2024-12-26 23:30:57 -08:00
chensuyue
885173a455 for test only
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-27 09:40:49 +08:00
XinyaoWa
575c974df3 chatqna rocm ut trigger (#1302)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-12-26 15:30:06 +08:00
XinyaoWa
9ab66319ef AvatarChatbot gaudi refactor ut (#1300)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-12-26 15:06:38 +08:00
Sihan Chen
a01729a5c2 Refactor DocSum example (#1286)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-26 14:45:17 +08:00
XinyaoWa
6c053654d7 AvatarChatbot - refactor UT (#1294)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-26 14:21:41 +08:00
XinyaoWa
40be38f68b AudioQnA GPU (#1297)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-12-26 14:07:00 +08:00
XinyaoWa
fb51d9f2ed AudioQnA fix (#1290)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-12-26 13:49:02 +08:00
Letong Han
be4e9ad000 Fix VisualQnA CI issues caused by Comps refactor (#1292)
Signed-off-by: letonghan <letong.han@intel.com>
2024-12-26 13:37:40 +08:00
chen, suyue
fe90ca172f Merge branch 'main' into genaicomps_refactor 2024-12-26 13:27:18 +08:00
chen, suyue
6b6a08df78 Add minimal containers and ports clean up before test (#1291)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-26 10:59:26 +08:00
Letong Han
3146d5d69d Fix Translation and VideoQnA CI issues (#1293)
Signed-off-by: letonghan <letong.han@intel.com>
2024-12-26 09:45:50 +08:00
chen, suyue
0b23cba505 add manually clean up container action (#1296)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-26 09:26:53 +08:00
Xinyao Wang
744f7c9519 fix ChatQnA
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-12-25 15:45:17 +08:00
chensuyue
bac73f4e1a for test only
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-25 10:38:07 +08:00
chensuyue
1a80dcf4d1 for test only based on refactor_comps branch
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-24 11:42:33 +08:00
Liang Lv
5d302d7501 Update code and README for GenAIComps refactor (#1281)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-12-24 10:43:35 +08:00
XinyaoWa
50dd959d60 Support Long context for DocSum (#1255)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
2024-12-20 19:17:10 +08:00
XinyaoWa
05365b6140 FaqGen param fix (#1277)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-12-20 11:30:36 +08:00
Sihan Chen
fd706d1a70 Align DocIndexRetriever Xeon tests with Gaudi (#1272) 2024-12-20 10:30:51 +08:00
Sihan Chen
3b9e55cb8e Minor fix DocIndexRetriever test (#1266) 2024-12-19 12:12:33 +08:00
bjzhjing
7d9b34cf5e Chatqna/benchmark: Remove the deprecated directory (#1261)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-12-19 10:51:01 +08:00
Mustafa
84a6a6e9bc Adding URL summary option to DocSum Gradio-UI (#1248)
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Co-authored-by: okhleif-IL <omar.khleif@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: WenjiaoYue <wenjiao.yue@intel.com>
2024-12-19 10:49:03 +08:00
chen, suyue
89a7f9e001 Update CODEOWNERS list for PR review (#1262)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: xiguiw <111278656+xiguiw@users.noreply.github.com>
2024-12-19 10:01:52 +08:00
Artem Astafev
236ea6bcce Added compose example for MultimodalQnA deployment on AMD ROCm systems (#1233)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
2024-12-18 17:43:32 +08:00
chyundunovDatamonsters
67634dfd22 DocSum - Solving the problem of running DocSum on ROCm (#1268)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2024-12-18 17:38:38 +08:00
Artem Astafev
df7c192835 Added docker compose example for AgentQnA deployment on AMD ROCm (#1166)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
2024-12-18 10:21:00 +08:00
Letong Han
f930638844 Update Multimodal Docker File Path (#1252)
Signed-off-by: letonghan <letong.han@intel.com>
2024-12-17 17:30:29 +08:00
Sun, Xuehao
5613add4dd Change to pull_request_target for dependency review workflow (#1256)
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
2024-12-17 12:05:02 +08:00
lkk
e18369ba0d remove examples gateway. (#1250)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-14 13:19:51 +08:00
lkk
2af1ea0f8e remove examples gateway. (#1243)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-13 15:16:11 +08:00
Melanie Hart Buehler
c760cac2f4 Adds audio querying to MultimodalQ&A Example (#1225)
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Co-authored-by: Omar Khleif <omar.khleif@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-12-12 16:05:14 +08:00
Li Gang
a50e4e6f9f [DocIndexRetriever] enable the without-rerank flavor (#1223)
Signed-off-by: Li Gang <gang.g.li@intel.com>
Co-authored-by: ligang <ligang@ligang-nuc9v.bj.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-12 09:34:21 +08:00
Omar Khleif
00b526c8e5 Changed Default UI to Gradio (#1246)
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
2024-12-11 11:04:10 -08:00
Wang, Kai Lawrence
4c01e14642 [ChatQnA] Remove enforce-eager to enable HPU graphs for better vLLM perf (#1210)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2024-12-10 13:19:15 +08:00
Lianhao Lu
6f9f6f0bad Remove deprecated docker compose files (#1238)
Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
2024-12-10 09:43:19 +08:00
Pranav Singh
893f324d07 [ChatQNA] Fixes Embedding Endpoint (#1230)
Signed-off-by: Pranav Singh <pranav.singh@intel.com>
2024-12-09 10:12:16 +08:00
Artem Astafev
77e640e2f3 Added compose example for VisualQnA deployment on AMD ROCm systems (#1201)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-07 18:58:40 +08:00
Mustafa
07e47a1f38 Update tests for issue 1229 (#1231)
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-07 09:07:52 +08:00
lkk
bde285dfce move examples gateway (#992)
Co-authored-by: root <root@idc708073.jf.intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
2024-12-06 14:40:25 +08:00
WenjiaoYue
f5c08d4fbb Update audioQnA compose (#1227)
Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com>
2024-12-05 16:23:47 +08:00
pallavijaini0525
3a371ac102 Updated the Pinecone readme to reflect the new structure (#1222)
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-05 10:04:09 +08:00
sgurunat
031cf6e1ff ChatQnA: Update kubernetes xeon chatqna remote inference and svelte UI (#1215)
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-04 22:40:03 +08:00
sgurunat
3299e5c9f5 ChatQnA: Update chatqna-vllm-remote-inference (#1224)
Signed-off-by: sgurunat <gurunath.s@intel.com>
2024-12-04 22:33:27 +08:00
ZePan110
340796bbae Split ChatQnA manifest test (#1190)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-04 15:17:46 +08:00
Lianhao Lu
8182a83382 CI: Add check for conflict image build definition (#1184)
Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
2024-12-03 10:46:16 +08:00
WenjiaoYue
8192c3166f Update OPEA example package.json version (#1211)
Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-02 21:33:30 +08:00
chen, suyue
240054ac52 CD workflow update (#1221)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-12-02 17:42:02 +08:00
Neo Zhang Jianyu
c9caf1c083 fix file name (#1219)
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
2024-12-02 14:29:20 +08:00
Neo Zhang Jianyu
a426a9a51d add label automaticly when create issue (#1217)
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
2024-12-02 13:41:22 +08:00
Zhu Yongbo
bb466b3791 EdgeCraft RAG UI bug fix (#1189)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-02 11:47:04 +08:00
chen, suyue
0f8344e4f5 Update test params (#1182)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-29 15:47:15 +08:00
ZePan110
ed8dbaac47 Revert "WA for the issue of vllm Dockerfile.cpu build failure (#1195)" (#1206) 2024-11-28 13:36:14 +08:00
ZePan110
e8cffc6146 Check image and service names and Dockerfile in build.yaml (#1209)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-28 13:14:11 +08:00
Sihan Chen
907b30b7fe Refactor service names (#1199) 2024-11-28 10:01:31 +08:00
Letong Han
545aa571bf [ChatQnA] Update Benchmark E2E Parameters (#1200)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-27 17:11:11 +08:00
ZePan110
5422bcb970 WA for the issue of vllm Dockerfile.cpu build failure (#1195)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-11-27 14:51:19 +08:00
VincyZhang
736155ca95 Detect dangerous command (#1179)
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
2024-11-27 11:43:56 +08:00
ZePan110
39fa25e03a Limit the version of vllm to avoid dockers build failures. (#1183)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-11-25 10:33:33 +08:00
Wang, Kai Lawrence
ac470421d0 Update the llm backend ports (#1172)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2024-11-22 09:20:09 +08:00
Mingyuan Qi
edcd7c9d6a Fix code scanning alert no. 21: Uncontrolled data used in path expression (#1171)
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2024-11-21 20:36:28 +08:00
bjzhjing
ef2047b070 Adjustments for helm release change (#1173)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-21 14:14:27 +08:00
Letong Han
94231584aa Fix Translation Manifest CI with MODEL_ID (#1169)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-21 10:48:52 +08:00
minmin-intel
c5177c5e2f Fix DocIndexRetriever CI error on Xeon (#1167)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-21 09:30:11 +08:00
Artem Astafev
006c61bcbb Add example for AudioQnA deploy in AMD ROCm (#1147)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
2024-11-20 20:46:27 +08:00
chen, suyue
cc108b5a18 Fix DBQnA image build (#1165)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-20 10:56:49 +08:00
chen, suyue
f70d9c3853 chatqna benchmark for v1.1 release (#1120)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-19 22:57:25 +08:00
ZePan110
8808b51e42 Rename image name XXX-hpu to XXX-gaudi (#1154)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2024-11-19 22:18:41 +08:00
chen, suyue
17d4b0c97f freeze nodejs version in CI test (#1162)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-19 13:22:56 +08:00
Sun, Xuehao
3a03d31f8f Update manual-freeze-tag workflow (#1161)
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
2024-11-19 11:00:36 +08:00
dependabot[bot]
179fd84362 Bump gradio from 4.44.0 to 5.5.0 in /DocSum/ui/gradio (#1157)
Signed-off-by: dependabot[bot] <support@github.com>
2024-11-18 23:50:56 +08:00
chen, suyue
9ba034b22d fix the docker image name for release image build (#1152)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-18 23:48:01 +08:00
jotpalch
c3e6f43ece Fix command in README for deploying ChatQnA application (#1156) 2024-11-18 22:59:22 +08:00
Theresa
1ac756a1c7 Rename the GraphRAG UI image (#1155)
Signed-off-by: ichbinblau <theresa.shan@intel.com>
2024-11-18 20:07:22 +08:00
sgurunat
56f770cb28 ChatQnA with Remote Inference Endpoints (Kubernetes) (#1149)
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 20:06:17 +08:00
XinyaoWa
0cdeb946e4 DocSum Manifest support multimedia (#1158)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 18:46:01 +08:00
Artem Astafev
5648839411 Add compose example for FaqGen AMD ROCm (#1126)
Signed-off-by: artem-astafev <a.astafev@datamonsters.com>
2024-11-18 17:38:21 +08:00
Mustafa
eb91d1f054 Docsum (#1095)
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: XinyaoWa <xinyao.wang@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 17:15:42 +08:00
Wang, Kai Lawrence
2587179224 Add instructions of modifying reranking docker image for NVGPU (#1133)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 15:37:32 +08:00
chyundunovDatamonsters
7e62175c2e Adding files to deploy CodeTrans application on AMD GPU (#1138)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2024-11-18 14:58:38 +08:00
Louie Tsai
152adf8012 maintain a version info for docker_compose yaml files among release (#1141)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2024-11-17 22:39:41 -08:00
chyundunovDatamonsters
83172e9a99 Adding files to deploy CodeGen application on AMD GPU (#1130)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 14:36:23 +08:00
Liang Lv
fb514bb8ba Add chatqna wrapper for multiple model selection (#1144)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Ying Hu <ying.hu@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-18 10:48:09 +08:00
Artem Astafev
b1bb6db52d Add compose example for DocSum amd rocm deployment (#1125)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-18 09:09:12 +08:00
rui2zhang
7949045176 EdgeCraftRAG: Add E2E test cases for EdgeCraftRAG - local LLM and vllm (#1137)
Signed-off-by: Zhang, Rui <rui2.zhang@intel.com>
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mingyuan Qi <mingyuan.qi@intel.com>
2024-11-17 18:22:32 +08:00
Lianhao Lu
cbe952ec5e Fail CI manifest test if response content is not expected (#1145)
Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-11-17 12:46:31 +08:00
chen, suyue
3b1a9fe9e1 optimize hardware list for test (#1151)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-15 22:46:02 +08:00
chen, suyue
e66d7fe381 fix typo involved in ci workflow (#1150)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-15 21:19:29 +08:00
Artem Astafev
6d3a017609 Add compose example for ChatQnA AMD ROCm deployment (#1122)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 17:24:06 +08:00
Ying Hu
dbf4ba03fa Update AgentQnA README.md for refactor doc structure (#1146)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 16:30:13 +08:00
XinyaoWa
4f96d9e605 vllm hpu fix version for bug fix (#1142)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2024-11-15 15:12:53 +08:00
Ying Hu
a8f4245384 Update README.md for usage experience (#1135)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-11-15 14:23:12 +08:00
Mingyuan Qi
096a37aacc EdgeCraftRAG: Fix multiple issues (#1143)
Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-15 14:01:27 +08:00
rbrugaro
6f8fa6a689 Grag ex1.1 (#1123)
Signed-off-by: Rita Brugarolas <rita.brugarolas.brufau@intel.com>
Signed-off-by: theresa <theresa.shan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: theresa <theresa.shan@intel.com>
2024-11-15 13:17:06 +08:00
Letong Han
39f68d5d6b Fix SearchQnA CI Issue (#1134)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-15 10:01:27 +08:00
Louie Tsai
00d9bb6128 Enable vLLM Profiling for ChatQnA on Gaudi (#1128)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2024-11-14 15:46:33 -08:00
Abolfazl Shahbazi
59b624c677 Fix minor documentation build issue (#1139)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-11-14 15:29:50 -08:00
chen, suyue
2b2c7ee2f5 upgrade setuptools version to fix CVE-2024-6345 (#999)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2024-11-14 14:57:16 +08:00
Hoong Tee, Yeoh
6b9a27dd83 DBQnA: Include workflow in README (#956)
Signed-off-by: Yeoh, Hoong Tee <hoong.tee.yeoh@intel.com>
2024-11-14 14:05:28 +08:00
Yi Yao
5720cd45c0 Add benchmark launcher for AudioQnA (#981)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-14 13:58:51 +08:00
XinyaoWa
73879d3cec fix faq ui bug (#1118)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-14 10:00:30 +08:00
Lucas Melo
7c9ed04132 ChatQnA - Add Terraform and Ansible Modules information (#970)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: lucasmelogithub <lucas.melo@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
2024-11-13 11:42:12 -08:00
lvliang-intel
9ff7df9202 Use fixed version of TEI Gaudi for stability (#1101)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
2024-11-13 10:45:50 -08:00
Abolfazl Shahbazi
b5f95f735e Fix missing end of file chars (#1106)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-13 09:40:53 -08:00
chen, suyue
393367e9f1 Fix left issue of tgi version update (#1121)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-13 15:42:42 +08:00
Louie Tsai
7adbba6add Enable vLLM Profiling for ChatQnA (#1124) 2024-11-13 11:26:31 +08:00
pallavijaini0525
0d52c2f003 Pinecone update to Readme and docker compose for ChatQnA (#540)
Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>
Signed-off-by: AI Workloads <aigoldrush1@g2-r3-2.iind.intel.com>
Signed-off-by: Pallavi Jaini <pallavi,jaini@intel.com>
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
Signed-off-by: root <root@test-pjaini.535545281608.us-region-2.idcservice.net>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: AI Workloads <aigoldrush1@g2-r3-2.iind.intel.com>
Co-authored-by: Pallavi Jaini <pallavi,jaini@intel.com>
Co-authored-by: root <root@test-pjaini.535545281608.us-region-2.idcservice.net>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-13 09:32:37 +08:00
lvliang-intel
1ff85f6a85 Upgrade TGI Gaudi version to v2.0.6 (#1088)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2024-11-12 14:38:22 +08:00
bjzhjing
f7a7f8aa3f Fix typo (#1117)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-12 09:54:05 +08:00
lvliang-intel
e3187be819 Update ChatQnA manifests using always pull image policy (#1100)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-11-11 14:37:14 +08:00
Sihan Chen
abd9d12937 Fix non stream case (#1115)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-11-11 14:18:42 +08:00
bjzhjing
a7353bbaa4 Refine performance directory (#1017)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-11-11 13:58:46 +08:00
Letong Han
aa314f6757 [Readme] Update ChatQnA Readme for LLM Endpoint (#1086)
Signed-off-by: letonghan <letong.han@intel.com>
2024-11-11 13:53:06 +08:00
488 changed files with 17160 additions and 11401 deletions

38
.github/CODEOWNERS vendored
View File

@@ -1,17 +1,23 @@
/AgentQnA/ kaokao.lv@intel.com
/AudioQnA/ sihan.chen@intel.com
/ChatQnA/ liang1.lv@intel.com
/CodeGen/ liang1.lv@intel.com
/CodeTrans/ sihan.chen@intel.com
/DocSum/ letong.han@intel.com
/.github/ suyue.chen@intel.com ze.pan@intel.com
/AgentQnA/ kaokao.lv@intel.com minmin.hou@intel.com
/AudioQnA/ sihan.chen@intel.com wenjiao.yue@intel.com
/AvatarChatbot/ chun.tao@intel.com kaokao.lv@intel.com
/ChatQnA/ liang1.lv@intel.com letong.han@intel.com
/CodeGen/ liang1.lv@intel.com xinyao.wang@intel.com
/CodeTrans/ sihan.chen@intel.com xinyao.wang@intel.com
/DBQnA/ supriya.krishnamurthi@intel.com liang1.lv@intel.com
/DocIndexRetriever/ kaokao.lv@intel.com chendi.xue@intel.com
/InstructionTuning xinyu.ye@intel.com
/RerankFinetuning xinyu.ye@intel.com
/MultimodalQnA tiep.le@intel.com
/FaqGen/ xinyao.wang@intel.com
/SearchQnA/ sihan.chen@intel.com
/Translation/ liang1.lv@intel.com
/VisualQnA/ liang1.lv@intel.com
/ProductivitySuite/ hoong.tee.yeoh@intel.com
/VideoQnA huiling.bao@intel.com
/*/ liang1.lv@intel.com
/DocSum/ letong.han@intel.com xinyao.wang@intel.com
/EdgeCraftRAG/ yongbo.zhu@intel.com mingyuan.qi@intel.com
/FaqGen/ yogesh.pandey@intel.com xinyao.wang@intel.com
/GraphRAG/ rita.brugarolas.brufau@intel.com abolfazl.shahbazi@intel.com
/InstructionTuning/ xinyu.ye@intel.com kaokao.lv@intel.com
/MultimodalQnA/ melanie.h.buehler@intel.com tiep.le@intel.com
/ProductivitySuite/ jaswanth.karani@intel.com hoong.tee.yeoh@intel.com
/RerankFinetuning/ xinyu.ye@intel.com kaokao.lv@intel.com
/SearchQnA/ sihan.chen@intel.com letong.han@intel.com
/Text2Image/ wenjiao.yue@intel.com xinyu.ye@intel.com
/Translation/ liang1.lv@intel.com sihan.chen@intel.com
/VideoQnA/ huiling.bao@intel.com xinyao.wang@intel.com
/VisualQnA/ liang1.lv@intel.com sihan.chen@intel.com
/*/ liang1.lv@intel.com feng.tian@intel.com suyue.chen@intel.com

View File

@@ -4,6 +4,7 @@
name: Report Bug
description: Used to report bug
title: "[Bug]"
labels: ["bug"]
body:
- type: dropdown
id: priority

View File

@@ -4,6 +4,7 @@
name: Report Feature
description: Used to report feature
title: "[Feature]"
labels: ["feature"]
body:
- type: dropdown
id: priority

View File

@@ -1,2 +1,2 @@
ModelIn
modelin
modelin

View File

@@ -1,2 +1,2 @@
Copyright (C) 2024 Intel Corporation
SPDX-License-Identifier: Apache-2.0
SPDX-License-Identifier: Apache-2.0

View File

@@ -77,9 +77,9 @@ jobs:
git clone https://github.com/vllm-project/vllm.git
cd vllm && git rev-parse HEAD && cd ../
fi
if [[ $(grep -c "vllm-hpu:" ${docker_compose_path}) != 0 ]]; then
if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/HabanaAI/vllm-fork.git
cd vllm-fork && git rev-parse HEAD && cd ../
cd vllm-fork && git checkout 3c39626 && cd ../
fi
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps && git checkout ${{ inputs.opea_branch }} && git rev-parse HEAD && cd ../

View File

@@ -14,7 +14,7 @@ on:
test_mode:
required: false
type: string
default: 'docker_compose'
default: 'compose'
outputs:
run_matrix:
description: "The matrix string"
@@ -42,6 +42,12 @@ jobs:
ref: ${{ env.CHECKOUT_REF }}
fetch-depth: 0
- name: Check Dangerous Command Injection
if: github.event_name == 'pull_request' || github.event_name == 'pull_request_target'
uses: opea-project/validation/actions/check-cmd@main
with:
work_dir: ${{ github.workspace }}
- name: Get test matrix
id: get-test-matrix
run: |

View File

@@ -67,36 +67,6 @@ jobs:
make docker.build
make docker.push
- name: Scan gmcmanager
if: ${{ inputs.node == 'gaudi' }}
uses: opea-project/validation/actions/trivy-scan@main
with:
image-ref: ${{ env.DOCKER_REGISTRY }}/gmcmanager:${{ env.VERSION }}
output: gmcmanager-scan.txt
- name: Upload gmcmanager scan result
if: ${{ inputs.node == 'gaudi' }}
uses: actions/upload-artifact@v4.3.4
with:
name: gmcmanager-scan
path: gmcmanager-scan.txt
overwrite: true
- name: Scan gmcrouter
if: ${{ inputs.node == 'gaudi' }}
uses: opea-project/validation/actions/trivy-scan@main
with:
image-ref: ${{ env.DOCKER_REGISTRY }}/gmcrouter:${{ env.VERSION }}
output: gmcrouter-scan.txt
- name: Upload gmcrouter scan result
if: ${{ inputs.node == 'gaudi' }}
uses: actions/upload-artifact@v4.3.4
with:
name: gmcrouter-scan
path: gmcrouter-scan.txt
overwrite: true
- name: Clean up images
if: always()
run: |

View File

@@ -22,7 +22,72 @@ on:
type: string
jobs:
get-test-case:
runs-on: ubuntu-latest
outputs:
test_cases: ${{ steps.test-case-matrix.outputs.test_cases }}
CHECKOUT_REF: ${{ steps.get-checkout-ref.outputs.CHECKOUT_REF }}
steps:
- name: Get checkout ref
id: get-checkout-ref
run: |
if [ "${{ github.event_name }}" == "pull_request" ] || [ "${{ github.event_name }}" == "pull_request_target" ]; then
CHECKOUT_REF=refs/pull/${{ github.event.number }}/merge
else
CHECKOUT_REF=${{ github.ref }}
fi
echo "CHECKOUT_REF=${CHECKOUT_REF}" >> $GITHUB_OUTPUT
echo "checkout ref ${CHECKOUT_REF}"
- name: Checkout out Repo
uses: actions/checkout@v4
with:
ref: ${{ steps.get-checkout-ref.outputs.CHECKOUT_REF }}
fetch-depth: 0
- name: Get test matrix
shell: bash
id: test-case-matrix
run: |
example_l=$(echo ${{ inputs.example }} | tr '[:upper:]' '[:lower:]')
cd ${{ github.workspace }}/${{ inputs.example }}/tests
run_test_cases=""
default_test_case=$(find . -type f -name "test_manifest_on_${{ inputs.hardware }}.sh" | cut -d/ -f2)
if [ "$default_test_case" ]; then run_test_cases="$default_test_case"; fi
other_test_cases=$(find . -type f -name "test_manifest_*_on_${{ inputs.hardware }}.sh" | cut -d/ -f2)
echo "default_test_case=$default_test_case"
echo "other_test_cases=$other_test_cases"
if [ "${{ inputs.tag }}" == "ci" ]; then
base_commit=$(curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
"https://api.github.com/repos/opea-project/GenAIExamples/commits?sha=${{ github.event.pull_request.base.ref }}" | jq -r '.[0].sha')
merged_commit=$(git log -1 --format='%H')
changed_files="$(git diff --name-only ${base_commit} ${merged_commit} | grep -vE '${{ inputs.diff_excluded_files }}')" || true
fi
for test_case in $other_test_cases; do
if [ "${{ inputs.tag }}" == "ci" ]; then
flag=${test_case%_on_*}
flag=${flag#test_compose_}
if [[ $(printf '%s\n' "${changed_files[@]}" | grep ${{ inputs.example }} | grep ${flag}) ]]; then
run_test_cases="$run_test_cases $test_case"
fi
else
run_test_cases="$run_test_cases $test_case"
fi
done
test_cases=$(echo $run_test_cases | tr ' ' '\n' | sort -u | jq -R '.' | jq -sc '.')
echo "test_cases=$test_cases"
echo "test_cases=$test_cases" >> $GITHUB_OUTPUT
manifest-test:
needs: [get-test-case]
strategy:
matrix:
test_case: ${{ fromJSON(needs.get-test-case.outputs.test_cases) }}
fail-fast: false
runs-on: "k8s-${{ inputs.hardware }}"
continue-on-error: true
steps:
@@ -45,11 +110,14 @@ jobs:
fetch-depth: 0
- name: Set variables
env:
test_case: ${{ matrix.test_case }}
run: |
echo "IMAGE_REPO=${OPEA_IMAGE_REPO}opea" >> $GITHUB_ENV
echo "IMAGE_TAG=${{ inputs.tag }}" >> $GITHUB_ENV
lower_example=$(echo "${{ inputs.example }}" | tr '[:upper:]' '[:lower:]')
echo "NAMESPACE=$lower_example-$(tr -dc a-z0-9 </dev/urandom | head -c 16)" >> $GITHUB_ENV
name=$(echo "$test_case" | cut -d/ -f2 | cut -d'_' -f3- |cut -d'_' -f1 | grep -v 'on' | sed 's/^/-/')
echo "NAMESPACE=$lower_example$name-$(tr -dc a-z0-9 </dev/urandom | head -c 16)" >> $GITHUB_ENV
echo "ROLLOUT_TIMEOUT_SECONDS=1800s" >> $GITHUB_ENV
echo "KUBECTL_TIMEOUT_SECONDS=60s" >> $GITHUB_ENV
echo "continue_test=true" >> $GITHUB_ENV
@@ -59,15 +127,19 @@ jobs:
- name: Kubectl install
id: install
env:
test_case: ${{ matrix.test_case }}
run: |
if [[ ! -f ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh ]]; then
set -x
echo "test_case=$test_case"
if [[ ! -f ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} ]]; then
echo "No test script found, exist test!"
exit 0
else
${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh init_${{ inputs.example }}
${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} init_${{ inputs.example }}
echo "should_cleanup=true" >> $GITHUB_ENV
kubectl create ns $NAMESPACE
${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh install_${{ inputs.example }} $NAMESPACE
${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} install_${{ inputs.example }} $NAMESPACE
echo "Testing ${{ inputs.example }}, waiting for pod ready..."
if kubectl rollout status deployment --namespace "$NAMESPACE" --timeout "$ROLLOUT_TIMEOUT_SECONDS"; then
echo "Testing manifests ${{ inputs.example }}, waiting for pod ready done!"
@@ -82,14 +154,16 @@ jobs:
- name: Validate e2e test
if: always()
env:
test_case: ${{ matrix.test_case }}
run: |
if $skip_validate; then
echo "Skip validate"
else
if ${{ github.workspace }}/${{ inputs.example }}/tests/test_manifest_on_${{ inputs.hardware }}.sh validate_${{ inputs.example }} $NAMESPACE ; then
echo "Validate ${{ inputs.example }} successful!"
if ${{ github.workspace }}/${{ inputs.example }}/tests/${test_case} validate_${{ inputs.example }} $NAMESPACE ; then
echo "Validate ${test_case} successful!"
else
echo "Validate ${{ inputs.example }} failure!!!"
echo "Validate ${test_case} failure!!!"
echo "Check the logs in 'Dump logs when e2e test failed' step!!!"
exit 1
fi

View File

@@ -111,6 +111,17 @@ jobs:
ref: ${{ needs.get-test-case.outputs.CHECKOUT_REF }}
fetch-depth: 0
- name: Clean up container before test
shell: bash
run: |
docker ps
cd ${{ github.workspace }}/${{ inputs.example }}
export test_case=${{ matrix.test_case }}
export hardware=${{ inputs.hardware }}
bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "containers"
bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "ports"
docker ps
- name: Run test
shell: bash
env:
@@ -123,6 +134,7 @@ jobs:
SERVING_TOKEN: ${{ secrets.SERVING_TOKEN }}
IMAGE_REPO: ${{ inputs.registry }}
IMAGE_TAG: ${{ inputs.tag }}
opea_branch: "refactor_comps"
example: ${{ inputs.example }}
hardware: ${{ inputs.hardware }}
test_case: ${{ matrix.test_case }}
@@ -131,21 +143,14 @@ jobs:
if [[ "$IMAGE_REPO" == "" ]]; then export IMAGE_REPO="${OPEA_IMAGE_REPO}opea"; fi
if [ -f ${test_case} ]; then timeout 30m bash ${test_case}; else echo "Test script {${test_case}} not found, skip test!"; fi
- name: Clean up container
- name: Clean up container after test
shell: bash
if: cancelled() || failure()
run: |
cd ${{ github.workspace }}/${{ inputs.example }}/docker_compose
test_case=${{ matrix.test_case }}
flag=${test_case%_on_*}
flag=${flag#test_}
yaml_file=$(find . -type f -wholename "*${{ inputs.hardware }}/${flag}.yaml")
echo $yaml_file
container_list=$(cat $yaml_file | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
done
cd ${{ github.workspace }}/${{ inputs.example }}
export test_case=${{ matrix.test_case }}
export hardware=${{ inputs.hardware }}
bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "containers"
docker system prune -f
docker rmi $(docker images --filter reference="*:5000/*/*" -q) || true

View File

@@ -0,0 +1,31 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Clean up container on manual event
on:
workflow_dispatch:
inputs:
node:
default: "rocm"
description: "Hardware to clean"
required: true
type: string
clean_list:
default: ""
description: "docker command to clean"
required: false
type: string
jobs:
clean:
runs-on: "${{ inputs.node }}"
steps:
- name: Clean up container
run: |
docker ps
if [ "${{ inputs.clean_list }}" ]; then
echo "----------stop and remove containers----------"
docker stop ${{ inputs.clean_list }} && docker rm ${{ inputs.clean_list }}
echo "----------container removed----------"
docker ps
fi

View File

@@ -12,7 +12,7 @@ on:
type: string
examples:
default: "ChatQnA"
description: 'List of examples to test [AudioQnA,ChatQnA,CodeGen,CodeTrans,DocSum,FaqGen,SearchQnA,Translation]'
description: 'List of examples to test [AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA,AvatarChatbot,Text2Image,WorkflowExecAgent,DBQnA,EdgeCraftRAG,GraphRAG]'
required: true
type: string
tag:
@@ -51,7 +51,7 @@ on:
required: false
type: string
inject_commit:
default: true
default: false
description: "inject commit to docker images true or false"
required: false
type: string

View File

@@ -1,13 +1,13 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Freeze OPEA images release tag in readme on manual event
name: Freeze OPEA images release tag
on:
workflow_dispatch:
inputs:
tag:
default: "latest"
default: "1.1.0"
description: "Tag to apply to images"
required: true
type: string
@@ -23,10 +23,6 @@ jobs:
fetch-depth: 0
ref: ${{ github.ref }}
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Set up Git
run: |
git config --global user.name "NeuralChatBot"
@@ -35,9 +31,10 @@ jobs:
- name: Run script
run: |
find . -name "*.md" | xargs sed -i "s|^docker\ compose|TAG=${{ github.event.inputs.tag }}\ docker\ compose|g"
find . -type f -name "*.yaml" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
find . -type f -name "*.md" \( -path "*/benchmark/*" -o -path "*/kubernetes/*" \) | xargs sed -i -E 's/(opea\/[A-Za-z0-9\-]*:)latest/\1${{ github.event.inputs.tag }}/g'
IFS='.' read -r major minor patch <<< "${{ github.event.inputs.tag }}"
echo "VERSION_MAJOR ${major}" > version.txt
echo "VERSION_MINOR ${minor}" >> version.txt
echo "VERSION_PATCH ${patch}" >> version.txt
- name: Commit changes
run: |

View File

@@ -12,7 +12,7 @@ on:
type: string
example:
default: "ChatQnA"
description: 'Build images belong to which example?'
description: 'Build images belong to which example? [AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA,AvatarChatbot,Text2Image,WorkflowExecAgent,DBQnA,EdgeCraftRAG,GraphRAG]'
required: true
type: string
services:
@@ -31,7 +31,7 @@ on:
required: false
type: string
inject_commit:
default: true
default: false
description: "inject commit to docker images true or false"
required: false
type: string

View File

@@ -0,0 +1,59 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Clean up Local Registry on manual event
on:
workflow_dispatch:
inputs:
nodes:
default: "gaudi,xeon"
description: "Hardware to clean up"
required: true
type: string
env:
EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
jobs:
get-build-matrix:
runs-on: ubuntu-latest
outputs:
examples: ${{ steps.get-matrix.outputs.examples }}
nodes: ${{ steps.get-matrix.outputs.nodes }}
steps:
- name: Create Matrix
id: get-matrix
run: |
examples=($(echo ${EXAMPLES} | tr ',' ' '))
examples_json=$(printf '%s\n' "${examples[@]}" | sort -u | jq -R '.' | jq -sc '.')
echo "examples=$examples_json" >> $GITHUB_OUTPUT
nodes=($(echo ${{ inputs.nodes }} | tr ',' ' '))
nodes_json=$(printf '%s\n' "${nodes[@]}" | sort -u | jq -R '.' | jq -sc '.')
echo "nodes=$nodes_json" >> $GITHUB_OUTPUT
clean-up:
needs: get-build-matrix
strategy:
matrix:
node: ${{ fromJson(needs.get-build-matrix.outputs.nodes) }}
fail-fast: false
runs-on: "docker-build-${{ matrix.node }}"
steps:
- name: Clean Up Local Registry
run: |
echo "Cleaning up local registry on ${{ matrix.node }}"
bash /home/sdp/workspace/fully_registry_cleanup.sh
docker ps | grep registry
build:
needs: [get-build-matrix, clean-up]
strategy:
matrix:
example: ${{ fromJson(needs.get-build-matrix.outputs.examples) }}
node: ${{ fromJson(needs.get-build-matrix.outputs.nodes) }}
fail-fast: false
uses: ./.github/workflows/_example-workflow.yml
with:
node: ${{ matrix.node }}
example: ${{ matrix.example }}
secrets: inherit

View File

@@ -5,11 +5,11 @@ name: Nightly build/publish latest docker images
on:
schedule:
- cron: "30 13 * * *" # UTC time
- cron: "30 14 * * *" # UTC time
workflow_dispatch:
env:
EXAMPLES: "AgentQnA,AudioQnA,ChatQnA,CodeGen,CodeTrans,DocIndexRetriever,DocSum,FaqGen,InstructionTuning,MultimodalQnA,ProductivitySuite,RerankFinetuning,SearchQnA,Translation,VideoQnA,VisualQnA"
EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
TAG: "latest"
PUBLISH_TAGS: "latest"
@@ -32,7 +32,7 @@ jobs:
echo "TAG=$TAG" >> $GITHUB_OUTPUT
echo "PUBLISH_TAGS=$PUBLISH_TAGS" >> $GITHUB_OUTPUT
build:
build-and-test:
needs: get-build-matrix
strategy:
matrix:
@@ -42,6 +42,7 @@ jobs:
with:
node: gaudi
example: ${{ matrix.example }}
test_compose: true
secrets: inherit
get-image-list:
@@ -51,7 +52,7 @@ jobs:
examples: ${{ needs.get-build-matrix.outputs.EXAMPLES }}
publish:
needs: [get-build-matrix, get-image-list, build]
needs: [get-build-matrix, get-image-list, build-and-test]
strategy:
matrix:
image: ${{ fromJSON(needs.get-image-list.outputs.matrix) }}

View File

@@ -0,0 +1,40 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Check Duplicated Images
on:
pull_request:
branches: [main, genaicomps_refactor]
types: [opened, reopened, ready_for_review, synchronize]
paths:
- "**/docker_image_build/*.yaml"
- ".github/workflows/pr-check-duplicated-image.yml"
- ".github/workflows/scripts/check_duplicated_image.py"
workflow_dispatch:
# If there is a new commit, the previous jobs will be canceled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-duplicated-image:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Checkout Repo
uses: actions/checkout@v4
- name: Check all the docker image build files
run: |
pip install PyYAML
cd ${{github.workspace}}
build_files=""
for f in `find . -path "*/docker_image_build/build.yaml"`; do
build_files="$build_files $f"
done
python3 .github/workflows/scripts/check_duplicated_image.py $build_files
shell: bash

View File

@@ -34,6 +34,11 @@ jobs:
- name: Checkout out Repo
uses: actions/checkout@v4
- name: Check Dangerous Command Injection
uses: opea-project/validation/actions/check-cmd@main
with:
work_dir: ${{ github.workspace }}
- name: Docker Build
run: |
docker build -f ${{ github.workspace }}/.github/workflows/docker/${{ env.DOCKER_FILE_NAME }}.dockerfile -t ${{ env.REPO_NAME }}:${{ env.REPO_TAG }} .

View File

@@ -2,7 +2,7 @@
# SPDX-License-Identifier: Apache-2.0
name: "Dependency Review"
on: [pull_request]
on: [pull_request_target]
permissions:
contents: read

View File

@@ -4,8 +4,8 @@
name: E2E test with docker compose
on:
pull_request_target:
branches: ["main", "*rc"]
pull_request:
branches: ["main", "*rc", "genaicomps_refactor"]
types: [opened, reopened, ready_for_review, synchronize] # added `ready_for_review` since draft is skipped
paths:
- "**/Dockerfile**"

View File

@@ -0,0 +1,110 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Compose file and dockerfile path checking
on:
pull_request:
branches: [main, genaicomps_refactor]
types: [opened, reopened, ready_for_review, synchronize]
jobs:
check-dockerfile-paths-in-README:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Checkout Repo GenAIExamples
uses: actions/checkout@v4
- name: Clone Repo GenAIComps
run: |
cd ..
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps && git checkout refactor_comps
- name: Check for Missing Dockerfile Paths in GenAIComps
run: |
cd ${{github.workspace}}
miss="FALSE"
while IFS=: read -r file line content; do
dockerfile_path=$(echo "$content" | awk -F '-f ' '{print $2}' | awk '{print $1}')
if [[ ! -f "../GenAIComps/${dockerfile_path}" ]]; then
miss="TRUE"
echo "Missing Dockerfile: GenAIComps/${dockerfile_path} (Referenced in GenAIExamples/${file}:${line})"
fi
done < <(grep -Ern 'docker build .* -f comps/.+/Dockerfile' --include='*.md' .)
if [[ "$miss" == "TRUE" ]]; then
exit 1
fi
shell: bash
check-Dockerfile-in-build-yamls:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Checkout Repo GenAIExamples
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check Dockerfile path included in image build yaml
if: always()
run: |
set -e
shopt -s globstar
no_add="FALSE"
cd ${{github.workspace}}
Dockerfiles=$(realpath $(find ./ -name '*Dockerfile*'))
if [ -n "$Dockerfiles" ]; then
for dockerfile in $Dockerfiles; do
service=$(echo "$dockerfile" | awk -F '/GenAIExamples/' '{print $2}' | awk -F '/' '{print $2}')
cd ${{github.workspace}}/$service/docker_image_build
all_paths=$(realpath $(awk ' /context:/ { context = $2 } /dockerfile:/ { dockerfile = $2; combined = context "/" dockerfile; gsub(/\/+/, "/", combined); if (index(context, ".") > 0) {print combined}}' build.yaml) 2> /dev/null || true )
if ! echo "$all_paths" | grep -q "$dockerfile"; then
echo "AR: Update $dockerfile to GenAIExamples/$service/docker_image_build/build.yaml. The yaml is used for release images build."
no_add="TRUE"
fi
done
fi
if [[ "$no_add" == "TRUE" ]]; then
exit 1
fi
check-image-and-service-names-in-build-yaml:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Checkout Repo GenAIExamples
uses: actions/checkout@v4
- name: Check name agreement in build.yaml
run: |
pip install ruamel.yaml
cd ${{github.workspace}}
consistency="TRUE"
build_yamls=$(find . -name 'build.yaml')
for build_yaml in $build_yamls; do
message=$(python3 .github/workflows/scripts/check-name-agreement.py "$build_yaml")
if [[ "$message" != *"consistent"* ]]; then
consistency="FALSE"
echo "Inconsistent service name and image name found in file $build_yaml."
echo "$message"
fi
done
if [[ "$consistency" == "FALSE" ]]; then
echo "Please ensure that the service and image names are consistent in build.yaml, otherwise we cannot guarantee that your image will be published correctly."
exit 1
fi
shell: bash

View File

@@ -1,47 +1,14 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Check Paths and Hyperlinks
name: Check hyperlinks and relative path validity
on:
pull_request:
branches: [main]
branches: [main, genaicomps_refactor]
types: [opened, reopened, ready_for_review, synchronize]
jobs:
check-dockerfile-paths:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*
- name: Checkout Repo GenAIExamples
uses: actions/checkout@v4
- name: Clone Repo GenAIComps
run: |
cd ..
git clone https://github.com/opea-project/GenAIComps.git
- name: Check for Missing Dockerfile Paths in GenAIComps
run: |
cd ${{github.workspace}}
miss="FALSE"
while IFS=: read -r file line content; do
dockerfile_path=$(echo "$content" | awk -F '-f ' '{print $2}' | awk '{print $1}')
if [[ ! -f "../GenAIComps/${dockerfile_path}" ]]; then
miss="TRUE"
echo "Missing Dockerfile: GenAIComps/${dockerfile_path} (Referenced in GenAIExamples/${file}:${line})"
fi
done < <(grep -Ern 'docker build .* -f comps/.+/Dockerfile' --include='*.md' .)
if [[ "$miss" == "TRUE" ]]; then
exit 1
fi
shell: bash
check-the-validity-of-hyperlinks-in-README:
runs-on: ubuntu-latest
steps:

View File

@@ -8,7 +8,9 @@ on:
branches: [ 'main' ]
paths:
- "**.py"
- "**Dockerfile"
- "**Dockerfile*"
- "**docker_image_build/build.yaml"
- "**/ui/**"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-on-push
@@ -18,7 +20,7 @@ jobs:
job1:
uses: ./.github/workflows/_get-test-matrix.yml
with:
test_mode: "docker_image_build/build.yaml"
test_mode: "docker_image_build"
image-build:
needs: job1

View File

@@ -0,0 +1,46 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import argparse
from ruamel.yaml import YAML
def parse_yaml_file(file_path):
yaml = YAML()
with open(file_path, "r") as file:
data = yaml.load(file)
return data
def check_service_image_consistency(data):
inconsistencies = []
for service_name, service_details in data.get("services", {}).items():
image_name = service_details.get("image", "")
# Extract the image name part after the last '/'
image_name_part = image_name.split("/")[-1].split(":")[0]
# Check if the service name is a substring of the image name part
if service_name not in image_name_part:
# Get the line number of the service name
line_number = service_details.lc.line + 1
inconsistencies.append((service_name, image_name, line_number))
return inconsistencies
def main():
parser = argparse.ArgumentParser(description="Check service name and image name consistency in a YAML file.")
parser.add_argument("file_path", type=str, help="The path to the YAML file.")
args = parser.parse_args()
data = parse_yaml_file(args.file_path)
inconsistencies = check_service_image_consistency(data)
if inconsistencies:
for service_name, image_name, line_number in inconsistencies:
print(f"Service name: {service_name}, Image name: {image_name}, Line number: {line_number}")
else:
print("All consistent")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,63 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import argparse
import os.path
import subprocess
import sys
import yaml
images = {}
def check_docker_compose_build_definition(file_path):
with open(file_path, "r") as f:
data = yaml.load(f, Loader=yaml.FullLoader)
for service in data["services"]:
if "build" in data["services"][service] and "image" in data["services"][service]:
bash_command = "echo " + data["services"][service]["image"]
image = (
subprocess.run(["bash", "-c", bash_command], check=True, capture_output=True)
.stdout.decode("utf-8")
.strip()
)
build = data["services"][service]["build"]
context = build.get("context", "")
dockerfile = os.path.normpath(
os.path.join(os.path.dirname(file_path), context, build.get("dockerfile", ""))
)
if not os.path.isfile(dockerfile):
# dockerfile not exists in the current repo context, assume it's in 3rd party context
dockerfile = os.path.normpath(os.path.join(context, build.get("dockerfile", "")))
item = {"file_path": file_path, "service": service, "dockerfile": dockerfile}
if image in images and dockerfile != images[image]["dockerfile"]:
print("ERROR: !!! Found Conflicts !!!")
print(f"Image: {image}, Dockerfile: {dockerfile}, defined in Service: {service}, File: {file_path}")
print(
f"Image: {image}, Dockerfile: {images[image]['dockerfile']}, defined in Service: {images[image]['service']}, File: {images[image]['file_path']}"
)
sys.exit(1)
else:
# print(f"Add Image: {image} Dockerfile: {dockerfile}")
images[image] = item
def parse_arg():
parser = argparse.ArgumentParser(
description="Check for conflicts in image build definition in docker-compose.yml files"
)
parser.add_argument("files", nargs="+", help="list of files to be checked")
return parser.parse_args()
def main():
args = parse_arg()
for file_path in args.files:
check_docker_compose_build_definition(file_path)
print("SUCCESS: No Conlicts Found.")
return 0
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,42 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# The test machine used by several opea projects, so the test scripts can't use `docker compose down` to clean up
# the all the containers, ports and networks directly.
# So we need to use the following script to minimize the impact of the clean up.
test_case=${test_case:-"test_compose_on_gaudi.sh"}
hardware=${hardware:-"gaudi"}
flag=${test_case%_on_*}
flag=${flag#test_}
yaml_file=$(find . -type f -wholename "*${hardware}/${flag}.yaml")
echo $yaml_file
case "$1" in
containers)
echo "Stop and remove all containers used by the services in $yaml_file ..."
containers=$(cat $yaml_file | grep container_name | cut -d':' -f2)
for container_name in $containers; do
cid=$(docker ps -aq --filter "name=$container_name")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
done
;;
ports)
echo "Release all ports used by the services in $yaml_file ..."
pip install jq yq
ports=$(yq '.services[].ports[] | split(":")[0]' $yaml_file | grep -o '[0-9a-zA-Z_-]\+')
echo "$ports"
for port in $ports; do
if [[ $port =~ [a-zA-Z_-] ]]; then
port=$(grep -E "export $port=" tests/$test_case | cut -d'=' -f2)
fi
echo $port
cid=$(docker ps --filter "publish=${port}" --format "{{.ID}}")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
done
;;
*)
echo "Unknown function: $1"
;;
esac

View File

@@ -16,8 +16,13 @@ for example in ${examples}; do
if [[ ! $(find . -type f | grep ${test_mode}) ]]; then continue; fi
cd tests
ls -l
hardware_list=$(find . -type f -name "test_compose*_on_*.sh" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
echo "Test supported hardware list = ${hardware_list}"
if [[ "$test_mode" == "docker_image_build" ]]; then
find_name="test_manifest_on_*.sh"
else
find_name="test_${test_mode}*_on_*.sh"
fi
hardware_list=$(find . -type f -name "${find_name}" | cut -d/ -f2 | cut -d. -f1 | awk -F'_on_' '{print $2}'| sort -u)
echo -e "Test supported hardware list: \n${hardware_list}"
run_hardware=""
if [[ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | cut -d'/' -f2 | grep -E '*.py|Dockerfile*|ui|docker_image_build' ) ]]; then

2
.gitignore vendored
View File

@@ -5,4 +5,4 @@
**/playwright/.cache/
**/test-results/
__pycache__/
__pycache__/

View File

@@ -1 +1 @@
**/kubernetes/
**/kubernetes/

16
.set_env.sh Normal file
View File

@@ -0,0 +1,16 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
#To anounce the version of the codes, please create a version.txt and have following format.
#VERSION_MAJOR 1
#VERSION_MINOR 0
#VERSION_PATCH 0
VERSION_FILE="version.txt"
if [ -f $VERSION_FILE ]; then
VER_OPEA_MAJOR=$(grep "VERSION_MAJOR" $VERSION_FILE | cut -d " " -f 2)
VER_OPEA_MINOR=$(grep "VERSION_MINOR" $VERSION_FILE | cut -d " " -f 2)
VER_OPEA_PATCH=$(grep "VERSION_PATCH" $VERSION_FILE | cut -d " " -f 2)
export TAG=$VER_OPEA_MAJOR.$VER_OPEA_MINOR
echo OPEA Version:$TAG
fi

View File

@@ -83,29 +83,32 @@ flowchart LR
## Deployment with docker
1. Build agent docker image
1. Build agent docker image [Optional]
Note: this is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
> [!NOTE]
> the step is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
First, clone the opea GenAIComps repo.
First, clone the opea GenAIComps repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIComps.git
```
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIComps.git
```
Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.
Then build the agent docker image. Both the supervisor agent and the worker agent will use the same docker image, but when we launch the two agents we will specify different strategies and register different tools.
```
cd GenAIComps
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
```
```
cd GenAIComps
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
```
2. Set up environment for this example </br>
First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
@@ -113,6 +116,14 @@ flowchart LR
Second, set up env vars.
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
# for using open-source llms
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
@@ -147,6 +158,12 @@ flowchart LR
5. Launch agent services</br>
We provide two options for `llm_engine` of the agents: 1. open-source LLMs, 2. OpenAI models via API calls.
Deploy it on Gaudi or Xeon respectively
::::{tab-set}
:::{tab-item} Gaudi
:sync: Gaudi
To use open-source LLMs on Gaudi2, run commands below.
```
@@ -155,6 +172,10 @@ flowchart LR
bash launch_agent_service_tgi_gaudi.sh
```
:::
:::{tab-item} Xeon
:sync: Xeon
To use OpenAI models, run commands below.
```
@@ -162,6 +183,9 @@ flowchart LR
bash launch_agent_service_openai.sh
```
:::
::::
## Validate services
First look at logs of the agent docker containers:
@@ -181,7 +205,7 @@ You should see something like "HTTP server setup successful" if the docker conta
Second, validate worker agent:
```
curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
@@ -189,7 +213,7 @@ curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: app
Third, validate supervisor agent:
```
curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```

View File

@@ -0,0 +1,97 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
agent-tgi-server:
image: ${AGENTQNA_TGI_IMAGE}
container_name: agent-tgi-server
ports:
- "${AGENTQNA_TGI_SERVICE_PORT-8085}:80"
volumes:
- /var/opea/agent-service/:/data
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${AGENTQNA_TGI_SERVICE_PORT}"
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/${AGENTQNA_CARD_ID}:/dev/dri/${AGENTQNA_CARD_ID}
- /dev/dri/${AGENTQNA_RENDER_ID}:/dev/dri/${AGENTQNA_RENDER_ID}
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
ipc: host
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192
worker-rag-agent:
image: opea/agent-langchain:latest
container_name: rag-agent-endpoint
volumes:
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
- ${TOOLSET_PATH}:/home/user/tools/
ports:
- "9095:9095"
ipc: host
environment:
ip_address: ${ip_address}
strategy: rag_agent_llama
recursion_limit: ${recursion_limit_worker}
llm_engine: tgi
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
streaming: false
tools: /home/user/tools/worker_agent_tools.yaml
require_human_feedback: false
RETRIEVAL_TOOL_URL: ${RETRIEVAL_TOOL_URL}
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-worker-agent-service"
port: 9095
supervisor-react-agent:
image: opea/agent-langchain:latest
container_name: react-agent-endpoint
depends_on:
- agent-tgi-server
- worker-rag-agent
volumes:
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
- ${TOOLSET_PATH}:/home/user/tools/
ports:
- "${AGENTQNA_FRONTEND_PORT}:9090"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_langgraph
recursion_limit: ${recursion_limit_supervisor}
llm_engine: tgi
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
streaming: false
tools: /home/user/tools/supervisor_agent_tools.yaml
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
LANGCHAIN_PROJECT: "opea-supervisor-agent-service"
CRAG_SERVER: $CRAG_SERVER
WORKER_AGENT_URL: $WORKER_AGENT_URL
port: 9090

View File

@@ -0,0 +1,47 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
WORKPATH=$(dirname "$PWD")/..
export ip_address=${host_ip}
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
export AGENTQNA_TGI_SERVICE_PORT="8085"
# LLM related environment variables
export AGENTQNA_CARD_ID="card1"
export AGENTQNA_RENDER_ID="renderD136"
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
#export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
export temperature=0.01
export max_new_tokens=512
# agent related environment variables
export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
export TOOLSET_PATH=/home/huggingface/datamonsters/amd-opea/GenAIExamples/AgentQnA/tools/
echo "TOOLSET_PATH=${TOOLSET_PATH}"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:18881
export AGENTQNA_FRONTEND_PORT="9090"
#retrieval_tool
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:26379"
export INDEX_NAME="rag-redis"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
docker compose -f compose.yaml up -d

View File

@@ -0,0 +1,46 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
WORKPATH=$(dirname "$PWD")/..
export ip_address=${host_ip}
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export AGENTQNA_TGI_IMAGE=ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
export AGENTQNA_TGI_SERVICE_PORT="19001"
# LLM related environment variables
export AGENTQNA_CARD_ID="card1"
export AGENTQNA_RENDER_ID="renderD136"
export HF_CACHE_DIR=${HF_CACHE_DIR}
ls $HF_CACHE_DIR
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export NUM_SHARDS=4
export LLM_ENDPOINT_URL="http://${ip_address}:${AGENTQNA_TGI_SERVICE_PORT}"
export temperature=0.01
export max_new_tokens=512
# agent related environment variables
export AGENTQNA_WORKER_AGENT_SERVICE_PORT="9095"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export recursion_limit_worker=12
export recursion_limit_supervisor=10
export WORKER_AGENT_URL="http://${ip_address}:${AGENTQNA_WORKER_AGENT_SERVICE_PORT}/v1/chat/completions"
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
export CRAG_SERVER=http://${ip_address}:18881
export AGENTQNA_FRONTEND_PORT="15557"
#retrieval_tool
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export REDIS_URL="redis://${host_ip}:26379"
export INDEX_NAME="rag-redis"
export MEGA_SERVICE_HOST_IP=${host_ip}
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export RERANK_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8889/v1/retrievaltool"
export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"

View File

@@ -1,3 +1,100 @@
# Deployment on Xeon
# Single node on-prem deployment with Docker Compose on Xeon Scalable processors
We deploy the retrieval tool on Xeon. For LLMs, we support OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Xeon. For LLMs, we use OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
#OPANAI_API_KEY if you want to use OpenAI models
export OPENAI_API_KEY=<your-openai-key>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
5. Launch `Agent` service
The configurations of the supervisor agent and the worker agent are defined in the docker-compose yaml file. We currently use openAI GPT-4o-mini as LLM, and llama3.1-70B-instruct (served by TGI-Gaudi) in Gaudi example. To use openai llm, run command below.
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
bash launch_agent_service_openai.sh
```
6. [Optional] Build `Agent` docker image if pulling images failed.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker agent
docker logs rag-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
Third, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).

View File

@@ -1,6 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export ip_address=$(hostname -I | awk '{print $1}')
export recursion_limit_worker=12

View File

@@ -0,0 +1,105 @@
# Single node on-prem deployment AgentQnA on Gaudi
This example showcases a hierarchical multi-agent system for question-answering applications. We deploy the example on Gaudi using open-source LLMs,
For more details, please refer to the deployment guide [here](../../../../README.md).
## Deployment with docker
1. First, clone this repo.
```
export WORKDIR=<your-work-directory>
cd $WORKDIR
git clone https://github.com/opea-project/GenAIExamples.git
```
2. Set up environment for this example </br>
```
# Example: host_ip="192.168.1.1" or export host_ip="External_Public_IP"
export host_ip=$(hostname -I | awk '{print $1}')
# if you are in a proxy environment, also set the proxy-related environment variables
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
# for using open-source llms
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
# Example export HF_CACHE_DIR=$WORKDIR so that no need to redownload every time
export HF_CACHE_DIR=<directory-where-llms-are-downloaded>
```
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
First, launch the mega-service.
```
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
bash launch_retrieval_tool.sh
```
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
```
bash run_ingest_data.sh
```
4. Launch Tool service
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
```
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
```
5. Launch `Agent` service
To use open-source LLMs on Gaudi2, run commands below.
```
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
bash launch_tgi_gaudi.sh
bash launch_agent_service_tgi_gaudi.sh
```
6. [Optional] Build `Agent` docker image if pulling images failed.
```
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/agent-langchain:latest -f comps/agent/langchain/Dockerfile .
```
## Validate services
First look at logs of the agent docker containers:
```
# worker agent
docker logs rag-agent-endpoint
```
```
# supervisor agent
docker logs react-agent-endpoint
```
You should see something like "HTTP server setup successful" if the docker containers are started successfully.</p>
Second, validate worker agent:
```
curl http://${host_ip}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
Third, validate supervisor agent:
```
curl http://${host_ip}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Most recent album by Taylor Swift"
}'
```
## How to register your own tools with agent
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).

View File

@@ -1,6 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null
WORKPATH=$(dirname "$PWD")/..
# export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"

View File

@@ -3,7 +3,7 @@
services:
tgi-server:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-server
ports:
- "8085:80"

View File

@@ -0,0 +1,76 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
set -ex
WORKPATH=$(dirname "$PWD")
export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export HF_CACHE_DIR=$WORKDIR/hf_cache
if [ ! -d "$HF_CACHE_DIR" ]; then
mkdir -p "$HF_CACHE_DIR"
fi
ls $HF_CACHE_DIR
function start_agent_and_api_server() {
echo "Starting CRAG server"
docker run -d --runtime=runc --name=kdd-cup-24-crag-service -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
echo "Starting Agent services"
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/amd/gpu/rocm
bash launch_agent_service_tgi_rocm.sh
}
function validate() {
local CONTENT="$1"
local EXPECTED_RESULT="$2"
local SERVICE_NAME="$3"
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
echo "[ $SERVICE_NAME ] Content is as expected: $CONTENT"
echo 0
else
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
echo 1
fi
}
function validate_agent_service() {
echo "----------------Test agent ----------------"
local CONTENT=$(http_proxy="" curl http://${ip_address}:9095/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Tell me about Michael Jackson song thriller"
}')
local EXIT_CODE=$(validate "$CONTENT" "Thriller" "react-agent-endpoint")
docker logs rag-agent-endpoint
if [ "$EXIT_CODE" == "1" ]; then
exit 1
fi
local CONTENT=$(http_proxy="" curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
"query": "Tell me about Michael Jackson song thriller"
}')
local EXIT_CODE=$(validate "$CONTENT" "Thriller" "react-agent-endpoint")
docker logs react-agent-endpoint
if [ "$EXIT_CODE" == "1" ]; then
exit 1
fi
}
function main() {
echo "==================== Start agent ===================="
start_agent_and_api_server
echo "==================== Agent started ===================="
echo "==================== Validate agent service ===================="
validate_agent_service
echo "==================== Agent service validated ===================="
}
main

View File

@@ -0,0 +1,75 @@
#!/bin/bash
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
set -e
WORKPATH=$(dirname "$PWD")
export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
function stop_crag() {
cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
echo "Stopping container kdd-cup-24-crag-service with cid $cid"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
}
function stop_agent_docker() {
cd $WORKPATH/docker_compose/amd/gpu/rocm
# docker compose -f compose.yaml down
container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
echo "Stopping container $container_name"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
done
}
function stop_retrieval_tool() {
echo "Stopping Retrieval tool"
local RETRIEVAL_TOOL_PATH=$WORKPATH/../DocIndexRetriever
cd $RETRIEVAL_TOOL_PATH/docker_compose/intel/cpu/xeon/
# docker compose -f compose.yaml down
container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
echo "Stopping container $container_name"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
done
}
echo "workpath: $WORKPATH"
echo "=================== Stop containers ===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
cd $WORKPATH/tests
echo "=================== #1 Building docker images===================="
bash step1_build_images.sh
echo "=================== #1 Building docker images completed===================="
echo "=================== #2 Start retrieval tool===================="
bash step2_start_retrieval_tool.sh
echo "=================== #2 Retrieval tool started===================="
echo "=================== #3 Ingest data and validate retrieval===================="
bash step3_ingest_data_and_validate_retrieval.sh
echo "=================== #3 Data ingestion and validation completed===================="
echo "=================== #4 Start agent and API server===================="
bash step4a_launch_and_validate_agent_tgi_on_rocm.sh
echo "=================== #4 Agent test passed ===================="
echo "=================== #5 Stop agent and API server===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
echo "=================== #5 Agent and API server stopped===================="
echo y | docker system prune
echo "ALL DONE!"

View File

@@ -16,9 +16,8 @@ RUN useradd -m -s /bin/bash user && \
WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./audioqna.py /home/user/audioqna.py

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./audioqna_multilang.py /home/user/audioqna_multilang.py

View File

@@ -1,58 +1,133 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import asyncio
import os
from comps import AudioQnAGateway, MicroService, ServiceOrchestrator, ServiceType
from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
from comps.cores.proto.docarray import LLMParams
from fastapi import Request
MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
ASR_SERVICE_HOST_IP = os.getenv("ASR_SERVICE_HOST_IP", "0.0.0.0")
ASR_SERVICE_PORT = int(os.getenv("ASR_SERVICE_PORT", 9099))
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
TTS_SERVICE_HOST_IP = os.getenv("TTS_SERVICE_HOST_IP", "0.0.0.0")
TTS_SERVICE_PORT = int(os.getenv("TTS_SERVICE_PORT", 9088))
WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
WHISPER_SERVER_PORT = int(os.getenv("WHISPER_SERVER_PORT", 7066))
SPEECHT5_SERVER_HOST_IP = os.getenv("SPEECHT5_SERVER_HOST_IP", "0.0.0.0")
SPEECHT5_SERVER_PORT = int(os.getenv("SPEECHT5_SERVER_PORT", 7055))
LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 3006))
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
if self.services[cur_node].service_type == ServiceType.LLM:
# convert TGI/vLLM to unified OpenAI /v1/chat/completions format
next_inputs = {}
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
next_inputs["messages"] = [{"role": "user", "content": inputs["asr_result"]}]
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
next_inputs["top_p"] = llm_parameters_dict["top_p"]
next_inputs["stream"] = inputs["streaming"] # False as default
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
# next_inputs["presence_penalty"] = inputs["presence_penalty"]
# next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
next_inputs["temperature"] = inputs["temperature"]
inputs = next_inputs
elif self.services[cur_node].service_type == ServiceType.TTS:
next_inputs = {}
next_inputs["text"] = inputs["choices"][0]["message"]["content"]
next_inputs["voice"] = kwargs["voice"]
inputs = next_inputs
return inputs
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
if self.services[cur_node].service_type == ServiceType.TTS:
new_inputs = {}
new_inputs["text"] = inputs["choices"][0]["text"]
return new_inputs
else:
return inputs
class AudioQnAService:
def __init__(self, host="0.0.0.0", port=8000):
self.host = host
self.port = port
ServiceOrchestrator.align_inputs = align_inputs
self.megaservice = ServiceOrchestrator()
self.endpoint = str(MegaServiceEndpoint.AUDIO_QNA)
def add_remote_service(self):
asr = MicroService(
name="asr",
host=ASR_SERVICE_HOST_IP,
port=ASR_SERVICE_PORT,
endpoint="/v1/audio/transcriptions",
host=WHISPER_SERVER_HOST_IP,
port=WHISPER_SERVER_PORT,
endpoint="/v1/asr",
use_remote_service=True,
service_type=ServiceType.ASR,
)
llm = MicroService(
name="llm",
host=LLM_SERVICE_HOST_IP,
port=LLM_SERVICE_PORT,
host=LLM_SERVER_HOST_IP,
port=LLM_SERVER_PORT,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
)
tts = MicroService(
name="tts",
host=TTS_SERVICE_HOST_IP,
port=TTS_SERVICE_PORT,
endpoint="/v1/audio/speech",
host=SPEECHT5_SERVER_HOST_IP,
port=SPEECHT5_SERVER_PORT,
endpoint="/v1/tts",
use_remote_service=True,
service_type=ServiceType.TTS,
)
self.megaservice.add(asr).add(llm).add(tts)
self.megaservice.flow_to(asr, llm)
self.megaservice.flow_to(llm, tts)
self.gateway = AudioQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
async def handle_request(self, request: Request):
data = await request.json()
chat_request = AudioChatCompletionRequest.parse_obj(data)
parameters = LLMParams(
# relatively lower max_tokens for audio conversation
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
top_k=chat_request.top_k if chat_request.top_k else 10,
top_p=chat_request.top_p if chat_request.top_p else 0.95,
temperature=chat_request.temperature if chat_request.temperature else 0.01,
frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
streaming=False, # TODO add streaming LLM output as input to TTS
)
result_dict, runtime_graph = await self.megaservice.schedule(
initial_inputs={"audio": chat_request.audio},
llm_parameters=parameters,
voice=chat_request.voice if hasattr(chat_request, "voice") else "default",
)
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["tts_result"]
return response
def start(self):
self.service = MicroService(
self.__class__.__name__,
service_role=ServiceRoleType.MEGASERVICE,
host=self.host,
port=self.port,
endpoint=self.endpoint,
input_datatype=AudioChatCompletionRequest,
output_datatype=ChatCompletionResponse,
)
self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
self.service.start()
if __name__ == "__main__":
audioqna = AudioQnAService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
audioqna = AudioQnAService(port=MEGA_SERVICE_PORT)
audioqna.add_remote_service()
audioqna.start()

View File

@@ -1,13 +1,14 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import asyncio
import base64
import os
from comps import AudioQnAGateway, MicroService, ServiceOrchestrator, ServiceType
from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
from comps.cores.proto.docarray import LLMParams
from fastapi import Request
MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
@@ -19,12 +20,8 @@ LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 8888))
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
print(inputs)
if self.services[cur_node].service_type == ServiceType.ASR:
# {'byte_str': 'UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA'}
inputs["audio"] = inputs["byte_str"]
del inputs["byte_str"]
elif self.services[cur_node].service_type == ServiceType.LLM:
if self.services[cur_node].service_type == ServiceType.LLM:
# convert TGI/vLLM to unified OpenAI /v1/chat/completions format
next_inputs = {}
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
@@ -60,6 +57,8 @@ class AudioQnAService:
ServiceOrchestrator.align_outputs = align_outputs
self.megaservice = ServiceOrchestrator()
self.endpoint = str(MegaServiceEndpoint.AUDIO_QNA)
def add_remote_service(self):
asr = MicroService(
name="asr",
@@ -90,9 +89,46 @@ class AudioQnAService:
self.megaservice.add(asr).add(llm).add(tts)
self.megaservice.flow_to(asr, llm)
self.megaservice.flow_to(llm, tts)
self.gateway = AudioQnAGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
async def handle_request(self, request: Request):
data = await request.json()
chat_request = AudioChatCompletionRequest.parse_obj(data)
parameters = LLMParams(
# relatively lower max_tokens for audio conversation
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
top_k=chat_request.top_k if chat_request.top_k else 10,
top_p=chat_request.top_p if chat_request.top_p else 0.95,
temperature=chat_request.temperature if chat_request.temperature else 0.01,
frequency_penalty=chat_request.frequency_penalty if chat_request.frequency_penalty else 0.0,
presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
streaming=False, # TODO add streaming LLM output as input to TTS
)
result_dict, runtime_graph = await self.megaservice.schedule(
initial_inputs={"audio": chat_request.audio}, llm_parameters=parameters
)
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["byte_str"]
return response
def start(self):
self.service = MicroService(
self.__class__.__name__,
service_role=ServiceRoleType.MEGASERVICE,
host=self.host,
port=self.port,
endpoint=self.endpoint,
input_datatype=AudioChatCompletionRequest,
output_datatype=ChatCompletionResponse,
)
self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
self.service.start()
if __name__ == "__main__":
audioqna = AudioQnAService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
audioqna = AudioQnAService(port=MEGA_SERVICE_PORT)
audioqna.add_remote_service()
audioqna.start()

View File

@@ -14,12 +14,12 @@ We evaluate the WER (Word Error Rate) metric of the ASR microservice.
### Launch ASR microservice
Launch the ASR microserice with the following commands. For more details please refer to [doc](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/whisper/README.md).
Launch the ASR microserice with the following commands. For more details please refer to [doc](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src/README.md).
```bash
git clone https://github.com/opea-project/GenAIComps
cd GenAIComps
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
# change the name of model by editing model_name_or_path you want to evaluate
docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest --model_name_or_path "openai/whisper-tiny"
```

View File

@@ -0,0 +1,77 @@
# AudioQnA Benchmarking
This folder contains a collection of scripts to enable inference benchmarking by leveraging a comprehensive benchmarking tool, [GenAIEval](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md), that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
## Getting Started
We recommend using Kubernetes to deploy the AudioQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs.
### Prerequisites
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running GenAIEval.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.
```bash
# The way to modify the containered ulimit:
sudo systemctl edit containerd
# Add two lines:
[Service]
LimitNOFILE=65536:1048576
sudo systemctl daemon-reload; sudo systemctl restart containerd
```
## Test Steps
Please deploy AudioQnA service before benchmarking.
### Run Benchmark Test
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[128, 128, 128, 128]"
export TEST_OUTPUT_DIR="/tmp/benchmark_output"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n <node_count>
```
The argument `-n` refers to the number of test nodes.
### Data collection
All the test results will come to this folder `/tmp/benchmark_output` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.

View File

@@ -0,0 +1,99 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
deployment_type="k8s"
node_number=1
service_port=8888
query_per_node=128
benchmark_tool_path="$(pwd)/GenAIEval"
usage() {
echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
echo " -d deployment_type AudioQnA deployment type, select between k8s and docker (default: k8s)"
echo " -n node_number Test node number, required only for k8s deployment_type, (default: 1)"
echo " -i service_ip AudioQnA service ip, required only for docker deployment_type"
echo " -p service_port AudioQnA service port, required only for docker deployment_type, (default: 8888)"
exit 1
}
while getopts ":d:n:i:p:" opt; do
case ${opt} in
d )
deployment_type=$OPTARG
;;
n )
node_number=$OPTARG
;;
i )
service_ip=$OPTARG
;;
p )
service_port=$OPTARG
;;
\? )
echo "Invalid option: -$OPTARG" 1>&2
usage
;;
: )
echo "Invalid option: -$OPTARG requires an argument" 1>&2
usage
;;
esac
done
if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
echo "Error: service_ip is required for docker deployment_type" 1>&2
usage
fi
if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
fi
function main() {
if [[ ! -d ${benchmark_tool_path} ]]; then
echo "Benchmark tool not found, setting up..."
setup_env
fi
run_benchmark
}
function setup_env() {
git clone https://github.com/opea-project/GenAIEval.git
pushd ${benchmark_tool_path}
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
popd
}
function run_benchmark() {
source ${benchmark_tool_path}/stress_venv/bin/activate
export DEPLOYMENT_TYPE=${deployment_type}
export SERVICE_IP=${service_ip:-"None"}
export SERVICE_PORT=${service_port:-"None"}
if [[ -z $USER_QUERIES ]]; then
user_query=$((query_per_node*node_number))
export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
fi
export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
if [[ -z $WARMUP ]]; then export WARMUP=0; fi
if [[ -z $TEST_OUTPUT_DIR ]]; then
if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
else
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
fi
echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
fi
envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
cd ${benchmark_tool_path}/evals/benchmark
python benchmark.py
}
main

View File

@@ -0,0 +1,52 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
test_suite_config: # Overall configuration settings for the test suite
examples: ["audioqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: "k8s" # Default is "k8s", can also be "docker"
service_ip: None # Leave as None for k8s, specify for Docker
service_port: None # Leave as None for k8s, specify for Docker
warm_ups: 0 # Number of test requests for warm-up
run_time: 60m # The max total run time for the test suite
seed: # The seed for all RNGs
user_queries: [1, 2, 4, 8, 16, 32, 64, 128] # Number of test requests at each concurrency level
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "/tmp/benchmark_output" # The directory to store the test output
load_shape: # Tenant concurrency pattern
name: constant # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Poisson load shape specific parameters, activate only if load_shape is poisson
concurrent_level: 4 # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape is poisson
arrival-rate: 1.0 # Request arrival rate
namespace: "" # Fill the user-defined namespace. Otherwise, it will be default.
test_cases:
audioqna:
asr:
run_test: true
service_name: "asr-svc" # Replace with your service name
llm:
run_test: true
service_name: "llm-svc" # Replace with your service name
parameters:
model_name: "Intel/neural-chat-7b-v3-3"
max_new_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
streaming: true
llmserve:
run_test: true
service_name: "llm-svc" # Replace with your service name
tts:
run_test: true
service_name: "tts-svc" # Replace with your service name
e2e:
run_test: true
service_name: "audioqna-backend-server-svc" # Replace with your service name

View File

@@ -0,0 +1,137 @@
# Build Mega Service of AudioQnA on AMD ROCm GPU
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice
pipeline on server on AMD ROCm GPU platform.
## 🚀 Build Docker images
### 1. Source Code install GenAIComps
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
### 2. Build ASR Image
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
```
### 3. Build LLM Image
For compose for ROCm example AMD optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (https://github.com/huggingface/text-generation-inference)
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
```
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AudioQnA/
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper:latest`
2. `opea/speecht5:latest`
3. `opea/audioqna:latest`
## 🚀 Set the environment variables
Before starting the services with `docker compose`, you have to recheck the following environment variables.
```bash
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
```
or use set_env.sh file to setup environment variables.
Note: Please replace with host_ip with your external IP address, do not use localhost.
Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
Example for set isolation for 1 GPU
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
Example for set isolation for 2 GPUs
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD129:/dev/dri/renderD129
Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
## 🚀 Start the MegaService
```bash
cd GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm/
docker compose up -d
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to 'Build Docker Images' in below.
## 🚀 Consume the AudioQnA Service
Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
to the response, decode the base64 string and save it as a .wav file.
```bash
# voice can be "default" or "male"
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```
## 🚀 Test MicroServices
```bash
# whisper service
curl http://${host_ip}:7066/v1/asr \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
```

View File

@@ -0,0 +1,85 @@
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
services:
whisper-service:
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
container_name: whisper-service
ports:
- "7066:7066"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
speecht5-service:
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
container_name: speecht5-service
ports:
- "7055:7055"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
container_name: tgi-service
ports:
- "3006:80"
volumes:
- "./data:/data"
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card1:/dev/dri/card1
- /dev/dri/renderD136:/dev/dri/renderD136
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
host_ip: ${host_ip}
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
interval: 10s
timeout: 10s
retries: 100
command: --model-id ${LLM_MODEL_ID}
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
ipc: host
audioqna-backend-server:
image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
container_name: audioqna-xeon-backend-server
depends_on:
- whisper-service
- tgi-service
- speecht5-service
ports:
- "3008:8888"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
- WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
- LLM_SERVER_PORT=${LLM_SERVER_PORT}
- SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
- SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
ipc: host
restart: always
networks:
default:
driver: bridge

View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash set_env.sh
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
# export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export host_ip="192.165.1.21"
export HUGGINGFACEHUB_API_TOKEN=${YOUR_HUGGINGFACEHUB_API_TOKEN}
# <token>
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna

View File

@@ -14,27 +14,20 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
```
### 3. Build LLM Image
```bash
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
```
Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu (https://github.com/huggingface/text-generation-inference)
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile .
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
```
### 6. Build MegaService Docker Image
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
@@ -47,11 +40,8 @@ docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_p
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper:latest`
2. `opea/asr:latest`
3. `opea/llm-tgi:latest`
4. `opea/speecht5:latest`
5. `opea/tts:latest`
6. `opea/audioqna:latest`
2. `opea/speecht5:latest`
3. `opea/audioqna:latest`
## 🚀 Set the environment variables
@@ -61,22 +51,24 @@ Before starting the services with `docker compose`, you have to recheck the foll
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
```
or use set_env.sh file to setup environment variables.
Note: Please replace with host_ip with your external IP address, do not use localhost.
## 🚀 Start the MegaService
```bash
@@ -93,36 +85,18 @@ curl http://${host_ip}:7066/v1/asr \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# asr microservice
curl http://${host_ip}:3001/v1/audio/transcriptions \
-X POST \
-d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# tts microservice
curl http://${host_ip}:3002/v1/audio/speech \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
```
## 🚀 Test MegaService
@@ -132,8 +106,9 @@ base64 string to the megaservice endpoint. The megaservice will return a spoken
to the response, decode the base64 string and save it as a .wav file.
```bash
# voice can be "default" or "male"
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```

View File

@@ -13,14 +13,6 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "3001:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
speecht5-service:
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
container_name: speecht5-service
@@ -32,14 +24,6 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
tts:
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
container_name: tts-service
ports:
- "3002:9088"
ipc: host
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
@@ -53,29 +37,20 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
host_ip: ${host_ip}
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
interval: 10s
timeout: 10s
retries: 100
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
llm:
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
container_name: llm-tgi-server
depends_on:
- tgi-service
ports:
- "3007:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
audioqna-xeon-backend-server:
image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
container_name: audioqna-xeon-backend-server
depends_on:
- asr
- llm
- tts
- whisper-service
- tgi-service
- speecht5-service
ports:
- "3008:8888"
environment:
@@ -83,12 +58,26 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
- ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
- TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
- TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
- WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
- WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
- LLM_SERVER_PORT=${LLM_SERVER_PORT}
- SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
- SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
ipc: host
restart: always
audioqna-xeon-ui-server:
image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
container_name: audioqna-xeon-ui-server
depends_on:
- audioqna-xeon-backend-server
ports:
- "5173:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- CHAT_URL=${BACKEND_SERVICE_ENDPOINT}
ipc: host
restart: always

View File

@@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# export host_ip=<your External Public IP>
export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
# <token>
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna

View File

@@ -14,27 +14,20 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile.intel_hpu .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
```
### 3. Build LLM Image
```bash
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
```
Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/tgi-gaudi:2.0.6 (https://github.com/huggingface/tgi-gaudi)
### 4. Build TTS Image
```bash
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile.intel_hpu .
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
```
### 6. Build MegaService Docker Image
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
@@ -47,11 +40,8 @@ docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_p
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper-gaudi:latest`
2. `opea/asr:latest`
3. `opea/llm-tgi:latest`
4. `opea/speecht5-gaudi:latest`
5. `opea/tts:latest`
6. `opea/audioqna:latest`
2. `opea/speecht5-gaudi:latest`
3. `opea/audioqna:latest`
## 🚀 Set the environment variables
@@ -61,20 +51,18 @@ Before starting the services with `docker compose`, you have to recheck the foll
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
```
## 🚀 Start the MegaService
@@ -95,36 +83,18 @@ curl http://${host_ip}:7066/v1/asr \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# asr microservice
curl http://${host_ip}:3001/v1/audio/transcriptions \
-X POST \
-d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# tts microservice
curl http://${host_ip}:3002/v1/audio/speech \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
```
## 🚀 Test MegaService
@@ -134,8 +104,9 @@ base64 string to the megaservice endpoint. The megaservice will return a spoken
to the response, decode the base64 string and save it as a .wav file.
```bash
# voice can be "default" or "male"
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```

View File

@@ -18,14 +18,6 @@ services:
cap_add:
- SYS_NICE
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "3001:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
speecht5-service:
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
container_name: speecht5-service
@@ -42,16 +34,8 @@ services:
cap_add:
- SYS_NICE
restart: unless-stopped
tts:
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
container_name: tts-service
ports:
- "3002:9088"
ipc: host
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "3006:80"
@@ -74,29 +58,19 @@ services:
cap_add:
- SYS_NICE
ipc: host
healthcheck:
test: ["CMD-SHELL", "sleep 500 && exit 0"]
interval: 1s
timeout: 505s
retries: 1
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
llm:
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
container_name: llm-tgi-gaudi-server
depends_on:
- tgi-service
ports:
- "3007:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
audioqna-gaudi-backend-server:
image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
container_name: audioqna-gaudi-backend-server
depends_on:
- asr
- llm
- tts
- whisper-service
- tgi-service
- speecht5-service
ports:
- "3008:8888"
environment:
@@ -104,12 +78,26 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
- ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
- TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
- TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
- WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
- WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
- LLM_SERVER_PORT=${LLM_SERVER_PORT}
- SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
- SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
ipc: host
restart: always
audioqna-gaudi-ui-server:
image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
container_name: audioqna-gaudi-ui-server
depends_on:
- audioqna-gaudi-backend-server
ports:
- "5173:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- CHAT_URL=${BACKEND_SERVICE_ENDPOINT}
ipc: host
restart: always

View File

@@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# export host_ip=<your External Public IP>
export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
# <token>
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna

View File

@@ -11,51 +11,63 @@ services:
context: ../
dockerfile: ./Dockerfile
image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
audioqna-ui:
build:
context: ../ui
dockerfile: ./docker/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/audioqna-ui:${TAG:-latest}
audioqna-multilang:
build:
context: ../
dockerfile: ./Dockerfile.multilang
extends: audioqna
image: ${REGISTRY:-opea}/audioqna-multilang:${TAG:-latest}
whisper-gaudi:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/dependency/Dockerfile.intel_hpu
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
extends: audioqna
image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
whisper:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/dependency/Dockerfile
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/Dockerfile
dockerfile: comps/asr/src/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
llm-tgi:
build:
context: GenAIComps
dockerfile: comps/llms/text-generation/tgi/Dockerfile
dockerfile: comps/llms/src/text-generation/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
speecht5-gaudi:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/dependency/Dockerfile.intel_hpu
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
extends: audioqna
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
speecht5:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/dependency/Dockerfile
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
tts:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/Dockerfile
dockerfile: comps/tts/src/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
gpt-sovits:
build:
context: GenAIComps
dockerfile: comps/tts/gpt-sovits/Dockerfile
dockerfile: comps/tts/src/integrations/dependency/gpt-sovits/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/gpt-sovits:${TAG:-latest}

View File

@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
For Gaudi:
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.6
- whisper-gaudi: opea/whisper-gaudi:latest
- speecht5-gaudi: opea/speecht5-gaudi:latest

View File

@@ -7,69 +7,17 @@ metadata:
name: audio-qna-config
namespace: default
data:
ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
ASR_SERVICE_HOST_IP: asr-svc
ASR_SERVICE_PORT: "3001"
LLM_SERVICE_HOST_IP: llm-svc
LLM_SERVICE_PORT: "3007"
TTS_SERVICE_HOST_IP: tts-svc
TTS_SERVICE_PORT: "3002"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: asr-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: asr-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: asr-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: asr-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
ports:
- containerPort: 9099
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: asr-svc
spec:
type: ClusterIP
selector:
app: asr-deploy
ports:
- name: service
port: 3001
targetPort: 9099
WHISPER_SERVER_HOST_IP: whisper-svc
WHISPER_SERVER_PORT: 7066
SPEECHT5_SERVER_HOST_IP: speecht5-svc
SPEECHT5_SERVER_PORT: 7055
LLM_SERVER_HOST_IP: llm-svc
LLM_SERVER_PORT: 3006
---
apiVersion: apps/v1
@@ -122,57 +70,6 @@ spec:
port: 7066
targetPort: 7066
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tts-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tts-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: tts-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: tts-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
ports:
- containerPort: 9088
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: tts-svc
spec:
type: ClusterIP
selector:
app: tts-deploy
ports:
- name: service
port: 3002
targetPort: 9088
---
apiVersion: apps/v1
kind: Deployment
@@ -291,57 +188,6 @@ spec:
port: 3006
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: llm-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
ports:
- containerPort: 9000
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: llm-svc
spec:
type: ClusterIP
selector:
app: llm-deploy
ports:
- name: service
port: 3007
targetPort: 9000
---
apiVersion: apps/v1
kind: Deployment

View File

@@ -7,69 +7,17 @@ metadata:
name: audio-qna-config
namespace: default
data:
ASR_ENDPOINT: http://whisper-svc.default.svc.cluster.local:7066
TTS_ENDPOINT: http://speecht5-svc.default.svc.cluster.local:7055
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here"
TGI_LLM_ENDPOINT: http://llm-dependency-svc.default.svc.cluster.local:3006
MEGA_SERVICE_HOST_IP: audioqna-backend-server-svc
ASR_SERVICE_HOST_IP: asr-svc
ASR_SERVICE_PORT: "3001"
LLM_SERVICE_HOST_IP: llm-svc
LLM_SERVICE_PORT: "3007"
TTS_SERVICE_HOST_IP: tts-svc
TTS_SERVICE_PORT: "3002"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: asr-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: asr-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: asr-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: asr-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/asr:latest
imagePullPolicy: IfNotPresent
name: asr-deploy
args: null
ports:
- containerPort: 9099
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: asr-svc
spec:
type: ClusterIP
selector:
app: asr-deploy
ports:
- name: service
port: 3001
targetPort: 9099
WHISPER_SERVER_HOST_IP: whisper-svc
WHISPER_SERVER_PORT: 7066
SPEECHT5_SERVER_HOST_IP: speecht5-svc
SPEECHT5_SERVER_PORT: 7055
LLM_SERVER_HOST_IP: llm-svc
LLM_SERVER_PORT: 3006
---
apiVersion: apps/v1
@@ -134,57 +82,6 @@ spec:
port: 7066
targetPort: 7066
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tts-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: tts-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: tts-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: tts-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/tts:latest
imagePullPolicy: IfNotPresent
name: tts-deploy
args: null
ports:
- containerPort: 9088
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: tts-svc
spec:
type: ClusterIP
selector:
app: tts-deploy
ports:
- name: service
port: 3002
targetPort: 9088
---
apiVersion: apps/v1
kind: Deployment
@@ -271,7 +168,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
name: llm-dependency-deploy-demo
securityContext:
capabilities:
@@ -343,57 +240,6 @@ spec:
port: 3006
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-deploy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: llm-deploy
template:
metadata:
annotations:
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
labels:
app: llm-deploy
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: llm-deploy
hostIPC: true
containers:
- envFrom:
- configMapRef:
name: audio-qna-config
image: opea/llm-tgi:latest
imagePullPolicy: IfNotPresent
name: llm-deploy
args: null
ports:
- containerPort: 9000
serviceAccountName: default
---
kind: Service
apiVersion: v1
metadata:
name: llm-svc
spec:
type: ClusterIP
selector:
app: llm-deploy
ports:
- name: service
port: 3007
targetPort: 9000
---
apiVersion: apps/v1
kind: Deployment

View File

@@ -2,7 +2,7 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
set -e
set -xe
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
@@ -19,71 +19,48 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
service_list="audioqna audioqna-ui whisper-gaudi speecht5-gaudi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
function start_services() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TGI_LLM_ENDPOINT=http://$ip_address:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$ip_address:7066
export TTS_ENDPOINT=http://$ip_address:7055
export MEGA_SERVICE_HOST_IP=${ip_address}
export ASR_SERVICE_HOST_IP=${ip_address}
export TTS_SERVICE_HOST_IP=${ip_address}
export LLM_SERVICE_HOST_IP=${ip_address}
export WHISPER_SERVER_HOST_IP=${ip_address}
export SPEECHT5_SERVER_HOST_IP=${ip_address}
export LLM_SERVER_HOST_IP=${ip_address}
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
n=0
until [[ "$n" -ge 100 ]]; do
docker logs whisper-service > $LOG_PATH/whisper_service_start.log
if grep -q "Uvicorn server setup on port" $LOG_PATH/whisper_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
sleep 20s
}
function validate_megaservice() {
result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
echo "result is === $result"
if [[ $result == *"AAA"* ]]; then
response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
# always print the log
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs speecht5-service > $LOG_PATH/tts-service.log
docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
docker logs audioqna-gaudi-backend-server > $LOG_PATH/audioqna-gaudi-backend-server.log
echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
if [[ $(file speech.mp3) == *"RIFF"* ]]; then
echo "Result correct."
else
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs asr-service > $LOG_PATH/asr-service.log
docker logs speecht5-service > $LOG_PATH/tts-service.log
docker logs tts-service > $LOG_PATH/tts-service.log
docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
docker logs llm-tgi-gaudi-server > $LOG_PATH/llm-tgi-gaudi-server.log
echo "Result wrong."
exit 1
fi
@@ -100,7 +77,7 @@ function validate_megaservice() {
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
@@ -126,7 +103,6 @@ function main() {
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
start_services
# validate_microservices
validate_megaservice
# validate_frontend

View File

@@ -0,0 +1,116 @@
#!/bin/bash
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0
set -xe
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
ip_address=$(hostname -I | awk '{print $1}')
export PATH="~/miniconda3/bin:$PATH"
function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna audioqna-ui whisper speecht5"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
echo "docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
docker images && sleep 1s
}
function start_services() {
cd $WORKPATH/docker_compose/amd/gpu/rocm/
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export MEGA_SERVICE_HOST_IP=${ip_address}
export WHISPER_SERVER_HOST_IP=${ip_address}
export SPEECHT5_SERVER_HOST_IP=${ip_address}
export LLM_SERVER_HOST_IP=${ip_address}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
sleep 24s
}
function validate_megaservice() {
response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
# always print the log
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs speecht5-service > $LOG_PATH/tts-service.log
docker logs tgi-service > $LOG_PATH/tgi-service.log
docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
if [[ $(file speech.mp3) == *"RIFF"* ]]; then
echo "Result correct."
else
echo "Result wrong."
exit 1
fi
}
#function validate_frontend() {
# Frontend tests are currently disabled
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/amd/gpu/rocm/
docker compose stop && docker compose rm -f
}
function main() {
stop_docker
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
start_services
validate_megaservice
# Frontend tests are currently disabled
# validate_frontend
stop_docker
echo y | docker system prune
}
main

View File

@@ -2,7 +2,7 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
set -e
set -xe
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
@@ -19,61 +19,49 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna whisper asr llm-tgi speecht5 tts"
service_list="audioqna audioqna-ui whisper speecht5"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
function start_services() {
cd $WORKPATH/docker_compose/intel/cpu/xeon/
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TGI_LLM_ENDPOINT=http://$ip_address:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$ip_address:7066
export TTS_ENDPOINT=http://$ip_address:7055
export MEGA_SERVICE_HOST_IP=${ip_address}
export ASR_SERVICE_HOST_IP=${ip_address}
export TTS_SERVICE_HOST_IP=${ip_address}
export LLM_SERVICE_HOST_IP=${ip_address}
export WHISPER_SERVER_HOST_IP=${ip_address}
export SPEECHT5_SERVER_HOST_IP=${ip_address}
export LLM_SERVER_HOST_IP=${ip_address}
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-service > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
sleep 20s
}
function validate_megaservice() {
result=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
echo $result
if [[ $result == *"AAA"* ]]; then
response=$(http_proxy="" curl http://${ip_address}:3008/v1/audioqna -XPOST -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' -H 'Content-Type: application/json')
# always print the log
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs speecht5-service > $LOG_PATH/tts-service.log
docker logs tgi-service > $LOG_PATH/tgi-service.log
docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
echo "$response" | sed 's/^"//;s/"$//' | base64 -d > speech.mp3
if [[ $(file speech.mp3) == *"RIFF"* ]]; then
echo "Result correct."
else
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs asr-service > $LOG_PATH/asr-service.log
docker logs speecht5-service > $LOG_PATH/tts-service.log
docker logs tts-service > $LOG_PATH/tts-service.log
docker logs tgi-service > $LOG_PATH/tgi-service.log
docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log
docker logs audioqna-xeon-backend-server > $LOG_PATH/audioqna-xeon-backend-server.log
echo "Result wrong."
exit 1
fi
@@ -90,7 +78,7 @@ function validate_megaservice() {
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#

View File

@@ -23,4 +23,4 @@ RUN npm run build
EXPOSE 5173
# Run the front-end application in preview mode
CMD ["npm", "run", "preview", "--", "--port", "5173", "--host", "0.0.0.0"]
CMD ["npm", "run", "preview", "--", "--port", "5173", "--host", "0.0.0.0"]

View File

@@ -1,5 +1,5 @@
{
"name": "sveltekit-auth-example",
"name": "audio-qna",
"version": "0.0.1",
"private": true,
"scripts": {
@@ -11,38 +11,38 @@
"lint": "prettier --check . && eslint .",
"format": "prettier --write ."
},
"peerDependencies": {
"svelte": "^4.0.0"
},
"devDependencies": {
"@fortawesome/free-solid-svg-icons": "6.2.0",
"@sveltejs/adapter-auto": "1.0.0-next.75",
"@sveltejs/kit": "^1.30.4",
"@playwright/test": "^1.45.2",
"@sveltejs/adapter-auto": "^3.0.0",
"@sveltejs/kit": "^2.0.0",
"@sveltejs/vite-plugin-svelte": "^3.0.0",
"@tailwindcss/typography": "0.5.7",
"@types/debug": "4.1.7",
"@typescript-eslint/eslint-plugin": "^5.27.0",
"@typescript-eslint/parser": "^5.27.0",
"autoprefixer": "^10.4.7",
"autoprefixer": "^10.4.16",
"daisyui": "^3.5.0",
"debug": "4.3.4",
"eslint": "^8.16.0",
"eslint-config-prettier": "^8.3.0",
"eslint-plugin-neverthrow": "1.1.4",
"eslint-plugin-svelte3": "^4.0.0",
"neverthrow": "5.0.0",
"pocketbase": "0.7.0",
"postcss": "^8.4.23",
"postcss": "^8.4.31",
"postcss-load-config": "^4.0.1",
"postcss-preset-env": "^8.3.2",
"prettier": "^2.8.8",
"prettier-plugin-svelte": "^2.7.0",
"prettier-plugin-tailwindcss": "^0.3.0",
"svelte": "^3.59.1",
"svelte-check": "^2.7.1",
"svelte": "^4.2.7",
"svelte-check": "^3.6.0",
"svelte-fa": "3.0.3",
"svelte-preprocess": "^4.10.7",
"tailwindcss": "^3.1.5",
"tailwindcss": "^3.3.6",
"ts-pattern": "4.0.5",
"tslib": "^2.3.1",
"typescript": "^4.7.4",
"vite": "^4.3.9"
"tslib": "^2.4.1",
"typescript": "^5.0.0",
"vite": "^5.0.11"
},
"type": "module",
"dependencies": {

View File

@@ -79,4 +79,4 @@ a.btn {
.w-12\/12 {
width: 100%
}
}

View File

@@ -89,4 +89,4 @@
<stop offset="1" stop-color="#3300FF" stop-opacity="0.2" />
</linearGradient>
</defs>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 5.8 KiB

After

Width:  |  Height:  |  Size: 5.8 KiB

View File

@@ -89,4 +89,4 @@
<stop offset="1" stop-color="#f3f4f6" stop-opacity="0" />
</linearGradient>
</defs>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 5.8 KiB

After

Width:  |  Height:  |  Size: 5.8 KiB

View File

@@ -76,4 +76,4 @@
<stop offset="1" stop-color="#9CFFED" stop-opacity="0" />
</linearGradient>
</defs>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 4.8 KiB

After

Width:  |  Height:  |  Size: 4.8 KiB

View File

@@ -76,4 +76,4 @@
<stop offset="1" stop-color="#6141E1" stop-opacity="0" />
</linearGradient>
</defs>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 4.8 KiB

After

Width:  |  Height:  |  Size: 4.8 KiB

View File

@@ -89,4 +89,4 @@
<stop offset="1" stop-color="#3300FF" stop-opacity="0" />
</linearGradient>
</defs>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 5.8 KiB

After

Width:  |  Height:  |  Size: 5.8 KiB

View File

@@ -3,4 +3,4 @@
<path
d="M512 1024a512 512 0 1 1 512-512 512 512 0 0 1-512 512z m0-896a384 384 0 1 0 384 384A384 384 0 0 0 512 128z m128 576h-256a64 64 0 0 1-64-64v-256a64 64 0 0 1 64-64h256a64 64 0 0 1 64 64v256a64 64 0 0 1-64 64z"
fill="#d81e06" p-id="3104"></path>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 429 B

After

Width:  |  Height:  |  Size: 430 B

View File

@@ -1 +1 @@
<svg t="1713431562066" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="6399" width="32" height="32"><path d="M592 768h-160c-26.6 0-48-21.4-48-48V384h-175.4c-35.6 0-53.4-43-28.2-68.2L484.6 11.4c15-15 39.6-15 54.6 0l304.4 304.4c25.2 25.2 7.4 68.2-28.2 68.2H640v336c0 26.6-21.4 48-48 48z m432-16v224c0 26.6-21.4 48-48 48H48c-26.6 0-48-21.4-48-48V752c0-26.6 21.4-48 48-48h272v16c0 61.8 50.2 112 112 112h160c61.8 0 112-50.2 112-112v-16h272c26.6 0 48 21.4 48 48z m-248 176c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z m128 0c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z" p-id="6400" fill="#ffffff"></path></svg>
<svg t="1713431562066" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="6399" width="32" height="32"><path d="M592 768h-160c-26.6 0-48-21.4-48-48V384h-175.4c-35.6 0-53.4-43-28.2-68.2L484.6 11.4c15-15 39.6-15 54.6 0l304.4 304.4c25.2 25.2 7.4 68.2-28.2 68.2H640v336c0 26.6-21.4 48-48 48z m432-16v224c0 26.6-21.4 48-48 48H48c-26.6 0-48-21.4-48-48V752c0-26.6 21.4-48 48-48h272v16c0 61.8 50.2 112 112 112h160c61.8 0 112-50.2 112-112v-16h272c26.6 0 48 21.4 48 48z m-248 176c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z m128 0c0-22-18-40-40-40s-40 18-40 40 18 40 40 40 40-18 40-40z" p-id="6400" fill="#ffffff"></path></svg>

Before

Width:  |  Height:  |  Size: 669 B

After

Width:  |  Height:  |  Size: 670 B

View File

@@ -6,4 +6,4 @@
<path
d="M864 479.776 864 352c0-17.664-14.304-32-32-32s-32 14.336-32 32l0 127.776c0 160.16-129.184 290.464-288 290.464-158.784 0-288-130.304-288-290.464L224 352c0-17.664-14.336-32-32-32s-32 14.336-32 32l0 127.776c0 184.608 140.864 336.48 320 352.832L480 896 288 896c-17.664 0-32 14.304-32 32s14.336 32 32 32l448 0c17.696 0 32-14.304 32-32s-14.304-32-32-32l-192 0 0-63.36C723.136 816.256 864 664.384 864 479.776z"
fill="#707070" p-id="2962"></path>
</svg>
</svg>

Before

Width:  |  Height:  |  Size: 844 B

After

Width:  |  Height:  |  Size: 845 B

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 16 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

View File

@@ -5,4 +5,4 @@
docker_compose/intel/cpu/xeon/data
docker_compose/intel/hpu/gaudi/data
inputs/
outputs/
outputs/

View File

@@ -5,20 +5,48 @@ import asyncio
import os
import sys
from comps import AvatarChatbotGateway, MicroService, ServiceOrchestrator, ServiceType
from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps.cores.proto.api_protocol import AudioChatCompletionRequest, ChatCompletionResponse
from comps.cores.proto.docarray import LLMParams
from fastapi import Request
MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
ASR_SERVICE_HOST_IP = os.getenv("ASR_SERVICE_HOST_IP", "0.0.0.0")
ASR_SERVICE_PORT = int(os.getenv("ASR_SERVICE_PORT", 9099))
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
TTS_SERVICE_HOST_IP = os.getenv("TTS_SERVICE_HOST_IP", "0.0.0.0")
TTS_SERVICE_PORT = int(os.getenv("TTS_SERVICE_PORT", 9088))
WHISPER_SERVER_HOST_IP = os.getenv("WHISPER_SERVER_HOST_IP", "0.0.0.0")
WHISPER_SERVER_PORT = int(os.getenv("WHISPER_SERVER_PORT", 7066))
LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 3006))
SPEECHT5_SERVER_HOST_IP = os.getenv("SPEECHT5_SERVER_HOST_IP", "0.0.0.0")
SPEECHT5_SERVER_PORT = int(os.getenv("SPEECHT5_SERVER_PORT", 7055))
ANIMATION_SERVICE_HOST_IP = os.getenv("ANIMATION_SERVICE_HOST_IP", "0.0.0.0")
ANIMATION_SERVICE_PORT = int(os.getenv("ANIMATION_SERVICE_PORT", 9066))
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
if self.services[cur_node].service_type == ServiceType.LLM:
# convert TGI/vLLM to unified OpenAI /v1/chat/completions format
next_inputs = {}
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
next_inputs["messages"] = [{"role": "user", "content": inputs["asr_result"]}]
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
next_inputs["top_p"] = llm_parameters_dict["top_p"]
next_inputs["stream"] = inputs["streaming"] # False as default
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
# next_inputs["presence_penalty"] = inputs["presence_penalty"]
# next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
next_inputs["temperature"] = inputs["temperature"]
inputs = next_inputs
elif self.services[cur_node].service_type == ServiceType.TTS:
next_inputs = {}
next_inputs["text"] = inputs["choices"][0]["message"]["content"]
next_inputs["voice"] = kwargs["voice"]
inputs = next_inputs
elif self.services[cur_node].service_type == ServiceType.ANIMATION:
next_inputs = {}
next_inputs["byte_str"] = inputs["tts_result"]
inputs = next_inputs
return inputs
def check_env_vars(env_var_list):
for var in env_var_list:
if os.getenv(var) is None:
@@ -31,30 +59,32 @@ class AvatarChatbotService:
def __init__(self, host="0.0.0.0", port=8000):
self.host = host
self.port = port
ServiceOrchestrator.align_inputs = align_inputs
self.megaservice = ServiceOrchestrator()
self.endpoint = str(MegaServiceEndpoint.AVATAR_CHATBOT)
def add_remote_service(self):
asr = MicroService(
name="asr",
host=ASR_SERVICE_HOST_IP,
port=ASR_SERVICE_PORT,
endpoint="/v1/audio/transcriptions",
host=WHISPER_SERVER_HOST_IP,
port=WHISPER_SERVER_PORT,
endpoint="/v1/asr",
use_remote_service=True,
service_type=ServiceType.ASR,
)
llm = MicroService(
name="llm",
host=LLM_SERVICE_HOST_IP,
port=LLM_SERVICE_PORT,
host=LLM_SERVER_HOST_IP,
port=LLM_SERVER_PORT,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
)
tts = MicroService(
name="tts",
host=TTS_SERVICE_HOST_IP,
port=TTS_SERVICE_PORT,
endpoint="/v1/audio/speech",
host=SPEECHT5_SERVER_HOST_IP,
port=SPEECHT5_SERVER_PORT,
endpoint="/v1/tts",
use_remote_service=True,
service_type=ServiceType.TTS,
)
@@ -70,7 +100,44 @@ class AvatarChatbotService:
self.megaservice.flow_to(asr, llm)
self.megaservice.flow_to(llm, tts)
self.megaservice.flow_to(tts, animation)
self.gateway = AvatarChatbotGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
async def handle_request(self, request: Request):
data = await request.json()
chat_request = AudioChatCompletionRequest.model_validate(data)
parameters = LLMParams(
# relatively lower max_tokens for audio conversation
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 128,
top_k=chat_request.top_k if chat_request.top_k else 10,
top_p=chat_request.top_p if chat_request.top_p else 0.95,
temperature=chat_request.temperature if chat_request.temperature else 0.01,
repetition_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 1.03,
streaming=False, # TODO add streaming LLM output as input to TTS
)
# print(parameters)
result_dict, runtime_graph = await self.megaservice.schedule(
initial_inputs={"audio": chat_request.audio},
llm_parameters=parameters,
voice=chat_request.voice if hasattr(chat_request, "voice") else "default",
)
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["video_path"]
return response
def start(self):
self.service = MicroService(
self.__class__.__name__,
service_role=ServiceRoleType.MEGASERVICE,
host=self.host,
port=self.port,
endpoint=self.endpoint,
input_datatype=AudioChatCompletionRequest,
output_datatype=ChatCompletionResponse,
)
self.service.add_route(self.endpoint, self.handle_request, methods=["POST"])
self.service.start()
if __name__ == "__main__":
@@ -78,16 +145,17 @@ if __name__ == "__main__":
[
"MEGA_SERVICE_HOST_IP",
"MEGA_SERVICE_PORT",
"ASR_SERVICE_HOST_IP",
"ASR_SERVICE_PORT",
"LLM_SERVICE_HOST_IP",
"LLM_SERVICE_PORT",
"TTS_SERVICE_HOST_IP",
"TTS_SERVICE_PORT",
"WHISPER_SERVER_HOST_IP",
"WHISPER_SERVER_PORT",
"LLM_SERVER_HOST_IP",
"LLM_SERVER_PORT",
"SPEECHT5_SERVER_HOST_IP",
"SPEECHT5_SERVER_PORT",
"ANIMATION_SERVICE_HOST_IP",
"ANIMATION_SERVICE_PORT",
]
)
avatarchatbot = AvatarChatbotService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
avatarchatbot = AvatarChatbotService(port=MEGA_SERVICE_PORT)
avatarchatbot.add_remote_service()
avatarchatbot.start()

View File

@@ -14,32 +14,25 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
```
### 3. Build LLM Image
```bash
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
```
Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu (https://github.com/huggingface/text-generation-inference)
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile .
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
```
### 5. Build Animation Image
```bash
docker build -t opea/wav2lip:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/dependency/Dockerfile .
docker build -t opea/wav2lip:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/integration/dependency/Dockerfile .
docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/Dockerfile .
docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/Dockerfile .
```
### 6. Build MegaService Docker Image
@@ -55,13 +48,10 @@ docker build --no-cache -t opea/avatarchatbot:latest --build-arg https_proxy=$ht
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper:latest`
2. `opea/asr:latest`
3. `opea/llm-tgi:latest`
4. `opea/speecht5:latest`
5. `opea/tts:latest`
6. `opea/wav2lip:latest`
7. `opea/animation:latest`
8. `opea/avatarchatbot:latest`
2. `opea/speecht5:latest`
3. `opea/wav2lip:latest`
4. `opea/animation:latest`
5. `opea/avatarchatbot:latest`
## 🚀 Set the environment variables
@@ -71,24 +61,21 @@ Before starting the services with `docker compose`, you have to recheck the foll
export HUGGINGFACEHUB_API_TOKEN=<your_hf_token>
export host_ip=$(hostname -I | awk '{print $1}')
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export WAV2LIP_ENDPOINT=http://$host_ip:7860
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_PORT=3006
export ANIMATION_SERVICE_HOST_IP=${host_ip}
export ANIMATION_SERVICE_PORT=3008
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export ANIMATION_SERVICE_PORT=3008
```
- Xeon CPU
@@ -124,36 +111,18 @@ curl http://${host_ip}:7066/v1/asr \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# asr microservice
curl http://${host_ip}:3001/v1/audio/transcriptions \
-X POST \
-d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# tts microservice
curl http://${host_ip}:3002/v1/audio/speech \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# wav2lip service
cd ../../../..
curl http://${host_ip}:7860/v1/wav2lip \

View File

@@ -14,14 +14,6 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "3001:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
speecht5-service:
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
container_name: speecht5-service
@@ -33,14 +25,6 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped
tts:
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
container_name: tts-service
ports:
- "3002:9088"
ipc: host
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
@@ -54,22 +38,13 @@ services:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
host_ip: ${host_ip}
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:3006/health || exit 1"]
interval: 10s
timeout: 10s
retries: 100
command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
llm:
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
container_name: llm-tgi-server
depends_on:
- tgi-service
ports:
- "3007:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
wav2lip-service:
image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
container_name: wav2lip-service
@@ -110,9 +85,6 @@ services:
image: ${REGISTRY:-opea}/avatarchatbot:${TAG:-latest}
container_name: avatarchatbot-xeon-backend-server
depends_on:
- asr
- llm
- tts
- animation
ports:
- "3009:8888"
@@ -122,12 +94,12 @@ services:
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
- ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
- ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
- TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
- TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
- WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
- WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
- LLM_SERVER_PORT=${LLM_SERVER_PORT}
- SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
- SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
- ANIMATION_SERVICE_HOST_IP=${ANIMATION_SERVICE_HOST_IP}
- ANIMATION_SERVICE_PORT=${ANIMATION_SERVICE_PORT}
ipc: host

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -14,32 +14,29 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile.intel_hpu .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
```
### 3. Build LLM Image
```bash
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```
Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/tgi-gaudi:2.0.6 (https://github.com/huggingface/tgi-gaudi)
### 4. Build TTS Image
```bash
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile.intel_hpu .
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile .
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
```
### 5. Build Animation Image
```bash
docker build -t opea/wav2lip-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/dependency/Dockerfile.intel_hpu .
docker build -t opea/wav2lip-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/integration/dependency/Dockerfile.intel_hpu .
docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/wav2lip/Dockerfile .
docker build -t opea/animation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/animation/src/Dockerfile .
```
### 6. Build MegaService Docker Image
@@ -55,13 +52,10 @@ docker build --no-cache -t opea/avatarchatbot:latest --build-arg https_proxy=$ht
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper-gaudi:latest`
2. `opea/asr:latest`
3. `opea/llm-tgi:latest`
4. `opea/speecht5-gaudi:latest`
5. `opea/tts:latest`
6. `opea/wav2lip-gaudi:latest`
7. `opea/animation:latest`
8. `opea/avatarchatbot:latest`
2. `opea/speecht5-gaudi:latest`
3. `opea/wav2lip-gaudi:latest`
4. `opea/animation:latest`
5. `opea/avatarchatbot:latest`
## 🚀 Set the environment variables
@@ -71,24 +65,21 @@ Before starting the services with `docker compose`, you have to recheck the foll
export HUGGINGFACEHUB_API_TOKEN=<your_hf_token>
export host_ip=$(hostname -I | awk '{print $1}')
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export WAV2LIP_ENDPOINT=http://$host_ip:7860
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_PORT=3006
export ANIMATION_SERVICE_HOST_IP=${host_ip}
export ANIMATION_SERVICE_PORT=3008
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export ANIMATION_SERVICE_PORT=3008
```
- Gaudi2 HPU
@@ -124,36 +115,18 @@ curl http://${host_ip}:7066/v1/asr \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# asr microservice
curl http://${host_ip}:3001/v1/audio/transcriptions \
-X POST \
-d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
# tgi service
curl http://${host_ip}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
# llm microservice
curl http://${host_ip}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-H 'Content-Type: application/json'
# speecht5 service
curl http://${host_ip}:7055/v1/tts \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# tts microservice
curl http://${host_ip}:3002/v1/audio/speech \
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
# wav2lip service
cd ../../../..
curl http://${host_ip}:7860/v1/wav2lip \

View File

@@ -15,20 +15,12 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
runtime: habana
cap_add:
- SYS_NICE
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "3001:9099"
ipc: host
environment:
ASR_ENDPOINT: ${ASR_ENDPOINT}
speecht5-service:
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
container_name: speecht5-service
@@ -39,22 +31,14 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
runtime: habana
cap_add:
- SYS_NICE
restart: unless-stopped
tts:
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
container_name: tts-service
ports:
- "3002:9088"
ipc: host
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
container_name: tgi-gaudi-server
ports:
- "3006:80"
@@ -67,7 +51,7 @@ services:
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
@@ -77,22 +61,12 @@ services:
cap_add:
- SYS_NICE
ipc: host
command: --model-id ${LLM_MODEL_ID} --max-input-length 128 --max-total-tokens 256
llm:
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
container_name: llm-tgi-gaudi-server
depends_on:
- tgi-service
ports:
- "3007:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "sleep 500 && exit 0"]
interval: 1s
timeout: 505s
retries: 1
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
wav2lip-service:
image: ${REGISTRY:-opea}/wav2lip-gaudi:${TAG:-latest}
container_name: wav2lip-service
@@ -105,7 +79,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
DEVICE: ${DEVICE}
INFERENCE_MODE: ${INFERENCE_MODE}
@@ -132,7 +106,7 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_MODULES: all
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
WAV2LIP_ENDPOINT: ${WAV2LIP_ENDPOINT}
runtime: habana
@@ -143,9 +117,6 @@ services:
image: ${REGISTRY:-opea}/avatarchatbot:${TAG:-latest}
container_name: avatarchatbot-gaudi-backend-server
depends_on:
- asr
- llm
- tts
- animation
ports:
- "3009:8888"
@@ -155,12 +126,12 @@ services:
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
- ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP}
- ASR_SERVICE_PORT=${ASR_SERVICE_PORT}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
- TTS_SERVICE_HOST_IP=${TTS_SERVICE_HOST_IP}
- TTS_SERVICE_PORT=${TTS_SERVICE_PORT}
- WHISPER_SERVER_HOST_IP=${WHISPER_SERVER_HOST_IP}
- WHISPER_SERVER_PORT=${WHISPER_SERVER_PORT}
- LLM_SERVER_HOST_IP=${LLM_SERVER_HOST_IP}
- LLM_SERVER_PORT=${LLM_SERVER_PORT}
- SPEECHT5_SERVER_HOST_IP=${SPEECHT5_SERVER_HOST_IP}
- SPEECHT5_SERVER_PORT=${SPEECHT5_SERVER_PORT}
- ANIMATION_SERVICE_HOST_IP=${ANIMATION_SERVICE_HOST_IP}
- ANIMATION_SERVICE_PORT=${ANIMATION_SERVICE_PORT}
ipc: host

View File

@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
pushd "../../../../../" > /dev/null
source .set_env.sh
popd > /dev/null

View File

@@ -14,60 +14,60 @@ services:
whisper-gaudi:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/dependency/Dockerfile.intel_hpu
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
extends: avatarchatbot
image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
whisper:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/dependency/Dockerfile
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
build:
context: GenAIComps
dockerfile: comps/asr/whisper/Dockerfile
dockerfile: comps/asr/src/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
llm-tgi:
build:
context: GenAIComps
dockerfile: comps/llms/text-generation/tgi/Dockerfile
dockerfile: comps/llms/src/text-generation/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest}
speecht5-gaudi:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/dependency/Dockerfile.intel_hpu
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
extends: avatarchatbot
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
speecht5:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/dependency/Dockerfile
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
tts:
build:
context: GenAIComps
dockerfile: comps/tts/speecht5/Dockerfile
dockerfile: comps/tts/src/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/tts:${TAG:-latest}
wav2lip-gaudi:
build:
context: GenAIComps
dockerfile: comps/animation/wav2lip/dependency/Dockerfile.intel_hpu
dockerfile: comps/animation/src/integration/dependency/Dockerfile.intel_hpu
extends: avatarchatbot
image: ${REGISTRY:-opea}/wav2lip-gaudi:${TAG:-latest}
wav2lip:
build:
context: GenAIComps
dockerfile: comps/animation/wav2lip/dependency/Dockerfile
dockerfile: comps/animation/src/integration/dependency/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
animation:
build:
context: GenAIComps
dockerfile: comps/animation/wav2lip/Dockerfile
dockerfile: comps/animation/src/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/animation:${TAG:-latest}

View File

@@ -26,10 +26,10 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="avatarchatbot whisper-gaudi asr llm-tgi speecht5-gaudi tts wav2lip-gaudi animation"
service_list="avatarchatbot whisper-gaudi speecht5-gaudi wav2lip-gaudi animation"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
@@ -41,30 +41,27 @@ function start_services() {
export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN
export host_ip=$(hostname -I | awk '{print $1}')
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export WAV2LIP_ENDPOINT=http://$host_ip:7860
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_PORT=3006
export ANIMATION_SERVICE_HOST_IP=${host_ip}
export ANIMATION_SERVICE_PORT=3008
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export ANIMATION_SERVICE_PORT=3008
export DEVICE="hpu"
export WAV2LIP_PORT=7860
export INFERENCE_MODE='wav2lip+gfpgan'
export CHECKPOINT_PATH='/usr/local/lib/python3.10/dist-packages/Wav2Lip/checkpoints/wav2lip_gan.pth'
export FACE="assets/img/avatar1.jpg"
export FACE="/home/user/comps/animation/src/assets/img/avatar1.jpg"
# export AUDIO='assets/audio/eg3_ref.wav' # audio file path is optional, will use base64str in the post request as input if is 'None'
export AUDIO='None'
export FACESIZE=96
@@ -74,21 +71,10 @@ function start_services() {
export FPS=10
# Start Docker Containers
docker compose up -d
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
# sleep 5m
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
sleep 60s
echo "All services are up and running"
sleep 5s
}
@@ -99,12 +85,10 @@ function validate_megaservice() {
if [[ $result == *"mp4"* ]]; then
echo "Result correct."
else
echo "Result wrong, print docker logs."
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs asr-service > $LOG_PATH/asr-service.log
docker logs speecht5-service > $LOG_PATH/speecht5-service.log
docker logs tts-service > $LOG_PATH/tts-service.log
docker logs tgi-gaudi-server > $LOG_PATH/tgi-gaudi-server.log
docker logs llm-tgi-gaudi-server > $LOG_PATH/llm-tgi-gaudi-server.log
docker logs wav2lip-service > $LOG_PATH/wav2lip-service.log
docker logs animation-gaudi-server > $LOG_PATH/animation-gaudi-server.log
@@ -115,11 +99,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi
docker compose down

View File

@@ -26,10 +26,10 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="avatarchatbot whisper asr llm-tgi speecht5 tts wav2lip animation"
service_list="avatarchatbot whisper speecht5 wav2lip animation"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}
@@ -41,30 +41,27 @@ function start_services() {
export HUGGINGFACEHUB_API_TOKEN=$HUGGINGFACEHUB_API_TOKEN
export host_ip=$(hostname -I | awk '{print $1}')
export TGI_LLM_ENDPOINT=http://$host_ip:3006
export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
export ASR_ENDPOINT=http://$host_ip:7066
export TTS_ENDPOINT=http://$host_ip:7055
export WAV2LIP_ENDPOINT=http://$host_ip:7860
export MEGA_SERVICE_HOST_IP=${host_ip}
export ASR_SERVICE_HOST_IP=${host_ip}
export TTS_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_PORT=3006
export ANIMATION_SERVICE_HOST_IP=${host_ip}
export ANIMATION_SERVICE_PORT=3008
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export ANIMATION_SERVICE_PORT=3008
export DEVICE="cpu"
export WAV2LIP_PORT=7860
export INFERENCE_MODE='wav2lip+gfpgan'
export CHECKPOINT_PATH='/usr/local/lib/python3.11/site-packages/Wav2Lip/checkpoints/wav2lip_gan.pth'
export FACE="assets/img/avatar5.png"
export FACE="/home/user/comps/animation/src/assets/img/avatar5.png"
# export AUDIO='assets/audio/eg3_ref.wav' # audio file path is optional, will use base64str in the post request as input if is 'None'
export AUDIO='None'
export FACESIZE=96
@@ -75,17 +72,8 @@ function start_services() {
# Start Docker Containers
docker compose up -d
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-service > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
sleep 20s
echo "All services are up and running"
sleep 5s
}
@@ -97,11 +85,8 @@ function validate_megaservice() {
echo "Result correct."
else
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs asr-service > $LOG_PATH/asr-service.log
docker logs speecht5-service > $LOG_PATH/speecht5-service.log
docker logs tts-service > $LOG_PATH/tts-service.log
docker logs tgi-service > $LOG_PATH/tgi-service.log
docker logs llm-tgi-server > $LOG_PATH/llm-tgi-server.log
docker logs wav2lip-service > $LOG_PATH/wav2lip-service.log
docker logs animation-server > $LOG_PATH/animation-server.log

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -18,7 +18,7 @@ WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt && \
pip install --no-cache-dir langchain_core

View File

@@ -0,0 +1,32 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
FROM python:3.11-slim
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
libgl1-mesa-glx \
libjemalloc-dev \
git
RUN useradd -m -s /bin/bash user && \
mkdir -p /home/user && \
chown -R user /home/user/
WORKDIR /home/user/
RUN git clone https://github.com/opea-project/GenAIComps.git
WORKDIR /home/user/GenAIComps
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
COPY ./chatqna_wrapper.py /home/user/chatqna.py
ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps
USER user
WORKDIR /home/user
RUN echo 'ulimit -S -n 999999' >> ~/.bashrc
ENTRYPOINT ["python", "chatqna.py"]

View File

@@ -4,7 +4,26 @@ Chatbots are the most widely adopted use case for leveraging the powerful chat a
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
## Deploy ChatQnA Service
## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments |
| -------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Module](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Uses Intel/neural-chat-7b-v3-3 by default |
| AWS Falcon2-11B | 4th Gen Intel Xeon with Intel AMX | [AWS Module with Falcon11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | Uses TII Falcon2-11B LLM Model |
| GCP | 5th Gen Intel Xeon with Intel AMX | [GCP Module](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Also supports Confidential AI by using Intel® TDX with 4th Gen Xeon |
| Azure | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
## Automated Deployment to Ubuntu based system(if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible**
To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform.
Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal.
| Operating System | Intel Optimized Cloud Module for Ansible |
|------------------|------------------------------------------|
| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) |
| Ubuntu 22.04 | Work-in-progress |
## Manually Deploy ChatQnA Service
The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable Processors and Nvidia GPU.
@@ -41,15 +60,18 @@ To set up environment variables for deploying ChatQnA services, follow these ste
3. Set up other environment variables:
> Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly.
> Notice that you can only choose **one** hardware option below to set up envs according to your hardware. Make sure port numbers are set correctly as well.
```bash
# on Gaudi
source ./docker_compose/intel/hpu/gaudi/set_env.sh
export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails
# on Xeon
source ./docker_compose/intel/cpu/xeon/set_env.sh
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service
# on Nvidia GPU
source ./docker_compose/nvidia/gpu/set_env.sh
export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service
```
### Quick Start: 2.Run Docker Compose
@@ -177,7 +199,7 @@ In the below, we provide a table that describes for each microservice component
Gaudi default compose.yaml
| MicroService | Open Source Project | HW | Port | Endpoint |
| ------------ | ------------------- | ----- | ---- | -------------------- |
| Embedding | Langchain | Xeon | 6000 | /v1/embaddings |
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval |
| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking |
| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions |

View File

@@ -48,7 +48,7 @@ To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2
# for better performance, set `PREFILL_BATCH_BUCKET_SIZE`, `BATCH_BUCKET_SIZE`, `max-batch-total-tokens`, `max-batch-prefill-tokens`
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
```
### Prepare Dataset

View File

@@ -1,378 +0,0 @@
# ChatQnA Benchmarking
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
## Getting Started
We recommend using Kubernetes to deploy the ChatQnA service, as it offers benefits such as load balancing and improved scalability. However, you can also deploy the service using Docker if that better suits your needs. Below is a description of Kubernetes deployment and benchmarking. For instructions on deploying and benchmarking with Docker, please refer to [this section](#benchmark-with-docker).
### Prerequisites
- Install Kubernetes by following [this guide](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md).
- Every node has direct internet access
- Set up kubectl on the master node with access to the Kubernetes cluster.
- Install Python 3.8+ on the master node for running the stress tool.
- Ensure all nodes have a local /mnt/models folder, which will be mounted by the pods.
- Ensure that the container's ulimit can meet the the number of requests.
```bash
# The way to modify the containered ulimit:
sudo systemctl edit containerd
# Add two lines:
[Service]
LimitNOFILE=65536:1048576
sudo systemctl daemon-reload; sudo systemctl restart containerd
```
### Kubernetes Cluster Example
```bash
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 35d v1.29.6
k8s-work1 Ready <none> 35d v1.29.5
k8s-work2 Ready <none> 35d v1.29.6
k8s-work3 Ready <none> 35d v1.29.6
```
### Manifest preparation
We have created the [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single node, two nodes and four nodes K8s cluster. In order to apply, we need to check out and configure some values.
```bash
# on k8s-master node
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/benchmark/performance
# replace the image tag from latest to v0.9 since we want to test with v0.9 release
IMAGE_TAG=v0.9
find . -name '*.yaml' -type f -exec sed -i "s#image: opea/\(.*\):latest#image: opea/\1:${IMAGE_TAG}#g" {} \;
# set the huggingface token
HUGGINGFACE_TOKEN=<your token>
find . -name '*.yaml' -type f -exec sed -i "s#\${HF_TOKEN}#${HUGGINGFACE_TOKEN}#g" {} \;
# set models
LLM_MODEL_ID=Intel/neural-chat-7b-v3-3
EMBEDDING_MODEL_ID=BAAI/bge-base-en-v1.5
RERANK_MODEL_ID=BAAI/bge-reranker-base
find . -name '*.yaml' -type f -exec sed -i "s#\$(LLM_MODEL_ID)#${LLM_MODEL_ID}#g" {} \;
find . -name '*.yaml' -type f -exec sed -i "s#\$(EMBEDDING_MODEL_ID)#${EMBEDDING_MODEL_ID}#g" {} \;
find . -name '*.yaml' -type f -exec sed -i "s#\$(RERANK_MODEL_ID)#${RERANK_MODEL_ID}#g" {} \;
```
### Test Configurations
By default, the workload and benchmark configuration is as below:
| Key | Value |
| -------- | ------- |
| Workload | ChatQnA |
| Tag | V0.9 |
Models configuration
| Key | Value |
| ---------- | ------------------ |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| Inference | Intel/neural-chat-7b-v3-3 |
Benchmark parameters
| Key | Value |
| ---------- | ------------------ |
| LLM input tokens | 1024 |
| LLM output tokens | 128 |
Number of test requests for different scheduled node number:
| Node count | Concurrency | Query number |
| ----- | -------- | -------- |
| 1 | 128 | 640 |
| 2 | 256 | 1280 |
| 4 | 512 | 2560 |
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
### Test Steps
#### Single node test
##### 1. Preparation
We add label to 1 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-worker1 node-type=chatqna-opea
```
##### 2. Install ChatQnA
Go to [BKC manifest](./tuned/with_rerank/single_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl apply -f .
```
##### 3. Run tests
###### 3.1 Upload Retrieval File
Before running tests, upload a specified file to make sure the llm input have the token length of 1k.
Run the following command to check the cluster ip of dataprep.
```bash
kubectl get svc
```
Substitute the `${cluster_ip}` into the real cluster ip of dataprep microservice as below.
```log
dataprep-svc ClusterIP xx.xx.xx.xx <none> 6007/TCP 5m app=dataprep-deploy
```
Run the cURL command to upload file:
```bash
cd GenAIEval/evals/benchmark/data
# RAG with Rerank
curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./upload_file.txt" \
-F "chunk_size=3800"
# RAG without Rerank
curl -X POST "http://${cluster_ip}:6007/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@./upload_file_no_rerank.txt"
```
###### 3.2 Run Benchmark Test
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[640, 640, 640, 640]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_1"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n 1
```
The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
##### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_1` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
##### 5. Clean up
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-worker1 node-type-
```
#### Two node test
##### 1. Preparation
We add label to 2 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea
```
##### 2. Install ChatQnA
Go to [BKC manifest](./tuned/with_rerank/two_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/two_gaudi
kubectl apply -f .
```
##### 3. Run tests
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[1280, 1280, 1280, 1280]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_2"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n 2
```
The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
##### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_2` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
##### 5. Clean up
```bash
# on k8s-master node
kubectl delete -f .
kubectl label nodes k8s-worker1 k8s-worker2 node-type-
```
#### Four node test
##### 1. Preparation
We add label to 4 Kubernetes node to make sure all pods are scheduled to this node:
```bash
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=chatqna-opea
```
##### 2. Install ChatQnA
Go to [BKC manifest](./tuned/with_rerank/four_gaudi) and apply to K8s.
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/four_gaudi
kubectl apply -f .
```
##### 3. Run tests
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[2560, 2560, 2560, 2560]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/node_4"
```
And then run the benchmark by:
```bash
bash benchmark.sh -n 4
```
The argument `-n` refers to the number of test nodes. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
##### 4. Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/node_4` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
##### 5. Clean up
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
```
## Benchmark with Docker
### Deploy ChatQnA service with Docker
In order to set up the environment correctly, you'll need to configure essential environment variables and, if applicable, proxy-related variables.
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
#### Deploy ChatQnA on Gaudi
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose up -d
```
Refer to the [Gaudi Guide](../../docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
#### Deploy ChatQnA on Xeon
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
docker compose up -d
```
Refer to the [Xeon Guide](../../docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
#### Deploy ChatQnA on NVIDIA GPU
```bash
cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/
docker compose up -d
```
Refer to the [NVIDIA GPU Guide](../../docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source.
### Run tests
Before the benchmark, we can configure the number of test queries and test output directory by:
```bash
export USER_QUERIES="[640, 640, 640, 640]"
export TEST_OUTPUT_DIR="/home/sdp/benchmark_output/docker"
```
And then run the benchmark by:
```bash
bash benchmark.sh -d docker -i <service-ip> -p <service-port>
```
The argument `-i` and `-p` refer to the deployed ChatQnA service IP and port, respectively. Note that necessary dependencies will be automatically installed when running benchmark for the first time.
### Data collection
All the test results will come to this folder `/home/sdp/benchmark_output/docker` configured by the environment variable `TEST_OUTPUT_DIR` in previous steps.
### Clean up
Take gaudi as example, use the below command to clean up system.
```bash
cd GenAIExamples/docker_compose/intel/hpu/gaudi
docker compose stop && docker compose rm -f
echo y | docker system prune
```

View File

@@ -1,23 +0,0 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/

View File

@@ -1,27 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
apiVersion: v2
name: chatqna-charts
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"

View File

@@ -1,36 +0,0 @@
# ChatQnA Deployment
This document guides you through deploying ChatQnA pipelines using Helm charts. Helm charts simplify managing Kubernetes applications by packaging configuration and resources.
## Getting Started
### Preparation
```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/performance/helm_charts
# Replace the key of HUGGINGFACEHUB_API_TOKEN with your actual Hugging Face token:
# vim customize.yaml
HUGGINGFACEHUB_API_TOKEN: hf_xxxxx
```
### Deploy your ChatQnA
```bash
# Deploy a ChatQnA pipeline using the specified YAML configuration.
# To deploy with different configurations, simply provide a different YAML file.
helm install chatqna helm_charts/ -f customize.yaml
```
Notes: The provided [BKC manifests](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark) for single, two, and four node Kubernetes clusters are generated using this tool.
## Customize your own ChatQnA pipelines. (Optional)
There are two yaml configs you can specify.
- customize.yaml
This file can specify image names, the number of replicas and CPU cores to manage your pods.
- values.yaml
This file contains the default microservice configurations for ChatQnA. Please review and understand each parameter before making any changes.

Some files were not shown because too many files have changed in this diff Show More