Compare commits

..

100 Commits

Author SHA1 Message Date
ZePan110
ec9db1a3e1 Merge branch 'main' into nightly-cancel 2025-05-06 16:35:38 +08:00
lkk
ff66600ab4 Fix ui dockerfile. (#1909)
Signed-off-by: lkk <33276950+lkk12014402@users.noreply.github.com>
2025-05-06 16:34:16 +08:00
ZePan110
faf6250590 Fix 1.
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-05-06 16:17:10 +08:00
ZePan110
5375332fb3 Fix security issues for helm test workflow (#1908)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-05-06 15:54:43 +08:00
Omar Khleif
df33800945 CodeGen Gradio UI Enhancements (#1904)
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
2025-05-06 13:41:21 +08:00
Ying Hu
40e44dfcd6 Update README.md of ChatQnA for broken URL (#1907)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2025-05-06 13:21:31 +08:00
ZePan110
9259ba41a5 Remove invalid codeowner. (#1896)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-30 13:24:42 +08:00
ZePan110
5c7f5718ed Restore context in EdgeCraftRAG build.yaml. (#1895)
Restore context in EdgeCraftRAG build.yaml to avoid the issue of can't find Dockerfiles.

Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-30 11:09:21 +08:00
lkk
d334f5c8fd build cpu agent ui docker image. (#1894) 2025-04-29 23:58:52 +08:00
ZePan110
670d9f3d18 Fix security issue. (#1892)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-29 19:44:48 +08:00
Zhu Yongbo
555c4100b3 Install cpu version for components (#1888)
Signed-off-by: Yongbozzz <yongbo.zhu@intel.com>
2025-04-29 10:08:23 +08:00
ZePan110
04d527d3b0 Integrate set_env to ut scripts for CodeTrans. (#1868)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-28 13:53:50 +08:00
ZePan110
13c4749ca3 Fix security issue (#1884)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-28 13:52:50 +08:00
ZePan110
99b62ae49e Integrate DocSum set_env to ut scripts. (#1860)
Integrate DocSum set_env to ut scripts.
Add README.md for DocSum and InstructionTuning UT scripts.

Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-28 13:35:05 +08:00
chen, suyue
c546d96e98 downgrade tei version from 1.6 to 1.5, fix the chatqna perf regression (#1886)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-25 23:00:36 +08:00
chen, suyue
be5933ad85 Update benchmark scripts (#1883)
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-25 17:05:48 +08:00
rbrugaro
18b4f39f27 README fixes Finance Example (#1882)
Signed-off-by: Rita Brugarolas <rita.brugarolas.brufau@intel.com>
Co-authored-by: Ying Hu <ying.hu@intel.com>
2025-04-24 23:58:08 -07:00
chyundunovDatamonsters
ef9290f245 DocSum - refactoring README.md for deploy application on ROCm (#1881)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2025-04-25 13:36:40 +08:00
chyundunovDatamonsters
3b0bcb80a8 DocSum - Adding files to deploy an application in the K8S environment using Helm (#1758)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
Co-authored-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-04-25 13:33:08 +08:00
Artem Astafev
ccc145ea1a Refine README.MD for SearchQnA on AMD ROCm platform (#1876)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
2025-04-25 10:16:03 +08:00
chyundunovDatamonsters
bb7a675665 ChatQnA - refactoring README.md for deploy application on ROCm (#1857)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
Signed-off-by: Chingis Yundunov <c.yundunov@datamonsters.com>
Co-authored-by: Chingis Yundunov <YundunovCN@sibedge.com>
Co-authored-by: Artem Astafev <a.astafev@datamonsters.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-25 08:52:24 +08:00
chen, suyue
f90a6d2a8e [CICD enhance] EdgeCraftRAG run CI with latest base image, group logs in GHA outputs. (#1877)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-24 16:18:44 +08:00
chyundunovDatamonsters
1fdab591d9 CodeTrans - refactoring README.md for deploy application on ROCm with Docker Compose (#1875)
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
2025-04-24 15:28:57 +08:00
chen, suyue
13ea13862a Remove proxy in CodeTrans test (#1874)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-24 13:47:56 +08:00
ZePan110
1787d1ee98 Update image links. (#1866)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-24 13:34:41 +08:00
Artem Astafev
db4bf1a4c3 Refine README.MD for AMD ROCm docker compose deployment (#1856)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
2025-04-24 11:00:51 +08:00
chen, suyue
f7002fcb70 Set opea_branch for CD test (#1870)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-24 09:49:20 +08:00
Artem Astafev
c39c875211 Fix compose file and functional tests for Avatarchatbot on AMD ROCm platform (#1872)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
2025-04-23 22:58:25 +08:00
Artem Astafev
c2e9a259fe Refine AuidoQnA README.MD for AMD ROCm docker compose deployment (#1862)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
2025-04-23 13:55:01 +08:00
Omar Khleif
48eaf9c1c9 Added CodeGen Gradio README link to Docker Images List (#1864)
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-04-22 15:28:49 -07:00
Ervin Castelino
a39824f142 Update README.md of DBQnA (#1855)
Co-authored-by: Ying Hu <ying.hu@intel.com>
2025-04-22 15:56:37 -04:00
Dina Suehiro Jones
e10e6dd002 Fixes for MultimodalQnA with the Milvus vector db (#1859)
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
2025-04-21 16:05:11 -07:00
chen, suyue
ea17b38ac5 [CICD enhance] AudioQnA run CI with latest base image, group logs in GHA outputs. (#1854)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-21 21:58:02 +08:00
sgurunat
ea9d444bbf New Productivity Suite react UI and Bug Fixes (#1834)
Signed-off-by: Gurunath S <gurunath.s@intel.com>
2025-04-21 18:33:25 +08:00
Yao Qing
262ad7d6ec Refine readme of CodeGen (#1797)
Signed-off-by: Yao, Qing <qing.yao@intel.com>
2025-04-21 17:49:15 +08:00
Spycsh
608dc963c9 Refine readme of AudioQnA (#1804) 2025-04-21 17:30:14 +08:00
chen, suyue
ef2156fbf4 Enable more flexible support for test HWs (#1816)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-21 17:25:01 +08:00
WenjiaoYue
52c4db2fc6 [ SearchQnA ] Refine documents (#1803)
Signed-off-by: WenjiaoYue <wenjiao.yue@intel.com>
2025-04-21 17:16:41 +08:00
Letong Han
697f78ea71 Refine the READMEs of CodeTrans (#1796)
Signed-off-by: letonghan <letong.han@intel.com>
2025-04-21 17:14:46 +08:00
chen, suyue
e96f5a1ac5 AgentQnA group log lines in test outputs for better readable logs. (#1817)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-21 15:27:28 +08:00
Zhu Yongbo
82ef639ee3 hot fix for permission issue (#1849)
Signed-off-by: Yongbozzz <yongbo.zhu@intel.com>
2025-04-21 10:32:46 +08:00
vrantala
29d449b3ca Added Initial version of DocSum support for benchmarking scripts for OPEA (#1840)
Signed-off-by: Valtteri Rantala <valtteri.rantala@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
2025-04-21 10:32:28 +08:00
Shifani Rajabose
338f81430d [Bug: 900] Create a version of MultimodalQnA example with Zilliz/Milvus as Vector DB (#1639)
Signed-off-by: Shifani Rajabose <srajabose@habana.ai>
Signed-off-by: Pallavi Jaini <pallavi.jaini@intel.com>
2025-04-21 10:11:39 +08:00
dolpher
87e3c0f59f Update chatqna values file changes (#1844)
Signed-off-by: Dolpher Du <dolpher.du@intel.com>
2025-04-21 09:38:07 +08:00
Spycsh
27813b3bf9 add AudioQnA key parameters to comply with the image size reduction (#1833) 2025-04-20 16:34:19 +08:00
XinyaoWa
c7f06d5e54 Refine documents for DocSum (#1802)
Signed-off-by: Xinyao <xinyao.wang@intel.com>
2025-04-20 16:20:20 +08:00
ZePan110
0967fcac86 [ Translation ] Refine documents (#1795)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-20 16:18:38 +08:00
Omar Khleif
a3eba01879 CodeGen Gradio UI Updates for new delete endpoint features (#1851)
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
2025-04-20 16:17:32 +08:00
XinyuYe-Intel
bc168f1732 Refine readme of InstructionTuning (#1794)
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
2025-04-20 16:17:13 +08:00
Liang Lv
1eb2e36a18 Refine ChatQnA READMEs (#1850)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-04-20 10:34:24 +08:00
Louie Tsai
1a9a2dd53c Redirect users to new github.io sections for AgentQnA opentelemetry materials (#1846)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-04-17 23:40:15 -07:00
sri-intel
c63e2cd067 Remote inference support for examples in Productivity suite (#1818)
Signed-off-by: Srinarayan Srikanthan <srinarayan.srikanthan@intel.com>
2025-04-18 14:36:57 +08:00
Louie Tsai
c793dd0b51 Redirect Users to github.io for ChatQnA telemetry materials (#1845)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-04-17 23:35:30 -07:00
Ying Hu
1b3f1f632a Update README.md of ChatQnA for layout (#1842)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-18 11:41:35 +08:00
Zhu Yongbo
4c05e7fd1c fix missing package (#1841)
Signed-off-by: Yongbozzz <yongbo.zhu@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-17 22:57:53 +08:00
sri-intel
90cfe89e21 new chatqna readme template (#1755)
Signed-off-by: Srinarayan Srikanthan <srinarayan.srikanthan@intel.com>
2025-04-17 16:38:40 +08:00
ZePan110
62f7f5bd34 Update docker images list. (#1835)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-17 16:26:50 +08:00
Letong Han
7c6189cf43 Enable dataprep health check for examples (#1800)
Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-17 15:52:06 +08:00
Letong Han
ae31e4fb75 Enable health check for dataprep in ChatQnA (#1799)
Signed-off-by: letonghan <letong.han@intel.com>
2025-04-17 15:01:57 +08:00
xiguiw
4fc19c7d73 Update TEI docker images to CPU-1.6 (#1791)
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
2025-04-17 15:00:06 +08:00
Letong Han
b80449b8ab Fix Multimodal & ProductivitySuite Issue (#1820)
1. add data-prep health check
2. add create conda env

Signed-off-by: letonghan <letong.han@intel.com>
2025-04-17 09:30:15 +08:00
minmin-intel
8aa96c6278 Update FinanceAgent v1.3 (#1819)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-16 15:44:46 -07:00
Abolfazl Shahbazi
a7ef8333ee Adding the two missing packages for ingest script (#1822)
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-04-16 09:46:45 -07:00
Liang Lv
71fe886ce9 Replaced TGI with vLLM for guardrail serving (#1815)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-04-16 17:06:11 +08:00
XinyuYe-Intel
1a6f821c95 Remove template_llava.jinja in command (#1831)
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
2025-04-16 17:05:55 +08:00
chen, suyue
1095d88c5f Group log lines in GHA outputs for better readable logs. (#1821)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-16 13:17:53 +08:00
mahathis
c73b09a758 Update AgentQnA and DocSum for Gaudi Compatibility (#1777)
Signed-off-by: Mahathi Vatsal <mahathi.vatsal.salopanthula@intel.com>
2025-04-15 22:01:27 -07:00
Liang Lv
13dd27e6d5 Update vLLM parameter max-seq-len-to-capture (#1809)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-04-15 14:27:12 +08:00
chen, suyue
a222d1cfbb Optimize the nightly/weekly example test (#1806)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-14 17:46:49 +08:00
minmin-intel
1852e6bcc3 Add Finance Agent Example (#1752)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Signed-off-by: Rita Brugarolas <rita.brugarolas.brufau@intel.com>
Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: lkk12014402 <kaokao.lv@intel.com>
2025-04-14 14:27:07 +08:00
Neo Zhang Jianyu
72ce335663 add 'N/A' to option (#1801)
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
2025-04-14 11:05:56 +08:00
chen, suyue
15d76c0889 support rocm helm charts test (#1787)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-13 22:36:16 +08:00
Chaunte W. Lacewell
c4763434b8 Fix VideoQnA (#1696)
This PR fixes the VideoQnA example.

Fixes Issues #1476 #1478 #1477

Signed-off-by: zhanmyz <yazhan.ma@intel.com>
Signed-off-by: Lacewell, Chaunte W <chaunte.w.lacewell@intel.com>
2025-04-12 18:15:02 +08:00
minmin-intel
58b47c15c6 update AgentQnA (#1790)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-11 13:33:19 -07:00
ZePan110
8d421b7912 [Translation] Integrate set_env.sh into test scripts. (#1785)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-04-11 09:31:40 +08:00
ZePan110
e9cafb3343 Redefine docker images list. (#1743)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-10 16:38:46 +08:00
ZePan110
1737d4b2b4 Update model cache for MultimodalQnA (#1618)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-10 16:25:41 +08:00
chen, suyue
177da5e6fc Add new secrets for docker compose test (#1786)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-10 16:12:51 +08:00
XinyaoWa
063547fb66 Align DocSum env to vllm (#1784)
Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>
Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com>
2025-04-10 11:38:24 +08:00
ZePan110
c3bb59a354 Unified build.yaml file writing style (#1781)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-04-10 10:58:45 +08:00
ZePan110
8c763cbe11 Enable AvatarChatbot model cache for docker compose test. (#1604)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-04-10 09:54:30 +08:00
minmin-intel
411bb28f41 fix bugs in DocIndexRetriever (#1770)
Signed-off-by: minmin-intel <minmin.hou@intel.com>
2025-04-10 09:45:46 +08:00
ZePan110
00d7a65dd8 Enable model cache for Rocm docker compose test. (#1614)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-10 09:40:37 +08:00
Artem Astafev
795c29fe87 Adding files to deploy MultimodalQnA application on ROCm vLLM (#1737)
Signed-off-by: Artem Astafev <a.astafev@datamonsters.com>
2025-04-10 09:34:58 +08:00
pre-commit-ci[bot]
094ca7aefe [pre-commit.ci] pre-commit autoupdate (#1771)
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-04-09 11:51:57 -07:00
Liang Lv
398441a10c Fix typo in CodeGen README (#1783)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-04-09 16:43:53 +08:00
Mustafa
892624f539 CodGen Examples using-RAG-and-Agents (#1757)
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
2025-04-09 16:12:20 +08:00
Eero Tamminen
8b7cb3539e Use GenAIComp base image to simplify Dockerfiles & reduce image sizes (#1369)
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-04-09 14:51:10 +08:00
ZePan110
5f4b3a6d12 Adaptation to vllm v0.8.3 build paths (#1761)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-09 13:20:02 +08:00
Yazhan Ma
0392610776 Iteratively add image docker hub description (#1768)
Signed-off-by: zhanmyz <yazhan.ma@intel.com>
2025-04-09 12:00:45 +08:00
Lucas Melo
2d8a7e25f6 Update ChatQna & CodeGen README.md with new Automated Terraform Deployment Options (#1731)
Signed-off-by: lucasmelogithub <lucas.melo@intel.com>
2025-04-09 10:54:01 +08:00
Chun Tao
4d652719c2 Fix GenAIExamples #1607 (#1776)
Fix issue #1607

Signed-off-by: Chun Tao <chun.tao@intel.com>
2025-04-09 10:10:07 +08:00
Liang Lv
7b7728c6c3 Fix vLLM CPU initialize engine issue for DeepSeek models (#1762)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-04-09 09:47:08 +08:00
XinyaoWa
6917d5bdb1 Fix ChatQnA port to internal vllm port (#1763)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2025-04-09 09:37:11 +08:00
dolpher
46ebb78aa3 Sync values yaml file for 1.3 release (#1748)
Signed-off-by: Dolpher Du <dolpher.du@intel.com>
2025-04-08 22:39:40 +08:00
chen, suyue
b14db6dbd3 fix docker image clean up issue (#1773)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-08 22:26:37 +08:00
lkk
ff8008b6d0 compatible open-webui for opea agent. (#1765) 2025-04-08 21:54:01 +08:00
Spycsh
d4952d1e7c Refine third parties links (#1764)
Signed-off-by: Spycsh <sihan.chen@intel.com>
2025-04-08 18:39:13 +08:00
chen, suyue
12932477ee Add dockerhub login step to avoid 429 Too Many Requests (#1772)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-04-08 14:29:36 +08:00
ZePan110
42735d0d7d Fix vllm and vllm-fork tags (#1766)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-04-07 22:58:50 +08:00
496 changed files with 24188 additions and 12637 deletions

10
.github/CODEOWNERS vendored
View File

@@ -4,13 +4,13 @@
/AudioQnA/ sihan.chen@intel.com wenjiao.yue@intel.com
/AvatarChatbot/ chun.tao@intel.com kaokao.lv@intel.com
/ChatQnA/ liang1.lv@intel.com letong.han@intel.com
/CodeGen/ liang1.lv@intel.com xinyao.wang@intel.com
/CodeTrans/ sihan.chen@intel.com xinyao.wang@intel.com
/CodeGen/ liang1.lv@intel.com
/CodeTrans/ sihan.chen@intel.com
/DBQnA/ supriya.krishnamurthi@intel.com liang1.lv@intel.com
/DocIndexRetriever/ kaokao.lv@intel.com chendi.xue@intel.com
/DocSum/ letong.han@intel.com xinyao.wang@intel.com
/DocSum/ letong.han@intel.com
/EdgeCraftRAG/ yongbo.zhu@intel.com mingyuan.qi@intel.com
/FaqGen/ yogesh.pandey@intel.com xinyao.wang@intel.com
/FaqGen/ yogesh.pandey@intel.com
/GraphRAG/ rita.brugarolas.brufau@intel.com abolfazl.shahbazi@intel.com
/InstructionTuning/ xinyu.ye@intel.com kaokao.lv@intel.com
/MultimodalQnA/ melanie.h.buehler@intel.com tiep.le@intel.com
@@ -19,5 +19,5 @@
/SearchQnA/ sihan.chen@intel.com letong.han@intel.com
/Text2Image/ wenjiao.yue@intel.com xinyu.ye@intel.com
/Translation/ liang1.lv@intel.com sihan.chen@intel.com
/VideoQnA/ huiling.bao@intel.com xinyao.wang@intel.com
/VideoQnA/ huiling.bao@intel.com
/VisualQnA/ liang1.lv@intel.com sihan.chen@intel.com

View File

@@ -32,6 +32,7 @@ body:
- Mac
- BSD
- Other (Please let us know in description)
- N/A
validations:
required: true
@@ -56,6 +57,7 @@ body:
- GPU-Nvidia
- GPU-AMD
- GPU-other (Please let us know in description)
- N/A
validations:
required: true
@@ -67,6 +69,7 @@ body:
- label: Pull docker images from hub.docker.com
- label: Build docker images from source
- label: Other
- label: N/A
validations:
required: true
@@ -80,6 +83,7 @@ body:
- label: Kubernetes Helm Charts
- label: Kubernetes GMC
- label: Other
- label: N/A
validations:
required: true
@@ -91,6 +95,7 @@ body:
- Single Node
- Multiple Nodes
- Other
- N/A
default: 0
validations:
required: true

View File

@@ -32,6 +32,7 @@ body:
- Mac
- BSD
- Other (Please let us know in description)
- N/A
validations:
required: true
@@ -56,6 +57,7 @@ body:
- GPU-Nvidia
- GPU-AMD
- GPU-other (Please let us know in description)
- N/A
validations:
required: true
@@ -67,6 +69,7 @@ body:
- Single Node
- Multiple Nodes
- Other
- N/A
default: 0
validations:
required: true

View File

@@ -35,9 +35,9 @@ jobs:
- name: Check if job should be skipped
id: check-skip
run: |
should_skip=false
if [[ "${{ inputs.node }}" == "gaudi3" || "${{ inputs.node }}" == "rocm" || "${{ inputs.node }}" == "arc" ]]; then
should_skip=true
should_skip=true
if [[ "${{ inputs.node }}" == "gaudi" || "${{ inputs.node }}" == "xeon" ]]; then
should_skip=false
fi
echo "should_skip=$should_skip"
echo "should_skip=$should_skip" >> $GITHUB_OUTPUT

View File

@@ -42,9 +42,9 @@ jobs:
- name: Check if job should be skipped
id: check-skip
run: |
should_skip=false
if [[ "${{ inputs.node }}" == "gaudi3" || "${{ inputs.node }}" == "rocm" || "${{ inputs.node }}" == "arc" ]]; then
should_skip=true
should_skip=true
if [[ "${{ inputs.node }}" == "gaudi" || "${{ inputs.node }}" == "xeon" ]]; then
should_skip=false
fi
echo "should_skip=$should_skip"
echo "should_skip=$should_skip" >> $GITHUB_OUTPUT
@@ -77,15 +77,13 @@ jobs:
docker_compose_path=${{ github.workspace }}/${{ inputs.example }}/docker_image_build/build.yaml
if [[ $(grep -c "vllm:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/vllm-project/vllm.git && cd vllm
# Get the latest tag
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
VLLM_VER=v0.8.3
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
fi
if [[ $(grep -c "vllm-gaudi:" ${docker_compose_path}) != 0 ]]; then
git clone https://github.com/HabanaAI/vllm-fork.git && cd vllm-fork
# Get the latest tag
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
fi

View File

@@ -76,6 +76,7 @@ jobs:
example: ${{ inputs.example }}
hardware: ${{ inputs.node }}
use_model_cache: ${{ inputs.use_model_cache }}
opea_branch: ${{ inputs.opea_branch }}
secrets: inherit

View File

@@ -2,7 +2,9 @@
# SPDX-License-Identifier: Apache-2.0
name: Helm Chart E2e Test For Call
permissions: read-all
permissions:
contents: read
on:
workflow_call:
inputs:
@@ -81,6 +83,10 @@ jobs:
if [[ "${{ inputs.hardware }}" == "gaudi" ]]; then
value_files="${value_files}\"${filename}\","
fi
elif [[ "$filename" == *"rocm"* ]]; then
if [[ "${{ inputs.hardware }}" == "rocm" ]]; then
value_files="${value_files}\"${filename}\","
fi
elif [[ "$filename" == *"nv"* ]]; then
continue
else
@@ -131,16 +137,28 @@ jobs:
env:
example: ${{ inputs.example }}
run: |
CHART_NAME="${example,,}" # CodeGen
echo "CHART_NAME=$CHART_NAME" >> $GITHUB_ENV
echo "RELEASE_NAME=${CHART_NAME}$(date +%Y%m%d%H%M%S)" >> $GITHUB_ENV
echo "NAMESPACE=${CHART_NAME}-$(head -c 4 /dev/urandom | xxd -p)" >> $GITHUB_ENV
echo "ROLLOUT_TIMEOUT_SECONDS=600s" >> $GITHUB_ENV
echo "TEST_TIMEOUT_SECONDS=600s" >> $GITHUB_ENV
echo "KUBECTL_TIMEOUT_SECONDS=60s" >> $GITHUB_ENV
echo "should_cleanup=false" >> $GITHUB_ENV
echo "skip_validate=false" >> $GITHUB_ENV
echo "CHART_FOLDER=${example}/kubernetes/helm" >> $GITHUB_ENV
if [[ ! "$example" =~ ^[a-zA-Z]{1,20}$ ]] || [[ "$example" =~ \.\. ]] || [[ "$example" == -* || "$example" == *- ]]; then
echo "Error: Invalid input - only lowercase alphanumeric and internal hyphens allowed"
exit 1
fi
# SAFE_PREFIX="kb-"
CHART_NAME="${SAFE_PREFIX}$(echo "$example" | tr '[:upper:]' '[:lower:]')"
RAND_SUFFIX=$(openssl rand -hex 2 | tr -dc 'a-f0-9')
cat <<EOF >> $GITHUB_ENV
CHART_NAME=${CHART_NAME}
RELEASE_NAME=${CHART_NAME}-$(date +%s)
NAMESPACE=ns-${CHART_NAME}-${RAND_SUFFIX}
ROLLOUT_TIMEOUT_SECONDS=600s
TEST_TIMEOUT_SECONDS=600s
KUBECTL_TIMEOUT_SECONDS=60s
should_cleanup=false
skip_validate=false
CHART_FOLDER=${example}/kubernetes/helm
EOF
echo "Generated safe variables:" >> $GITHUB_STEP_SUMMARY
echo "- CHART_NAME: ${CHART_NAME}" >> $GITHUB_STEP_SUMMARY
- name: Helm install
id: install

View File

@@ -32,6 +32,10 @@ on:
required: false
type: boolean
default: false
opea_branch:
default: "main"
required: false
type: string
jobs:
get-test-case:
runs-on: ubuntu-latest
@@ -64,8 +68,10 @@ jobs:
cd ${{ github.workspace }}/${{ inputs.example }}/tests
run_test_cases=""
if [ "${{ inputs.hardware }}" == "gaudi2" ] || [ "${{ inputs.hardware }}" == "gaudi3" ]; then
if [[ "${{ inputs.hardware }}" == "gaudi"* ]]; then
hardware="gaudi"
elif [[ "${{ inputs.hardware }}" == "xeon"* ]]; then
hardware="xeon"
else
hardware="${{ inputs.hardware }}"
fi
@@ -116,13 +122,17 @@ jobs:
run: |
sudo rm -rf ${{github.workspace}}/* || true
# clean up containers use ports
cid=$(docker ps --format '{{.Names}} : {{.Ports}}' | grep -v ' : $' | grep -v 5000 | awk -F' : ' '{print $1}')
echo "Cleaning up containers using ports..."
cid=$(docker ps --format '{{.Names}} : {{.Ports}}' | grep -v ' : $' | grep -v 0.0.0.0:5000 | awk -F' : ' '{print $1}')
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
docker system prune -f
docker rmi $(docker images --filter reference="*/*/*:latest" -q) || true
docker rmi $(docker images --filter reference="*/*:ci" -q) || true
echo "Cleaning up images ..."
docker images --filter reference="*/*/*:latest" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="*/*:ci" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="*:5000/*/*" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="opea/comps-base" -q | xargs -r docker rmi && sleep 1s
docker images
- name: Checkout out Repo
uses: actions/checkout@v4
@@ -141,6 +151,12 @@ jobs:
bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "ports"
docker ps
- name: Log in DockerHub
uses: docker/login-action@v3.2.0
with:
username: ${{ secrets.DOCKERHUB_USER }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Run test
shell: bash
env:
@@ -153,8 +169,11 @@ jobs:
SDK_BASE_URL: ${{ secrets.SDK_BASE_URL }}
SERVING_TOKEN: ${{ secrets.SERVING_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
FINNHUB_API_KEY: ${{ secrets.FINNHUB_API_KEY }}
FINANCIAL_DATASETS_API_KEY: ${{ secrets.FINANCIAL_DATASETS_API_KEY }}
IMAGE_REPO: ${{ inputs.registry }}
IMAGE_TAG: ${{ inputs.tag }}
opea_branch: ${{ inputs.opea_branch }}
example: ${{ inputs.example }}
hardware: ${{ inputs.hardware }}
test_case: ${{ matrix.test_case }}
@@ -167,30 +186,38 @@ jobs:
export model_cache="/data2/hf_model"
else
echo "Model cache directory /data2/hf_model does not exist"
export model_cache="~/.cache/huggingface/hub"
export model_cache="$HOME/.cache/huggingface/hub"
fi
if [[ "$test_case" == *"rocm"* ]]; then
export model_cache="/var/lib/GenAI/data"
fi
fi
if [ -f "${test_case}" ]; then timeout 60m bash "${test_case}"; else echo "Test script {${test_case}} not found, skip test!"; fi
- name: Clean up container after test
shell: bash
if: cancelled() || failure()
if: always()
run: |
cd ${{ github.workspace }}/${{ inputs.example }}
export test_case=${{ matrix.test_case }}
export hardware=${{ inputs.hardware }}
bash ${{ github.workspace }}/.github/workflows/scripts/docker_compose_clean_up.sh "containers"
set -x
# clean up containers use ports
cid=$(docker ps --format '{{.Names}} : {{.Ports}}' | grep -v ' : $' | grep -v 5000 | awk -F' : ' '{print $1}')
echo "Cleaning up containers using ports..."
cid=$(docker ps --format '{{.Names}} : {{.Ports}}' | grep -v ' : $' | grep -v 0.0.0.0:5000 | awk -F' : ' '{print $1}')
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
docker system prune -f
docker rmi $(docker images --filter reference="*:5000/*/*" -q) || true
echo "Cleaning up images ..."
if [[ "${{ inputs.hardware }}" == "xeon"* ]]; then
docker system prune -a -f
else
docker images --filter reference="*/*/*:latest" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="*/*:ci" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="*:5000/*/*" -q | xargs -r docker rmi && sleep 1s
docker images --filter reference="opea/comps-base" -q | xargs -r docker rmi && sleep 1s
docker system prune -f
fi
docker images
- name: Publish pipeline artifact
if: ${{ !cancelled() }}
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.example }}_${{ matrix.test_case }}
name: ${{ inputs.hardware }}_${{ inputs.example }}_${{ matrix.test_case }}
path: ${{ github.workspace }}/${{ inputs.example }}/tests/*.log

File diff suppressed because it is too large Load Diff

View File

@@ -7,7 +7,7 @@ on:
inputs:
nodes:
default: "gaudi,xeon"
description: "Hardware to run test gaudi,gaudi3,xeon,rocm,arc"
description: "Hardware to run test gaudi,xeon,rocm,arc,gaudi3,xeon-gnr"
required: true
type: string
examples:

View File

@@ -5,11 +5,11 @@ name: Nightly build/publish latest docker images
on:
schedule:
- cron: "30 14 * * *" # UTC time
- cron: "30 14 * * 1-5" # UTC time
workflow_dispatch:
env:
EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
EXAMPLES: CodeGen,CodeTrans #${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
TAG: "latest"
PUBLISH_TAGS: "latest"
@@ -38,8 +38,21 @@ jobs:
with:
node: gaudi
build-and-test:
needs: get-build-matrix
build-images:
needs: [get-build-matrix, build-comps-base]
strategy:
matrix:
example: ${{ fromJSON(needs.get-build-matrix.outputs.examples_json) }}
fail-fast: false
uses: ./.github/workflows/_build_image.yml
with:
node: gaudi
example: ${{ matrix.example }}
inject_commit: true
secrets: inherit
test-example:
needs: [get-build-matrix]
if: ${{ needs.get-build-matrix.outputs.examples_json != '' }}
strategy:
matrix:
@@ -47,21 +60,22 @@ jobs:
fail-fast: false
uses: ./.github/workflows/_example-workflow.yml
with:
node: gaudi
node: xeon
build: false
example: ${{ matrix.example }}
test_compose: true
inject_commit: true
secrets: inherit
get-image-list:
needs: get-build-matrix
needs: [get-build-matrix]
uses: ./.github/workflows/_get-image-list.yml
with:
examples: ${{ needs.get-build-matrix.outputs.EXAMPLES }}
publish:
needs: [get-build-matrix, get-image-list, build-and-test]
if: always() && ${{ needs.get-image-list.outputs.matrix != '' }}
needs: [get-build-matrix, get-image-list, build-images]
if: ${{ success() }}
strategy:
matrix:
image: ${{ fromJSON(needs.get-image-list.outputs.matrix) }}

View File

@@ -19,6 +19,9 @@ concurrency:
jobs:
job1:
name: Get-Test-Matrix
permissions:
contents: read
pull-requests: read
runs-on: ubuntu-latest
outputs:
run_matrix: ${{ steps.get-test-matrix.outputs.run_matrix }}
@@ -46,6 +49,8 @@ jobs:
example=$(echo "$values_file" | cut -d'/' -f1) # CodeGen
if [[ "$valuefile" == *"gaudi"* ]]; then
hardware="gaudi"
elif [[ "$valuefile" == *"rocm"* ]]; then
hardware="rocm"
elif [[ "$valuefile" == *"nv"* ]]; then
continue
else

View File

@@ -7,7 +7,7 @@ source /GenAIExamples/.github/workflows/scripts/change_color
log_dir=/GenAIExamples/.github/workflows/scripts/codeScan
ERROR_WARN=false
find . -type f \( -name "Dockerfile*" \) -print -exec hadolint --ignore DL3006 --ignore DL3007 --ignore DL3008 --ignore DL3013 {} \; > ${log_dir}/hadolint.log
find . -type f \( -name "Dockerfile*" \) -print -exec hadolint --ignore DL3006 --ignore DL3007 --ignore DL3008 --ignore DL3013 --ignore DL3018 --ignore DL3016 {} \; > ${log_dir}/hadolint.log
if [[ $(grep -c "error" ${log_dir}/hadolint.log) != 0 ]]; then
$BOLD_RED && echo "Error!! Please Click on the artifact button to download and check error details." && $RESET

View File

@@ -0,0 +1,55 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
name: Weekly test all examples on multiple HWs
on:
schedule:
- cron: "30 2 * * 6" # UTC time
workflow_dispatch:
env:
EXAMPLES: ${{ vars.NIGHTLY_RELEASE_EXAMPLES }}
NODES: "gaudi,xeon,rocm,arc"
jobs:
get-test-matrix:
runs-on: ubuntu-latest
outputs:
examples: ${{ steps.get-matrix.outputs.examples }}
nodes: ${{ steps.get-matrix.outputs.nodes }}
steps:
- name: Create Matrix
id: get-matrix
run: |
examples=($(echo ${EXAMPLES} | tr ',' ' '))
examples_json=$(printf '%s\n' "${examples[@]}" | sort -u | jq -R '.' | jq -sc '.')
echo "examples=$examples_json" >> $GITHUB_OUTPUT
nodes=($(echo ${NODES} | tr ',' ' '))
nodes_json=$(printf '%s\n' "${nodes[@]}" | sort -u | jq -R '.' | jq -sc '.')
echo "nodes=$nodes_json" >> $GITHUB_OUTPUT
build-comps-base:
needs: [get-test-matrix]
strategy:
matrix:
node: ${{ fromJson(needs.get-test-matrix.outputs.nodes) }}
uses: ./.github/workflows/_build_comps_base_image.yml
with:
node: ${{ matrix.node }}
run-examples:
needs: [get-test-matrix, build-comps-base]
strategy:
matrix:
example: ${{ fromJson(needs.get-test-matrix.outputs.examples) }}
node: ${{ fromJson(needs.get-test-matrix.outputs.nodes) }}
fail-fast: false
uses: ./.github/workflows/_example-workflow.yml
with:
node: ${{ matrix.node }}
example: ${{ matrix.example }}
build: true
test_compose: true
test_helmchart: true
secrets: inherit

View File

@@ -74,7 +74,7 @@ repos:
name: Unused noqa
- repo: https://github.com/pycqa/isort
rev: 5.13.2
rev: 6.0.1
hooks:
- id: isort
@@ -100,7 +100,7 @@ repos:
- prettier@3.2.5
- repo: https://github.com/psf/black.git
rev: 24.10.0
rev: 25.1.0
hooks:
- id: black
files: (.*\.py)$
@@ -114,7 +114,7 @@ repos:
- black==24.10.0
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
rev: v2.4.1
hooks:
- id: codespell
args: [-w]
@@ -122,7 +122,7 @@ repos:
- tomli
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.6
rev: v0.11.4
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --no-cache]

View File

@@ -4,9 +4,10 @@
1. [Overview](#overview)
2. [Deploy with Docker](#deploy-with-docker)
3. [Launch the UI](#launch-the-ui)
3. [How to interact with the agent system with UI](#how-to-interact-with-the-agent-system-with-ui)
4. [Validate Services](#validate-services)
5. [Register Tools](#how-to-register-other-tools-with-the-ai-agent)
6. [Monitoring and Tracing](#monitor-and-tracing)
## Overview
@@ -144,21 +145,19 @@ source $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon/set_env.sh
### 2. Launch the multi-agent system. </br>
Two options are provided for the `llm_engine` of the agents: 1. open-source LLMs on Gaudi, 2. OpenAI models via API calls.
We make it convenient to launch the whole system with docker compose, which includes microservices for LLM, agents, UI, retrieval tool, vector database, dataprep, and telemetry. There are 3 docker compose files, which make it easy for users to pick and choose. Users can choose a different retrieval tool other than the `DocIndexRetriever` example provided in our GenAIExamples repo. Users can choose not to launch the telemetry containers.
#### Gaudi
#### Launch on Gaudi
On Gaudi, `meta-llama/Meta-Llama-3.1-70B-Instruct` will be served using vllm.
By default, both the RAG agent and SQL agent will be launched to support the React Agent.
The React Agent requires the DocIndexRetriever's [`compose.yaml`](../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml) file, so two `compose.yaml` files need to be run with docker compose to start the multi-agent system.
> **Note**: To enable the web search tool, skip this step and proceed to the "[Optional] Web Search Tool Support" section.
On Gaudi, `meta-llama/Meta-Llama-3.3-70B-Instruct` will be served using vllm. The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.
```bash
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi/
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml up -d
```
> **Note**: To enable the web search tool, skip this step and proceed to the "[Optional] Web Search Tool Support" section.
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
Gaudi example with Open Telemetry feature:
@@ -183,11 +182,9 @@ docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/
</details>
#### Xeon
#### Launch on Xeon
On Xeon, only OpenAI models are supported.
By default, both the RAG Agent and SQL Agent will be launched to support the React Agent.
The React Agent requires the DocIndexRetriever's [`compose.yaml`](../DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml) file, so two `compose yaml` files need to be run with docker compose to start the multi-agent system.
On Xeon, only OpenAI models are supported. The command below will launch the multi-agent system with the `DocIndexRetriever` as the retrieval tool for the Worker RAG agent.
```bash
export OPENAI_API_KEY=<your-openai-key>
@@ -206,11 +203,19 @@ bash run_ingest_data.sh
> **Note**: This is a one-time operation.
## Launch the UI
## How to interact with the agent system with UI
Open a web browser to http://localhost:5173 to access the UI. Ensure the environment variable `AGENT_URL` is set to http://$ip_address:9090/v1/chat/completions in [ui/svelte/.env](./ui/svelte/.env) or else the UI may not work properly.
The UI microservice is launched in the previous step with the other microservices.
To see the UI, open a web browser to `http://${ip_address}:5173` to access the UI. Note the `ip_address` here is the host IP of the UI microservice.
The AgentQnA UI can be deployed locally or using Docker. To customize deployment, refer to the [AgentQnA UI Guide](./ui/svelte/README.md).
1. `create Admin Account` with a random value
2. add opea agent endpoint `http://$ip_address:9090/v1` which is a openai compatible api
![opea-agent-setting](assets/img/opea-agent-setting.png)
3. test opea agent with ui
![opea-agent-test](assets/img/opea-agent-test.png)
## [Optional] Deploy using Helm Charts
@@ -249,3 +254,8 @@ python $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --agent_role "supervisor" -
## How to register other tools with the AI agent
The [tools](./tools) folder contains YAML and Python files for additional tools for the supervisor and worker agents. Refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/src/README.md) to add tools and customize the AI agents.
## Monitor and Tracing
Follow [OpenTelemetry OPEA Guide](https://opea-project.github.io/latest/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.html) to understand how to use OpenTelemetry tracing and metrics in OPEA.
For AgentQnA specific tracing and metrics monitoring, follow [OpenTelemetry on AgentQnA](https://opea-project.github.io/latest/tutorial/OpenTelemetry/deploy/AgentQnA.html) section.

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

View File

@@ -29,7 +29,7 @@ services:
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192
worker-rag-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: rag-agent-endpoint
volumes:
- "${TOOLSET_PATH}:/home/user/tools/"
@@ -60,7 +60,7 @@ services:
port: 9095
worker-sql-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: sql-agent-endpoint
volumes:
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
@@ -89,7 +89,7 @@ services:
port: 9096
supervisor-react-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: react-agent-endpoint
depends_on:
- worker-rag-agent

View File

@@ -33,7 +33,7 @@ services:
ipc: host
worker-rag-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: rag-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
@@ -64,7 +64,7 @@ services:
port: 9095
worker-sql-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: sql-agent-endpoint
volumes:
- "${WORKDIR}/tests/Chinook_Sqlite.sqlite:/home/user/chinook-db/Chinook_Sqlite.sqlite:rw"
@@ -93,7 +93,7 @@ services:
port: 9096
supervisor-react-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: react-agent-endpoint
depends_on:
- worker-rag-agent

View File

@@ -103,10 +103,8 @@ services:
agent-ui:
image: opea/agent-ui
container_name: agent-ui
volumes:
- ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env:/home/user/svelte/.env # test db
ports:
- "5173:5173"
- "5173:8080"
ipc: host
networks:

View File

@@ -3,7 +3,7 @@
services:
worker-rag-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: rag-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
@@ -34,7 +34,7 @@ services:
port: 9095
worker-sql-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: sql-agent-endpoint
volumes:
- ${WORKDIR}/GenAIExamples/AgentQnA/tests:/home/user/chinook-db # test db
@@ -63,7 +63,7 @@ services:
port: 9096
supervisor-react-agent:
image: opea/agent:latest
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
container_name: react-agent-endpoint
depends_on:
- worker-rag-agent
@@ -104,14 +104,12 @@ services:
- "8080:8000"
ipc: host
agent-ui:
image: opea/agent-ui
image: ${REGISTRY:-opea}/agent-ui:${TAG:-latest}
container_name: agent-ui
volumes:
- ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env:/home/user/svelte/.env
environment:
host_ip: ${host_ip}
ports:
- "5173:5173"
- "5173:8080"
ipc: host
vllm-service:
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
@@ -119,7 +117,7 @@ services:
ports:
- "8086:8000"
volumes:
- "./data:/data"
- "${MODEL_CACHE:-./data}:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
@@ -140,4 +138,4 @@ services:
cap_add:
- SYS_NICE
ipc: host
command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 16384
command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --block-size 128 --max-num-seqs 256 --max-seq-len-to-capture 16384

View File

@@ -42,7 +42,7 @@ if [ ! -f $WORKDIR/GenAIExamples/AgentQnA/tests/Chinook_Sqlite.sqlite ]; then
fi
# configure agent ui
echo "AGENT_URL = 'http://$ip_address:9090/v1/chat/completions'" | tee ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env
# echo "AGENT_URL = 'http://$ip_address:9090/v1/chat/completions'" | tee ${WORKDIR}/GenAIExamples/AgentQnA/ui/svelte/.env
# retriever
export host_ip=$(hostname -I | awk '{print $1}')

View File

@@ -17,12 +17,15 @@ services:
dockerfile: ./docker/Dockerfile
extends: agent
image: ${REGISTRY:-opea}/agent-ui:${TAG:-latest}
vllm-gaudi:
build:
context: vllm-fork
dockerfile: Dockerfile.hpu
extends: agent
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
vllm-rocm:
build:
args:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ${no_proxy}
context: GenAIComps
dockerfile: comps/third_parties/vllm/src/Dockerfile.amd_gpu
extends: agent
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}

View File

@@ -0,0 +1,22 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
tgi:
enabled: false
vllm:
enabled: true
LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
extraCmdArgs: ["--max-seq-len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"

View File

@@ -4,13 +4,32 @@
# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values
tgi:
enabled: false
vllm:
enabled: true
accelDevice: "gaudi"
image:
repository: opea/vllm-gaudi
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
VLLM_SKIP_WARMUP: true
shmSize: 16Gi
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq-len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
llm_engine: vllm
model: "meta-llama/Llama-3.3-70B-Instruct"

View File

@@ -1,7 +1,22 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
host_ip=$(hostname -I | awk '{print $1}')
port=6007
FILEDIR=${WORKDIR}/GenAIExamples/AgentQnA/example_data/
FILENAME=test_docs_music.jsonl
python3 index_data.py --filedir ${FILEDIR} --filename ${FILENAME} --host_ip $host_ip
# AgentQnA ingestion script requires following packages
packages=("requests" "tqdm")
# Check if packages are installed
for package in "${packages[@]}"; do
if pip freeze | grep -q "$package="; then
echo "$package is installed"
else
echo "$package is not installed"
pip install --no-cache-dir "$package"
fi
done
python3 index_data.py --filedir ${FILEDIR} --filename ${FILENAME} --host_ip $host_ip --port $port

View File

@@ -31,7 +31,7 @@ function stop_retrieval_tool() {
}
echo "=================== #1 Building docker images===================="
bash step1_build_images.sh
bash step1_build_images.sh xeon
echo "=================== #1 Building docker images completed===================="
echo "=================== #2 Start retrieval tool===================="

View File

@@ -15,42 +15,52 @@ function get_genai_comps() {
fi
}
function build_docker_images_for_retrieval_tool(){
cd $WORKDIR/GenAIExamples/DocIndexRetriever/docker_image_build/
get_genai_comps
echo "Build all the images with --no-cache..."
service_list="doc-index-retriever dataprep embedding retriever reranking"
docker compose -f build.yaml build ${service_list} --no-cache
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
docker compose -f build.yaml build --no-cache
docker images && sleep 1s
}
function build_agent_docker_image() {
function build_agent_docker_image_xeon() {
cd $WORKDIR/GenAIExamples/AgentQnA/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
docker compose -f build.yaml build --no-cache
service_list="agent agent-ui"
docker compose -f build.yaml build ${service_list} --no-cache
}
function build_vllm_docker_image() {
echo "Building the vllm docker image"
cd $WORKPATH
echo $WORKPATH
if [ ! -d "./vllm-fork" ]; then
git clone https://github.com/HabanaAI/vllm-fork.git
fi
cd ./vllm-fork
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
git checkout ${VLLM_VER} &> /dev/null
docker build --no-cache -f Dockerfile.hpu -t opea/vllm-gaudi:ci --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
if [ $? -ne 0 ]; then
echo "opea/vllm-gaudi:ci failed"
exit 1
else
echo "opea/vllm-gaudi:ci successful"
fi
function build_agent_docker_image_gaudi_vllm() {
cd $WORKDIR/GenAIExamples/AgentQnA/docker_image_build/
get_genai_comps
git clone https://github.com/HabanaAI/vllm-fork.git && cd vllm-fork
VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
git checkout ${VLLM_VER} &> /dev/null && cd ../
echo "Build agent image with --no-cache..."
service_list="agent agent-ui vllm-gaudi"
docker compose -f build.yaml build ${service_list} --no-cache
}
function build_agent_docker_image_rocm() {
cd $WORKDIR/GenAIExamples/AgentQnA/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
service_list="agent agent-ui"
docker compose -f build.yaml build ${service_list} --no-cache
}
function build_agent_docker_image_rocm_vllm() {
cd $WORKDIR/GenAIExamples/AgentQnA/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
service_list="agent agent-ui vllm-rocm"
docker compose -f build.yaml build ${service_list} --no-cache
}
@@ -59,15 +69,32 @@ function main() {
build_docker_images_for_retrieval_tool
echo "==================== Build docker images for retrieval tool completed ===================="
echo "==================== Build agent docker image ===================="
build_agent_docker_image
echo "==================== Build agent docker image completed ===================="
sleep 3s
echo "==================== Build vllm docker image ===================="
build_vllm_docker_image
echo "==================== Build vllm docker image completed ===================="
case $1 in
"rocm")
echo "==================== Build agent docker image for ROCm ===================="
build_agent_docker_image_rocm
;;
"rocm_vllm")
echo "==================== Build agent docker image for ROCm VLLM ===================="
build_agent_docker_image_rocm_vllm
;;
"gaudi_vllm")
echo "==================== Build agent docker image for Gaudi ===================="
build_agent_docker_image_gaudi_vllm
;;
"xeon")
echo "==================== Build agent docker image for Xeon ===================="
build_agent_docker_image_xeon
;;
*)
echo "Invalid argument"
exit 1
;;
esac
docker image ls | grep vllm
}
main
main $1

View File

@@ -1,64 +0,0 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
set -e
export WORKPATH=$(dirname "$PWD")
export WORKDIR=${WORKPATH}/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
function get_genai_comps() {
if [ ! -d "GenAIComps" ] ; then
git clone --depth 1 --branch ${opea_branch:-"main"} https://github.com/opea-project/GenAIComps.git
fi
}
function build_docker_images_for_retrieval_tool(){
cd $WORKPATH/../DocIndexRetriever/docker_image_build/
get_genai_comps
echo "Build all the images with --no-cache..."
service_list="doc-index-retriever dataprep embedding retriever reranking"
docker compose -f build.yaml build ${service_list} --no-cache
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker images && sleep 3s
}
function build_agent_docker_image() {
cd $WORKPATH/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
docker compose -f build.yaml build --no-cache
docker images && sleep 3s
}
#function build_vllm_docker_image() {
# echo "Building the vllm docker image"
# cd $WORKPATH/
# docker build --no-cache -t opea/llm-vllm-rocm:ci -f Dockerfile-vllm-rocm .
#
# docker images && sleep 3s
#}
function main() {
echo "==================== Build docker images for retrieval tool ===================="
build_docker_images_for_retrieval_tool
echo "==================== Build docker images for retrieval tool completed ===================="
echo "==================== Build agent docker image ===================="
build_agent_docker_image
echo "==================== Build agent docker image completed ===================="
# echo "==================== Build vllm docker image ===================="
# build_vllm_docker_image
# echo "==================== Build vllm docker image completed ===================="
docker image ls | grep vllm
}
main

View File

@@ -8,6 +8,8 @@ WORKPATH=$(dirname "$PWD")
export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export host_ip=$ip_address
echo "ip_address=${ip_address}"
export TOOLSET_PATH=$WORKPATH/tools/
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
@@ -24,12 +26,12 @@ ls $HF_CACHE_DIR
vllm_port=8086
vllm_volume=${HF_CACHE_DIR}
function start_tgi(){
echo "Starting tgi-gaudi server"
function start_agent_service() {
echo "Starting agent service"
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
source set_env.sh
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml tgi_gaudi.yaml -f compose.telemetry.yaml up -d
docker compose -f compose.yaml up -d
}
function start_all_services() {
@@ -69,7 +71,6 @@ function download_chinook_data(){
cp chinook-database/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite $WORKDIR/GenAIExamples/AgentQnA/tests/
}
function validate() {
local CONTENT="$1"
local EXPECTED_RESULT="$2"
@@ -138,24 +139,6 @@ function remove_chinook_data(){
echo "Chinook data removed!"
}
export host_ip=$ip_address
echo "ip_address=${ip_address}"
function validate() {
local CONTENT="$1"
local EXPECTED_RESULT="$2"
local SERVICE_NAME="$3"
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
echo "[ $SERVICE_NAME ] Content is as expected: $CONTENT"
echo 0
else
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
echo 1
fi
}
function ingest_data_and_validate() {
echo "Ingesting data"
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool/

View File

@@ -10,31 +10,41 @@ export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
export no_proxy="$no_proxy,rag-agent-endpoint,sql-agent-endpoint,react-agent-endpoint,agent-ui,vllm-gaudi-server,jaeger,grafana,prometheus,127.0.0.1,localhost,0.0.0.0,$ip_address"
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
function get_genai_comps() {
if [ ! -d "GenAIComps" ] ; then
git clone --depth 1 --branch ${opea_branch:-"main"} https://github.com/opea-project/GenAIComps.git
fi
}
function build_agent_docker_image() {
cd $WORKDIR/GenAIExamples/AgentQnA/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
docker compose -f build.yaml build --no-cache
}
function stop_crag() {
cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
echo "Stopping container kdd-cup-24-crag-service with cid $cid"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
}
function stop_agent_docker() {
function stop_agent_containers() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi/
docker compose -f $WORKDIR/GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon/compose.yaml -f compose.yaml down
container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
echo "Stopping container $container_name"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
done
}
function stop_telemetry_containers(){
cd $WORKPATH/docker_compose/intel/hpu/gaudi/
container_list=$(cat compose.telemetry.yaml | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
echo "Stopping container $container_name"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
done
container_list=$(cat compose.telemetry.yaml | grep container_name | cut -d':' -f2)
}
function stop_llm(){
@@ -68,25 +78,31 @@ function stop_retrieval_tool() {
done
}
echo "workpath: $WORKPATH"
echo "=================== Stop containers ===================="
echo "::group::=================== Stop containers ===================="
stop_llm
stop_crag
stop_agent_docker
stop_agent_containers
stop_retrieval_tool
stop_telemetry_containers
echo "::endgroup::"
cd $WORKPATH/tests
echo "=================== #1 Building docker images===================="
build_agent_docker_image
echo "=================== #1 Building docker images completed===================="
echo "::group::=================== Building docker images===================="
bash step1_build_images.sh gaudi_vllm > docker_image_build.log
echo "::endgroup::"
echo "=================== #4 Start agent, API server, retrieval, and ingest data===================="
bash $WORKPATH/tests/step4_launch_and_validate_agent_gaudi.sh
echo "=================== #4 Agent, retrieval test passed ===================="
echo "::group::=================== Start agent, API server, retrieval, and ingest data===================="
bash step4_launch_and_validate_agent_gaudi.sh
echo "::endgroup::"
echo "=================== #5 Stop agent and API server===================="
echo "::group::=================== Stop agent and API server===================="
stop_llm
stop_crag
stop_agent_docker
echo "=================== #5 Agent and API server stopped===================="
stop_agent_containers
stop_retrieval_tool
stop_telemetry_containers
echo y | docker system prune
echo "::endgroup::"
echo "ALL DONE!!"

View File

@@ -11,7 +11,13 @@ echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TOOLSET_PATH=$WORKPATH/tools/
export MODEL_CACHE="./data"
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
function stop_crag() {
cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
@@ -37,34 +43,35 @@ function stop_retrieval_tool() {
done
}
echo "workpath: $WORKPATH"
echo "=================== Stop containers ===================="
echo "::group::=================== Stop containers ===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
echo "::endgroup::=================== Stop containers completed ===================="
cd $WORKPATH/tests
echo "=================== #1 Building docker images===================="
bash step1_build_images.sh
echo "=================== #1 Building docker images completed===================="
echo "::group::=================== #1 Building docker images===================="
bash step1_build_images.sh rocm > docker_image_build.log
echo "::endgroup::=================== #1 Building docker images completed===================="
echo "=================== #2 Start retrieval tool===================="
echo "::group::=================== #2 Start retrieval tool===================="
bash step2_start_retrieval_tool.sh
echo "=================== #2 Retrieval tool started===================="
echo "::endgroup::=================== #2 Retrieval tool started===================="
echo "=================== #3 Ingest data and validate retrieval===================="
echo "::group::=================== #3 Ingest data and validate retrieval===================="
bash step3_ingest_data_and_validate_retrieval.sh
echo "=================== #3 Data ingestion and validation completed===================="
echo "::endgroup::=================== #3 Data ingestion and validation completed===================="
echo "=================== #4 Start agent and API server===================="
echo "::group::=================== #4 Start agent and API server===================="
bash step4a_launch_and_validate_agent_tgi_on_rocm.sh
echo "=================== #4 Agent test passed ===================="
echo "::endgroup::=================== #4 Agent test passed ===================="
echo "=================== #5 Stop agent and API server===================="
echo "::group::=================== #5 Stop agent and API server===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
echo "=================== #5 Agent and API server stopped===================="
echo "::endgroup::=================== #5 Agent and API server stopped===================="
echo y | docker system prune

View File

@@ -5,13 +5,18 @@
set -e
WORKPATH=$(dirname "$PWD")
export LOG_PATH=${WORKPATH}
export WORKDIR=${WORKPATH}/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export TOOLSET_PATH=$WORKPATH/tools/
export MODEL_CACHE="./data"
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
function stop_crag() {
cid=$(docker ps -aq --filter "name=kdd-cup-24-crag-service")
@@ -32,34 +37,35 @@ function stop_retrieval_tool() {
}
echo "workpath: $WORKPATH"
echo "=================== Stop containers ===================="
echo "::group::=================== Stop containers ===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
echo "::endgroup::"
cd $WORKPATH/tests
echo "=================== #1 Building docker images===================="
bash step1_build_images_rocm_vllm.sh
echo "=================== #1 Building docker images completed===================="
echo "::group::=================== #1 Building docker images===================="
bash step1_build_images.sh rocm_vllm > docker_image_build.log
echo "::endgroup::=================== #1 Building docker images completed===================="
echo "=================== #2 Start retrieval tool===================="
echo "::group::=================== #2 Start retrieval tool===================="
bash step2_start_retrieval_tool_rocm_vllm.sh
echo "=================== #2 Retrieval tool started===================="
echo "::endgroup::=================== #2 Retrieval tool started===================="
echo "=================== #3 Ingest data and validate retrieval===================="
echo "::group::=================== #3 Ingest data and validate retrieval===================="
bash step3_ingest_data_and_validate_retrieval_rocm_vllm.sh
echo "=================== #3 Data ingestion and validation completed===================="
echo "::endgroup::=================== #3 Data ingestion and validation completed===================="
echo "=================== #4 Start agent and API server===================="
echo "::group::=================== #4 Start agent and API server===================="
bash step4_launch_and_validate_agent_rocm_vllm.sh
echo "=================== #4 Agent test passed ===================="
echo "::endgroup::=================== #4 Agent test passed ===================="
echo "=================== #5 Stop agent and API server===================="
echo "::group::=================== #5 Stop agent and API server===================="
stop_crag
stop_agent_docker
stop_retrieval_tool
echo "=================== #5 Agent and API server stopped===================="
echo "::endgroup::=================== #5 Agent and API server stopped===================="
echo y | docker system prune

View File

@@ -12,7 +12,7 @@ def search_knowledge_base(query: str) -> str:
print(url)
proxies = {"http": ""}
payload = {
"text": query,
"messages": query,
}
response = requests.post(url, json=payload, proxies=proxies)
print(response)

View File

@@ -1,26 +1,203 @@
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# Use node 20.11.1 as the base image
FROM node:20.11.1
# syntax=docker/dockerfile:1
# Initialize device type args
# use build args in the docker build command with --build-arg="BUILDARG=true"
ARG USE_CUDA=false
ARG USE_OLLAMA=false
# Tested with cu117 for CUDA 11 and cu121 for CUDA 12 (default)
ARG USE_CUDA_VER=cu121
# any sentence transformer model; models to use can be found at https://huggingface.co/models?library=sentence-transformers
# Leaderboard: https://huggingface.co/spaces/mteb/leaderboard
# for better performance and multilangauge support use "intfloat/multilingual-e5-large" (~2.5GB) or "intfloat/multilingual-e5-base" (~1.5GB)
# IMPORTANT: If you change the embedding model (sentence-transformers/all-MiniLM-L6-v2) and vice versa, you aren't able to use RAG Chat with your previous documents loaded in the WebUI! You need to re-embed them.
ARG USE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
ARG USE_RERANKING_MODEL=""
# Update package manager and install Git
RUN apt-get update -y && apt-get install -y git
# Tiktoken encoding name; models to use can be found at https://huggingface.co/models?library=tiktoken
ARG USE_TIKTOKEN_ENCODING_NAME="cl100k_base"
# Copy the front-end code repository
COPY svelte /home/user/svelte
ARG BUILD_HASH=dev-build
# Override at your own risk - non-root configurations are untested
ARG UID=0
ARG GID=0
# Set the working directory
WORKDIR /home/user/svelte
######## WebUI frontend ########
FROM --platform=$BUILDPLATFORM node:22-alpine3.20 AS build
ARG BUILD_HASH
# Install front-end dependencies
RUN npm install
WORKDIR /app
# Build the front-end application
COPY open_webui_patches /app/patches
ARG WEBUI_VERSION=v0.5.20
RUN apk add --no-cache git
# Clone code and use patch
RUN git config --global user.name "opea" && \
git config --global user.email "" && \
git clone https://github.com/open-webui/open-webui.git
WORKDIR /app/open-webui
RUN git checkout ${WEBUI_VERSION} && git am /app/patches/*.patch
WORKDIR /app
RUN mv open-webui/* . && rm -fr open-webui && ls -lrth /app/backend/
RUN npm install onnxruntime-node --onnxruntime-node-install-cuda=skip
RUN apk update && \
apk add --no-cache wget && \
wget https://github.com/microsoft/onnxruntime/releases/download/v1.20.1/onnxruntime-linux-x64-gpu-1.20.1.tgz
ENV APP_BUILD_HASH=${BUILD_HASH}
RUN npm run build
# Expose the port of the front-end application
EXPOSE 5173
######## WebUI backend ########
FROM python:3.11-slim-bookworm AS base
# Run the front-end application in preview mode
CMD ["npm", "run", "preview", "--", "--port", "5173", "--host", "0.0.0.0"]
# Use args
ARG USE_CUDA
ARG USE_OLLAMA
ARG USE_CUDA_VER
ARG USE_EMBEDDING_MODEL
ARG USE_RERANKING_MODEL
ARG UID
ARG GID
## Basis ##
ENV ENV=prod \
PORT=8080 \
# pass build args to the build
USE_OLLAMA_DOCKER=${USE_OLLAMA} \
USE_CUDA_DOCKER=${USE_CUDA} \
USE_CUDA_DOCKER_VER=${USE_CUDA_VER} \
USE_EMBEDDING_MODEL_DOCKER=${USE_EMBEDDING_MODEL} \
USE_RERANKING_MODEL_DOCKER=${USE_RERANKING_MODEL}
## Basis URL Config ##
ENV OLLAMA_BASE_URL="/ollama" \
OPENAI_API_BASE_URL=""
## API Key and Security Config ##
ENV OPENAI_API_KEY="" \
WEBUI_SECRET_KEY="" \
SCARF_NO_ANALYTICS=true \
DO_NOT_TRACK=true \
ANONYMIZED_TELEMETRY=false
#### Other models #########################################################
## whisper TTS model settings ##
ENV WHISPER_MODEL="base" \
WHISPER_MODEL_DIR="/app/backend/data/cache/whisper/models"
## RAG Embedding model settings ##
ENV RAG_EMBEDDING_MODEL="$USE_EMBEDDING_MODEL_DOCKER" \
RAG_RERANKING_MODEL="$USE_RERANKING_MODEL_DOCKER" \
SENTENCE_TRANSFORMERS_HOME="/app/backend/data/cache/embedding/models"
## Tiktoken model settings ##
ENV TIKTOKEN_ENCODING_NAME="cl100k_base" \
TIKTOKEN_CACHE_DIR="/app/backend/data/cache/tiktoken"
## Hugging Face download cache ##
ENV HF_HOME="/app/backend/data/cache/embedding/models"
## Torch Extensions ##
# ENV TORCH_EXTENSIONS_DIR="/.cache/torch_extensions"
#### Other models ##########################################################
COPY --from=build /app/backend /app/backend
WORKDIR /app/backend
ENV HOME=/root
# Create user and group if not root
RUN if [ $UID -ne 0 ]; then \
if [ $GID -ne 0 ]; then \
addgroup --gid $GID app; \
fi; \
adduser --uid $UID --gid $GID --home $HOME --disabled-password --no-create-home app; \
fi
RUN mkdir -p $HOME/.cache/chroma
RUN printf 00000000-0000-0000-0000-000000000000 > $HOME/.cache/chroma/telemetry_user_id
# Make sure the user has access to the app and root directory
RUN chown -R $UID:$GID /app $HOME
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN if [ "$USE_OLLAMA" = "true" ]; then \
apt-get update && \
# Install pandoc and netcat
apt-get install -y --no-install-recommends git build-essential pandoc netcat-openbsd curl && \
apt-get install -y --no-install-recommends gcc python3-dev && \
# for RAG OCR
apt-get install -y --no-install-recommends ffmpeg libsm6 libxext6 && \
# install helper tools
apt-get install -y --no-install-recommends curl jq && \
# install ollama
curl -fsSL https://ollama.com/install.sh | sh && \
# cleanup
rm -rf /var/lib/apt/lists/*; \
else \
apt-get update && \
# Install pandoc, netcat and gcc
apt-get install -y --no-install-recommends git build-essential pandoc gcc netcat-openbsd curl jq && \
apt-get install -y --no-install-recommends gcc python3-dev && \
# for RAG OCR
apt-get install -y --no-install-recommends ffmpeg libsm6 libxext6 && \
# cleanup
rm -rf /var/lib/apt/lists/*; \
fi
# install python dependencies
# COPY --chown=$UID:$GID ./backend/requirements.txt ./requirements.txt
# RUN cp /app/backend/requirements.txt ./requirements.txt
RUN pip3 install --no-cache-dir uv && \
if [ "$USE_CUDA" = "true" ]; then \
# If you use CUDA the whisper and embedding model will be downloaded on first use
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/$USE_CUDA_DOCKER_VER --no-cache-dir && \
uv pip install --system -r requirements.txt --no-cache-dir && \
python -c "import os; from sentence_transformers import SentenceTransformer; SentenceTransformer(os.environ['RAG_EMBEDDING_MODEL'], device='cpu')" && \
python -c "import os; from faster_whisper import WhisperModel; WhisperModel(os.environ['WHISPER_MODEL'], device='cpu', compute_type='int8', download_root=os.environ['WHISPER_MODEL_DIR'])"; \
python -c "import os; import tiktoken; tiktoken.get_encoding(os.environ['TIKTOKEN_ENCODING_NAME'])"; \
else \
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu --no-cache-dir && \
uv pip install --system -r requirements.txt --no-cache-dir && \
python -c "import os; from sentence_transformers import SentenceTransformer; SentenceTransformer(os.environ['RAG_EMBEDDING_MODEL'], device='cpu')" && \
python -c "import os; from faster_whisper import WhisperModel; WhisperModel(os.environ['WHISPER_MODEL'], device='cpu', compute_type='int8', download_root=os.environ['WHISPER_MODEL_DIR'])"; \
python -c "import os; import tiktoken; tiktoken.get_encoding(os.environ['TIKTOKEN_ENCODING_NAME'])"; \
fi; \
chown -R $UID:$GID /app/backend/data/
# copy embedding weight from build
# RUN mkdir -p /root/.cache/chroma/onnx_models/all-MiniLM-L6-v2
# COPY --from=build /app/onnx /root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx
# copy built frontend files
COPY --chown=$UID:$GID --from=build /app/build /app/build
COPY --chown=$UID:$GID --from=build /app/CHANGELOG.md /app/CHANGELOG.md
COPY --chown=$UID:$GID --from=build /app/package.json /app/package.json
# copy backend files
# COPY --chown=$UID:$GID ./backend .
EXPOSE 8080
HEALTHCHECK CMD curl --silent --fail http://localhost:${PORT:-8080}/health | jq -ne 'input.status == true' || exit 1
USER $UID:$GID
ARG BUILD_HASH
ENV WEBUI_BUILD_VERSION=${BUILD_HASH}
ENV DOCKER=true
CMD [ "bash", "start.sh"]

View File

@@ -1,17 +1,26 @@
From 799dcc304b3aecf2e2969df47c8dcac16d2267b0 Mon Sep 17 00:00:00 2001
From d90ba418f866bc11848d7d6507aabc6b5e8cc3e2 Mon Sep 17 00:00:00 2001
From: lkk12014402 <kaokao.lv@intel.com>
Date: Fri, 4 Apr 2025 07:40:30 +0000
Subject: [PATCH] deal opea agent tool content.
Date: Mon, 7 Apr 2025 07:22:53 +0000
Subject: [PATCH] compatible opea agent tool content
---
backend/open_webui/utils/middleware.py | 54 ++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
backend/open_webui/utils/middleware.py | 56 ++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/backend/open_webui/utils/middleware.py b/backend/open_webui/utils/middleware.py
index 289d887df..afa0edf1e 100644
index 289d887df..fddbe8ee1 100644
--- a/backend/open_webui/utils/middleware.py
+++ b/backend/open_webui/utils/middleware.py
@@ -1486,6 +1486,60 @@ async def process_chat_response(
@@ -1465,6 +1465,8 @@ async def process_chat_response(
async def stream_body_handler(response):
nonlocal content
nonlocal content_blocks
+ nonlocal events
+ sources = []
response_tool_calls = []
@@ -1486,6 +1488,60 @@ async def process_chat_response(
try:
data = json.loads(data)

View File

@@ -0,0 +1,531 @@
From 8ad31e50644eab3c9e698d7828b1857919887841 Mon Sep 17 00:00:00 2001
From: lkk12014402 <kaokao.lv@intel.com>
Date: Tue, 8 Apr 2025 03:38:09 +0000
Subject: [PATCH 2/2] update agent icloud upload feature
---
src/lib/apis/knowledge/index.ts | 60 +++++++
.../admin/Settings/Connections.svelte | 50 +++++-
.../components/icons/UploadCloudIcon.svelte | 18 ++
src/lib/components/workspace/Knowledge.svelte | 57 +++++-
.../KnowledgeBase/AddIcloudContentMenu.svelte | 164 ++++++++++++++++++
.../KnowledgeBase/IcloudFiles.svelte | 37 ++++
src/lib/i18n/locales/zh-CN/translation.json | 15 +-
7 files changed, 396 insertions(+), 5 deletions(-)
create mode 100644 src/lib/components/icons/UploadCloudIcon.svelte
create mode 100644 src/lib/components/workspace/Knowledge/KnowledgeBase/AddIcloudContentMenu.svelte
create mode 100644 src/lib/components/workspace/Knowledge/KnowledgeBase/IcloudFiles.svelte
diff --git a/src/lib/apis/knowledge/index.ts b/src/lib/apis/knowledge/index.ts
index c5fad1323..32be528a7 100644
--- a/src/lib/apis/knowledge/index.ts
+++ b/src/lib/apis/knowledge/index.ts
@@ -345,3 +345,63 @@ export const deleteKnowledgeById = async (token: string, id: string) => {
return res;
};
+
+export const getIcloudFiles = async (ICLOUD_BASE_URLS: string) => {
+ let error = null;
+
+ const res = await fetch(`${ICLOUD_BASE_URLS}/dataprep/get`, {
+ method: 'POST',
+ headers: {
+ Accept: 'application/json',
+ 'Content-Type': 'application/json',
+ }
+ })
+ .then(async (res) => {
+ if (!res.ok) throw await res.json();
+ return res.json();
+ })
+ .then((json) => {
+ return json;
+ })
+ .catch((err) => {
+ error = err.detail;
+
+ console.log(err);
+ return null;
+ });
+
+ if (error) {
+ throw error;
+ }
+
+ return res;
+};
+
+export const updateIcloudFiles = async (ICLOUD_BASE_URLS: string, formData: any) => {
+ let error = null;
+
+ const res = await fetch(`${ICLOUD_BASE_URLS}/dataprep/ingest`, {
+ method: 'POST',
+ body: formData
+ })
+ .then(async (res) => {
+ if (!res.ok) throw await res.json();
+ return res.json();
+ })
+ .then((json) => {
+ return json;
+ })
+ .catch((err) => {
+ error = err.detail;
+
+ console.log(err);
+ return null;
+ });
+
+ if (error) {
+ throw error;
+ }
+
+ return res;
+};
+
diff --git a/src/lib/components/admin/Settings/Connections.svelte b/src/lib/components/admin/Settings/Connections.svelte
index 2fcfadaec..3237744d5 100644
--- a/src/lib/components/admin/Settings/Connections.svelte
+++ b/src/lib/components/admin/Settings/Connections.svelte
@@ -47,6 +47,9 @@
let showAddOpenAIConnectionModal = false;
let showAddOllamaConnectionModal = false;
+ let ENABLE_ICLOUD_API: null | boolean = (localStorage.getItem('ENABLE_ICLOUD_API') === "enable");
+ let ICLOUD_BASE_URL = localStorage.getItem('ICLOUD_BASE_URL') || '';
+
const updateOpenAIHandler = async () => {
if (ENABLE_OPENAI_API !== null) {
// Remove trailing slashes
@@ -193,10 +196,22 @@
}
});
+ const updateIcloudHandler = async () => {
+ if (ENABLE_ICLOUD_API) {
+ localStorage.setItem('ICLOUD_BASE_URL', ICLOUD_BASE_URL);
+ localStorage.setItem('ENABLE_ICLOUD_API', "enable");
+ } else {
+ localStorage.setItem('ICLOUD_BASE_URL', '');
+ localStorage.setItem('ENABLE_ICLOUD_API', "");
+ }
+ toast.success($i18n.t('Icloud API settings updated'));
+ };
+
const submitHandler = async () => {
updateOpenAIHandler();
updateOllamaHandler();
updateDirectConnectionsHandler();
+ updateIcloudHandler();
dispatch('save');
};
@@ -301,7 +316,7 @@
</div>
{#if ENABLE_OLLAMA_API}
- <hr class=" border-gray-100 dark:border-gray-850 my-2" />
+ <hr class=" border-gray-100 dark:border-gray-850" />
<div class="">
<div class="flex justify-between items-center">
@@ -358,6 +373,39 @@
{/if}
</div>
+ <hr class=" border-gray-50 dark:border-gray-850" />
+
+ <div class="pr-1.5 my-2">
+ <div class="flex justify-between items-center text-sm">
+ <div class="font-medium">{$i18n.t('Icloud File API')}</div>
+
+ <div class="mt-1">
+ <Switch
+ bind:state={ENABLE_ICLOUD_API}
+ on:change={async () => {
+ updateIcloudHandler();
+ }}
+ />
+ </div>
+ </div>
+
+ {#if ENABLE_ICLOUD_API}
+ <hr class=" border-gray-50 dark:border-gray-850 my-2" />
+
+ <div class="">
+ <div class="flex w-full gap-1.5">
+ <div class="flex-1 flex flex-col gap-1.5">
+ <input
+ class="w-full text-sm bg-transparent outline-none"
+ placeholder={$i18n.t('Enter Icloud URL(e.g.') + 'http://localhost:6007/v1)'}
+ bind:value={ICLOUD_BASE_URL}
+ />
+ </div>
+ </div>
+ </div>
+ {/if}
+ </div>
+
<hr class=" border-gray-100 dark:border-gray-850" />
<div class="pr-1.5 my-2">
diff --git a/src/lib/components/icons/UploadCloudIcon.svelte b/src/lib/components/icons/UploadCloudIcon.svelte
new file mode 100644
index 000000000..eed3bd582
--- /dev/null
+++ b/src/lib/components/icons/UploadCloudIcon.svelte
@@ -0,0 +1,18 @@
+<script lang="ts">
+ export let className = 'w-4 h-4';
+</script>
+
+<svg
+ t="1744007283647"
+ viewBox="0 0 1491 1024"
+ version="1.1"
+ xmlns="http://www.w3.org/2000/svg"
+ p-id="1630"
+ class = {className}
+ ><path
+ d="M546.047379 263.651842s-90.221363-91.423424-212.63125-16.762074c-109.521121 71.300031-90.154581 201.768179-90.154582 201.76818S0 498.498962 0 759.902727c5.431535 261.003078 264.186314 263.674325 264.186314 263.674326l388.443814 0.422947V744.565318H466.355181l279.434681-279.412421 279.390161 279.412421h-186.297208V1024l377.157796-0.422947s240.812904 0.222604 274.648698-248.092052c16.094262-271.576764-232.754643-325.113003-232.754643-325.113003S1286.205362 48.327085 936.761752 2.470681C637.181417-29.740104 546.047379 263.651842 546.047379 263.651842z"
+ fill="#507BFC"
+ p-id="1631"
+ ></path></svg
+>
+
diff --git a/src/lib/components/workspace/Knowledge.svelte b/src/lib/components/workspace/Knowledge.svelte
index 57d45312d..43a1f305e 100644
--- a/src/lib/components/workspace/Knowledge.svelte
+++ b/src/lib/components/workspace/Knowledge.svelte
@@ -13,7 +13,8 @@
import {
getKnowledgeBases,
deleteKnowledgeById,
- getKnowledgeBaseList
+ getKnowledgeBaseList,
+ getIcloudFiles
} from '$lib/apis/knowledge';
import { goto } from '$app/navigation';
@@ -26,6 +27,11 @@
import Spinner from '../common/Spinner.svelte';
import { capitalizeFirstLetter } from '$lib/utils';
import Tooltip from '../common/Tooltip.svelte';
+ import AddIcloudConnectionModal from '$lib/components/workspace/Knowledge/KnowledgeBase/AddIcloudContentMenu.svelte';
+ import IcloudFiles from '$lib/components/workspace/Knowledge/KnowledgeBase/IcloudFiles.svelte';
+
+ let showAddTextContentModal = false;
+ let IcloudFile = [];
let loaded = false;
@@ -65,9 +71,26 @@
};
onMount(async () => {
+ await updateIcloudFiles();
+
knowledgeBases = await getKnowledgeBaseList(localStorage.token);
loaded = true;
});
+
+ async function updateIcloudFiles() {
+ let ICLOUD_BASE_URL = localStorage.getItem('ICLOUD_BASE_URL') || '';
+ console.log('ICLOUD_BASE_URL', ICLOUD_BASE_URL);
+
+ if (ICLOUD_BASE_URL !== '') {
+ const res = await getIcloudFiles(ICLOUD_BASE_URL).catch((e) => {
+ toast.error(`${e}`);
+ });
+
+ if (res) {
+ IcloudFile = res;
+ }
+ }
+ }
</script>
<svelte:head>
@@ -187,11 +210,39 @@
{/each}
</div>
- <div class=" text-gray-500 text-xs mt-1 mb-2">
- ⓘ {$i18n.t("Use '#' in the prompt input to load and include your knowledge.")}
+ <div class="flex justify-between items-center">
+ <div class="flex md:self-center text-xl font-medium px-0.5 items-center">
+ {$i18n.t('Icloud Knowledge')}
+ <div class="flex self-center w-[1px] h-6 mx-2.5 bg-gray-50 dark:bg-gray-850" />
+ <span class="text-lg font-medium text-gray-500 dark:text-gray-300">{IcloudFile.length}</span>
+ </div>
+ <div>
+ <button
+ class=" px-2 py-2 rounded-xl hover:bg-gray-700/10 dark:hover:bg-gray-100/10 dark:text-gray-300 dark:hover:text-white transition font-medium text-sm flex items-center space-x-1"
+ aria-label={$i18n.t('Upload to Icloud')}
+ on:click={() => {
+ showAddTextContentModal = !showAddTextContentModal;
+ }}
+ >
+ <Plus className="size-3.5" />
+ </button>
+ </div>
+ </div>
+ <hr class="border-gray-100 dark:border-gray-850 my-2" />
+ <div class=" flex overflow-y-auto w-full h-[15rem] scrollbar-hidden text-xs">
+ <IcloudFiles files={IcloudFile} />
</div>
{:else}
<div class="w-full h-full flex justify-center items-center">
<Spinner />
</div>
{/if}
+
+<AddIcloudConnectionModal
+ bind:show={showAddTextContentModal}
+ on:updateIcloudFile={async (e) => {
+ if (e.detail.status) {
+ await updateIcloudFiles();
+ }
+ }}
+/>
diff --git a/src/lib/components/workspace/Knowledge/KnowledgeBase/AddIcloudContentMenu.svelte b/src/lib/components/workspace/Knowledge/KnowledgeBase/AddIcloudContentMenu.svelte
new file mode 100644
index 000000000..fb906a0d3
--- /dev/null
+++ b/src/lib/components/workspace/Knowledge/KnowledgeBase/AddIcloudContentMenu.svelte
@@ -0,0 +1,164 @@
+<script lang="ts">
+ import { toast } from 'svelte-sonner';
+ import { getContext, onMount, createEventDispatcher } from 'svelte';
+ import Modal from '$lib/components/common/Modal.svelte';
+ import UploadCloudIcon from '$lib/components/icons/UploadCloudIcon.svelte';
+ import Spinner from '$lib/components/common/Spinner.svelte';
+ import { updateIcloudFiles } from '$lib/apis/knowledge';
+
+ const i18n = getContext('i18n');
+ const dispatch = createEventDispatcher();
+
+ export let show = false;
+
+ let url = '';
+
+ let loading = false;
+
+ let selectedFile = null;
+
+ function handleFileSelect(event) {
+ selectedFile = event.target.files[0];
+ }
+
+ function parseAndValidateUrls(normalizedInput: string): string[] {
+ return normalizedInput
+ .split(',')
+ .map((candidate) => {
+ const processed = candidate.replace(/^["']+|["']+$/g, '').trim();
+
+ try {
+ new URL(processed);
+ return processed;
+ } catch {
+ return null;
+ }
+ })
+ .filter((url): url is string => url !== null);
+ }
+
+ async function submitHandler() {
+ loading = true;
+
+ if (!url && !selectedFile) {
+ loading = false;
+ show = false;
+
+ toast.error($i18n.t('URL or File are required'));
+ return;
+ }
+ if (url && selectedFile) {
+ loading = false;
+ show = false;
+
+ toast.error($i18n.t('Upload file or enter URL'));
+ url = '';
+ selectedFile = null;
+ return;
+ }
+
+ const formData = new FormData();
+ if (url) {
+ formData.append('link_list', JSON.stringify(parseAndValidateUrls(url)));
+ }
+ if (selectedFile) {
+ formData.append('files', selectedFile, selectedFile.name);
+ }
+ let ICLOUD_BASE_URL = localStorage.getItem('ICLOUD_BASE_URL') || '';
+ console.log('ICLOUD_BASE_URL', ICLOUD_BASE_URL);
+
+ if (ICLOUD_BASE_URL !== '') {
+ const res = await updateIcloudFiles(ICLOUD_BASE_URL, formData).catch((e) => {
+ toast.error(`${e}`);
+
+ return;
+ });
+
+ if (res) {
+ toast.success($i18n.t('Upload Succeed'));
+ dispatch('updateIcloudFile', { status: true });
+ }
+
+ url = '';
+ selectedFile = null;
+ loading = false;
+ show = false;
+ }
+ }
+</script>
+
+<Modal size="sm" bind:show>
+ <div class="flex flex-col justify-end">
+ <div class=" flex justify-between dark:text-gray-100 px-5 pt-4 pb-2">
+ <div class="flex-col text-lg font-medium self-center font-primary">
+ {$i18n.t('Upload Icloud file')}
+ <span class="text-sm text-gray-500">- {$i18n.t('choose URL or local file')}</span>
+ </div>
+
+ <button
+ class="self-center"
+ on:click={() => {
+ show = false;
+ }}
+ >
+ <svg
+ xmlns="http://www.w3.org/2000/svg"
+ viewBox="0 0 20 20"
+ fill="currentColor"
+ class="w-5 h-5"
+ >
+ <path
+ d="M6.28 5.22a.75.75 0 00-1.06 1.06L8.94 10l-3.72 3.72a.75.75 0 101.06 1.06L10 11.06l3.72 3.72a.75.75 0 101.06-1.06L11.06 10l3.72-3.72a.75.75 0 00-1.06-1.06L10 8.94 6.28 5.22z"
+ />
+ </svg>
+ </button>
+ </div>
+
+ <div class="flex flex-col md:flex-row w-full px-4 pb-4 md:space-x-4 dark:text-gray-200">
+ <div class=" flex flex-col w-full sm:flex-row sm:justify-center sm:space-x-6">
+ <div class="flex items-center w-full">
+ <div class="flex-1 min-w-0 mr-2">
+ <div class="flex flex-col w-full my-8 mx-2">
+ <input
+ class="w-full text-sm bg-transparent placeholder:text-gray-300 outline-none border-b-solid border-b-2 border-blue-500 rounded p-2"
+ type="text"
+ bind:value={url}
+ placeholder={$i18n.t('Upload from URL')}
+ />
+ </div>
+ </div>
+
+ <div class="flex-none w-[1px] h-[60%] mx-2.5 bg-gray-300"></div>
+
+ <div class="flex-1 min-w-0">
+ <input type="file" id="fileInput" hidden on:change={handleFileSelect} />
+
+ <label
+ for="fileInput"
+ class="cursor-pointer flex flex-col items-center hover:bg-gray-100 rounded-lg p-2 transition-colors"
+ >
+ <UploadCloudIcon className="w-12 h-12 text-gray-500" />
+ <div class="text-xs text-gray-500 pt-2">
+ {selectedFile ? selectedFile.name : '点击上传文件'}
+ </div>
+ </label>
+ </div>
+ </div>
+ </div>
+ </div>
+ {#if loading}
+ <Spinner className="my-4 size-4" />
+ {:else}
+ <button
+ class="bg-blue-500 hover:bg-blue-700 text-white font-bold py-3 px-4 rounded text-sm"
+ on:click={(e) => {
+ e.preventDefault();
+ submitHandler();
+ }}
+ >
+ {$i18n.t('Upload Confirm')}
+ </button>
+ {/if}
+ </div>
+</Modal>
+
diff --git a/src/lib/components/workspace/Knowledge/KnowledgeBase/IcloudFiles.svelte b/src/lib/components/workspace/Knowledge/KnowledgeBase/IcloudFiles.svelte
new file mode 100644
index 000000000..d6490dce2
--- /dev/null
+++ b/src/lib/components/workspace/Knowledge/KnowledgeBase/IcloudFiles.svelte
@@ -0,0 +1,37 @@
+<script lang="ts">
+ export let selectedFileId = null;
+ export let files = [];
+
+ export let small = false;
+</script>
+
+<div class="max-h-full flex flex-col w-full">
+ {#each files as file}
+ <div class="mt-1 px-2 flex hover:bg-gray-50 transition">
+ <div class="p-3 bg-black/20 dark:bg-white/10 text-white rounded-xl my-2">
+ <svg
+ xmlns="http://www.w3.org/2000/svg"
+ viewBox="0 0 24 24"
+ fill="currentColor"
+ class=" size-3"
+ >
+ <path
+ fill-rule="evenodd"
+ d="M5.625 1.5c-1.036 0-1.875.84-1.875 1.875v17.25c0 1.035.84 1.875 1.875 1.875h12.75c1.035 0 1.875-.84 1.875-1.875V12.75A3.75 3.75 0 0 0 16.5 9h-1.875a1.875 1.875 0 0 1-1.875-1.875V5.25A3.75 3.75 0 0 0 9 1.5H5.625ZM7.5 15a.75.75 0 0 1 .75-.75h7.5a.75.75 0 0 1 0 1.5h-7.5A.75.75 0 0 1 7.5 15Zm.75 2.25a.75.75 0 0 0 0 1.5H12a.75.75 0 0 0 0-1.5H8.25Z"
+ clip-rule="evenodd"
+ />
+ <path
+ d="M12.971 1.816A5.23 5.23 0 0 1 14.25 5.25v1.875c0 .207.168.375.375.375H16.5a5.23 5.23 0 0 1 3.434 1.279 9.768 9.768 0 0 0-6.963-6.963Z"
+ />
+ </svg>
+ </div>
+
+ <div class="flex flex-col justify-center -space-y-0.5 px-2.5 w-full">
+ <div class=" dark:text-gray-100 text-sm font-medium line-clamp-1 mb-1">
+ {file.name}
+ </div>
+ </div>
+ </div>
+ {/each}
+</div>
+
diff --git a/src/lib/i18n/locales/zh-CN/translation.json b/src/lib/i18n/locales/zh-CN/translation.json
index ebb53a1b5..d6b72e04d 100644
--- a/src/lib/i18n/locales/zh-CN/translation.json
+++ b/src/lib/i18n/locales/zh-CN/translation.json
@@ -1174,5 +1174,18 @@
"Your entire contribution will go directly to the plugin developer; Open WebUI does not take any percentage. However, the chosen funding platform might have its own fees.": "您的全部捐款将直接给到插件开发者Open WebUI 不会收取任何比例。但众筹平台可能会有服务费、抽成。",
"Youtube": "YouTube",
"Youtube Language": "Youtube 语言",
- "Youtube Proxy URL": "Youtube 代理 URL"
+ "Youtube Proxy URL": "Youtube 代理 URL",
+ "Upload Icloud file": "上传到云端",
+ "choose URL or local file": "选择URL或本地文件",
+ "Upload from URL": "从URL上传",
+ "Upload Confirm": "确认上传",
+ "URL or File are required": "未上传文件",
+ "Upload file or enter URL": "文件与URL不能同时提交",
+ "Icloud File": "云端文件",
+ "Icloud File API": "云端存储API",
+ "Enter Icloud URL(e.g.": "输入云端存储URL例如.",
+ "Upload to Icloud": "上传到云端",
+ "Icloud Knowledge": "云端数据库",
+ "Upload Succeed": "上传文件成功",
+ "Icloud API settings updated": "云端存储API设置已更新"
}
--
2.34.1

View File

@@ -0,0 +1,56 @@
From ebf3218eef81897b536521e2140bdd9176f3ace3 Mon Sep 17 00:00:00 2001
From: lkk12014402 <kaokao.lv@intel.com>
Date: Tue, 8 Apr 2025 07:13:20 +0000
Subject: [PATCH 3/3] update build script
---
hatch_build.py | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/hatch_build.py b/hatch_build.py
index 8ddaf0749..e15d6e99d 100644
--- a/hatch_build.py
+++ b/hatch_build.py
@@ -3,21 +3,34 @@ import os
import shutil
import subprocess
from sys import stderr
-
+
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
-
-
+
+
class CustomBuildHook(BuildHookInterface):
def initialize(self, version, build_data):
super().initialize(version, build_data)
- stderr.write(">>> Building Open Webui frontend\n")
+ stderr.write(">>> Building DCAI小智 frontend\n")
npm = shutil.which("npm")
if npm is None:
raise RuntimeError(
- "NodeJS `npm` is required for building Open Webui but it was not found"
+ "NodeJS `npm` is required for building DCAI小智 but it was not found"
)
+ stderr.write("### Installing onnxruntime-node\n")
+ subprocess.run([npm, "install", "onnxruntime-node", "--onnxruntime-node-install-cuda=skip"], check=True) # noqa: S603
+
+ stderr.write("### Installing huggingface/transformers.js\n")
+ subprocess.run([npm, "i", "@huggingface/transformers"], check=True) # noqa: S603
+
+ ort_version = "1.20.1"
+ ort_url = f"https://github.com/microsoft/onnxruntime/releases/download/v{ort_version}/onnxruntime-linux-x64-gpu-{ort_version}.tgz"
+
+ stderr.write(f"### Downloading onnxruntime binaries from {ort_url}\n")
+ subprocess.run(["curl", "-L", ort_url, "-o", f"onnxruntime-linux-x64-gpu-{ort_version}.tgz"], check=True) # noqa: S603
+
stderr.write("### npm install\n")
subprocess.run([npm, "install"], check=True) # noqa: S603
+
stderr.write("\n### npm run build\n")
os.environ["APP_BUILD_HASH"] = version
subprocess.run([npm, "run", "build"], check=True) # noqa: S603
--
2.34.1

View File

@@ -0,0 +1,31 @@
From 36d61dab9306cb8f12c4497a32781d84f8cfb2e7 Mon Sep 17 00:00:00 2001
From: lkk12014402 <kaokao.lv@intel.com>
Date: Tue, 8 Apr 2025 07:22:36 +0000
Subject: [PATCH 4/4] enhance tool formatting
---
backend/open_webui/utils/middleware.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/backend/open_webui/utils/middleware.py b/backend/open_webui/utils/middleware.py
index fddbe8ee1..9e44ed91a 100644
--- a/backend/open_webui/utils/middleware.py
+++ b/backend/open_webui/utils/middleware.py
@@ -1142,12 +1142,12 @@ async def process_chat_response(
result_display_content = f"{result_display_content}\n> {tool_name}: {result.get('content', '')}"
if not raw:
- content = f'{content}\n<details type="tool_calls" done="true" content="{html.escape(json.dumps(block_content))}" results="{html.escape(json.dumps(results))}">\n<summary>Tool Executed</summary>\n{result_display_content}\n</details>\n'
+ content = f'{content}\n<details type="tool_calls" done="true" content="{html.escape(json.dumps(block_content))}" results="{html.escape(json.dumps(results))}">\n<summary> Tool: {tool_call.get('function', {}).get('name', '')} Executed</summary>\n{result_display_content}\n</details>\n'
else:
tool_calls_display_content = ""
for tool_call in block_content:
- tool_calls_display_content = f"{tool_calls_display_content}\n> Executing {tool_call.get('function', {}).get('name', '')}"
+ tool_calls_display_content = f"{tool_calls_display_content}\n> Executing Tool: {tool_call.get('function', {}).get('name', '')}"
if not raw:
content = f'{content}\n<details type="tool_calls" done="false" content="{html.escape(json.dumps(block_content))}">\n<summary>Tool Executing...</summary>\n{tool_calls_display_content}\n</details>\n'
--
2.34.1

View File

@@ -0,0 +1,25 @@
From 4723fb2df86df3e1c300f12fc0649823ea1a753b Mon Sep 17 00:00:00 2001
From: lkk12014402 <kaokao.lv@intel.com>
Date: Tue, 8 Apr 2025 08:09:36 +0000
Subject: [PATCH 5/5] fix tool call typo.
---
backend/open_webui/utils/middleware.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/backend/open_webui/utils/middleware.py b/backend/open_webui/utils/middleware.py
index 9e44ed91a..82aed5346 100644
--- a/backend/open_webui/utils/middleware.py
+++ b/backend/open_webui/utils/middleware.py
@@ -1142,7 +1142,7 @@ async def process_chat_response(
result_display_content = f"{result_display_content}\n> {tool_name}: {result.get('content', '')}"
if not raw:
- content = f'{content}\n<details type="tool_calls" done="true" content="{html.escape(json.dumps(block_content))}" results="{html.escape(json.dumps(results))}">\n<summary> Tool: {tool_call.get('function', {}).get('name', '')} Executed</summary>\n{result_display_content}\n</details>\n'
+ content = f'{content}\n<details type="tool_calls" done="true" content="{html.escape(json.dumps(block_content))}" results="{html.escape(json.dumps(results))}">\n<summary> Tool: {tool_call.get("function", {}).get("name", "")} Executed</summary>\n{result_display_content}\n</details>\n'
else:
tool_calls_display_content = ""
--
2.34.1

View File

@@ -1,8 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
ARG IMAGE_REPO=opea
ARG BASE_TAG=latest
FROM opea/comps-base:$BASE_TAG
FROM $IMAGE_REPO/comps-base:$BASE_TAG
COPY ./audioqna.py $HOME/audioqna.py

View File

@@ -1,8 +1,9 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
ARG IMAGE_REPO=opea
ARG BASE_TAG=latest
FROM opea/comps-base:$BASE_TAG
FROM $IMAGE_REPO/comps-base:$BASE_TAG
COPY ./audioqna_multilang.py $HOME/audioqna_multilang.py

View File

@@ -2,6 +2,13 @@
AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS).
## Table of Contents
1. [Architecture](#architecture)
2. [Deployment Options](#deployment-options)
## Architecture
The AudioQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
```mermaid
@@ -59,37 +66,13 @@ flowchart LR
```
## Deploy AudioQnA Service
## Deployment Options
The AudioQnA service can be deployed on either Intel Gaudi2 or Intel Xeon Scalable Processor.
The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.
### Deploy AudioQnA on Gaudi
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instructions on deploying AudioQnA on Gaudi.
### Deploy AudioQnA on Xeon
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying AudioQnA on Xeon.
## Deploy using Helm Chart
Refer to the [AudioQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying AudioQnA on Kubernetes.
## Supported Models
### ASR
The default model is [openai/whisper-small](https://huggingface.co/openai/whisper-small). It also supports all models in the Whisper family, such as `openai/whisper-large-v3`, `openai/whisper-medium`, `openai/whisper-base`, `openai/whisper-tiny`, etc.
To replace the model, please edit the `compose.yaml` and add the `command` line to pass the name of the model you want to use:
```yaml
services:
whisper-service:
...
command: --model_name_or_path openai/whisper-tiny
```
### TTS
The default model is [microsoft/SpeechT5](https://huggingface.co/microsoft/speecht5_tts). We currently do not support replacing the model. More models under the commercial license will be added in the future.
| Category | Deployment Option | Description |
| ---------------------- | ----------------- | ---------------------------------------------------------------- |
| On-premise Deployments | Docker compose | [AudioQnA deployment on Xeon](./docker_compose/intel/cpu/xeon) |
| | | [AudioQnA deployment on Gaudi](./docker_compose/intel/hpu/gaudi) |
| | | [AudioQnA deployment on AMD ROCm](./docker_compose/amd/gpu/rocm) |
| | Kubernetes | [Helm Charts](./kubernetes/helm) |

View File

@@ -0,0 +1,42 @@
# AudioQnA Docker Image Build
## Table of Contents
1. [Build MegaService Docker Image](#build-megaservice-docker-image)
2. [Build UI Docker Image](#build-ui-docker-image)
3. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
4. [Troubleshooting](#troubleshooting)
## Build MegaService Docker Image
To construct the Megaservice of AudioQnA, the [GenAIExamples](https://github.com/opea-project/GenAIExamples.git) repository is utilized. Build Megaservice Docker image using command below:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AudioQnA
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
## Build UI Docker Image
Build frontend Docker image using below command:
```bash
cd GenAIExamples/AudioQnA/ui
docker build -t opea/audioqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
## Generate a HuggingFace Access Token
Some HuggingFace resources, such as some models, are only accessible if the developer has an access token. In the absence of a HuggingFace access token, the developer can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
## Troubleshooting
1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/AudioQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
```bash
curl http://${host_ip}:7055/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:7777, the port may be allocated by other users, you can modify the `compose.yaml`.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.

View File

@@ -1,120 +1,59 @@
# Build Mega Service of AudioQnA on AMD ROCm GPU
# Deploying AudioQnA on AMD ROCm GPU
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice
pipeline on server on AMD ROCm GPU platform.
This document outlines the single node deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on server with AMD ROCm processing accelerators. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`.
## Build Docker Images
Note: The default LLM is `Intel/neural-chat-7b-v3-3`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
### 1. Build Docker Image
## Table of Contents
- #### Create application install directory and go to it:
1. [AudioQnA Quick Start Deployment](#audioqna-quick-start-deployment)
2. [AudioQnA Docker Compose Files](#audioqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
```bash
mkdir ~/audioqna-install && cd audioqna-install
```
## AudioQnA Quick Start Deployment
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
This section describes how to quickly deploy and test the AudioQnA service manually on an AMD ROCm platform. The basic steps are:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
```
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
### Access the Code
```bash
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
```
We remind you that when using a specific version of the code, you need to use the README from this version:
- #### Go to build directory:
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_image_build
```
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
```bash
echo Y | rm -R GenAIComps
```
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
Clone the GenAIExample repository and access the AudioQnA AMD ROCm platform Docker Compose files and supporting scripts:
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AudioQnA
```
We remind you that when using a specific version of the code, you need to use the README from this version.
Then checkout a released version, such as v1.3:
- #### Setting the list of images for the build (from the build file.yaml)
```bash
git checkout v1.3
```
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
### Configure the Deployment Environment
#### vLLM-based application
#### Docker Compose GPU Configuration
```bash
service_list="vllm-rocm whisper speecht5 audioqna audioqna-ui"
```
Consult the section on [AudioQnA Service configuration](#audioqna-configuration) for information on how service specific configuration parameters affect deployments.
#### TGI-based application
```bash
service_list="whisper speecht5 audioqna audioqna-ui"
```
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
```bash
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
```
- #### Build Docker Images
```bash
docker compose -f build.yaml build ${service_list} --no-cache
```
After the build, we check the list of images with the command:
```bash
docker image ls
```
The list of images should include:
##### vLLM-based application:
- opea/vllm-rocm:latest
- opea/whisper:latest
- opea/speecht5:latest
- opea/audioqna:latest
##### TGI-based application:
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- opea/whisper:latest
- opea/speecht5:latest
- opea/audioqna:latest
---
## Deploy the AudioQnA Application
### Docker Compose Configuration for AMD GPUs
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
- compose_vllm.yaml - for vLLM-based application
- compose.yaml - for TGI-based
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose files (`compose.yaml`, `compose_vllm.yaml`) for the LLM serving container:
```yaml
# Example for vLLM service in compose_vllm.yaml
# Note: Modern docker compose might use deploy.resources syntax instead.
# Check your docker version and compose file.
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
# - /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
@@ -123,131 +62,161 @@ security_opt:
- seccomp:unconfined
```
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example:
#### Environment Variables (`set_env*.sh`)
```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```
These scripts (`set_env_vllm.sh` for vLLM, `set_env.sh` for TGI) configure crucial parameters passed to the containers.
**How to Identify GPU Device IDs:**
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
To set up environment variables for deploying AudioQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
### Set deploy environment variables
#### Setting variables in the operating system environment:
##### Set variable HUGGINGFACEHUB_API_TOKEN:
For TGI inference usage:
```bash
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
export host_ip="External_Public_IP" # ip address of the node
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip,whisper-service,speecht5-service,vllm-service,tgi-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env.sh
```
#### Set variables value in set_env\*\*\*\*.sh file:
Go to Docker Compose directory:
For vLLM inference usage
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm
export host_ip="External_Public_IP" # ip address of the node
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip,whisper-service,speecht5-service,vllm-service,tgi-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env_vllm.sh
```
The example uses the Nano text editor. You can use any convenient text editor:
### Deploy the Services Using Docker Compose
#### If you use vLLM
```bash
nano set_env_vllm.sh
```
#### If you use TGI
```bash
nano set_env.sh
```
If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
Set the values of the variables:
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
We set these values in the file set_env\*\*\*\*.sh
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
#### Set variables with script set_env\*\*\*\*.sh
#### If you use vLLM
```bash
. set_env_vllm.sh
```
#### If you use TGI
```bash
. set_env.sh
```
### Start the services:
#### If you use vLLM
```bash
docker compose -f compose_vllm.yaml up -d
```
#### If you use TGI
To deploy the AudioQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
for TGI inference deployment
```bash
cd docker_compose/amd/gpu/rocm
docker compose -f compose.yaml up -d
```
All containers should be running and should not restart:
for vLLM inference deployment
##### If you use vLLM:
```bash
cd docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml up -d
```
- audioqna-vllm-service
- whisper-service
- speecht5-service
- audioqna-backend-server
- audioqna-ui-server
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
##### If you use TGI:
Please refer to the table below to build different microservices from source:
- audioqna-tgi-service
- whisper-service
- speecht5-service
- audioqna-backend-server
- audioqna-ui-server
| Microservice | Deployment Guide |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| WHISPER | [Whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src#211-whisper-server-image) |
| SPEECHT5 | [SpeechT5 build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/tts/src#211-speecht5-server-image) |
| GPT-SOVITS | [GPT-SOVITS build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/gpt-sovits/src#build-the-image) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
---
### Check the Deployment Status
## Validate the Services
After running docker compose, check if all the containers launched via docker compose have started:
### 1. Validate the vLLM/TGI Service
#### For TGI inference deployment
```bash
docker ps -a
```
For the default deployment, the following 5 containers should have started:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d8007690868d opea/audioqna:latest "python audioqna.py" 21 seconds ago Up 19 seconds 0.0.0.0:3008->8888/tcp, [::]:3008->8888/tcp audioqna-rocm-backend-server
87ba9a1d56ae ghcr.io/huggingface/text-generation-inference:2.4.1-rocm "/tgi-entrypoint.sh …" 21 seconds ago Up 20 seconds 0.0.0.0:3006->80/tcp, [::]:3006->80/tcp tgi-service
59e869acd742 opea/speecht5:latest "python speecht5_ser…" 21 seconds ago Up 20 seconds 0.0.0.0:7055->7055/tcp, :::7055->7055/tcp speecht5-service
0143267a4327 opea/whisper:latest "python whisper_serv…" 21 seconds ago Up 20 seconds 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service
```
### For vLLM inference deployment
```bash
docker ps -a
```
For the default deployment, the following 5 containers should have started:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f3e6893a69fa opea/audioqna-ui:latest "docker-entrypoint.s…" 37 seconds ago Up 35 seconds 0.0.0.0:18039->5173/tcp, [::]:18039->5173/tcp audioqna-ui-server
f943e5cd21e9 opea/audioqna:latest "python audioqna.py" 37 seconds ago Up 35 seconds 0.0.0.0:18038->8888/tcp, [::]:18038->8888/tcp audioqna-backend-server
074e8c418f52 opea/speecht5:latest "python speecht5_ser…" 37 seconds ago Up 36 seconds 0.0.0.0:7055->7055/tcp, :::7055->7055/tcp speecht5-service
77abe498e427 opea/vllm-rocm:latest "python3 /workspace/…" 37 seconds ago Up 36 seconds 0.0.0.0:8081->8011/tcp, [::]:8081->8011/tcp audioqna-vllm-service
9074a95bb7a6 opea/whisper:latest "python whisper_serv…" 37 seconds ago Up 36 seconds 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service
```
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
### Validate the Pipeline
Once the AudioQnA services are running, test the pipeline using the following command:
```bash
# Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the base64 string to the megaservice endpoint.
# The megaservice will return a spoken response as a base64 string. To listen to the response, decode the base64 string and save it as a .wav file.
wget https://github.com/intel/intel-extension-for-transformers/raw/refs/heads/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav
base64_audio=$(base64 -w 0 sample_2.wav)
# if you are using speecht5 as the tts service, voice can be "default" or "male"
# if you are using gpt-sovits for the tts service, you can set the reference audio following https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/gpt-sovits/src/README.md
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-H "Content-Type: application/json" \
-d "{\"audio\": \"${base64_audio}\", \"max_tokens\": 64, \"voice\": \"default\"}" \
| sed 's/^"//;s/"$//' | base64 -d > output.wav
```
**Note** : Access the AudioQnA UI by web browser through this URL: `http://${host_ip}:5173`. Please confirm the `5173` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
### Cleanup the Deployment
To stop the containers associated with the deployment, execute the following command:
#### If you use vLLM
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use TGI
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
```
## AudioQnA Docker Compose Files
In the context of deploying an AudioQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| ---------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose_vllm.yaml](./compose_vllm.yaml) | Default compose file using vllm as serving framework and redis as vector database |
| [compose.yaml](./compose.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
### Validate the vLLM/TGI Service
#### If you use vLLM:
@@ -313,7 +282,7 @@ Checking the response from the service. The response should be similar to JSON:
If the service response has a meaningful response in the value of the "generated_text" key,
then we consider the TGI service to be successfully launched
### 2. Validate MegaServices
### Validate MegaServices
Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
@@ -327,7 +296,7 @@ curl http://${host_ip}:3008/v1/audioqna \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```
### 3. Validate MicroServices
### Validate MicroServices
```bash
# whisper service
@@ -343,18 +312,6 @@ curl http://${host_ip}:7055/v1/tts \
-H 'Content-Type: application/json'
```
### 4. Stop application
## Conclusion
#### If you use vLLM
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use TGI
```bash
cd ~/audioqna-install/GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
```
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -30,7 +30,7 @@ services:
ports:
- "3006:80"
volumes:
- "./data:/data"
- "${MODEL_CACHE:-./data}:/data"
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd

View File

@@ -1,123 +1,146 @@
# Build Mega Service of AudioQnA on Xeon
# Deploying AudioQnA on Intel® Xeon® Processors
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server.
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [Start the MegaService](#-start-the-megaservice) section in this page.
This document outlines the single node deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`.
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
## 🚀 Build Docker images
## Table of Contents
### 1. Source Code install GenAIComps
1. [AudioQnA Quick Start Deployment](#audioqna-quick-start-deployment)
2. [AudioQnA Docker Compose Files](#audioqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
## AudioQnA Quick Start Deployment
### 2. Build ASR Image
This section describes how to quickly deploy and test the AudioQnA service manually on an Intel® Xeon® processor. The basic steps are:
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
```
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
### 3. Build vLLM Image
### Access the Code
```bash
git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
git checkout ${VLLM_VER}
docker build --no-cache --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.cpu -t opea/vllm:latest --shm-size=128g .
```
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
# multilang tts (optional)
docker build -t opea/gpt-sovits:latest --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -f comps/tts/src/integrations/dependency/gpt-sovits/Dockerfile .
```
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
Clone the GenAIExample repository and access the AudioQnA Intel® Xeon® platform Docker Compose files and supporting scripts:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AudioQnA/
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
cd GenAIExamples/AudioQnA
```
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper:latest`
2. `opea/vllm:latest`
3. `opea/speecht5:latest`
4. `opea/audioqna:latest`
5. `opea/gpt-sovits:latest` (optional)
## 🚀 Set the environment variables
Before starting the services with `docker compose`, you have to recheck the following environment variables.
Then checkout a released version, such as v1.2:
```bash
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export GPT_SOVITS_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export GPT_SOVITS_SERVER_PORT=9880
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
git checkout v1.2
```
or use set_env.sh file to setup environment variables.
### Configure the Deployment Environment
Note:
- Please replace with host_ip with your external IP address, do not use localhost.
- If you are in a proxy environment, also set the proxy-related environment variables:
```
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",${host_ip},whisper-service,speecht5-service,gpt-sovits-service,tgi-service,vllm-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server
```
## 🚀 Start the MegaService
To set up environment variables for deploying AudioQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
```bash
cd GenAIExamples/AudioQnA/docker_compose/intel/cpu/xeon/
export host_ip="External_Public_IP" # ip address of the node
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip,whisper-service,speecht5-service,vllm-service,tgi-service,audioqna-xeon-backend-server,audioqna-xeon-ui-server # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env.sh
```
If use vLLM as the LLM serving backend:
Consult the section on [AudioQnA Service configuration](#audioqna-configuration) for information on how service specific configuration parameters affect deployments.
```
docker compose up -d
### Deploy the Services Using Docker Compose
# multilang tts (optional)
docker compose -f compose_multilang.yaml up -d
To deploy the AudioQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
```bash
cd docker_compose/intel/cpu/xeon
docker compose -f compose.yaml up -d
```
If use TGI as the LLM serving backend:
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
```
docker compose -f compose_tgi.yaml up -d
Please refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| WHISPER | [Whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src#211-whisper-server-image) |
| SPEECHT5 | [SpeechT5 build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/tts/src#211-speecht5-server-image) |
| GPT-SOVITS | [GPT-SOVITS build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/gpt-sovits/src#build-the-image) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```bash
docker ps -a
```
## 🚀 Test MicroServices
For the default deployment, the following 5 containers should have started:
```
1c67e44c39d2 opea/audioqna-ui:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp audioqna-xeon-ui-server
833a42677247 opea/audioqna:latest "python audioqna.py" About a minute ago Up About a minute 0.0.0.0:3008->8888/tcp, :::3008->8888/tcp audioqna-xeon-backend-server
5dc4eb9bf499 opea/speecht5:latest "python speecht5_ser…" About a minute ago Up About a minute 0.0.0.0:7055->7055/tcp, :::7055->7055/tcp speecht5-service
814e6efb1166 opea/vllm:latest "python3 -m vllm.ent…" About a minute ago Up About a minute (healthy) 0.0.0.0:3006->80/tcp, :::3006->80/tcp vllm-service
46f7a00f4612 opea/whisper:latest "python whisper_serv…" About a minute ago Up About a minute 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service
```
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
### Validate the Pipeline
Once the AudioQnA services are running, test the pipeline using the following command:
```bash
# Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the base64 string to the megaservice endpoint.
# The megaservice will return a spoken response as a base64 string. To listen to the response, decode the base64 string and save it as a .wav file.
wget https://github.com/intel/intel-extension-for-transformers/raw/refs/heads/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav
base64_audio=$(base64 -w 0 sample_2.wav)
# if you are using speecht5 as the tts service, voice can be "default" or "male"
# if you are using gpt-sovits for the tts service, you can set the reference audio following https://github.com/opea-project/GenAIComps/blob/main/comps/third_parties/gpt-sovits/src/README.md
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-H "Content-Type: application/json" \
-d "{\"audio\": \"${base64_audio}\", \"max_tokens\": 64, \"voice\": \"default\"}" \
| sed 's/^"//;s/"$//' | base64 -d > output.wav
```
**Note** : Access the AudioQnA UI by web browser through this URL: `http://${host_ip}:5173`. Please confirm the `5173` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
### Cleanup the Deployment
To stop the containers associated with the deployment, execute the following command:
```bash
docker compose -f compose.yaml down
```
## AudioQnA Docker Compose Files
In the context of deploying an AudioQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| -------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
| [compose_multilang.yaml](./compose_multilang.yaml) | The TTS component is GPT-SoVITS. All other configurations remain the same as the default |
## Validate MicroServices
1. Whisper Service
@@ -161,7 +184,7 @@ docker compose -f compose_tgi.yaml up -d
3. TTS Service
```
```bash
# speecht5 service
curl http://${host_ip}:${SPEECHT5_SERVER_PORT}/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
@@ -169,17 +192,6 @@ docker compose -f compose_tgi.yaml up -d
curl http://${host_ip}:${GPT_SOVITS_SERVER_PORT}/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
```
## 🚀 Test MegaService
## Conclusion
Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
to the response, decode the base64 string and save it as a .wav file.
```bash
# if you are using speecht5 as the tts service, voice can be "default" or "male"
# if you are using gpt-sovits for the tts service, you can set the reference audio following https://github.com/opea-project/GenAIComps/blob/main/comps/tts/src/integrations/dependency/gpt-sovits/README.md
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -24,6 +24,9 @@ services:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
llm_download: ${llm_download:-True}
# volumes:
# - ./pretrained_models/:/home/user/GPT-SoVITS/GPT_SoVITS/pretrained_models/
restart: unless-stopped
vllm-service:
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}

View File

@@ -1,145 +1,170 @@
# Build Mega Service of AudioQnA on Gaudi
# Deploying AudioQnA on Intel® Gaudi® Processors
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server.
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [Start the MegaService](#-start-the-megaservice) section in this page.
This document outlines the single node deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Gaudi server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`.
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
## 🚀 Build Docker images
## Table of Contents
### 1. Source Code install GenAIComps
1. [AudioQnA Quick Start Deployment](#audioqna-quick-start-deployment)
2. [AudioQnA Docker Compose Files](#audioqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
## AudioQnA Quick Start Deployment
### 2. Build ASR Image
This section describes how to quickly deploy and test the AudioQnA service manually on an Intel® Gaudi® processor. The basic steps are:
```bash
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
```
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
### 3. Build vLLM Image
### Access the Code
git clone https://github.com/HabanaAI/vllm-fork.git
cd vllm-fork/
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
git checkout ${VLLM_VER}
docker build --no-cache --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g .
### 4. Build TTS Image
```bash
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
```
### 5. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below:
Clone the GenAIExample repository and access the AudioQnA Intel® Gaudi® platform Docker Compose files and supporting scripts:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/AudioQnA/
docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
cd GenAIExamples/AudioQnA
```
Then run the command `docker images`, you will have following images ready:
1. `opea/whisper-gaudi:latest`
2. `opea/vllm-gaudi:latest`
3. `opea/speecht5-gaudi:latest`
4. `opea/audioqna:latest`
## 🚀 Set the environment variables
Before starting the services with `docker compose`, you have to recheck the following environment variables.
Then checkout a released version, such as v1.2:
```bash
export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=<your HF token>
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
# set vLLM parameters
export NUM_CARDS=1
export BLOCK_SIZE=128
export MAX_NUM_SEQS=256
export MAX_SEQ_LEN_TO_CAPTURE=2048
export MEGA_SERVICE_HOST_IP=${host_ip}
export WHISPER_SERVER_HOST_IP=${host_ip}
export SPEECHT5_SERVER_HOST_IP=${host_ip}
export LLM_SERVER_HOST_IP=${host_ip}
export WHISPER_SERVER_PORT=7066
export SPEECHT5_SERVER_PORT=7055
export LLM_SERVER_PORT=3006
export BACKEND_SERVICE_ENDPOINT=http://${host_ip}:3008/v1/audioqna
git checkout v1.2
```
or use set_env.sh file to setup environment variables.
### Configure the Deployment Environment
Note:
- Please replace with host_ip with your external IP address, do not use localhost.
- If you are in a proxy environment, also set the proxy-related environment variables:
```
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",${host_ip},whisper-service,speecht5-service,tgi-service,vllm-service,audioqna-gaudi-backend-server,audioqna-gaudi-ui-server
```
## 🚀 Start the MegaService
> **_NOTE:_** Users will need at least three Gaudi cards for AudioQnA.
To set up environment variables for deploying AudioQnA services, set up some parameters specific to the deployment environment and source the `set_env.sh` script in this directory:
```bash
cd GenAIExamples/AudioQnA/docker_compose/intel/hpu/gaudi/
export host_ip="External_Public_IP" # ip address of the node
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
export http_proxy="Your_HTTP_Proxy" # http proxy if any
export https_proxy="Your_HTTPs_Proxy" # https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip,whisper-service,speecht5-service,vllm-service,tgi-service,audioqna-gaudi-backend-server,audioqna-gaudi-ui-server # additional no proxies if needed
export NGINX_PORT=${your_nginx_port} # your usable port for nginx, 80 for example
source ./set_env.sh
```
If use vLLM as the LLM serving backend:
Consult the section on [AudioQnA Service configuration](#audioqna-configuration) for information on how service specific configuration parameters affect deployments.
```
docker compose up -d
### Deploy the Services Using Docker Compose
To deploy the AudioQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
```bash
cd docker_compose/intel/hpu/gaudi
docker compose -f compose.yaml up -d
```
If use TGI as the LLM serving backend:
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
```
docker compose -f compose_tgi.yaml up -d
Please refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
| ------------ | -------------------------------------------------------------------------------------------------------------------- |
| vLLM-gaudi | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker-1) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| WHISPER | [Whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/asr/src#211-whisper-server-image) |
| SPEECHT5 | [SpeechT5 build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/tts/src#211-speecht5-server-image) |
| MegaService | [MegaService build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```bash
docker ps -a
```
## 🚀 Test MicroServices
For the default deployment, the following 5 containers should have started:
```
23f27dab14a5 opea/whisper-gaudi:latest "python whisper_serv…" 18 minutes ago Up 18 minutes 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service
629da06b7fb2 opea/audioqna-ui:latest "docker-entrypoint.s…" 19 minutes ago Up 18 minutes 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp audioqna-gaudi-ui-server
8a74d9806b87 opea/audioqna:latest "python audioqna.py" 19 minutes ago Up 18 minutes 0.0.0.0:3008->8888/tcp, [::]:3008->8888/tcp audioqna-gaudi-backend-server
29324430f42e opea/vllm-gaudi:latest "python3 -m vllm.ent…" 19 minutes ago Up 19 minutes (healthy) 0.0.0.0:3006->80/tcp, [::]:3006->80/tcp vllm-gaudi-service
dbd585f0a95a opea/speecht5-gaudi:latest "python speecht5_ser…" 19 minutes ago Up 19 minutes 0.0.0.0:7055->7055/tcp, :::7055->7055/tcp speecht5-service
```
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
### Validate the Pipeline
Once the AudioQnA services are running, test the pipeline using the following command:
```bash
# Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the base64 string to the megaservice endpoint.
# The megaservice will return a spoken response as a base64 string. To listen to the response, decode the base64 string and save it as a .wav file.
wget https://github.com/intel/intel-extension-for-transformers/raw/refs/heads/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav
base64_audio=$(base64 -w 0 sample_2.wav)
# if you are using speecht5 as the tts service, voice can be "default" or "male"
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-H "Content-Type: application/json" \
-d "{\"audio\": \"${base64_audio}\", \"max_tokens\": 64, \"voice\": \"default\"}" \
| sed 's/^"//;s/"$//' | base64 -d > output.wav
```
**Note** : Access the AudioQnA UI by web browser through this URL: `http://${host_ip}:5173`. Please confirm the `5173` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate Microservices](#validate-microservices) section.
### Cleanup the Deployment
To stop the containers associated with the deployment, execute the following command:
```bash
docker compose -f compose.yaml down
```
## AudioQnA Docker Compose Files
In the context of deploying an AudioQnA pipeline on an Intel® Gaudi® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| -------------------------------------- | ----------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default |
## Validate MicroServices
1. Whisper Service
```bash
curl http://${host_ip}:${WHISPER_SERVER_PORT}/v1/asr \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \
-H 'Content-Type: application/json'
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
curl http://${host_ip}:${WHISPER_SERVER_PORT}/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@./sample.wav" \
-F model="openai/whisper-small"
```
2. LLM backend Service
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready and the container (`vllm-gaudi-service` or `tgi-gaudi-service`) status shown via `docker ps` will be `healthy`. Before that, the status will be `health: starting`.
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready and the container (`vllm-service` or `tgi-service`) status shown via `docker ps` will be `healthy`. Before that, the status will be `health: starting`.
Or try the command below to check whether the LLM serving is ready.
```bash
# vLLM service
docker logs vllm-gaudi-service 2>&1 | grep complete
docker logs vllm-service 2>&1 | grep complete
# If the service is ready, you will get the response like below.
INFO: Application startup complete.
```
```bash
# TGI service
docker logs tgi-gaudi-service | grep Connected
docker logs tgi-service | grep Connected
# If the service is ready, you will get the response like below.
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
@@ -156,24 +181,11 @@ docker compose -f compose_tgi.yaml up -d
3. TTS Service
```
```bash
# speecht5 service
curl http://${host_ip}:${SPEECHT5_SERVER_PORT}/v1/tts
-X POST \
-d '{"text": "Who are you?"}' \
-H 'Content-Type: application/json'
curl http://${host_ip}:${SPEECHT5_SERVER_PORT}/v1/audio/speech -XPOST -d '{"input": "Who are you?"}' -H 'Content-Type: application/json' --output speech.mp3
```
## 🚀 Test MegaService
## Conclusion
Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the
base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen
to the response, decode the base64 string and save it as a .wav file.
```bash
# voice can be "default" or "male"
curl http://${host_ip}:3008/v1/audioqna \
-X POST \
-d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64, "voice":"default"}' \
-H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav
```
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -62,7 +62,7 @@ services:
cap_add:
- SYS_NICE
ipc: host
command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq-len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
audioqna-gaudi-backend-server:
image: ${REGISTRY:-opea}/audioqna:${TAG:-latest}
container_name: audioqna-gaudi-backend-server

View File

@@ -5,6 +5,8 @@ services:
audioqna:
build:
args:
IMAGE_REPO: ${REGISTRY}
BASE_TAG: ${TAG}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ${no_proxy}
@@ -26,13 +28,13 @@ services:
whisper-gaudi:
build:
context: GenAIComps
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
dockerfile: comps/third_parties/whisper/src/Dockerfile.intel_hpu
extends: audioqna
image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
whisper:
build:
context: GenAIComps
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
dockerfile: comps/third_parties/whisper/src/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
@@ -50,13 +52,13 @@ services:
speecht5-gaudi:
build:
context: GenAIComps
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
dockerfile: comps/third_parties/speecht5/src/Dockerfile.intel_hpu
extends: audioqna
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
speecht5:
build:
context: GenAIComps
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
dockerfile: comps/third_parties/speecht5/src/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
tts:
@@ -68,13 +70,13 @@ services:
gpt-sovits:
build:
context: GenAIComps
dockerfile: comps/tts/src/integrations/dependency/gpt-sovits/Dockerfile
dockerfile: comps/third_parties/gpt-sovits/src/Dockerfile
extends: audioqna
image: ${REGISTRY:-opea}/gpt-sovits:${TAG:-latest}
vllm:
build:
context: vllm
dockerfile: Dockerfile.cpu
dockerfile: docker/Dockerfile.cpu
extends: audioqna
image: ${REGISTRY:-opea}/vllm:${TAG:-latest}
vllm-gaudi:
@@ -85,10 +87,7 @@ services:
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
vllm-rocm:
build:
args:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ${no_proxy}
context: GenAIComps
dockerfile: comps/third_parties/vllm/src/Dockerfile.amd_gpu
extends: audioqna
image: ${REGISTRY:-opea}/vllm-rocm:${TAG:-latest}

View File

@@ -0,0 +1,15 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
tgi:
enabled: false
vllm:
enabled: true
speecht5:
enabled: false
gpt-sovits:
enabled: true
image:
repository: opea/audioqna-multilang

View File

@@ -0,0 +1,12 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
tgi:
enabled: true
vllm:
enabled: false
speecht5:
enabled: true
gpt-sovits:
enabled: false

View File

@@ -2,4 +2,11 @@
# SPDX-License-Identifier: Apache-2.0
tgi:
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
enabled: false
vllm:
enabled: true
speecht5:
enabled: true
gpt-sovits:
enabled: false

View File

@@ -0,0 +1,49 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
tgi:
enabled: true
accelDevice: "gaudi"
image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.3.1"
resources:
limits:
habana.ai/gaudi: 1
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: ""
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
startupProbe:
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
vllm:
enabled: false
whisper:
image:
repository: opea/whisper-gaudi
resources:
limits:
habana.ai/gaudi: 1
speecht5:
enabled: true
image:
repository: opea/speecht5-gaudi
resources:
limits:
habana.ai/gaudi: 1
gpt-sovits:
enabled: false

View File

@@ -2,35 +2,27 @@
# SPDX-License-Identifier: Apache-2.0
tgi:
enabled: false
vllm:
enabled: true
accelDevice: "gaudi"
image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.3.1"
repository: opea/vllm-gaudi
startupProbe:
failureThreshold: 360
PT_HPU_ENABLE_LAZY_COLLECTIVES: "true"
OMPI_MCA_btl_vader_single_copy_mechanism: "none"
resources:
limits:
habana.ai/gaudi: 1
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: ""
HF_HUB_DISABLE_PROGRESS_BARS: 1
HF_HUB_ENABLE_HF_TRANSFER: 0
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
startupProbe:
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 120
extraCmdArgs: [
"--tensor-parallel-size", "1",
"--block-size", "128",
"--max-num-seqs", "256",
"--max-seq-len-to-capture", "2048"
]
whisper:
image:
@@ -40,8 +32,11 @@ whisper:
habana.ai/gaudi: 1
speecht5:
enabled: true
image:
repository: opea/speecht5-gaudi
resources:
limits:
habana.ai/gaudi: 1
gpt-sovits:
enabled: false

View File

@@ -17,23 +17,17 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
VLLM_VER="v0.8.3"
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
@@ -103,14 +97,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -17,23 +17,17 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
git clone https://github.com/HabanaAI/vllm-fork.git
cd vllm-fork/
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
@@ -105,34 +99,8 @@ function validate_megaservice() {
echo "Result wrong."
exit 1
fi
}
#function validate_frontend() {
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi
docker compose -f compose.yaml stop && docker compose rm -f
@@ -140,15 +108,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -9,6 +9,7 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"/var/lib/GenAI/data"}
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
@@ -17,25 +18,18 @@ export PATH="~/miniconda3/bin:$PATH"
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna audioqna-ui whisper speecht5"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
echo "docker pull ghcr.io/huggingface/text-generation-inference:2.4.1-rocm"
docker pull ghcr.io/huggingface/text-generation-inference:2.4.1-rocm
docker images && sleep 1s
}
@@ -55,8 +49,6 @@ function start_services() {
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
@@ -87,32 +79,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
# Frontend tests are currently disabled
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/amd/gpu/rocm/
docker compose stop && docker compose rm -f
@@ -120,16 +86,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# Frontend tests are currently disabled
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -17,23 +17,17 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
VLLM_VER="v0.8.3"
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
@@ -95,31 +89,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/cpu/xeon/
docker compose -f compose.yaml stop && docker compose rm -f
@@ -127,15 +96,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -17,25 +17,18 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna audioqna-ui whisper-gaudi speecht5-gaudi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
docker images && sleep 1s
}
@@ -55,7 +48,6 @@ function start_services() {
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
export host_ip=${ip_address}
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose -f compose_tgi.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
@@ -99,31 +91,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi
docker compose -f compose_tgi.yaml stop && docker compose rm -f
@@ -131,15 +98,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -17,25 +17,18 @@ ip_address=$(hostname -I | awk '{print $1}')
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna audioqna-ui whisper speecht5"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}
@@ -56,8 +49,6 @@ function start_services() {
export BACKEND_SERVICE_ENDPOINT=http://${ip_address}:3008/v1/audioqna
export host_ip=${ip_address}
# sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
# Start Docker Containers
docker compose -f compose_tgi.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
@@ -90,31 +81,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs=22.6.0 -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/cpu/xeon/
docker compose -f compose_tgi.yaml stop && docker compose rm -f
@@ -122,15 +88,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -17,19 +17,13 @@ export PATH="~/miniconda3/bin:$PATH"
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
if [[ "${opea_branch}" != "main" ]]; then
cd $WORKPATH
OLD_STRING="RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git"
NEW_STRING="RUN git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git"
find . -type f -name "Dockerfile*" | while read -r file; do
echo "Processing file: $file"
sed -i "s|$OLD_STRING|$NEW_STRING|g" "$file"
done
fi
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
pushd GenAIComps
echo "GenAIComps test commit is $(git rev-parse HEAD)"
docker build --no-cache -t ${REGISTRY}/comps-base:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
popd && sleep 1s
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="audioqna audioqna-ui whisper speecht5 vllm-rocm"
@@ -92,32 +86,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
## Frontend tests are currently disabled
# cd $WORKPATH/ui/svelte
# local conda_env_name="OPEA_e2e"
# export PATH=${HOME}/miniforge3/bin/:$PATH
## conda remove -n ${conda_env_name} --all -y
## conda create -n ${conda_env_name} python=3.12 -y
# source activate ${conda_env_name}
#
# sed -i "s/localhost/$ip_address/g" playwright.config.ts
#
## conda install -c conda-forge nodejs -y
# npm install && npm ci && npx playwright install --with-deps
# node -v && npm -v && pip list
#
# exit_status=0
# npx playwright test || exit_status=$?
#
# if [ $exit_status -ne 0 ]; then
# echo "[TEST INFO]: ---------frontend test failed---------"
# exit $exit_status
# else
# echo "[TEST INFO]: ---------frontend test passed---------"
# fi
#}
function stop_docker() {
cd $WORKPATH/docker_compose/amd/gpu/rocm/
docker compose -f compose_vllm.yaml stop && docker compose -f compose_vllm.yaml rm -f
@@ -125,16 +93,26 @@ function stop_docker() {
function main() {
echo "::group::stop_docker"
stop_docker
echo "::endgroup::"
echo "::group::build_docker_images"
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
echo "::endgroup::"
echo "::group::start_services"
start_services
echo "::endgroup::"
echo "::group::validate_megaservice"
validate_megaservice
# Frontend tests are currently disabled
# validate_frontend
echo "::endgroup::"
echo "::group::stop_docker"
stop_docker
echo y | docker system prune
docker system prune -f
echo "::endgroup::"
}

View File

@@ -2,6 +2,8 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# ===== Deprecated =====
set -xe
USER_ID=$(whoami)
LOG_PATH=/home/$(whoami)/logs

View File

@@ -2,6 +2,8 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# ===== Deprecated =====
set -xe
USER_ID=$(whoami)
LOG_PATH=/home/$(whoami)/logs

View File

@@ -1,48 +1,8 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# Stage 1: base setup used by other stages
FROM python:3.11-slim AS base
# get security updates
RUN apt-get update && apt-get upgrade -y && \
apt-get clean && rm -rf /var/lib/apt/lists/*
ENV HOME=/home/user
RUN useradd -m -s /bin/bash user && \
mkdir -p $HOME && \
chown -R user $HOME
WORKDIR $HOME
# Stage 2: latest GenAIComps sources
FROM base AS git
RUN apt-get update && apt-get install -y --no-install-recommends git
RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git
# Stage 3: common layer shared by services using GenAIComps
FROM base AS comps-base
# copy just relevant parts
COPY --from=git $HOME/GenAIComps/comps $HOME/GenAIComps/comps
COPY --from=git $HOME/GenAIComps/*.* $HOME/GenAIComps/LICENSE $HOME/GenAIComps/
WORKDIR $HOME/GenAIComps
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r $HOME/GenAIComps/requirements.txt
WORKDIR $HOME
ENV PYTHONPATH=$PYTHONPATH:$HOME/GenAIComps
USER user
# Stage 4: unique part
FROM comps-base
ARG BASE_TAG=latest
FROM opea/comps-base:$BASE_TAG
COPY ./avatarchatbot.py $HOME/avatarchatbot.py

View File

@@ -14,7 +14,7 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/whisper/src/Dockerfile .
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/Dockerfile .
@@ -29,7 +29,7 @@ docker build --no-cache -t opea/llm-textgen:latest --build-arg https_proxy=$http
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/speecht5/src/Dockerfile .
docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/Dockerfile .
```

View File

@@ -42,12 +42,12 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
image: ghcr.io/huggingface/text-generation-inference:2.4.1-rocm
container_name: tgi-service
ports:
- "${TGI_SERVICE_PORT:-3006}:80"
volumes:
- "./data:/data"
- "${MODEL_CACHE:-./data}:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
@@ -66,24 +66,6 @@ services:
- seccomp:unconfined
ipc: host
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192
llm:
image: ${REGISTRY:-opea}/llm-textgen:${TAG:-latest}
container_name: llm-tgi-server
depends_on:
- tgi-service
ports:
- "3007:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
OPENAI_API_KEY: ${OPENAI_API_KEY}
restart: unless-stopped
wav2lip-service:
image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
container_name: wav2lip-service
@@ -125,7 +107,7 @@ services:
container_name: avatarchatbot-backend-server
depends_on:
- asr
- llm
- tgi-service
- tts
- animation
ports:

View File

@@ -30,7 +30,7 @@ export ANIMATION_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export LLM_SERVICE_PORT=3006
export ANIMATION_SERVICE_PORT=3008
export DEVICE="cpu"

View File

@@ -14,7 +14,7 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/whisper/src/Dockerfile .
```
### 3. Build LLM Image
@@ -24,7 +24,7 @@ Intel Xeon optimized image hosted in huggingface repo will be used for TGI servi
### 4. Build TTS Image
```bash
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile .
docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/speecht5/src/Dockerfile .
```
### 5. Build Animation Image

View File

@@ -31,7 +31,7 @@ services:
ports:
- "3006:80"
volumes:
- "./data:/data"
- "${MODEL_CACHE:-./data}:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}

View File

@@ -14,7 +14,7 @@ cd GenAIComps
### 2. Build ASR Image
```bash
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu .
docker build -t opea/whisper-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/whisper/src/Dockerfile.intel_hpu .
```
### 3. Build LLM Image
@@ -24,7 +24,7 @@ Intel Gaudi optimized image hosted in huggingface repo will be used for TGI serv
### 4. Build TTS Image
```bash
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu .
docker build -t opea/speecht5-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/speecht5/src/Dockerfile.intel_hpu .
```
### 5. Build Animation Image

View File

@@ -43,7 +43,7 @@ services:
ports:
- "3006:80"
volumes:
- "./data:/data"
- "${MODEL_CACHE:-./data}:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}

View File

@@ -14,13 +14,13 @@ services:
whisper-gaudi:
build:
context: GenAIComps
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu
dockerfile: comps/third_parties/whisper/src/Dockerfile.intel_hpu
extends: avatarchatbot
image: ${REGISTRY:-opea}/whisper-gaudi:${TAG:-latest}
whisper:
build:
context: GenAIComps
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
dockerfile: comps/third_parties/whisper/src/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
@@ -38,13 +38,13 @@ services:
speecht5-gaudi:
build:
context: GenAIComps
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile.intel_hpu
dockerfile: comps/third_parties/speecht5/src/Dockerfile.intel_hpu
extends: avatarchatbot
image: ${REGISTRY:-opea}/speecht5-gaudi:${TAG:-latest}
speecht5:
build:
context: GenAIComps
dockerfile: comps/tts/src/integrations/dependency/speecht5/Dockerfile
dockerfile: comps/third_parties/speecht5/src/Dockerfile
extends: avatarchatbot
image: ${REGISTRY:-opea}/speecht5:${TAG:-latest}
tts:

View File

@@ -9,6 +9,7 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
@@ -86,15 +87,16 @@ function start_services() {
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log
n=0
until [[ "$n" -ge 200 ]]; do
docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
docker logs tgi-gaudi-server > $LOG_PATH/tgi_service_start.log && docker logs whisper-service 2>&1 | tee $LOG_PATH/whisper_service_start.log && docker logs speecht5-service 2>&1 | tee $LOG_PATH/speecht5_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log && grep -q running $LOG_PATH/whisper_service_start.log && grep -q running $LOG_PATH/speecht5_service_start.log; then
break
fi
sleep 5s
sleep 10s
n=$((n+1))
done
echo "All services are up and running"
sleep 5s
# sleep 5s
sleep 1m
}

View File

@@ -9,6 +9,7 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"/var/lib/GenAI/data"}
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
@@ -26,7 +27,7 @@ function build_docker_images() {
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="avatarchatbot whisper asr llm-textgen speecht5 tts wav2lip animation"
service_list="avatarchatbot whisper asr speecht5 tts wav2lip animation"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
@@ -64,7 +65,7 @@ function start_services() {
export MEGA_SERVICE_PORT=8888
export ASR_SERVICE_PORT=3001
export TTS_SERVICE_PORT=3002
export LLM_SERVICE_PORT=3007
export LLM_SERVICE_PORT=3006
export ANIMATION_SERVICE_PORT=3008
export DEVICE="cpu"

View File

@@ -9,6 +9,7 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
@@ -85,15 +86,16 @@ function start_services() {
# Start Docker Containers
docker compose up -d
n=0
until [[ "$n" -ge 100 ]]; do
docker logs tgi-service > $LOG_PATH/tgi_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log; then
until [[ "$n" -ge 200 ]]; do
docker logs tgi-service > $LOG_PATH/tgi_service_start.log && docker logs whisper-service 2>&1 | tee $LOG_PATH/whisper_service_start.log && docker logs speecht5-service 2>&1 | tee $LOG_PATH/speecht5_service_start.log
if grep -q Connected $LOG_PATH/tgi_service_start.log && grep -q running $LOG_PATH/whisper_service_start.log && grep -q running $LOG_PATH/speecht5_service_start.log; then
break
fi
sleep 5s
sleep 10s
n=$((n+1))
done
echo "All services are up and running"
sleep 1m
}
@@ -104,6 +106,7 @@ function validate_megaservice() {
if [[ $result == *"mp4"* ]]; then
echo "Result correct."
else
echo "Result wrong, print docker logs."
docker logs whisper-service > $LOG_PATH/whisper-service.log
docker logs speecht5-service > $LOG_PATH/speecht5-service.log
docker logs tgi-service > $LOG_PATH/tgi-service.log
@@ -117,11 +120,6 @@ function validate_megaservice() {
}
#function validate_frontend() {
#}
function stop_docker() {
cd $WORKPATH/docker_compose/intel/cpu/xeon
docker compose down
@@ -129,7 +127,6 @@ function stop_docker() {
function main() {
stop_docker
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
start_services

View File

@@ -1,148 +1,22 @@
# ChatQnA Application
Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbots development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.
Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbot development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that the response generated remains factual and current. Vector databases are at the core of this architecture, enabling efficient retrieval of semantically relevant information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
# Table of contents
## Table of contents
1. [Automated Terraform Deployment](#automated-deployment-to-ubuntu-based-systemif-not-using-terraform-using-intel-optimized-cloud-modules-for-ansible)
2. [Automated Deployment to Ubuntu based system](#automated-deployment-to-ubuntu-based-systemif-not-using-terraform-using-intel-optimized-cloud-modules-for-ansible)
3. [Manually Deployment](#manually-deploy-chatqna-service)
4. [Architecture and Deploy Details](#architecture-and-deploy-details)
5. [Consume Service](#consume-chatqna-service-with-rag)
6. [Monitoring and Tracing](#monitoring-opea-service-with-prometheus-and-grafana-dashboard)
1. [Architecture](#architecture)
2. [Deployment Options](#deployment-options)
3. [Monitoring and Tracing](#monitor-and-tracing)
## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
## Architecture
| Cloud Provider | Intel Architecture | Intel Optimized Cloud Module for Terraform | Comments |
| -------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| AWS | 4th Gen Intel Xeon with Intel AMX | [AWS Module](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Uses meta-llama/Meta-Llama-3-8B-Instruct by default |
| AWS Falcon2-11B | 4th Gen Intel Xeon with Intel AMX | [AWS Module with Falcon11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) | Uses TII Falcon2-11B LLM Model |
| GCP | 5th Gen Intel Xeon with Intel AMX | [GCP Module](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) | Also supports Confidential AI by using Intel® TDX with 4th Gen Xeon |
| Azure | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
| Intel Tiber AI Cloud | 5th Gen Intel Xeon with Intel AMX | Work-in-progress | Work-in-progress |
The ChatQnA application is a customizable end-to-end workflow that leverages the capabilities of LLMs and RAG efficiently. ChatQnA architecture is shown below:
## Automated Deployment to Ubuntu based system (if not using Terraform) using Intel® Optimized Cloud Modules for **Ansible**
To deploy to existing Xeon Ubuntu based system, use our Intel Optimized Cloud Modules for Ansible. This is the same Ansible playbook used by Terraform.
Use this if you are not using Terraform and have provisioned your system with another tool or manually including bare metal.
| Operating System | Intel Optimized Cloud Module for Ansible |
| ---------------- | ----------------------------------------------------------------------------------------------------------------- |
| Ubuntu 20.04 | [ChatQnA Ansible Module](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon) |
| Ubuntu 22.04 | Work-in-progress |
## Manually Deploy ChatQnA Service
The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable ProcessorsNvidia GPU and AMD GPU.
Two types of ChatQnA pipeline are supported now: `ChatQnA with/without Rerank`. And the `ChatQnA without Rerank` pipeline (including Embedding, Retrieval, and LLM) is offered for Xeon customers who can not run rerank service on HPU yet require high performance and accuracy.
Quick Start Deployment Steps:
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
Note:
1. If you do not have docker installed you can run this script to install docker : `bash docker_compose/install_docker.sh`.
2. The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) `or` you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
### Quick Start: 1.Setup Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
3. Set up other environment variables:
> Notice that you can only choose **one** hardware option below to set up envs according to your hardware. Make sure port numbers are set correctly as well.
```bash
# on Gaudi
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
source ./set_env.sh
export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,guardrails,jaeger,prometheus,grafana,gaudi-node-exporter-1
# on Xeon
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
source ./set_env.sh
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,jaeger,prometheus,grafana,xeon-node-exporter-1
# on Nvidia GPU
cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu
source ./set_env.sh
export no_proxy="Your_No_Proxy",chatqna-ui-server,chatqna-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service
```
### Quick Start: 2.Run Docker Compose
Select the compose.yaml file that matches your hardware.
CPU example:
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
# cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
# cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/
docker compose up -d
```
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
CPU example with Open Telemetry feature:
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
It will automatically download the docker image on `docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
In following cases, you could build docker image from source by yourself.
- Failed to download the docker image.
- If you want to use a specific version of Docker image.
Please refer to the 'Build Docker Images' in [Guide](docker_compose/intel/cpu/xeon/README.md).
### QuickStart: 3.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
## Architecture and Deploy details
ChatQnA architecture shows below:
![architecture](./assets/img/chatqna_architecture.png)
The ChatQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
This application is modular as it leverages each component as a microservice(as defined in [GenAIComps](https://github.com/opea-project/GenAIComps)) that can scale independently. It comprises data preparation, embedding, retrieval, reranker(optional) and LLM microservices. All these microservices are stitched together by the ChatQnA megaservice that orchestrates the data through these microservices. The flow chart below shows the information flow between different microservices for this example.
```mermaid
---
@@ -218,192 +92,31 @@ flowchart LR
```
This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) or [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html).
In the below, we provide a table that describes for each microservice component in the ChatQnA architecture, the default configuration of the open source project, hardware, port, and endpoint.
## Deployment Options
Gaudi default compose.yaml
The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.
| MicroService | Open Source Project | HW | Port | Endpoint |
| ------------ | ------------------- | ----- | ---- | -------------------- |
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval |
| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking |
| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions |
| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep/ingest |
| Category | Deployment Option | Description |
| ------------------------------------------------------------------------------------------------------------------------------ | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| On-premise Deployments | Docker compose | [ChatQnA deployment on Xeon](./docker_compose/intel/cpu/xeon/README.md) |
| | | [ChatQnA deployment on AI PC](./docker_compose/intel/cpu/aipc/README.md) |
| | | [ChatQnA deployment on Gaudi](./docker_compose/intel/hpu/gaudi/README.md) |
| | | [ChatQnA deployment on Nvidia GPU](./docker_compose/nvidia/gpu/README.md) |
| | | [ChatQnA deployment on AMD ROCm](./docker_compose/amd/gpu/rocm/README.md) |
| Cloud Platforms Deployment on AWS, GCP, Azure, IBM Cloud,Oracle Cloud, [Intel® Tiber™ AI Cloud](https://ai.cloud.intel.com/) | Docker Compose | [Getting Started Guide: Deploy the ChatQnA application across multiple cloud platforms](https://github.com/opea-project/docs/tree/main/getting-started/README.md) |
| | Kubernetes | [Helm Charts](./kubernetes/helm/README.md) |
| Automated Terraform Deployment on Cloud Service Providers | AWS | [Terraform deployment on 4th Gen Intel Xeon with Intel AMX using meta-llama/Meta-Llama-3-8B-Instruct ](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) |
| | | [Terraform deployment on 4th Gen Intel Xeon with Intel AMX using TII Falcon2-11B](https://github.com/intel/terraform-intel-aws-vm/tree/main/examples/gen-ai-xeon-opea-chatqna-falcon11B) |
| | GCP | [Terraform deployment on 5th Gen Intel Xeon with Intel AMX(support Confidential AI by using Intel® TDX](https://github.com/intel/terraform-intel-gcp-vm/tree/main/examples/gen-ai-xeon-opea-chatqna) |
| | Azure | [Terraform deployment on 4th/5th Gen Intel Xeon with Intel AMX & Intel TDX](https://github.com/intel/terraform-intel-azure-linux-vm/tree/main/examples/azure-gen-ai-xeon-opea-chatqna-tdx) |
| | Intel Tiber AI Cloud | Coming Soon |
| | Any Xeon based Ubuntu system | [ChatQnA Ansible Module for Ubuntu 20.04](https://github.com/intel/optimized-cloud-recipes/tree/main/recipes/ai-opea-chatqna-xeon). Use this if you are not using Terraform and have provisioned your system either manually or with another tool, including directly on bare metal. |
### Required Models
## Monitor and Tracing
By default, the embedding, reranking and LLM models are set to a default value as listed below:
Follow [OpenTelemetry OPEA Guide](https://opea-project.github.io/latest/tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.html) to understand how to use OpenTelemetry tracing and metrics in OPEA.
For ChatQnA specific tracing and metrics monitoring, follow [OpenTelemetry on ChatQnA](https://opea-project.github.io/latest/tutorial/OpenTelemetry/deploy/ChatQnA.html) section.
| Service | Model |
| --------- | ----------------------------------- |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
## FAQ Generation Application
Change the `xxx_MODEL_ID` in `docker_compose/xxx/set_env.sh` for your needs.
For customers with proxy issues, the models from [ModelScope](https://www.modelscope.cn/models) are also supported in ChatQnA. Refer to [this readme](docker_compose/intel/cpu/xeon/README.md) for details.
### Deploy ChatQnA on Gaudi
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose up -d
```
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
### Deploy ChatQnA on Xeon
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
docker compose up -d
```
To enable Open Telemetry Tracing, compose.telemetry.yaml file need to be merged along with default compose.yaml file.
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
### Deploy ChatQnA on NVIDIA GPU
```bash
cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/
docker compose up -d
```
Refer to the [NVIDIA GPU Guide](./docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source.
### Deploy ChatQnA on Kubernetes using Helm Chart
Refer to the [ChatQnA helm chart](./kubernetes/helm/README.md) for instructions on deploying ChatQnA on Kubernetes.
### Deploy ChatQnA on AI PC
Refer to the [AI PC Guide](./docker_compose/intel/cpu/aipc/README.md) for instructions on deploying ChatQnA on AI PC.
### Deploy ChatQnA on Red Hat OpenShift Container Platform (RHOCP)
Refer to the [Intel Technology enabling for Openshift readme](https://github.com/intel/intel-technology-enabling-for-openshift/blob/main/workloads/opea/chatqna/README.md) for instructions to deploy ChatQnA prototype on RHOCP with [Red Hat OpenShift AI (RHOAI)](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai).
## Consume ChatQnA Service with RAG
### Check Service Status
Before consuming ChatQnA Service, make sure the vLLM/TGI service is ready, which takes some time.
```bash
# vLLM example
docker logs vllm-gaudi-server 2>&1 | grep complete
# TGI example
docker logs tgi-gaudi-server | grep Connected
```
Consume ChatQnA service until you get the response like below.
```log
# vLLM
INFO: Application startup complete.
# TGI
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```
### Upload RAG Files (Optional)
To chat with retrieved information, you need to upload a file using `Dataprep` service.
Here is an example of `Nike 2023` pdf.
```bash
# download pdf file
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf
# upload pdf file with dataprep
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
### Consume Chat Service
Two ways of consuming ChatQnA Service:
1. Use cURL command on terminal
```bash
curl http://${host_ip}:8888/v1/chatqna \
-H "Content-Type: application/json" \
-d '{
"messages": "What is the revenue of Nike in 2023?"
}'
```
2. Access via frontend
To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`
By default, the UI runs on port 5173 internally.
If you choose conversational UI, use this URL: `http://{host_ip}:5174`
## Troubleshooting
1. If you get errors like "Access Denied", [validate micro service](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
```bash
http_proxy="" curl ${host_ip}:6006/embed -X POST -d '{"inputs":"What is Deep Learning?"}' -H 'Content-Type: application/json'
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:8888, the port may be allocated by other users, you can modify the `compose.yaml`.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
## Monitoring OPEA Service with Prometheus and Grafana dashboard
OPEA microservice deployment can easily be monitored through Grafana dashboards in conjunction with Prometheus data collection. Follow the [README](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/grafana/README.md) to setup Prometheus and Grafana servers and import dashboards to monitor the OPEA service.
![chatqna dashboards](./assets/img/chatqna_dashboards.png)
![tgi dashboard](./assets/img/tgi_dashboard.png)
## Tracing Services with OpenTelemetry Tracing and Jaeger
> NOTE: This feature is disabled by default. Please check the Deploy ChatQnA sessions for how to enable this feature with compose.telemetry.yaml file.
OPEA microservice and TGI/TEI serving can easily be traced through Jaeger dashboards in conjunction with OpenTelemetry Tracing feature. Follow the [README](https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#tracing) to trace additional functions if needed.
Tracing data is exported to http://{EXTERNAL_IP}:4318/v1/traces via Jaeger.
Users could also get the external IP via below command.
```bash
ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+'
```
Access the Jaeger dashboard UI at http://{EXTERNAL_IP}:16686
For TGI serving on Gaudi, users could see different services like opea, TEI and TGI.
![Screenshot from 2024-12-27 11-58-18](https://github.com/user-attachments/assets/6126fa70-e830-4780-bd3f-83cb6eff064e)
Here is a screenshot for one tracing of TGI serving request.
![Screenshot from 2024-12-27 11-26-25](https://github.com/user-attachments/assets/3a7c51c6-f422-41eb-8e82-c3df52cd48b8)
There are also OPEA related tracings. Users could understand the time breakdown of each service request by looking into each opea:schedule operation.
![image](https://github.com/user-attachments/assets/6137068b-b374-4ff8-b345-993343c0c25f)
There could be async function such as `llm/MicroService_asyn_generate` and user needs to check the trace of the async function in another operation like
opea:llm_generate_stream.
![image](https://github.com/user-attachments/assets/a973d283-198f-4ce2-a7eb-58515b77503e)
FAQ Generation Application leverages the power of large language models (LLMs) to revolutionize the way you interact with and comprehend complex textual data. By harnessing cutting-edge natural language processing techniques, our application can automatically generate comprehensive and natural-sounding frequently asked questions (FAQs) from your documents, legal texts, customer queries, and other sources. We merged the FaqGen into the ChatQnA example, which utilize LangChain to implement FAQ Generation and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors.

View File

@@ -0,0 +1,86 @@
# ChatQnA Docker Image Build
## Table of contents
1. [Build MegaService Docker Image](#Build-MegaService-Docker-Image)
2. [Build Basic UI Docker Image](#Build-Basic-UI-Docker-Image)
3. [Build Conversational React UI Docker Image](#Build-Conversational-React-UI-Docker-Image)
4. [Troubleshooting](#Troubleshooting)
## Build MegaService Docker Image
To construct the MegaService with Rerank, we utilize the [GenAIExamples](https://github.com/opea-project/GenAIExamples.git) microservice pipeline within the `chatqna.py` Python script. Build the MegaService Docker image using the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
git fetch && git checkout tags/v1.2
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
## Build Basic UI Docker Image
Build the Frontend Docker Image using the command below:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
## Build Conversational React UI Docker Image (Optional)
Build a frontend Docker image for an interactive conversational UI experience with ChatQnA MegaService
**Export the value of the public IP address of your host machine server to the `host_ip` environment variable**
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
## Troubleshooting
1. If you get errors like "Access Denied", [validate microservices](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/cpu/xeon/README.md#validate-microservices) first. A simple example:
```bash
http_proxy="" curl ${host_ip}:6006/embed -X POST -d '{"inputs":"What is Deep Learning?"}' -H 'Content-Type: application/json'
```
2. (Docker only) If all microservices work well, check the port ${host_ip}:8888, the port may be allocated by other users, you can modify the `compose.yaml`.
3. (Docker only) If you get errors like "The container name is in use", change container name in `compose.yaml`.
## Monitoring OPEA Services with Prometheus and Grafana Dashboard
OPEA microservice deployment can easily be monitored through Grafana dashboards using data collected via Prometheus. Follow the [README](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/grafana/README.md) to setup Prometheus and Grafana servers and import dashboards to monitor the OPEA services.
![chatqna dashboards](./assets/img/chatqna_dashboards.png)
![tgi dashboard](./assets/img/tgi_dashboard.png)
## Tracing with OpenTelemetry and Jaeger
> NOTE: This feature is disabled by default. Please use the compose.telemetry.yaml file to enable this feature.
OPEA microservice and [TGI](https://huggingface.co/docs/text-generation-inference/en/index)/[TEI](https://huggingface.co/docs/text-embeddings-inference/en/index) serving can easily be traced through [Jaeger](https://www.jaegertracing.io/) dashboards in conjunction with [OpenTelemetry](https://opentelemetry.io/) Tracing feature. Follow the [README](https://github.com/opea-project/GenAIComps/tree/main/comps/cores/telemetry#tracing) to trace additional functions if needed.
Tracing data is exported to http://{EXTERNAL_IP}:4318/v1/traces via Jaeger.
Users could also get the external IP via below command.
```bash
ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+'
```
Access the Jaeger dashboard UI at http://{EXTERNAL_IP}:16686
For TGI serving on Gaudi, users could see different services like opea, TEI and TGI.
![Screenshot from 2024-12-27 11-58-18](https://github.com/user-attachments/assets/6126fa70-e830-4780-bd3f-83cb6eff064e)
Here is a screenshot for one tracing of TGI serving request.
![Screenshot from 2024-12-27 11-26-25](https://github.com/user-attachments/assets/3a7c51c6-f422-41eb-8e82-c3df52cd48b8)
There are also OPEA related tracings. Users could understand the time breakdown of each service request by looking into each opea:schedule operation.
![image](https://github.com/user-attachments/assets/6137068b-b374-4ff8-b345-993343c0c25f)
There could be asynchronous function such as `llm/MicroService_asyn_generate` and user needs to check the trace of the asynchronous function in another operation like
opea:llm_generate_stream.
![image](https://github.com/user-attachments/assets/a973d283-198f-4ce2-a7eb-58515b77503e)

View File

@@ -1,192 +0,0 @@
# ChatQnA Benchmarking
This folder contains a collection of Kubernetes manifest files for deploying the ChatQnA service across scalable nodes. It includes a comprehensive [benchmarking tool](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/README.md) that enables throughput analysis to assess inference performance.
By following this guide, you can run benchmarks on your deployment and share the results with the OPEA community.
## Purpose
We aim to run these benchmarks and share them with the OPEA community for three primary reasons:
- To offer insights on inference throughput in real-world scenarios, helping you choose the best service or deployment for your needs.
- To establish a baseline for validating optimization solutions across different implementations, providing clear guidance on which methods are most effective for your use case.
- To inspire the community to build upon our benchmarks, allowing us to better quantify new solutions in conjunction with current leading llms, serving frameworks etc.
## Metrics
The benchmark will report the below metrics, including:
- Number of Concurrent Requests
- End-to-End Latency: P50, P90, P99 (in milliseconds)
- End-to-End First Token Latency: P50, P90, P99 (in milliseconds)
- Average Next Token Latency (in milliseconds)
- Average Token Latency (in milliseconds)
- Requests Per Second (RPS)
- Output Tokens Per Second
- Input Tokens Per Second
Results will be displayed in the terminal and saved as CSV file named `1_stats.csv` for easy export to spreadsheets.
## Table of Contents
- [Deployment](#deployment)
- [Prerequisites](#prerequisites)
- [Deployment Scenarios](#deployment-scenarios)
- [Case 1: Baseline Deployment with Rerank](#case-1-baseline-deployment-with-rerank)
- [Case 2: Baseline Deployment without Rerank](#case-2-baseline-deployment-without-rerank)
- [Case 3: Tuned Deployment with Rerank](#case-3-tuned-deployment-with-rerank)
- [Benchmark](#benchmark)
- [Test Configurations](#test-configurations)
- [Test Steps](#test-steps)
- [Upload Retrieval File](#upload-retrieval-file)
- [Run Benchmark Test](#run-benchmark-test)
- [Data collection](#data-collection)
- [Teardown](#teardown)
## Deployment
### Prerequisites
- Kubernetes installation: Use [kubespray](https://github.com/opea-project/docs/blob/main/guide/installation/k8s_install/k8s_install_kubespray.md) or other official Kubernetes installation guides:
- (Optional) [Kubernetes set up guide on Intel Gaudi product](https://github.com/opea-project/GenAIInfra/blob/main/README.md#setup-kubernetes-cluster)
- Helm installation: Follow the [Helm documentation](https://helm.sh/docs/intro/install/#helm) to install Helm.
- Setup Hugging Face Token
To access models and APIs from Hugging Face, set your token as environment variable.
```bash
export HF_TOKEN="insert-your-huggingface-token-here"
```
- Prepare Shared Models (Optional but Strongly Recommended)
Downloading models simultaneously to multiple nodes in your cluster can overload resources such as network bandwidth, memory and storage. To prevent resource exhaustion, it's recommended to preload the models in advance.
```bash
pip install -U "huggingface_hub[cli]"
sudo mkdir -p /mnt/models
sudo chmod 777 /mnt/models
huggingface-cli download --cache-dir /mnt/models Intel/neural-chat-7b-v3-3
export MODEL_DIR=/mnt/models
```
Once the models are downloaded, you can consider the following methods for sharing them across nodes:
- Persistent Volume Claim (PVC): This is the recommended approach for production setups. For more details on using PVC, refer to [PVC](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume).
- Local Host Path: For simpler testing, ensure that each node involved in the deployment follows the steps above to locally prepare the models. After preparing the models, use `--set global.modelUseHostPath=${MODELDIR}` in the deployment command.
- Label Nodes
```base
python deploy.py --add-label --num-nodes 2
```
### Deployment Scenarios
The example below are based on a two-node setup. You can adjust the number of nodes by using the `--num-nodes` option.
By default, these commands use the `default` namespace. To specify a different namespace, use the `--namespace` flag with deploy, uninstall, and kubernetes command. Additionally, update the `namespace` field in `benchmark.yaml` before running the benchmark test.
For additional configuration options, run `python deploy.py --help`
#### Case 1: Baseline Deployment with Rerank
Deploy Command (with node number, Hugging Face token, model directory specified):
```bash
python deploy.py --hf-token $HF_TOKEN --model-dir $MODEL_DIR --num-nodes 2 --with-rerank
```
Uninstall Command:
```bash
python deploy.py --uninstall
```
#### Case 2: Baseline Deployment without Rerank
```bash
python deploy.py --hf-token $HFTOKEN --model-dir $MODELDIR --num-nodes 2
```
#### Case 3: Tuned Deployment with Rerank
```bash
python deploy.py --hf-token $HFTOKEN --model-dir $MODELDIR --num-nodes 2 --with-rerank --tuned
```
## Benchmark
### Test Configurations
| Key | Value |
| -------- | ------- |
| Workload | ChatQnA |
| Tag | V1.1 |
Models configuration
| Key | Value |
| ---------- | ------------------ |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| Inference | Intel/neural-chat-7b-v3-3 |
Benchmark parameters
| Key | Value |
| ---------- | ------------------ |
| LLM input tokens | 1024 |
| LLM output tokens | 128 |
Number of test requests for different scheduled node number:
| Node count | Concurrency | Query number |
| ----- | -------- | -------- |
| 1 | 128 | 640 |
| 2 | 256 | 1280 |
| 4 | 512 | 2560 |
More detailed configuration can be found in configuration file [benchmark.yaml](./benchmark.yaml).
### Test Steps
Use `kubectl get pods` to confirm that all pods are `READY` before starting the test.
#### Upload Retrieval File
Before testing, upload a specified file to make sure the llm input have the token length of 1k.
Get files:
```bash
wget https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/data/upload_file.txt
```
Retrieve the `ClusterIP` of the `chatqna-data-prep` service.
```bash
kubectl get svc
```
Expected output:
```log
chatqna-data-prep ClusterIP xx.xx.xx.xx <none> 6007/TCP 51m
```
Use the following `cURL` command to upload file:
```bash
cd GenAIEval/evals/benchmark/data
curl -X POST "http://${cluster_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "chunk_size=3800" \
-F "files=@./upload_file.txt"
```
#### Run Benchmark Test
Run the benchmark test using:
```bash
bash benchmark.sh -n 2
```
The `-n` argument specifies the number of test nodes. Required dependencies will be automatically installed when running the benchmark for the first time.
#### Data collection
All the test results will come to the folder `GenAIEval/evals/benchmark/benchmark_output`.
## Teardown
After completing the benchmark, use the following command to clean up the environment:
Remove Node Labels:
```bash
python deploy.py --delete-label
```

View File

@@ -1,102 +0,0 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
deployment_type="k8s"
node_number=1
service_port=8888
query_per_node=640
benchmark_tool_path="$(pwd)/GenAIEval"
usage() {
echo "Usage: $0 [-d deployment_type] [-n node_number] [-i service_ip] [-p service_port]"
echo " -d deployment_type ChatQnA deployment type, select between k8s and docker (default: k8s)"
echo " -n node_number Test node number, required only for k8s deployment_type, (default: 1)"
echo " -i service_ip chatqna service ip, required only for docker deployment_type"
echo " -p service_port chatqna service port, required only for docker deployment_type, (default: 8888)"
exit 1
}
while getopts ":d:n:i:p:" opt; do
case ${opt} in
d )
deployment_type=$OPTARG
;;
n )
node_number=$OPTARG
;;
i )
service_ip=$OPTARG
;;
p )
service_port=$OPTARG
;;
\? )
echo "Invalid option: -$OPTARG" 1>&2
usage
;;
: )
echo "Invalid option: -$OPTARG requires an argument" 1>&2
usage
;;
esac
done
if [[ "$deployment_type" == "docker" && -z "$service_ip" ]]; then
echo "Error: service_ip is required for docker deployment_type" 1>&2
usage
fi
if [[ "$deployment_type" == "k8s" && ( -n "$service_ip" || -n "$service_port" ) ]]; then
echo "Warning: service_ip and service_port are ignored for k8s deployment_type" 1>&2
fi
function main() {
if [[ ! -d ${benchmark_tool_path} ]]; then
echo "Benchmark tool not found, setting up..."
setup_env
fi
run_benchmark
}
function setup_env() {
git clone https://github.com/opea-project/GenAIEval.git
pushd ${benchmark_tool_path}
python3 -m venv stress_venv
source stress_venv/bin/activate
pip install -r requirements.txt
popd
}
function run_benchmark() {
source ${benchmark_tool_path}/stress_venv/bin/activate
export DEPLOYMENT_TYPE=${deployment_type}
export SERVICE_IP=${service_ip:-"None"}
export SERVICE_PORT=${service_port:-"None"}
export LOAD_SHAPE=${load_shape:-"constant"}
export CONCURRENT_LEVEL=${concurrent_level:-5}
export ARRIVAL_RATE=${arrival_rate:-1.0}
if [[ -z $USER_QUERIES ]]; then
user_query=$((query_per_node*node_number))
export USER_QUERIES="[${user_query}, ${user_query}, ${user_query}, ${user_query}]"
echo "USER_QUERIES not configured, setting to: ${USER_QUERIES}."
fi
export WARMUP=$(echo $USER_QUERIES | sed -e 's/[][]//g' -e 's/,.*//')
if [[ -z $WARMUP ]]; then export WARMUP=0; fi
if [[ -z $TEST_OUTPUT_DIR ]]; then
if [[ $DEPLOYMENT_TYPE == "k8s" ]]; then
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/node_${node_number}"
else
export TEST_OUTPUT_DIR="${benchmark_tool_path}/evals/benchmark/benchmark_output/docker"
fi
echo "TEST_OUTPUT_DIR not configured, setting to: ${TEST_OUTPUT_DIR}."
fi
envsubst < ./benchmark.yaml > ${benchmark_tool_path}/evals/benchmark/benchmark.yaml
cd ${benchmark_tool_path}/evals/benchmark
python benchmark.py
}
main

View File

@@ -1,68 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
test_suite_config: # Overall configuration settings for the test suite
examples: ["chatqna"] # The specific test cases being tested, e.g., chatqna, codegen, codetrans, faqgen, audioqna, visualqna
deployment_type: ${DEPLOYMENT_TYPE} # Default is "k8s", can also be "docker"
service_ip: ${SERVICE_IP} # Leave as None for k8s, specify for Docker
service_port: ${SERVICE_PORT} # Leave as None for k8s, specify for Docker
warm_ups: ${WARMUP} # Number of test requests for warm-up
run_time: 60m # The max total run time for the test suite
seed: # The seed for all RNGs
user_queries: ${USER_QUERIES} # Number of test requests at each concurrency level
query_timeout: 120 # Number of seconds to wait for a simulated user to complete any executing task before exiting. 120 sec by defeult.
random_prompt: false # Use random prompts if true, fixed prompts if false
collect_service_metric: false # Collect service metrics if true, do not collect service metrics if false
data_visualization: false # Generate data visualization if true, do not generate data visualization if false
llm_model: "Intel/neural-chat-7b-v3-3" # The LLM model used for the test
test_output_dir: "${TEST_OUTPUT_DIR}" # The directory to store the test output
load_shape: # Tenant concurrency pattern
name: ${LOAD_SHAPE} # poisson or constant(locust default load shape)
params: # Loadshape-specific parameters
constant: # Constant load shape specific parameters, activate only if load_shape.name is constant
concurrent_level: ${CONCURRENT_LEVEL} # If user_queries is specified, concurrent_level is target number of requests per user. If not, it is the number of simulated users
poisson: # Poisson load shape specific parameters, activate only if load_shape.name is poisson
arrival_rate: ${ARRIVAL_RATE} # Request arrival rate
test_cases:
chatqna:
embedding:
run_test: false
service_name: "chatqna-embedding-usvc" # Replace with your service name
embedserve:
run_test: false
service_name: "chatqna-tei" # Replace with your service name
retriever:
run_test: false
service_name: "chatqna-retriever-usvc" # Replace with your service name
parameters:
search_type: "similarity"
k: 1
fetch_k: 20
lambda_mult: 0.5
score_threshold: 0.2
reranking:
run_test: false
service_name: "chatqna-reranking-usvc" # Replace with your service name
parameters:
top_n: 1
rerankserve:
run_test: false
service_name: "chatqna-teirerank" # Replace with your service name
llm:
run_test: false
service_name: "chatqna-llm-uservice" # Replace with your service name
parameters:
max_tokens: 128
temperature: 0.01
top_k: 10
top_p: 0.95
repetition_penalty: 1.03
stream: true
llmserve:
run_test: false
service_name: "chatqna-tgi" # Replace with your service name
e2e:
run_test: true
service_name: "chatqna" # Replace with your service name
k: 1

View File

@@ -1,278 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import argparse
import glob
import json
import os
import shutil
import subprocess
import sys
from generate_helm_values import generate_helm_values
def run_kubectl_command(command):
"""Run a kubectl command and return the output."""
try:
result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
return result.stdout
except subprocess.CalledProcessError as e:
print(f"Error running command: {command}\n{e.stderr}")
exit(1)
def get_all_nodes():
"""Get the list of all nodes in the Kubernetes cluster."""
command = ["kubectl", "get", "nodes", "-o", "json"]
output = run_kubectl_command(command)
nodes = json.loads(output)
return [node["metadata"]["name"] for node in nodes["items"]]
def add_label_to_node(node_name, label):
"""Add a label to the specified node."""
command = ["kubectl", "label", "node", node_name, label, "--overwrite"]
print(f"Labeling node {node_name} with {label}...")
run_kubectl_command(command)
print(f"Label {label} added to node {node_name} successfully.")
def add_labels_to_nodes(node_count=None, label=None, node_names=None):
"""Add a label to the specified number of nodes or to specified nodes."""
if node_names:
# Add label to the specified nodes
for node_name in node_names:
add_label_to_node(node_name, label)
else:
# Fetch the node list and label the specified number of nodes
all_nodes = get_all_nodes()
if node_count is None or node_count > len(all_nodes):
print(f"Error: Node count exceeds the number of available nodes ({len(all_nodes)} available).")
sys.exit(1)
selected_nodes = all_nodes[:node_count]
for node_name in selected_nodes:
add_label_to_node(node_name, label)
def clear_labels_from_nodes(label, node_names=None):
"""Clear the specified label from specific nodes if provided, otherwise from all nodes."""
label_key = label.split("=")[0] # Extract key from 'key=value' format
# If specific nodes are provided, use them; otherwise, get all nodes
nodes_to_clear = node_names if node_names else get_all_nodes()
for node_name in nodes_to_clear:
# Check if the node has the label by inspecting its metadata
command = ["kubectl", "get", "node", node_name, "-o", "json"]
node_info = run_kubectl_command(command)
node_metadata = json.loads(node_info)
# Check if the label exists on this node
labels = node_metadata["metadata"].get("labels", {})
if label_key in labels:
# Remove the label from the node
command = ["kubectl", "label", "node", node_name, f"{label_key}-"]
print(f"Removing label {label_key} from node {node_name}...")
run_kubectl_command(command)
print(f"Label {label_key} removed from node {node_name} successfully.")
else:
print(f"Label {label_key} not found on node {node_name}, skipping.")
def install_helm_release(release_name, chart_name, namespace, values_file, device_type):
"""Deploy a Helm release with a specified name and chart.
Parameters:
- release_name: The name of the Helm release.
- chart_name: The Helm chart name or path, e.g., "opea/chatqna".
- namespace: The Kubernetes namespace for deployment.
- values_file: The user values file for deployment.
- device_type: The device type (e.g., "gaudi") for specific configurations (optional).
"""
# Check if the namespace exists; if not, create it
try:
# Check if the namespace exists
command = ["kubectl", "get", "namespace", namespace]
subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
# Namespace does not exist, create it
print(f"Namespace '{namespace}' does not exist. Creating it...")
command = ["kubectl", "create", "namespace", namespace]
subprocess.run(command, check=True)
print(f"Namespace '{namespace}' created successfully.")
# Handle gaudi-specific values file if device_type is "gaudi"
hw_values_file = None
untar_dir = None
if device_type == "gaudi":
print("Device type is gaudi. Pulling Helm chart to get gaudi-values.yaml...")
# Combine chart_name with fixed prefix
chart_pull_url = f"oci://ghcr.io/opea-project/charts/{chart_name}"
# Pull and untar the chart
subprocess.run(["helm", "pull", chart_pull_url, "--untar"], check=True)
# Find the untarred directory
untar_dirs = glob.glob(f"{chart_name}*")
if untar_dirs:
untar_dir = untar_dirs[0]
hw_values_file = os.path.join(untar_dir, "gaudi-values.yaml")
print("gaudi-values.yaml pulled and ready for use.")
else:
print(f"Error: Could not find untarred directory for {chart_name}")
return
# Prepare the Helm install command
command = ["helm", "install", release_name, chart_name, "--namespace", namespace]
# Append additional values file for gaudi if it exists
if hw_values_file:
command.extend(["-f", hw_values_file])
# Append the main values file
command.extend(["-f", values_file])
# Execute the Helm install command
try:
print(f"Running command: {' '.join(command)}") # Print full command for debugging
subprocess.run(command, check=True)
print("Deployment initiated successfully.")
except subprocess.CalledProcessError as e:
print(f"Error occurred while deploying Helm release: {e}")
# Cleanup: Remove the untarred directory
if untar_dir and os.path.isdir(untar_dir):
print(f"Removing temporary directory: {untar_dir}")
shutil.rmtree(untar_dir)
print("Temporary directory removed successfully.")
def uninstall_helm_release(release_name, namespace=None):
"""Uninstall a Helm release and clean up resources, optionally delete the namespace if not 'default'."""
# Default to 'default' namespace if none is specified
if not namespace:
namespace = "default"
try:
# Uninstall the Helm release
command = ["helm", "uninstall", release_name, "--namespace", namespace]
print(f"Uninstalling Helm release {release_name} in namespace {namespace}...")
run_kubectl_command(command)
print(f"Helm release {release_name} uninstalled successfully.")
# If the namespace is specified and not 'default', delete it
if namespace != "default":
print(f"Deleting namespace {namespace}...")
delete_namespace_command = ["kubectl", "delete", "namespace", namespace]
run_kubectl_command(delete_namespace_command)
print(f"Namespace {namespace} deleted successfully.")
else:
print("Namespace is 'default', skipping deletion.")
except subprocess.CalledProcessError as e:
print(f"Error occurred while uninstalling Helm release or deleting namespace: {e}")
def main():
parser = argparse.ArgumentParser(description="Manage Helm Deployment.")
parser.add_argument(
"--release-name",
type=str,
default="chatqna",
help="The Helm release name created during deployment (default: chatqna).",
)
parser.add_argument(
"--chart-name",
type=str,
default="chatqna",
help="The chart name to deploy, composed of repo name and chart name (default: chatqna).",
)
parser.add_argument("--namespace", default="default", help="Kubernetes namespace (default: default).")
parser.add_argument("--hf-token", help="Hugging Face API token.")
parser.add_argument(
"--model-dir", help="Model directory, mounted as volumes for service access to pre-downloaded models"
)
parser.add_argument("--user-values", help="Path to a user-specified values.yaml file.")
parser.add_argument(
"--create-values-only", action="store_true", help="Only create the values.yaml file without deploying."
)
parser.add_argument("--uninstall", action="store_true", help="Uninstall the Helm release.")
parser.add_argument("--num-nodes", type=int, default=1, help="Number of nodes to use (default: 1).")
parser.add_argument("--node-names", nargs="*", help="Optional specific node names to label.")
parser.add_argument("--add-label", action="store_true", help="Add label to specified nodes if this flag is set.")
parser.add_argument(
"--delete-label", action="store_true", help="Delete label from specified nodes if this flag is set."
)
parser.add_argument(
"--label", default="node-type=opea-benchmark", help="Label to add/delete (default: node-type=opea-benchmark)."
)
parser.add_argument("--with-rerank", action="store_true", help="Include rerank service in the deployment.")
parser.add_argument(
"--tuned",
action="store_true",
help="Modify resources for services and change extraCmdArgs when creating values.yaml.",
)
parser.add_argument(
"--device-type",
type=str,
choices=["cpu", "gaudi"],
default="gaudi",
help="Specify the device type for deployment (choices: 'cpu', 'gaudi'; default: gaudi).",
)
args = parser.parse_args()
# Adjust num-nodes based on node-names if specified
if args.node_names:
num_node_names = len(args.node_names)
if args.num_nodes != 1 and args.num_nodes != num_node_names:
parser.error("--num-nodes must match the number of --node-names if both are specified.")
else:
args.num_nodes = num_node_names
# Node labeling management
if args.add_label:
add_labels_to_nodes(args.num_nodes, args.label, args.node_names)
return
elif args.delete_label:
clear_labels_from_nodes(args.label, args.node_names)
return
# Uninstall Helm release if specified
if args.uninstall:
uninstall_helm_release(args.release_name, args.namespace)
return
# Prepare values.yaml if not uninstalling
if args.user_values:
values_file_path = args.user_values
else:
if not args.hf_token:
parser.error("--hf-token are required")
node_selector = {args.label.split("=")[0]: args.label.split("=")[1]}
values_file_path = generate_helm_values(
with_rerank=args.with_rerank,
num_nodes=args.num_nodes,
hf_token=args.hf_token,
model_dir=args.model_dir,
node_selector=node_selector,
tune=args.tuned,
)
# Read back the generated YAML file for verification
with open(values_file_path, "r") as file:
print("Generated YAML contents:")
print(file.read())
# Deploy unless --create-values-only is specified
if not args.create_values_only:
install_helm_release(args.release_name, args.chart_name, args.namespace, values_file_path, args.device_type)
if __name__ == "__main__":
main()

View File

@@ -1,164 +0,0 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import os
import yaml
def generate_helm_values(with_rerank, num_nodes, hf_token, model_dir, node_selector=None, tune=False):
"""Create a values.yaml file based on the provided configuration."""
# Log the received parameters
print("Received parameters:")
print(f"with_rerank: {with_rerank}")
print(f"num_nodes: {num_nodes}")
print(f"node_selector: {node_selector}") # Log the node_selector
print(f"tune: {tune}")
if node_selector is None:
node_selector = {}
# Construct the base values dictionary
values = {
"tei": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"tgi": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"data-prep": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"redis-vector-db": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"retriever-usvc": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"chatqna-ui": {"nodeSelector": {key: value for key, value in node_selector.items()}},
"global": {
"HUGGINGFACEHUB_API_TOKEN": hf_token, # Use passed token
"modelUseHostPath": model_dir, # Use passed model directory
},
"nodeSelector": {key: value for key, value in node_selector.items()},
}
if with_rerank:
values["teirerank"] = {"nodeSelector": {key: value for key, value in node_selector.items()}}
else:
values["image"] = {"repository": "opea/chatqna-without-rerank"}
values["teirerank"] = {"enabled": False}
default_replicas = [
{"name": "chatqna", "replicaCount": 2},
{"name": "tei", "replicaCount": 1},
{"name": "teirerank", "replicaCount": 1} if with_rerank else None,
{"name": "tgi", "replicaCount": 7 if with_rerank else 8},
{"name": "data-prep", "replicaCount": 1},
{"name": "redis-vector-db", "replicaCount": 1},
{"name": "retriever-usvc", "replicaCount": 2},
]
if num_nodes > 1:
# Scale replicas based on number of nodes
replicas = [
{"name": "chatqna", "replicaCount": 1 * num_nodes},
{"name": "tei", "replicaCount": 1 * num_nodes},
{"name": "teirerank", "replicaCount": 1} if with_rerank else None,
{"name": "tgi", "replicaCount": (8 * num_nodes - 1) if with_rerank else 8 * num_nodes},
{"name": "data-prep", "replicaCount": 1},
{"name": "redis-vector-db", "replicaCount": 1},
{"name": "retriever-usvc", "replicaCount": 1 * num_nodes},
]
else:
replicas = default_replicas
# Remove None values for rerank disabled
replicas = [r for r in replicas if r]
# Update values.yaml with replicas
for replica in replicas:
service_name = replica["name"]
if service_name == "chatqna":
values["replicaCount"] = replica["replicaCount"]
print(replica["replicaCount"])
elif service_name in values:
values[service_name]["replicaCount"] = replica["replicaCount"]
# Prepare resource configurations based on tuning
resources = []
if tune:
resources = [
{
"name": "chatqna",
"resources": {
"limits": {"cpu": "16", "memory": "8000Mi"},
"requests": {"cpu": "16", "memory": "8000Mi"},
},
},
{
"name": "tei",
"resources": {
"limits": {"cpu": "80", "memory": "20000Mi"},
"requests": {"cpu": "80", "memory": "20000Mi"},
},
},
{"name": "teirerank", "resources": {"limits": {"habana.ai/gaudi": 1}}} if with_rerank else None,
{"name": "tgi", "resources": {"limits": {"habana.ai/gaudi": 1}}},
{"name": "retriever-usvc", "resources": {"requests": {"cpu": "8", "memory": "8000Mi"}}},
]
# Filter out any None values directly as part of initialization
resources = [r for r in resources if r is not None]
# Add resources for each service if tuning
for resource in resources:
service_name = resource["name"]
if service_name == "chatqna":
values["resources"] = resource["resources"]
elif service_name in values:
values[service_name]["resources"] = resource["resources"]
# Add extraCmdArgs for tgi service with default values
if "tgi" in values:
values["tgi"]["extraCmdArgs"] = [
"--max-input-length",
"1280",
"--max-total-tokens",
"2048",
"--max-batch-total-tokens",
"65536",
"--max-batch-prefill-tokens",
"4096",
]
yaml_string = yaml.dump(values, default_flow_style=False)
# Determine the mode based on the 'tune' parameter
mode = "tuned" if tune else "oob"
# Determine the filename based on 'with_rerank' and 'num_nodes'
if with_rerank:
filename = f"{mode}-{num_nodes}-gaudi-with-rerank-values.yaml"
else:
filename = f"{mode}-{num_nodes}-gaudi-without-rerank-values.yaml"
# Write the YAML data to the file
with open(filename, "w") as file:
file.write(yaml_string)
# Get the current working directory and construct the file path
current_dir = os.getcwd()
filepath = os.path.join(current_dir, filename)
print(f"YAML file {filepath} has been generated.")
return filepath # Optionally return the file path
# Main execution for standalone use of create_values_yaml
if __name__ == "__main__":
# Example values for standalone execution
with_rerank = True
num_nodes = 2
hftoken = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
modeldir = "/mnt/model"
node_selector = {"node-type": "opea-benchmark"}
tune = True
filename = generate_helm_values(with_rerank, num_nodes, hftoken, modeldir, node_selector, tune)
# Read back the generated YAML file for verification
with open(filename, "r") as file:
print("Generated YAML contents:")
print(file.read())

View File

@@ -3,7 +3,7 @@
deploy:
device: gaudi
version: 1.2.0
version: 1.3.0
modelUseHostPath: /mnt/models
HUGGINGFACEHUB_API_TOKEN: "" # mandatory
node: [1, 2, 4, 8]

View File

@@ -25,7 +25,7 @@ class ChatTemplate:
@staticmethod
def generate_rag_prompt(question, documents):
context_str = "\n".join(documents)
if context_str and len(re.findall("[\u4E00-\u9FFF]", context_str)) / len(context_str) >= 0.3:
if context_str and len(re.findall("[\u4e00-\u9fff]", context_str)) / len(context_str) >= 0.3:
# chinese context
template = """
### 你将扮演一个乐于助人、尊重他人并诚实的助手,你的目标是帮助用户解答问题。有效地利用来自本地知识库的搜索结果。确保你的回答中只包含相关信息。如果你不确定问题的答案,请避免分享不准确的信息。
@@ -58,6 +58,7 @@ RERANK_SERVER_PORT = int(os.getenv("RERANK_SERVER_PORT", 80))
LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 80))
LLM_MODEL = os.getenv("LLM_MODEL", "meta-llama/Meta-Llama-3-8B-Instruct")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", None)
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
@@ -239,6 +240,7 @@ class ChatQnAService:
name="llm",
host=LLM_SERVER_HOST_IP,
port=LLM_SERVER_PORT,
api_key=OPENAI_API_KEY,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
@@ -272,6 +274,7 @@ class ChatQnAService:
name="llm",
host=LLM_SERVER_HOST_IP,
port=LLM_SERVER_PORT,
api_key=OPENAI_API_KEY,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,
@@ -317,6 +320,7 @@ class ChatQnAService:
name="llm",
host=LLM_SERVER_HOST_IP,
port=LLM_SERVER_PORT,
api_key=OPENAI_API_KEY,
endpoint="/v1/chat/completions",
use_remote_service=True,
service_type=ServiceType.LLM,

View File

@@ -1,163 +1,90 @@
# Build and Deploy ChatQnA Application on AMD GPU (ROCm)
# Deploying ChatQnA on AMD ROCm GPU
## Build Docker Images
This document outlines the single node deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server and AMD GPU. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`.
### 1. Build Docker Image
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
- #### Create application install directory and go to it:
## Table of Contents
```bash
mkdir ~/chatqna-install && cd chatqna-install
```
1. [ChatQnA Quick Start Deployment](#chatqna-quick-start-deployment)
2. [ChatQnA Docker Compose Files](#chatqna-docker-compose-files)
3. [Validate Microservices](#validate-microservices)
4. [Conclusion](#conclusion)
- #### Clone the repository GenAIExamples (the default repository branch "main" is used here):
## ChatQnA Quick Start Deployment
```bash
git clone https://github.com/opea-project/GenAIExamples.git
```
This section describes how to quickly deploy and test the ChatQnA service manually on an AMD ROCm GPU. The basic steps are:
If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
1. [Access the Code](#access-the-code)
2. [Configure the Deployment Environment](#configure-the-deployment-environment)
3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
4. [Check the Deployment Status](#check-the-deployment-status)
5. [Validate the Pipeline](#validate-the-pipeline)
6. [Cleanup the Deployment](#cleanup-the-deployment)
```bash
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3
```
### Access the Code
We remind you that when using a specific version of the code, you need to use the README from this version:
Clone the GenAIExample repository and access the ChatQnA AMD ROCm GPU platform Docker Compose files and supporting scripts:
- #### Go to build directory:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA
```
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_image_build
```
Then checkout a released version, such as v1.3:
- Cleaning up the GenAIComps repository if it was previously cloned in this directory.
This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
```bash
git checkout v1.3
```
```bash
echo Y | rm -R GenAIComps
```
### Configure the Deployment Environment
- #### Clone the repository GenAIComps (the default repository branch "main" is used here):
To set up environment variables for deploying ChatQnA services, set up some parameters specific to the deployment environment and source the `set_env_*.sh` script in this directory:
```bash
git clone https://github.com/opea-project/GenAIComps.git
```
- if used vLLM - set_env_vllm.sh
- if used vLLM with FaqGen - set_env_faqgen_vllm.sh
- if used TGI - set_env.sh
- if used TGI with FaqGen - set_env_faqgen.sh
If you use a specific tag of the GenAIExamples repository,
then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value):
Set the values of the variables:
```bash
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3
```
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
We remind you that when using a specific version of the code, you need to use the README from this version.
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
- #### Setting the list of images for the build (from the build file.yaml)
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows:
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
#### vLLM-based application
We set these values in the file set_env\*\*\*\*.sh
```bash
service_list="dataprep retriever vllm-rocm chatqna chatqna-ui nginx"
```
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
The values shown in the file set_env.sh or set_env_vllm.sh they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
#### vLLM-based application with FaqGen
Setting variables in the operating system environment:
```bash
service_list="dataprep retriever vllm-rocm llm-faqgen chatqna chatqna-ui nginx"
```
```bash
export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token"
source ./set_env_*.sh # replace the script name with the appropriate one
```
#### TGI-based application
Consult the section on [ChatQnA Service configuration](#chatqna-configuration) for information on how service specific configuration parameters affect deployments.
```bash
service_list="dataprep retriever chatqna chatqna-ui nginx"
```
### Deploy the Services Using Docker Compose
#### TGI-based application with FaqGen
To deploy the ChatQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment with TGI, execute the command below. It uses the 'compose.yaml' file.
```bash
service_list="dataprep retriever llm-faqgen chatqna chatqna-ui nginx"
```
- #### Pull Docker Images
```bash
docker pull redis/redis-stack:7.2.0-v9
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
```
- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI)
```bash
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
```
- #### Build Docker Images
```bash
docker compose -f build.yaml build ${service_list} --no-cache
```
After the build, we check the list of images with the command:
```bash
docker image ls
```
The list of images should include:
##### vLLM-based application:
- redis/redis-stack:7.2.0-v9
- opea/dataprep:latest
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/retriever:latest
- opea/vllm-rocm:latest
- opea/chatqna:latest
- opea/chatqna-ui:latest
- opea/nginx:latest
##### vLLM-based application with FaqGen:
- redis/redis-stack:7.2.0-v9
- opea/dataprep:latest
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/retriever:latest
- opea/vllm-rocm:latest
- opea/llm-faqgen:latest
- opea/chatqna:latest
- opea/chatqna-ui:latest
- opea/nginx:latest
##### TGI-based application:
- redis/redis-stack:7.2.0-v9
- opea/dataprep:latest
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/retriever:latest
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- opea/chatqna:latest
- opea/chatqna-ui:latest
- opea/nginx:latest
##### TGI-based application with FaqGen:
- redis/redis-stack:7.2.0-v9
- opea/dataprep:latest
- ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- opea/retriever:latest
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
- opea/llm-faqgen:latest
- opea/chatqna:latest
- opea/chatqna-ui:latest
- opea/nginx:latest
---
## Deploy the ChatQnA Application
### Docker Compose Configuration for AMD GPUs
```bash
cd docker_compose/amd/gpu/rocm
# if used TGI
docker compose -f compose.yaml up -d
# if used TGI with FaqGen
# docker compose -f compose_faqgen.yaml up -d
# if used vLLM
# docker compose -f compose_vllm.yaml up -d
# if used vLLM with FaqGen
# docker compose -f compose_faqgen_vllm.yaml up -d
```
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file:
@@ -198,332 +125,103 @@ security_opt:
**How to Identify GPU Device IDs:**
Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU.
### Set deploy environment variables
> **Note**: developers should build docker image from source when:
>
> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image).
> - Unable to download the docker image.
> - Use a specific version of Docker image.
#### Setting variables in the operating system environment:
Please refer to the table below to build different microservices from source:
##### Set variable HUGGINGFACEHUB_API_TOKEN:
| Microservice | Deployment Guide |
| --------------- | ------------------------------------------------------------------------------------------------------------------ |
| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) |
| TGI | [TGI project](https://github.com/huggingface/text-generation-inference.git) |
| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) |
| Redis Vector DB | [Redis](https://github.com/redis/redis.git) |
| Dataprep | [Dataprep build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/dataprep/src/README_redis.md) |
| TEI Embedding | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) |
| Retriever | [Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers/src/README_redis.md) |
| TEI Reranking | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) |
| MegaService | [MegaService guide](../../../../README.md) |
| UI | [UI guide](../../../../ui/react/README.md) |
| Nginx | [Nginx guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/nginx) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```bash
### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token.
export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
docker ps -a
```
#### Set variables value in set_env\*\*\*\*.sh file:
For the default deployment with TGI, the following 9 containers should have started:
Go to Docker Compose directory:
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eaf24161aca8 opea/nginx:latest "/docker-entrypoint.…" 37 seconds ago Up 5 seconds 0.0.0.0:18104->80/tcp, [::]:18104->80/tcp chatqna-nginx-server
2fce48a4c0f4 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 seconds ago Up 5 seconds 0.0.0.0:18101->5173/tcp, [::]:18101->5173/tcp chatqna-ui-server
613c384979f4 opea/chatqna:latest "bash entrypoint.sh" 37 seconds ago Up 5 seconds 0.0.0.0:18102->8888/tcp, [::]:18102->8888/tcp chatqna-backend-server
05512bd29fee opea/dataprep:latest "sh -c 'python $( [ …" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:18103->5000/tcp, [::]:18103->5000/tcp chatqna-dataprep-service
49844d339d1d opea/retriever:latest "python opea_retriev…" 37 seconds ago Up 36 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp chatqna-retriever
75b698fe7de0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18808->80/tcp, [::]:18808->80/tcp chatqna-tei-reranking-service
342f01bfdbb2 ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"python3 /workspace/…" 37 seconds ago Up 36 seconds 0.0.0.0:18008->8011/tcp, [::]:18008->8011/tcp chatqna-tgi-service
6081eb1c119d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp chatqna-redis-vector-db
eded17420782 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18090->80/tcp, [::]:18090->80/tcp chatqna-tei-embedding-service
```
The example uses the Nano text editor. You can use any convenient text editor:
if used TGI with FaqGen:
#### If you use vLLM based application
```bash
nano set_env_vllm.sh
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eaf24161aca8 opea/nginx:latest "/docker-entrypoint.…" 37 seconds ago Up 5 seconds 0.0.0.0:18104->80/tcp, [::]:18104->80/tcp chatqna-nginx-server
2fce48a4c0f4 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 seconds ago Up 5 seconds 0.0.0.0:18101->5173/tcp, [::]:18101->5173/tcp chatqna-ui-server
613c384979f4 opea/chatqna:latest "bash entrypoint.sh" 37 seconds ago Up 5 seconds 0.0.0.0:18102->8888/tcp, [::]:18102->8888/tcp chatqna-backend-server
e0ef1ea67640 opea/llm-faqgen:latest "bash entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:18011->9000/tcp, [::]:18011->9000/tcp chatqna-llm-faqgen
05512bd29fee opea/dataprep:latest "sh -c 'python $( [ …" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:18103->5000/tcp, [::]:18103->5000/tcp chatqna-dataprep-service
49844d339d1d opea/retriever:latest "python opea_retriev…" 37 seconds ago Up 36 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp chatqna-retriever
75b698fe7de0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18808->80/tcp, [::]:18808->80/tcp chatqna-tei-reranking-service
342f01bfdbb2 ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"python3 /workspace/…" 37 seconds ago Up 36 seconds 0.0.0.0:18008->8011/tcp, [::]:18008->8011/tcp chatqna-tgi-service
6081eb1c119d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp chatqna-redis-vector-db
eded17420782 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18090->80/tcp, [::]:18090->80/tcp chatqna-tei-embedding-service
```
#### If you use vLLM based application with FaqGen
if used vLLM:
```bash
nano set_env_vllm_faqgen.sh
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eaf24161aca8 opea/nginx:latest "/docker-entrypoint.…" 37 seconds ago Up 5 seconds 0.0.0.0:18104->80/tcp, [::]:18104->80/tcp chatqna-nginx-server
2fce48a4c0f4 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 seconds ago Up 5 seconds 0.0.0.0:18101->5173/tcp, [::]:18101->5173/tcp chatqna-ui-server
613c384979f4 opea/chatqna:latest "bash entrypoint.sh" 37 seconds ago Up 5 seconds 0.0.0.0:18102->8888/tcp, [::]:18102->8888/tcp chatqna-backend-server
05512bd29fee opea/dataprep:latest "sh -c 'python $( [ …" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:18103->5000/tcp, [::]:18103->5000/tcp chatqna-dataprep-service
49844d339d1d opea/retriever:latest "python opea_retriev…" 37 seconds ago Up 36 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp chatqna-retriever
75b698fe7de0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18808->80/tcp, [::]:18808->80/tcp chatqna-tei-reranking-service
342f01bfdbb2 opea/vllm-rocm:latest "python3 /workspace/…" 37 seconds ago Up 36 seconds 0.0.0.0:18008->8011/tcp, [::]:18008->8011/tcp chatqna-vllm-service
6081eb1c119d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp chatqna-redis-vector-db
eded17420782 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18090->80/tcp, [::]:18090->80/tcp chatqna-tei-embedding-service
```
#### If you use TGI based application
if used vLLM with FaqGen:
```bash
nano set_env.sh
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eaf24161aca8 opea/nginx:latest "/docker-entrypoint.…" 37 seconds ago Up 5 seconds 0.0.0.0:18104->80/tcp, [::]:18104->80/tcp chatqna-nginx-server
2fce48a4c0f4 opea/chatqna-ui:latest "docker-entrypoint.s…" 37 seconds ago Up 5 seconds 0.0.0.0:18101->5173/tcp, [::]:18101->5173/tcp chatqna-ui-server
613c384979f4 opea/chatqna:latest "bash entrypoint.sh" 37 seconds ago Up 5 seconds 0.0.0.0:18102->8888/tcp, [::]:18102->8888/tcp chatqna-backend-server
e0ef1ea67640 opea/llm-faqgen:latest "bash entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:18011->9000/tcp, [::]:18011->9000/tcp chatqna-llm-faqgen
05512bd29fee opea/dataprep:latest "sh -c 'python $( [ …" 37 seconds ago Up 36 seconds (healthy) 0.0.0.0:18103->5000/tcp, [::]:18103->5000/tcp chatqna-dataprep-service
49844d339d1d opea/retriever:latest "python opea_retriev…" 37 seconds ago Up 36 seconds 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp chatqna-retriever
75b698fe7de0 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18808->80/tcp, [::]:18808->80/tcp chatqna-tei-reranking-service
342f01bfdbb2 opea/vllm-rocm:latest "python3 /workspace/…" 37 seconds ago Up 36 seconds 0.0.0.0:18008->8011/tcp, [::]:18008->8011/tcp chatqna-vllm-service
6081eb1c119d redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 37 seconds ago Up 36 seconds 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp chatqna-redis-vector-db
eded17420782 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 37 seconds ago Up 36 seconds 0.0.0.0:18090->80/tcp, [::]:18090->80/tcp chatqna-tei-embedding-service
```
#### If you use TGI based application with FaqGen
If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section.
```bash
nano set_env_faqgen.sh
```
### Validate the Pipeline
If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
```
Set the values of the variables:
- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world.
If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address.
If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address.
If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located.
We set these values in the file set_env\*\*\*\*.sh
- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services.
The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use.
#### Set variables with script set_env\*\*\*\*.sh
#### If you use vLLM based application
```bash
. set_env_vllm.sh
```
#### If you use vLLM based application with FaqGen
```bash
. set_env_faqgen_vllm.sh
```
#### If you use TGI based application
```bash
. set_env.sh
```
#### If you use TGI based application with FaqGen
```bash
. set_env_faqgen.sh
```
### Start the services:
#### If you use vLLM based application
```bash
docker compose -f compose_vllm.yaml up -d
```
#### If you use vLLM based application with FaqGen
```bash
docker compose -f compose_faqgen_vllm.yaml up -d
```
#### If you use TGI based application
```bash
docker compose -f compose.yaml up -d
```
#### If you use TGI based application with FaqGen
```bash
docker compose -f compose_faqgen.yaml up -d
```
All containers should be running and should not restart:
##### If you use vLLM based application:
- chatqna-redis-vector-db
- chatqna-dataprep-service
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-vllm-service
- chatqna-backend-server
- chatqna-ui-server
- chatqna-nginx-server
##### If you use vLLM based application with FaqGen:
- chatqna-redis-vector-db
- chatqna-dataprep-service
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-vllm-service
- chatqna-llm-faqgen
- chatqna-backend-server
- chatqna-ui-server
- chatqna-nginx-server
##### If you use TGI based application:
- chatqna-redis-vector-db
- chatqna-dataprep-service
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-tgi-service
- chatqna-backend-server
- chatqna-ui-server
- chaqna-nginx-server
##### If you use TGI based application with FaqGen:
- chatqna-redis-vector-db
- chatqna-dataprep-service
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-tgi-service
- chatqna-llm-faqgen
- chatqna-backend-server
- chatqna-ui-server
- chaqna-nginx-server
---
## Validate the Services
### 1. Validate TEI Embedding Service
```bash
curl http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to text:
```textmate
[[0.00037115702,-0.06356819,0.0024758505,..................,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]
```
If the service response has a meaningful response in the value,
then we consider the TEI Embedding Service to be successfully launched
### 2. Validate Retriever Microservice
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${HOST_IP}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval \
-X POST \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{ "id": "e191846168aed1f80b2ea12df80844d2", "retrieved_docs": [], "initial_query": "test", "top_n": 1, "metadata": [] }
```
If the response corresponds to the form of the provided JSON, then we consider the
Retriever Microservice verification successful.
### 3. Validate TEI Reranking Service
```bash
curl http://${HOST_IP}:${CHATQNA_TEI_RERANKING_PORT}/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
[
{ "index": 1, "score": 0.94238955 },
{ "index": 0, "score": 0.120219156 }
]
```
If the response corresponds to the form of the provided JSON, then we consider the TEI Reranking Service
verification successful.
### 4. Validate the vLLM/TGI Service
#### If you use vLLM:
```bash
DATA='{"model": "meta-llama/Meta-Llama-3-8B-Instruct", '\
'"messages": [{"role": "user", "content": "What is a Deep Learning?"}], "max_tokens": 64}'
curl http://${HOST_IP}:${CHATQNA_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "chatcmpl-91003647d1c7469a89e399958f390f67",
"object": "chat.completion",
"created": 1742877228,
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Deep Learning ( DL) is a subfield of Machine Learning (ML) that focuses on the design of algorithms and architectures inspired by the structure and function of the human brain. These algorithms are designed to analyze and interpret data that is presented in the form of patterns or signals, and they often mimic the way the human brain",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": { "prompt_tokens": 16, "total_tokens": 80, "completion_tokens": 64, "prompt_tokens_details": null },
"prompt_logprobs": null
}
```
If the service response has a meaningful response in the value of the "choices.message.content" key,
then we consider the vLLM service to be successfully launched
#### If you use TGI:
```bash
DATA='{"inputs":"What is a Deep Learning?",'\
'"parameters":{"max_new_tokens":64,"do_sample": true}}'
curl http://${HOST_IP}:${CHATQNA_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"generated_text": " What is its application in Computer Vision?\nWhat is a Deep Learning?\nDeep learning is a subfield of machine learning that involves the use of artificial neural networks to model high-level abstractions in data. It involves the use of deep neural networks, which are composed of multiple layers, to learn complex patterns in data. The"
}
```
If the service response has a meaningful response in the value of the "generated_text" key,
then we consider the TGI service to be successfully launched
### 5. Validate the LLM Service (if your used application with FaqGen)
```bash
DATA='{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source '\
'text embeddings and sequence classification models. TEI enables high-performance extraction for the most '\
'popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128}'
curl http://${HOST_IP}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
Checking the response from the service. The response should be similar to JSON:
```json
{
"id": "58f0632f5f03af31471b895b0d0d397b",
"text": " Q: What is Text Embeddings Inference (TEI)?\n A: TEI is a toolkit for deploying and serving open source text embeddings and sequence classification models.\n\n Q: What models does TEI support?\n A: TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.\n\n Q: What is the purpose of TEI?\n A: The purpose of TEI is to enable high-performance extraction for text embeddings and sequence classification models.\n\n Q: What are the benefits of using TEI?\n A: The benefits of using TEI include high",
"prompt": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
}
```
If the service response has a meaningful response in the value of the "text" key,
then we consider the LLM service to be successfully launched
### 6. Validate the MegaService
Once the ChatQnA services are running, test the pipeline using the following command:
```bash
curl http://${HOST_IP}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna \
@@ -531,91 +229,105 @@ curl http://${HOST_IP}:${CHATQNA_BACKEND_SERVICE_PORT}/v1/chatqna \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
Checking the response from the service. The response should be similar to text:
**Note** : Access the ChatQnA UI by web browser through this URL: `http://${HOST_IP_EXTERNAL}:${CHATQNA_NGINX_PORT}`
```textmate
data: b' What'
data: b' is'
data: b' the'
data: b' revenue'
data: b' of'
data: b' Nike'
data: b' in'
data: b' '
data: b'202'
data: b'3'
data: b'?\n'
data: b' '
data: b' Answer'
data: b':'
data: b' According'
data: b' to'
data: b' the'
data: b' search'
data: b' results'
data: b','
data: b' the'
data: b' revenue'
data: b' of'
data: b''
### Cleanup the Deployment
data: [DONE]
```
If the output lines in the "data" keys contain words (tokens) containing meaning, then the service
is considered launched successfully.
### 7. Validate the Frontend (UI)
To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${CHATQNA_NGINX_PORT}
A page should open when you click through to this address:
![UI start page](../../../../assets/img/ui-starting-page.png)
If a page of this type has opened, then we believe that the service is running and responding,
and we can proceed to functional UI testing.
Let's enter the task for the service in the "Enter prompt here" field.
For example, "What is a Deep Learning?" and press Enter.
After that, a page with the result of the task should open:
#### If used application without FaqGen
![UI result page](../../../../assets/img/ui-result-page.png)
#### If used application with FaqGen
![UI result page](../../../../assets/img/ui-result-page-faqgen.png)
If the result shown on the page is correct, then we consider the verification of the UI service to be successful.
### 5. Stop application
#### If you use vLLM
To stop the containers associated with the deployment, execute the following command:
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
```
#### If you use vLLM with FaqGen
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_faqgen_vllm.yaml down
```
#### If you use TGI
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
# if used TGI
docker compose -f compose.yaml down
# if used TGI with FaqGen
# docker compose -f compose_faqgen.yaml down
# if used vLLM
# docker compose -f compose_vllm.yaml down
# if used vLLM with FaqGen
# docker compose -f compose_faqgen_vllm.yaml down
```
#### If you use TGI with FaqGen
## ChatQnA Docker Compose Files
```bash
cd ~/chatqna-install/GenAIExamples/ChatQnA/docker_compose/amd/gpu/rocm
docker compose -f compose_faqgen.yaml down
```
In the context of deploying an ChatQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
| [compose.yaml](./compose.yaml) | The LLM serving framework is TGI. Default compose file using TGI as serving framework and redis as vector database |
| [compose_faqgen.yaml](./compose_faqgen.yaml) | The LLM serving framework is TGI with FaqGen. All other configurations remain the same as the default |
| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM serving framework is vLLM. Compose file using vllm as serving framework and redis as vector database |
| [compose_faqgen_vllm.yaml](./compose_faqgen_vllm.yaml) | The LLM serving framework is vLLM with FaqGen. Compose file using vllm as serving framework and redis as vector database |
## Validate MicroServices
1. TEI Embedding Service
```bash
curl http://${HOST_IP}:${CHATQNA_TEI_EMBEDDING_PORT}/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```
2. Retriever Microservice
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${HOST_IP}:${CHATQNA_REDIS_RETRIEVER_PORT}/v1/retrieval \
-X POST \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```
3. TEI Reranking Service
```bash
curl http://${HOST_IP}:${CHATQNA_TEI_RERANKING_PORT}/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
4. vLLM/TGI Service
If you use vLLM:
```bash
DATA='{"model": "meta-llama/Meta-Llama-3-8B-Instruct", '\
'"messages": [{"role": "user", "content": "What is a Deep Learning?"}], "max_tokens": 64}'
curl http://${HOST_IP}:${CHATQNA_VLLM_SERVICE_PORT}/v1/chat/completions \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
If you use TGI:
```bash
DATA='{"inputs":"What is a Deep Learning?",'\
'"parameters":{"max_new_tokens":64,"do_sample": true}}'
curl http://${HOST_IP}:${CHATQNA_TGI_SERVICE_PORT}/generate \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
5. LLM Service (if your used application with FaqGen)
```bash
DATA='{"messages":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source '\
'text embeddings and sequence classification models. TEI enables high-performance extraction for the most '\
'popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens": 128}'
curl http://${HOST_IP}:${CHATQNA_LLM_FAQGEN_PORT}/v1/faqgen \
-X POST \
-d "$DATA" \
-H 'Content-Type: application/json'
```
## Conclusion
This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -25,9 +25,15 @@ services:
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
chatqna-tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: chatqna-tei-embedding-service
ports:
- "${CHATQNA_TEI_EMBEDDING_PORT}:80"
@@ -62,7 +68,7 @@ services:
restart: unless-stopped
chatqna-tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: chatqna-tei-reranking-service
ports:
- "${CHATQNA_TEI_RERANKING_PORT}:80"
@@ -109,11 +115,18 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-backend-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-tgi-service
chatqna-redis-vector-db:
condition: service_started
chatqna-tei-embedding-service:
condition: service_started
chatqna-retriever:
condition: service_started
chatqna-tei-reranking-service:
condition: service_started
chatqna-tgi-service:
condition: service_started
chatqna-dataprep-service:
condition: service_healthy
ports:
- "${CHATQNA_BACKEND_SERVICE_PORT:-8888}:8888"
environment:
@@ -152,7 +165,7 @@ services:
chatqna-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-nginx-server
container_name: chatqna-nginx-server
depends_on:
- chatqna-backend-server
- chatqna-ui-server

View File

@@ -25,9 +25,15 @@ services:
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
chatqna-tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: chatqna-tei-embedding-service
ports:
- "${CHATQNA_TEI_EMBEDDING_PORT}:80"
@@ -62,7 +68,7 @@ services:
restart: unless-stopped
chatqna-tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
container_name: chatqna-tei-reranking-service
ports:
- "${CHATQNA_TEI_RERANKING_PORT}:80"
@@ -128,12 +134,20 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-backend-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-tgi-service
- chatqna-llm-faqgen
chatqna-redis-vector-db:
condition: service_started
chatqna-tei-embedding-service:
condition: service_started
chatqna-retriever:
condition: service_started
chatqna-tei-reranking-service:
condition: service_started
chatqna-tgi-service:
condition: service_started
chatqna-llm-faqgen:
condition: service_started
chatqna-dataprep-service:
condition: service_healthy
ports:
- "${CHATQNA_BACKEND_SERVICE_PORT}:8888"
environment:
@@ -173,7 +187,7 @@ services:
chatqna-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-nginx-server
container_name: chatqna-nginx-server
depends_on:
- chatqna-backend-server
- chatqna-ui-server

View File

@@ -25,6 +25,12 @@ services:
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
chatqna-tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
@@ -133,12 +139,20 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-backend-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-vllm-service
- chatqna-llm-faqgen
chatqna-redis-vector-db:
condition: service_started
chatqna-tei-embedding-service:
condition: service_started
chatqna-retriever:
condition: service_started
chatqna-tei-reranking-service:
condition: service_started
chatqna-vllm-service:
condition: service_started
chatqna-llm-faqgen:
condition: service_started
chatqna-dataprep-redis-service:
condition: service_healthy
ports:
- "${CHATQNA_BACKEND_SERVICE_PORT}:8888"
environment:
@@ -178,7 +192,7 @@ services:
chatqna-nginx-server:
image: ${REGISTRY:-opea}/nginx:${TAG:-latest}
container_name: chaqna-nginx-server
container_name: chatqna-nginx-server
depends_on:
- chatqna-backend-server
- chatqna-ui-server

View File

@@ -25,6 +25,12 @@ services:
INDEX_NAME: ${CHATQNA_INDEX_NAME}
TEI_ENDPOINT: ${CHATQNA_TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${CHATQNA_HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
chatqna-tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
@@ -111,11 +117,18 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-backend-server
depends_on:
- chatqna-redis-vector-db
- chatqna-tei-embedding-service
- chatqna-retriever
- chatqna-tei-reranking-service
- chatqna-vllm-service
chatqna-redis-vector-db:
condition: service_started
chatqna-tei-embedding-service:
condition: service_started
chatqna-retriever:
condition: service_started
chatqna-tei-reranking-service:
condition: service_started
chatqna-vllm-service:
condition: service_started
chatqna-dataprep-service:
condition: service_healthy
ports:
- "${CHATQNA_BACKEND_SERVICE_PORT}:8888"
environment:

View File

@@ -25,8 +25,14 @@ services:
INDEX_NAME: ${INDEX_NAME}
TEI_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
@@ -59,7 +65,7 @@ services:
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS"
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"
@@ -92,11 +98,16 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-aipc-backend-server
depends_on:
- redis-vector-db
- dataprep-redis-service
- tei-embedding-service
- retriever
- tei-reranking-service
redis-vector-db:
condition: service_started
dataprep-redis-service:
condition: service_healthy
tei-embedding-service:
condition: service_started
retriever:
condition: service_started
tei-reranking-service:
condition: service_started
ports:
- "8888:8888"
environment:

View File

@@ -1,56 +1,63 @@
# Build Mega Service of ChatQnA on Xeon
# Deploying ChatQnA on Intel® Xeon® Processors
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`,`llm` and `faqgen`.
This document outlines the single node deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server. The steps include pulling Docker images, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank` and `llm`.
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component. It also provides options of not using rerank in the pipeline and using TGI backend for LLM microservice, please refer to [start-all-the-services-docker-containers](#start-all-the-services-docker-containers) section in this page. Besides, refer to [Build with Pinecone VectorDB](./README_pinecone.md) and [Build with Qdrant VectorDB](./README_qdrant.md) for other deployment variants.
## Table of contents
Quick Start:
1. [ChatQnA Quick Start Deployment](#chatqna-quick-start-Deployment)
2. [ChatQnA Docker Compose file Options](#chatqna-docker-compose-files)
3. [ChatQnA with Conversational UI](#chatqna-with-conversational-ui-optional)
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
## ChatQnA Quick Start Deployment
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
This section describes how to quickly deploy and test the ChatQnA service manually on an Intel® Xeon® processor. The basic steps are:
## Quick Start: 1.Setup Environment Variable
1. [Access the Code](#access-the-code)
2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
3. [Configure the Deployment Environment](#configure-the-deployment-environment)
4. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose)
5. [Check the Deployment Status](#check-the-deployment-status)
6. [Test the Pipeline](#test-the-pipeline)
7. [Cleanup the Deployment](#cleanup-the-deployment)
To set up environment variables for deploying ChatQnA services, follow these steps:
### Access the Code
1. Set the required environment variables:
Clone the GenAIExample repository and access the ChatQnA Intel® Gaudi® platform Docker Compose files and supporting scripts:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```
```
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
Checkout a released version, such as v1.2:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen
```
```
git checkout v1.2
```
3. Set up other environment variables:
### Generate a HuggingFace Access Token
```bash
source ./set_env.sh
```
Some HuggingFace resources, such as some models, are only accessible if the developer have an access token. In the absence of a HuggingFace access token, the developer can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
4. Change Model for LLM serving
### Configure the Deployment Environment
By default, Meta-Llama-3-8B-Instruct is used for LLM serving, the default model can be changed to other validated LLM models.
Please pick a [validated llm models](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/src/text-generation#validated-llm-models) from the table.
To change the default model defined in set_env.sh, overwrite it by exporting LLM_MODEL_ID to the new model or by modifying set_env.sh, and then repeat step 3.
For example, change to Llama-2-7b-chat-hf using the following command.
To set up environment variables for deploying ChatQnA services, set up some parameters specific to the deployment environment and source the _setup_env.sh_ script in this directory:
```bash
export LLM_MODEL_ID="meta-llama/Llama-2-7b-chat-hf"
```
```
export host_ip="External_Public_IP" #ip address of the node
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
export http_proxy="Your_HTTP_Proxy" #http proxy if any
export https_proxy="Your_HTTPs_Proxy" #https proxy if any
export no_proxy=localhost,127.0.0.1,$host_ip #additional no proxies if needed
export no_proxy=$no_proxy,chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,llm-faqgen
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
Consult the section on [ChatQnA Service configuration](#chatqna-configuration) for information on how service specific configuration parameters affect deployments.
### Deploy the Services Using Docker Compose
To deploy the ChatQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute the command below. It uses the 'compose.yaml' file.
```bash
docker compose up -d
@@ -66,22 +73,54 @@ CPU example with Open Telemetry feature:
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
```
It will automatically download the docker image on `docker hub`:
**Note**: developers should build docker image from source when:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
- Unable to download the docker image.
- Use a specific version of Docker image.
Please refer to the table below to build different microservices from source:
| Microservice | Deployment Guide |
| ------------ | --------------------------------------------------------------------------------------------- |
| Dataprep | https://github.com/opea-project/GenAIComps/tree/main/comps/dataprep |
| Embedding | https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings |
| Retriever | https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers |
| Reranker | https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings |
| LLM | https://github.com/opea-project/GenAIComps/tree/main/comps/llms |
| Megaservice | [Megaservice build guide](../../../../README_miscellaneous.md#build-megaservice-docker-image) |
| UI | [Basic UI build guide](../../../../README_miscellaneous.md#build-ui-docker-image) |
### Check the Deployment Status
After running docker compose, check if all the containers launched via docker compose have started:
```
docker ps -a
```
NB: You should build docker image from source by yourself if:
For the default deployment, the following 10 containers should have started:
- You are developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
- You can't download the docker image.
- You want to use a specific version of Docker image.
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3b5fa9a722da opea/chatqna-ui:${RELEASE_VERSION} "docker-entrypoint.s…" 32 hours ago Up 2 hours 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server
d3b37f3d1faa opea/chatqna:${RELEASE_VERSION} "python chatqna.py" 32 hours ago Up 2 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server
b3e1388fa2ca opea/reranking-tei:${RELEASE_VERSION} "python reranking_te…" 32 hours ago Up 2 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server
24a240f8ad1c opea/retriever-redis:${RELEASE_VERSION} "python retriever_re…" 32 hours ago Up 2 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
9c0d2a2553e8 opea/embedding-tei:${RELEASE_VERSION} "python embedding_te…" 32 hours ago Up 2 hours 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server
24cae0db1a70 opea/llm-vllm:${RELEASE_VERSION} "bash entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-vllm-server
ea3986c3cf82 opea/dataprep-redis:${RELEASE_VERSION} "python prepare_doc_…" 32 hours ago Up 2 hours 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server
e10dd14497a8 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 32 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
b98fa07a4f5c opea/vllm:${RELEASE_VERSION} "python3 -m vllm.ent…" 32 hours ago Up 2 hours 0.0.0.0:9009->80/tcp, :::9009->80/tcp vllm-service
79276cf45a47 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:6006->80/tcp, :::6006->80/tcp tei-embedding-server
4943e5f6cd80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 "text-embeddings-rou…" 32 hours ago Up 2 hours 0.0.0.0:8808->80/tcp, :::8808->80/tcp
```
Please refer to ['Build Docker Images'](#🚀-build-docker-images) in below.
If any issues are encountered during deployment, refer to the [troubleshooting](../../../../README_miscellaneous.md##troubleshooting) section.
## QuickStart: 3.Consume the ChatQnA Service
### Test the Pipeline
Once the ChatQnA services are running, test the pipeline using the following command. This will send a sample query to the ChatQnA service and return a response.
```bash
curl http://${host_ip}:8888/v1/chatqna \
@@ -91,225 +130,78 @@ curl http://${host_ip}:8888/v1/chatqna \
}'
```
## 🚀 Apply Xeon Server on AWS
**Note** : Access the ChatQnA UI by web browser through this URL: `http://${host_ip}:80`. Please confirm the `80` port is opened in the firewall. To validate each microservice used in the pipeline refer to the [Validate microservices](#validate-microservices) section.
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors that are optimized for demanding workloads.
### Cleanup the Deployment
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
To stop the containers associated with the deployment, execute the following command:
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
### Network Port & Security
- Access the ChatQnA UI by web browser
It supports to access by `80` port. Please confirm the `80` port is opened in the firewall of EC2 instance.
- Access the microservice by tool or API
1. Login to the EC2 instance and access by **local IP address** and port.
It's recommended and do nothing of the network port setting.
2. Login to a remote client and access by **public IP address** and port.
You need to open the port of the microservice in the security group setting of firewall of EC2 instance setting.
For detailed guide, please refer to [Validate Microservices](#validate-microservices).
Note, it will increase the risk of security, so please confirm before do it.
## 🚀 Build Docker Images
First of all, you need to build Docker Images locally and install the python package of it.
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
```
docker compose -f compose.yaml down
```
### 1. Build Retriever Image
## ChatQnA Docker Compose Files
```bash
docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile .
In the context of deploying a ChatQnA pipeline on an Intel® Xeon® platform, we can pick and choose different vector databases, large language model serving frameworks, and remove pieces of the pipeline such as the reranker. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
| File | Description |
| ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database |
| [compose_milvus.yaml](./compose_milvus.yaml) | Uses Milvus as the vector database. All other configurations remain the same as the default |
| [compose_pinecone.yaml](./compose_pinecone.yaml) | Uses Pinecone as the vector database. All other configurations remain the same as the default. For more details, refer to [README_pinecone.md](./README_pinecone.md). |
| [compose_qdrant.yaml](./compose_qdrant.yaml) | Uses Qdrant as the vector database. All other configurations remain the same as the default. For more details, refer to [README_qdrant.md](./README_qdrant.md). |
| [compose_tgi.yaml](./compose_tgi.yaml) | Uses TGI as the LLM serving framework. All other configurations remain the same as the default |
| [compose_without_rerank.yaml](./compose_without_rerank.yaml) | Default configuration without the reranker |
| [compose_faqgen.yaml](./compose_faqgen.yaml) | Enables FAQ generation using vLLM as the LLM serving framework. For more details, refer to [README_faqgen.md](./README_faqgen.md). |
| [compose_faqgen_tgi.yaml](./compose_faqgen_tgi.yaml) | Enables FAQ generation using TGI as the LLM serving framework. For more details, refer to [README_faqgen.md](./README_faqgen.md). |
| [compose.telemetry.yaml](./compose.telemetry.yaml) | Helper file for telemetry features for vllm. Can be used along with any compose files that serves vllm |
| [compose_tgi.telemetry.yaml](./compose_tgi.telemetry.yaml) | Helper file for telemetry features for tgi. Can be used along with any compose files that serves tgi |
## ChatQnA with Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose` file used to deploy. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below:
```yaml
chatqna-xeon-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
container_name: chatqna-xeon-conversation-ui-server
environment:
- APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT}
ports:
- "5174:80"
depends_on:
- chaqna-xeon-backend-server
ipc: host
restart: always
```
### 2. Build Dataprep Image
Once the services are up, open the following URL in the browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If the developer prefers to use a different host port to access the frontend, it can be modified by port mapping in the `compose.yaml` file as shown below:
```bash
docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
cd ..
```yaml
chaqna-gaudi-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
...
ports:
- "80:80"
```
### 3. Build FaqGen LLM Image (Optional)
Here is an example of running ChatQnA (default UI):
If you want to enable FAQ generation LLM in the pipeline, please use the below command:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile .
```
Here is an example of running ChatQnA with Conversational UI (React):
### 4. Build MegaService Docker Image
To construct the Mega Service with Rerank, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `chatqna.py` Python script. Build MegaService Docker image via below command:
```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
### 5. Build UI Docker Image
Build frontend Docker image via below command:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
### 6. Build Conversational React UI Docker Image (Optional)
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
### 7. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .
```
Then run the command `docker images`, you will have the following 5 Docker Images:
1. `opea/dataprep:latest`
2. `opea/retriever:latest`
3. `opea/chatqna:latest`
4. `opea/chatqna-ui:latest`
5. `opea/nginx:latest`
If FaqGen related docker image is built, you will find one more image:
- `opea/llm-faqgen:latest`
## 🚀 Start Microservices
### Required Models
By default, the embedding, reranking and LLM models are set to a default value as listed below:
| Service | Model |
| --------- | ----------------------------------- |
| Embedding | BAAI/bge-base-en-v1.5 |
| Reranking | BAAI/bge-reranker-base |
| LLM | meta-llama/Meta-Llama-3-8B-Instruct |
Change the `xxx_MODEL_ID` below for your needs.
For users in China who are unable to download models directly from Huggingface, you can use [ModelScope](https://www.modelscope.cn/models) or a Huggingface mirror to download models. The vLLM/TGI can load the models either online or offline as described below:
1. Online
```bash
export HF_TOKEN=${your_hf_token}
export HF_ENDPOINT="https://hf-mirror.com"
model_name="meta-llama/Meta-Llama-3-8B-Instruct"
# Start vLLM LLM Service
docker run -p 8008:80 -v ./data:/root/.cache/huggingface/hub --name vllm-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 128g opea/vllm:latest --model $model_name --host 0.0.0.0 --port 80
# Start TGI LLM Service
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id $model_name
```
2. Offline
- Search your model name in ModelScope. For example, check [this page](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/files) for model `Meta-Llama-3-8B-Instruct`.
- Click on `Download this model` button, and choose one way to download the model to your local path `/path/to/model`.
- Run the following command to start the LLM service.
```bash
export HF_TOKEN=${your_hf_token}
export model_path="/path/to/model"
# Start vLLM LLM Service
docker run -p 8008:80 -v $model_path:/root/.cache/huggingface/hub --name vllm-service --shm-size 128g opea/vllm:latest --model /root/.cache/huggingface/hub --host 0.0.0.0 --port 80
# Start TGI LLM Service
docker run -p 8008:80 -v $model_path:/data --name tgi-service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id /data
```
### Setup Environment Variables
1. Set the required environment variables:
```bash
# Example: host_ip="192.168.1.1"
export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service
```
3. Set up other environment variables:
```bash
source ./set_env.sh
```
### Start all the services Docker Containers
> Before running the docker compose command, you need to be in the folder that has the docker compose yaml file
```bash
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
```
If use vLLM as the LLM serving backend.
```bash
# Start ChatQnA with Rerank Pipeline
docker compose -f compose.yaml up -d
# Start ChatQnA without Rerank Pipeline
docker compose -f compose_without_rerank.yaml up -d
# Start ChatQnA with Rerank Pipeline and Open Telemetry Tracing
docker compose -f compose.yaml -f compose.telemetry.yaml up -d
# Start ChatQnA with FaqGen Pipeline
docker compose -f compose_faqgen.yaml up -d
```
If use TGI as the LLM serving backend.
```bash
docker compose -f compose_tgi.yaml up -d
# Start ChatQnA with Open Telemetry Tracing
docker compose -f compose_tgi.yaml -f compose_tgi.telemetry.yaml up -d
# Start ChatQnA with FaqGen Pipeline
docker compose -f compose_faqgen_tgi.yaml up -d
```
![project-screenshot](../../../../assets/img/conversation_ui_response.png)
### Validate Microservices
Note, when verify the microservices by curl or API from remote client, please make sure the **ports** of the microservices are opened in the firewall of the cloud node.
Note, when verifying the microservices by curl or API from remote client, please make sure the **ports** of the microservices are opened in the firewall of the cloud node.
Follow the instructions to validate MicroServices.
For details on how to verify the correctness of the response, refer to [how-to-validate_service](../../hpu/gaudi/how_to_validate_service.md).
1. TEI Embedding Service
1. **TEI Embedding Service**
Send a test request to the TEI Embedding Service to ensure it is running correctly:
```bash
curl http://${host_ip}:6006/embed \
@@ -318,13 +210,15 @@ For details on how to verify the correctness of the response, refer to [how-to-v
-H 'Content-Type: application/json'
```
2. Retriever Microservice
If you receive a connection error, ensure that the service is running and the port 6006 is open in the firewall.
2. **Retriever Microservice**
To consume the retriever microservice, you need to generate a mock embedding vector by Python script. The length of embedding vector
is determined by the embedding model.
Here we use the model `EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"`, which vector size is 768.
Check the vector dimension of your embedding model, set `your_embedding` dimension equals to it.
Check the vector dimension of your embedding model, set `your_embedding` dimension equal to it.
```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
@@ -334,7 +228,11 @@ For details on how to verify the correctness of the response, refer to [how-to-v
-H 'Content-Type: application/json'
```
3. TEI Reranking Service
If the response indicates an invalid embedding vector, verify that the vector size matches the model's expected dimension.
3. **TEI Reranking Service**
To test the TEI Reranking Service, use the following `curl` command:
> Skip for ChatQnA without Rerank pipeline
@@ -345,7 +243,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v
-H 'Content-Type: application/json'
```
4. LLM backend Service
4. **LLM Backend Service**
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready.
@@ -375,16 +273,9 @@ For details on how to verify the correctness of the response, refer to [how-to-v
-H 'Content-Type: application/json'
```
5. FaqGen LLM Microservice (if enabled)
5. **MegaService**
```bash
curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \
-X POST \
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
-H 'Content-Type: application/json'
```
6. MegaService
Use the following `curl` command to test the MegaService:
```bash
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
@@ -392,7 +283,9 @@ curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \
}'
```
7. Nginx Service
6. **Nginx Service**
Use the following curl command to test the Nginx Service:
```bash
curl http://${host_ip}:${NGINX_PORT}/v1/chatqna \
@@ -400,84 +293,84 @@ curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
```
8. Dataprep MicroserviceOptional
7. **Dataprep Microservice(Optional) **
If you want to update the default knowledge base, you can use the following commands:
If you want to update the default knowledge base, you can use the following commands:
Update Knowledge Base via Local File [nke-10k-2023.pdf](https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf). Or
click [here](https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf) to download the file via any web browser.
Or run this command to get the file on a terminal.
Update Knowledge Base via Local File [nke-10k-2023.pdf](https://github.com/opea-project/GenAIComps/blob/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf). Or
click [here](https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf) to download the file via any web browser.
Or run this command to get the file on a terminal.
```bash
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf
```
```bash
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf
```
Upload:
Upload:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F "files=@./nke-10k-2023.pdf"
```
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment.
Add Knowledge Base via HTTP Links:
Add Knowledge Base via HTTP Links:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/ingest" \
-H "Content-Type: multipart/form-data" \
-F 'link_list=["https://opea.dev"]'
```
This command updates a knowledge base by submitting a list of HTTP links for processing.
This command updates a knowledge base by submitting a list of HTTP links for processing.
Also, you are able to get the file list that you uploaded:
Also, you are able to get the file list that you uploaded:
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get" \
-H "Content-Type: application/json"
```
```bash
curl -X POST "http://${host_ip}:6007/v1/dataprep/get" \
-H "Content-Type: application/json"
```
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
Then you will get the response JSON like this. Notice that the returned `name`/`id` of the uploaded link is `https://xxx.txt`.
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
```json
[
{
"name": "nke-10k-2023.pdf",
"id": "nke-10k-2023.pdf",
"type": "File",
"parent": ""
},
{
"name": "https://opea.dev.txt",
"id": "https://opea.dev.txt",
"type": "File",
"parent": ""
}
]
```
To delete the file/link you uploaded:
To delete the file/link you uploaded:
The `file_path` here should be the `id` get from `/v1/dataprep/get` API.
The `file_path` here should be the `id` get from `/v1/dataprep/get` API.
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
```bash
# delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "https://opea.dev.txt"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete file
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "nke-10k-2023.pdf"}' \
-H "Content-Type: application/json"
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
### Profile Microservices
@@ -509,7 +402,7 @@ After vLLM profiling is started, users could start asking questions and get resp
##### Stop vLLM profiling
By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
By following command, users could stop vLLM profiling and generate a \*.pt.trace.json.gz file as profiling result
under /mnt folder in vllm-service docker instance.
```bash
@@ -539,59 +432,6 @@ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then lo
to see the vLLM profiling result as below diagram.
![image](https://github.com/user-attachments/assets/55c7097e-5574-41dc-97a7-5e87c31bc286)
## 🚀 Launch the UI
## Conclusion
### Launch with origin port
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-gaudi-ui-server:
image: opea/chatqna-ui:latest
...
ports:
- "80:5173"
```
### Launch with Nginx
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below:
```yaml
chaqna-xeon-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
container_name: chatqna-xeon-conversation-ui-server
environment:
- APP_BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT}
- APP_DATA_PREP_SERVICE_URL=${DATAPREP_SERVICE_ENDPOINT}
ports:
- "5174:80"
depends_on:
- chaqna-xeon-backend-server
ipc: host
restart: always
```
Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-gaudi-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
...
ports:
- "80:80"
```
![project-screenshot](../../../../assets/img/chat_ui_init.png)
Here is an example of running ChatQnA:
![project-screenshot](../../../../assets/img/chat_ui_response.png)
Here is an example of running ChatQnA with Conversational UI (React):
![project-screenshot](../../../../assets/img/conversation_ui_response.png)
This guide should enable developer to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment.

View File

@@ -0,0 +1,227 @@
# Deploying FAQ Generation on Intel® Xeon® Processors
In today's data-driven world, organizations across various industries face the challenge of managing and understanding vast amounts of information. Legal documents, contracts, regulations, and customer inquiries often contain critical insights buried within dense text. Extracting and presenting these insights in a concise and accessible format is crucial for decision-making, compliance, and customer satisfaction.
Our FAQ Generation Application leverages the power of large language models (LLMs) to revolutionize the way you interact with and comprehend complex textual data. By harnessing cutting-edge natural language processing techniques, our application can automatically generate comprehensive and natural-sounding frequently asked questions (FAQs) from your documents, legal texts, customer queries, and other sources. In this example use case, we utilize LangChain to implement FAQ Generation and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors.
The FaqGen example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example.
```mermaid
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 50px
---
flowchart LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style FaqGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph FaqGen-MegaService["FaqGen MegaService "]
direction LR
LLM([LLM MicroService]):::blue
end
subgraph UserInterface[" User Interface "]
direction LR
a([User Input Query]):::orchid
UI([UI server<br>]):::orchid
end
LLM_gen{{LLM Service <br>}}
GW([FaqGen GateWay<br>]):::orange
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> FaqGen-MegaService
%% Embedding service flow
direction LR
LLM <-.-> LLM_gen
```
---
## Table of Contents
1. [Build Docker Images](#build-docker-images)
2. [Validate Microservices](#validate-microservices)
3. [Launch the UI](#launch-the-ui)
4. [Launch the Conversational UI (Optional)](#launch-the-conversational-ui-optional)
---
## Build Docker Images
First of all, you need to build Docker Images locally. This step can be ignored once the Docker images are published to Docker hub.
### 1. Build vLLM Image
```bash
git clone https://github.com/vllm-project/vllm.git
cd ./vllm/
VLLM_VER="$(git describe --tags "$(git rev-list --tags --max-count=1)" )"
git checkout ${VLLM_VER}
docker build --no-cache --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile.cpu -t opea/vllm:latest --shm-size=128g .
```
### 2. Build LLM Image
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/llm-faqgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/faq-generation/Dockerfile .
```
### 3. Build MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `faqgen.py` Python script. Build the MegaService Docker image via below command:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/ChatQnA
docker build --no-cache -t opea/chatqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
### 4. Build UI Docker Image
Build frontend Docker image via below command:
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
### 5. Build Conversational React UI Docker Image (Optional)
Build frontend Docker image that enables Conversational experience with ChatQnA megaservice via below command:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
cd GenAIExamples/ChatQnA/ui
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
### 6. Build Nginx Docker Image
```bash
cd GenAIComps
docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile .
```
Then run the command `docker images`, you will have the following Docker Images:
1. `opea/vllm:latest`
2. `opea/llm-faqgen:latest`
3. `opea/chatqna:latest`
4. `opea/chatqna-ui:latest`
5. `opea/nginx:latest`
## Start Microservices and MegaService
### Required Models
We set default model as "meta-llama/Meta-Llama-3-8B-Instruct", change "LLM_MODEL_ID" in following Environment Variables setting if you want to use other models.
If use gated models, you also need to provide [huggingface token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable.
### Setup Environment Variables
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export host_ip=${your_host_ip}
export LLM_ENDPOINT_PORT=8008
export LLM_SERVICE_PORT=9000
export FAQGEN_BACKEND_PORT=8888
export FAQGen_COMPONENT_NAME="OpeaFaqGenvLLM"
export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/faqgen"
```
Note: Please replace with `your_host_ip` with your external IP address, do not use localhost.
### Start Microservice Docker Containers
```bash
cd GenAIExamples/FaqGen/docker_compose/intel/cpu/xeon
docker compose up -d
```
### Validate Microservices
1. vLLM Service
```bash
curl http://${host_ip}:${LLM_ENDPOINT_PORT}/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
```
2. LLM Microservice
```bash
curl http://${host_ip}:${LLM_SERVICE_PORT}/v1/faqgen \
-X POST \
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
-H 'Content-Type: application/json'
```
3. MegaService
```bash
curl http://${host_ip}:${FAQGEN_BACKEND_PORT}/v1/faqgen \
-H "Content-Type: multipart/form-data" \
-F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
-F "max_tokens=32" \
-F "stream=False"
```
```bash
## enable stream
curl http://${host_ip}:${FAQGEN_BACKEND_PORT}/v1/faqgen \
-H "Content-Type: multipart/form-data" \
-F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
-F "max_tokens=32" \
-F "stream=True"
```
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.
## Launch the UI
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-gaudi-ui-server:
image: opea/chatqna-ui:latest
...
ports:
- "80:5173"
```
## Launch the Conversational UI (Optional)
To access the Conversational UI frontend, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```yaml
chaqna-xeon-conversation-ui-server:
image: opea/chatqna-conversation-ui:latest
...
ports:
- "80:80"
```

View File

@@ -1,18 +1,22 @@
# Build Mega Service of ChatQnA on Xeon
# Deploying ChatQnA with Pinecone on Intel® Xeon® Processors
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel® Xeon® servers. The pipeline integrates **Pinecone** as the vector database (VectorDB) and includes microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component.
---
Quick Start:
## Table of Contents
1. Set up the environment variables.
2. Run Docker Compose.
3. Consume the ChatQnA Service.
1. [Quick Start](#quick-start)
2. [Build Docker Images](#build-docker-images)
3. [Validate Microservices](#validate-microservices)
4. [Launch the UI](#launch-the-ui)
5. [Launch the Conversational UI (Optional)](#launch-the-conversational-ui-optional)
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
---
## Quick Start: 1.Setup Environment Variable
## Quick Start
### 1.Set up Environment Variable
To set up environment variables for deploying ChatQnA services, follow these steps:
@@ -31,8 +35,8 @@ To set up environment variables for deploying ChatQnA services, follow these ste
```bash
export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export https_proxy="Your_HTTPS_Proxy"
# Example: no_proxy="localhost,127.0.0.1,192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-xeon-ui-server,chatqna-xeon-backend-server,dataprep-pinecone-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service
```
@@ -41,28 +45,28 @@ To set up environment variables for deploying ChatQnA services, follow these ste
source ./set_env.sh
```
## Quick Start: 2.Run Docker Compose
### 2.Run Docker Compose
```bash
docker compose -f compose_pinecone.yaml up -d
```
It will automatically download the docker image on `docker hub`:
It will automatically download the Docker image on `Docker hub`:
```bash
docker pull opea/chatqna:latest
docker pull opea/chatqna-ui:latest
```
NB: You should build docker image from source by yourself if:
Note: You should build docker image from source by yourself if:
- You are developing off the git main branch (as the container's ports in the repo may be different from the published docker image).
- You can't download the docker image.
- You want to use a specific version of Docker image.
Please refer to ['Build Docker Images'](#🚀-build-docker-images) in below.
Please refer to ['Build Docker Images'](#build-docker-images) in below.
## QuickStart: 3.Consume the ChatQnA Service
### 3.Consume the ChatQnA Service
```bash
curl http://${host_ip}:8888/v1/chatqna \
@@ -72,35 +76,7 @@ curl http://${host_ip}:8888/v1/chatqna \
}'
```
## 🚀 Apply Xeon Server on AWS
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors that are optimized for demanding workloads.
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
### Network Port & Security
- Access the ChatQnA UI by web browser
It supports to access by `80` port. Please confirm the `80` port is opened in the firewall of EC2 instance.
- Access the microservice by tool or API
1. Login to the EC2 instance and access by **local IP address** and port.
It's recommended and do nothing of the network port setting.
2. Login to a remote client and access by **public IP address** and port.
You need to open the port of the microservice in the security group setting of firewall of EC2 instance setting.
For detailed guide, please refer to [Validate Microservices](#validate-microservices).
Note, it will increase the risk of security, so please confirm before do it.
## 🚀 Build Docker Images
## Build Docker Images
First of all, you need to build Docker Images locally and install the python package of it.
@@ -218,7 +194,7 @@ For users in China who are unable to download models directly from Huggingface,
docker run -p 8008:80 -v $model_path:/root/.cache/huggingface/hub --name vllm-service --shm-size 128g opea/vllm:latest --model /root/.cache/huggingface/hub --host 0.0.0.0 --port 80
```
### Setup Environment Variables
### Set up Environment Variables
1. Set the required environment variables:
@@ -263,7 +239,7 @@ If use vLLM backend.
docker compose -f compose_pinecone.yaml up -d
```
### Validate Microservices
## Validate Microservices
Note, when verify the microservices by curl or API from remote client, please make sure the **ports** of the microservices are opened in the firewall of the cloud node.
Follow the instructions to validate MicroServices.
@@ -383,12 +359,12 @@ To delete the files/link you uploaded:
```bash
# delete all uploaded files and links
curl -X POST "http://${host_ip}:6009/v1/dataprep/delete" \
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete" \
-d '{"file_path": "all"}' \
-H "Content-Type: application/json"
```
## 🚀 Launch the UI
## Launch the UI
### Launch with origin port
@@ -406,7 +382,7 @@ To access the frontend, open the following URL in your browser: http://{host_ip}
If you want to launch the UI using Nginx, open this URL: `http://${host_ip}:${NGINX_PORT}` in your browser to access the frontend.
## 🚀 Launch the Conversational UI (Optional)
## Launch the Conversational UI (Optional)
To access the Conversational UI (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `chaqna-xeon-ui-server` service with the `chatqna-xeon-conversation-ui-server` service as per the config below:

View File

@@ -1,71 +1,19 @@
# Build Mega Service of ChatQnA (with Qdrant) on Xeon
# Deploying ChatQnA with Qdrant on Intel® Xeon® Processors
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel® Xeon® servers. The pipeline integrates **Qdrant** as the vector database (VectorDB) and includes microservices such as `embedding`, `retriever`, `rerank`, and `llm`.
The default pipeline deploys with vLLM as the LLM serving component and leverages rerank component.
---
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
## Table of Contents
## 🚀 Apply Xeon Server on AWS
1. [Build Docker Images](#build-docker-images)
2. [Validate Microservices](#validate-microservices)
3. [Launch the UI](#launch-the-ui)
4. [Launch the Conversational UI (Optional)](#launch-the-conversational-ui-optional)
To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage the power of 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads.
---
For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options.
After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed.
**Certain ports in the EC2 instance need to opened up in the security group, for the microservices to work with the curl commands**
> See one example below. Please open up these ports in the EC2 instance based on the IP addresses you want to allow
```
qdrant-vector-db
===============
Port 6333 - Open to 0.0.0.0/0
Port 6334 - Open to 0.0.0.0/0
dataprep-qdrant-server
======================
Port 6043 - Open to 0.0.0.0/0
tei_embedding_service
=====================
Port 6040 - Open to 0.0.0.0/0
embedding
=========
Port 6044 - Open to 0.0.0.0/0
retriever
=========
Port 6045 - Open to 0.0.0.0/0
tei_reranking_service
================
Port 6041 - Open to 0.0.0.0/0
reranking
=========
Port 6046 - Open to 0.0.0.0/0
vllm-service
===========
Port 6042 - Open to 0.0.0.0/0
llm
===
Port 6047 - Open to 0.0.0.0/0
chaqna-xeon-backend-server
==========================
Port 8912 - Open to 0.0.0.0/0
chaqna-xeon-ui-server
=====================
Port 5173 - Open to 0.0.0.0/0
```
## 🚀 Build Docker Images
## Build Docker Images
First of all, you need to build Docker Images locally and install the python package of it.
@@ -137,7 +85,7 @@ Then run the command `docker images`, you will have the following 5 Docker Image
4. `opea/chatqna-ui:latest`
5. `opea/nginx:latest`
## 🚀 Start Microservices
## Start Microservices
### Required Models
@@ -292,7 +240,7 @@ For details on how to verify the correctness of the response, refer to [how-to-v
-F 'link_list=["https://opea.dev"]'
```
## 🚀 Launch the UI
## Launch the UI
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
@@ -304,7 +252,7 @@ To access the frontend, open the following URL in your browser: http://{host_ip}
- "80:5173"
```
## 🚀 Launch the Conversational UI (react)
## Launch the Conversational UI (Optional)
To access the Conversational UI frontend, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:

View File

@@ -32,8 +32,14 @@ services:
INDEX_NAME: ${INDEX_NAME}
TEI_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
@@ -66,7 +72,7 @@ services:
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS"
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"
@@ -96,6 +102,7 @@ services:
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LLM_MODEL_ID: ${LLM_MODEL_ID}
VLLM_TORCH_PROFILER_DIR: "/mnt"
VLLM_CPU_KVCACHE_SPACE: 40
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:9009/health || exit 1"]
interval: 10s
@@ -106,11 +113,18 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-xeon-backend-server
depends_on:
- redis-vector-db
- tei-embedding-service
- retriever
- tei-reranking-service
- vllm-service
redis-vector-db:
condition: service_started
dataprep-redis-service:
condition: service_healthy
tei-embedding-service:
condition: service_started
retriever:
condition: service_started
tei-reranking-service:
condition: service_started
vllm-service:
condition: service_healthy
ports:
- "8888:8888"
environment:
@@ -124,7 +138,7 @@ services:
- RERANK_SERVER_HOST_IP=tei-reranking-service
- RERANK_SERVER_PORT=${RERANK_SERVER_PORT:-80}
- LLM_SERVER_HOST_IP=vllm-service
- LLM_SERVER_PORT=${LLM_SERVER_PORT:-80}
- LLM_SERVER_PORT=80
- LLM_MODEL=${LLM_MODEL_ID}
- LOGFLAG=${LOGFLAG}
ipc: host

View File

@@ -25,8 +25,14 @@ services:
INDEX_NAME: ${INDEX_NAME}
TEI_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
@@ -59,7 +65,7 @@ services:
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS"
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"
@@ -121,12 +127,20 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-xeon-backend-server
depends_on:
- redis-vector-db
- tei-embedding-service
- retriever
- tei-reranking-service
- vllm-service
- llm-faqgen
redis-vector-db:
condition: service_started
tei-embedding-service:
condition: service_started
retriever:
condition: service_started
tei-reranking-service:
condition: service_started
vllm-service:
condition: service_started
llm-faqgen:
condition: service_started
dataprep-redis-service:
condition: service_healthy
ports:
- ${CHATQNA_BACKEND_PORT:-8888}:8888
environment:

View File

@@ -25,8 +25,14 @@ services:
INDEX_NAME: ${INDEX_NAME}
TEI_ENDPOINT: http://tei-embedding-service:80
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5000/v1/health_check || exit 1"]
interval: 10s
timeout: 5s
retries: 50
restart: unless-stopped
tei-embedding-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-server
ports:
- "6006:80"
@@ -59,7 +65,7 @@ services:
RETRIEVER_COMPONENT_NAME: "OPEA_RETRIEVER_REDIS"
restart: unless-stopped
tei-reranking-service:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.6
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-reranking-server
ports:
- "8808:80"
@@ -121,12 +127,20 @@ services:
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
container_name: chatqna-xeon-backend-server
depends_on:
- redis-vector-db
- tei-embedding-service
- retriever
- tei-reranking-service
- tgi-service
- llm-faqgen
redis-vector-db:
condition: service_started
tei-embedding-service:
condition: service_started
retriever:
condition: service_started
tei-reranking-service:
condition: service_started
tgi-service:
condition: service_started
llm-faqgen:
condition: service_started
dataprep-redis-service:
condition: service_healthy
ports:
- ${CHATQNA_BACKEND_PORT:-8888}:8888
environment:

Some files were not shown because too many files have changed in this diff Show More