Commit Graph

414 Commits

Author SHA1 Message Date
ZePan110
799881a3fa Remove perf test code from test scripts. (#1510)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-02-18 16:23:49 +08:00
xiguiw
0c0edffc5b update vLLM CPU to the latest stable version (#1546)
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-02-17 08:26:25 +08:00
Kendall González León
80dd86f122 Make a fix in the main README.md of the ChatQnA. (#1551)
Signed-off-by: Kendall González León <kendall.gonzalez.leon@intel.com>
2025-02-14 17:00:44 +08:00
Louie Tsai
970b869838 Add a new section to change LLM model such as deepseek based on validated model table in LLM microservice (#1501)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Wang, Kai Lawrence <109344418+wangkl2@users.noreply.github.com>
Co-authored-by: xiguiw <111278656+xiguiw@users.noreply.github.com>
2025-02-12 09:34:56 +08:00
XinyaoWa
87ff149f61 Remove vllm hpu triton version fix (#1515)
vllm-fork has fix triton version issue, remove duplicated code https://github.com/HabanaAI/vllm-fork/blob/habana_main/requirements-hpu.txt

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
2025-02-12 09:24:38 +08:00
chen, suyue
81b02bb947 Revert "HUGGINGFACEHUB_API_TOKEN environment is change to HF_TOKEN (#… (#1521)
Revert this PR since the test is not triggered properly due to the false merge of a WIP CI PR, 44a689b0bf, which block the CI test.

This change will be submitted in another PR.
2025-02-11 18:36:12 +08:00
Louie Tsai
47069ac70c fix a test script issue due to name change for telemetry yaml files (#1516)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-02-11 17:58:42 +08:00
Louie Tsai
ad5523bac7 Enable OpenTelemtry Tracing for ChatQnA on Xeon and Gaudi by docker compose merge feature (#1488)
Signed-off-by: Louie, Tsai <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-02-10 22:58:50 -08:00
xiguiw
45d5da2ddd HUGGINGFACEHUB_API_TOKEN environment is change to HF_TOKEN (#1503)
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
2025-02-09 20:33:06 +08:00
Louie Tsai
4c41a5db83 Update README.md for OPEA OTLP tracing (#1406)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
2025-02-05 13:03:15 -08:00
Liang Lv
9adf7a6af0 Add support for latest deepseek models on Gaudi (#1491)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-02-05 08:30:04 +08:00
bjzhjing
ed163087ba Provide unified scalable deployment and benchmarking support for exam… (#1315)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-01-24 22:27:49 +08:00
chen, suyue
259099d19f Remove kubernetes manifest related code and tests (#1466)
Remove deprecated kubernetes manifest related code and tests.
k8s implementation for those examples based on helm charts will target for next release.

Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-24 15:23:12 +08:00
chen, suyue
9a1118730b Freeze the triton version in vllm-gaudi image to 3.1.0 (#1463)
The new triton version 3.2.0 can't work with vllm-gaudi. Freeze the triton version in vllm-gaudi image to 3.1.0.

Issue create for vllm-fork: HabanaAI/vllm-fork#732
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-24 09:50:59 +08:00
dolpher
9b0f98be8b Update ChatQnA helm chart README. (#1459)
Signed-off-by: Dolpher Du <dolpher.du@intel.com>
2025-01-23 10:54:39 +08:00
Ervin Castelino
27fdbcab58 [chore/chatqna] Missing protocol in curl command (#1447)
This PR fixes the missing protocol for in the curl command mentioned in chatqna readme for tei-embedding-service.
2025-01-22 21:41:47 +08:00
dolpher
ee0e5cc8d9 Sync value files from GenAIInfra (#1428)
All gaudi values updated with extra flags.
Added helm support for 2 new examples Text2Image and SearchQnA. Minor fix for llm-uservice.

Signed-off-by: Dolpher Du <dolpher.du@intel.com>
2025-01-22 17:44:11 +08:00
WenjiaoYue
b721c256f9 Fix Domain Access Issue in Latest Vite Version (#1444)
Fix the restriction on using domain names when users are using the latest version of Vite

When users use the new version of Vite, the UI cannot be accessed via domain names due to Vite's new rules. This fix adds the corresponding parameters according to Vite's new rules, ensuring that users can access the frontend via domain names when building the UI.

Fixes #1441

Co-authored-by: WenjiaoYue <wenjiao.yue@intel.com>
2025-01-21 23:28:37 +08:00
chen, suyue
927698e23e Simplify git clone code in CI test (#1434)
1. Simplify git clone code in CI test.
2. Replace git clone branch in Dockerfile.

Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-21 23:00:08 +08:00
Wang, Kai Lawrence
284db982be [ROCm] Fix the hf-token setting for TGI and TEI in ChatQnA (#1432)
This PR is to correct the env variable names in chatqna example on ROCm platform passing to the docker container of TGI and TEI. For tgi, either HF_TOKEN and HUGGING_FACE_HUB_TOKEN could be parsed in TGI while HF_API_TOKEN can be parsed in TEI.

TGI: https://github.com/huggingface/text-generation-inference/blob/main/router/src/server.rs#L1700C1-L1702C15
TEI: https://github.com/huggingface/text-embeddings-inference/blob/main/router/src/main.rs#L112

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2025-01-21 14:22:39 +08:00
Wang, Kai Lawrence
3d3ac59bfb [ChatQnA] Update the default LLM to llama3-8B on cpu/gpu/hpu (#1430)
Update the default LLM to llama3-8B on cpu/nvgpu/amdgpu/gaudi for docker-compose deployment to avoid the potential model serving issue or the missing chat-template issue using neural-chat-7b.

Slow serving issue of neural-chat-7b on ICX: #1420
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2025-01-20 22:47:56 +08:00
chen, suyue
7a54064d65 remove Dockerfile.wrapper (#1429)
Remove Dockerfile.wrapper, it's not used anymore and no test cover this Dockerfile. So remove this Dockerfile to avoid regression.

Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-20 20:49:18 +08:00
Liang Lv
0f7e5a37ac Adapt code for dataprep microservice refactor (#1408)
https://github.com/opea-project/GenAIComps/pull/1153

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-01-20 20:37:03 +08:00
xiguiw
2d5898244c Enchance health check in GenAIExample docker-compose (#1410)
Fix service launch issue

1. Update Gaudi TGI image from 2.0.6 to 2.3.1
2. Change the hpu-gaudi TGI health check condition.

Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
2025-01-20 20:13:13 +08:00
chen, suyue
6bfd156573 Clean up test scripts and enhance git clone (#1417)
1. Clean up test code in scripts.
2. Simplify git clone code.
3. Replace git clone branch in Dockerfile.

Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-20 16:34:28 +08:00
Wang, Kai Lawrence
742cb6ddd3 [ChatQnA] Switch to vLLM as default llm backend on Xeon (#1403)
Switching from TGI to vLLM as the default LLM serving backend on Xeon for the ChatQnA example to enhance the perf.

https://github.com/opea-project/GenAIExamples/issues/1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2025-01-17 20:48:19 +08:00
Wang, Kai Lawrence
00e9da9ced [ChatQnA] Switch to vLLM as default llm backend on Gaudi (#1404)
Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. 

https://github.com/opea-project/GenAIExamples/issues/1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
2025-01-17 20:46:38 +08:00
XinyaoWa
301b5e9a69 Fix vllm hpu to a stable release (#1398)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2025-01-16 16:35:32 +08:00
Letong Han
4cabd55778 Refactor Retrievers related Examples (#1387)
Delete redundant retrievers docker image in docker_images_list.md.
Refactor Retrievers related Examples READMEs.
Change all of the comps/retrievers/xxx/xxx/Dockerfile path into comps/retrievers/src/Dockerfile.

Fix the Examples CI issues of PR opea-project/GenAIComps#1138.
Signed-off-by: letonghan <letong.han@intel.com>
2025-01-16 14:21:48 +08:00
xiguiw
698a06edbf [DOC] Fix document issue (#1395)
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
2025-01-16 11:30:07 +08:00
Eero Tamminen
0eae391fda Use staged builds to minimize final image sizes (#1031)
Staged image builds so that final images do not have redundant things like:
- Git tool and its deps
- Git repo history
- Test directories

Fixes: #225
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2025-01-16 11:14:47 +08:00
XinyaoWa
7d218b9f36 Remove vllm hpu commit id limit (#1386)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-01-14 11:05:32 +08:00
Ying Hu
91ff520baa Update README.md for add K8S cluster link for Gaudi (#1380) 2025-01-13 09:33:58 +08:00
Liang Lv
3ca78867eb Update example code for embedding dependency moving to 3rd_party (#1368)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2025-01-10 15:36:58 +08:00
dolpher
c795ef2203 Add helm deployment instructions for GenAIExamples (#1373)
Add helm deployment instructions for ChatQnA, AgentQnA, AudioQnA, CodeTrans, DocSum, FaqGen and VisualQnA

Signed-off-by: Dolpher Du <dolpher.du@intel.com>
2025-01-10 09:55:31 +08:00
Louie Tsai
81022355a7 Enable OpenTelemetry Tracing for ChatQnA TGI serving on Gaudi (#1316)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-01-08 17:20:13 -08:00
Jaswanth Karani
ddacb7e86d fixed build issue (#1367) 2025-01-08 22:19:23 +08:00
Liang Lv
b3c405a5f6 Adapt example code for guardrails refactor (#1360)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-08 14:35:23 +08:00
chen, suyue
23117871c2 remove chatqna-conversation-ui build in CI test (#1361)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-08 12:09:33 +08:00
WenjiaoYue
9970605460 Adapt refactor comps (#1340)
Signed-off-by: WenjiaoYue
2025-01-08 10:36:24 +08:00
Pranav Singh
d2b49bbc82 [ChatQNA] Fix K8s Deployment for CPU/HPU (#1274)
Signed-off-by: Pranav Singh <pranav.singh@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-01-07 13:45:09 +08:00
ZePan110
ed2b8ed983 Exclude dockerfile under tests and exclude check Dockerfile under tests. (#1354)
Signed-off-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-01-07 09:05:01 +08:00
ZePan110
aa5c91d7ee Check duplicated dockerfile (#1289)
Signed-off-by: ZePan110 <ze.pan@intel.com>
2025-01-06 17:30:12 +08:00
XinyaoWa
464e2d3125 Rename streaming to stream to align with OpenAI API (#1332)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
2025-01-06 13:25:55 +08:00
chen, suyue
1f29eca288 fix chatqna benchmark without rerank config issue (#1341)
Signed-off-by: chensuyue <suyue.chen@intel.com>
2025-01-06 09:16:20 +08:00
chen, suyue
5c7a5bd850 Update Code and README for GenAIComps Refactor (#1285)
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: WenjiaoYue <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>
2025-01-02 20:03:26 +08:00
Ying Hu
597f17b979 Update set_env.sh to fix LOGFLAG warning (#1319) 2024-12-30 10:54:26 +08:00
Daniel De León
b27b48c488 Add microservice resources to no_proxy in the main ChatQnA README (#1269)
Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2024-12-27 16:14:28 +08:00
bjzhjing
7d9b34cf5e Chatqna/benchmark: Remove the deprecated directory (#1261)
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2024-12-19 10:51:01 +08:00
lkk
2af1ea0f8e remove examples gateway. (#1243)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-12-13 15:16:11 +08:00