* Pass down model id for ChatQnA
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update logic
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* update gateway
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
* update the gateway
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
* update the gateway
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix history content from agent memory.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Embedding TEI Langchain compatible with OpenAI API
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* TextDoc support list
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* support tei llama index openai compatible API
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* support mosec langchain openai compatible API
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* update UT for embedding tests
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix ut bug
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* support embedding predictionguard openai compatible API
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* support embedding multimodal clip OpenAI compatible API
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* fix bug
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* enable debug mode for embedding UT
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
* Drop dump_outputs() method that obfuscates the code
dump_outputs() method in ServiceOrchestrator:
* Is not real method (does not use self)
* Adds a member to a dict instead of "dump"ing (drop or output) something
* Obfuscates how schedule() method return value is constructed, and
* Makes calling code unnecessary longer
Similar method in "ServiceOrchestratorWithYaml" is reasonable except
for the name, but drop also that for consistency.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Apply pylint simplification suggestion to execute()
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
---------
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: Sihan Chen <39623753+Spycsh@users.noreply.github.com>
* Multiple models support for langchain vLLM text-generation
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Add authentication support for langchain vLLM text-generation remote endpoints
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Update gateway and docarray from mega and proto services to have model field for ChatQnAGateway and LLMParams respectively
* Add load_model_configs method in utils.py to validate and load the model_configs
* Update llms text-generation tgi file (llm.py) to support multiple models. Uses load_model_configs method from utils
* Update llms text-generation tgi template to add different templates for different models
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fixed llm_endpoint empty string issue on error scenario
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Function to get llm_endpoint and keep the code clean
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add model parameter for DocSumGateway in gateway.py file
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Add langchain vllm support for DocSum along with authentication support for vllm endpoints
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Updated docker_compose_llm.yaml and README file with vLLM information
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Updated docsum-vllm Dockerfile into llm-compose-cd.yaml under github workflows
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Updated llm-compose.yaml file to include vllm sumarization docker build
Signed-off-by: sgurunat <gurunath.s@intel.com>
---------
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
* Add model parameter for FaqGenGateway in gateway.py file
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Add langchain vllm support for FaqGen along with authentication support for vllm endpoints
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Updated docker_compose_llm.yaml and README file with vLLM information
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Updated faq-vllm Dockerfile into llm-compose-cd.yaml under github workflows
Signed-off-by: sgurunat <gurunath.s@intel.com>
* Updated llm-compose.yaml file to include vllm faqgen build
Signed-off-by: sgurunat <gurunath.s@intel.com>
---------
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add "megaservice_request_pending" metric
Unlike other megaservice ServiceOrchestrator metrics, this covers (can
cover) also non-streaming requests, as suggested in PR review.
This does not have issues Prometheus-fastapi-instrumentator
"inprogress" metric did:
* Extra instances which have to be differentiated e.g. for CI
* Rely on name -> suffix coming through obscure kwargs calls
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Remove HTTP "inprogress" gauge as redundant
Now that ServiceOrchestrator provides pending metric.
Reverts the "inprogress" metric part of commit a6998a1dbd.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Document megaservice metrics
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* draft a demo code for memory.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add agent short-term memory with langgraph checkpoint.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add save long-term memory func.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add save long-term memory func.
* add timeout for llm response.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix ut with adding -e HABANA_VISIBLE_DEVICES=all.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add model parameter for CodeGenGateway in gateway.py file
Signed-off-by: sgurunat <gurunath.s@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: sgurunat <gurunath.s@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* added dpo support.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* make dpo trainer compatible with newest transformers.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* added ut for dpo.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* added training successfulness check in finetuning ut.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* updated broken link.
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
---------
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
* Fixed the issue of asynchronous call failure for MosecEmbeddings
Signed-off-by: Yao, Qing <qing.yao@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add import asyncio
Signed-off-by: Yao, Qing <qing.yao@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Yao, Qing <qing.yao@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add vllm Arc Dockerfile support
Support vllm inference on Intel ARC GPU
Signed-off-by: Li Gang <gang.g.li@intel.com>
Co-authored-by: Chen, Hu1 <hu1.chen@intel.com>
* Add vLLM ARC support
With vLLM official repo: https://github.com/vllm-project/vllm/
based on openvino backend
Dockerfile is based on Dockerfile.openvino
https://github.com/vllm-project/vllm/blob/main/Dockerfile.openvino
And add ARC support packages
Default mode: meta-llama/Llama-3.2-3B-Instruct to fit ARC A770 VRAM
Signed-off-by: Li Gang <gang.g.li@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add README and .github workflow for vLLM ARC support
Signed-off-by: Li Gang <gang.g.li@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update comps/llms/text-generation/vllm/langchain/README.md
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Rename Dockerfile to meet Contribution Guidelines
Signed-off-by: Li Gang <gang.g.li@intel.com>
* Align image names as opea/vllm-arc:latest
Signed-off-by: Li Gang <gang.g.li@intel.com>
---------
Signed-off-by: Li Gang <gang.g.li@intel.com>
Co-authored-by: Chen, Hu1 <hu1.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com>
* support faqgen upload file in UI
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* draft static batching embedding/reranking on single gaudi card
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* resolve segfault, deadlock and other issues
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* narrow down default timeout
* add doockerfile
* fix hpu local microservice start
* openai format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* configurable timeout
* lower timeout
* fix
* lower default timeout
* bf16
* log, pad max_len
* autocast, 128
* fix acc issue
* perf fallback with no acc drop
* revert no-padding ones
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix hpu graph wrapper
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add padding batch
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* habana 1.18
* static -> dynamic
* add UT, add param in_single_process
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add docker file
* fix case doc empty, and pass model id from env
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* CI
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
* Support convert mega.yaml to docker compose yaml.
* Remove device option in opea mege exporter.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* updated manifests exporter
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* updated manifests_exporter.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* updated mega.yaml & updated manifests_exporter
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* done
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* cleancode
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* cleancode and refactor to function
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* added UT for manifests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix for UT.
* fixed the UT issue.
* merged to one file.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zhenzhong1 <zhenzhong.xu@intel.com>
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix typos in BaseStatistics method names
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Add HttpService "inprogress" (pending) request count metrics
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Add E2E Prometheus metrics to ServiceOrchestrator
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Fix: support metrics with multiple ServiceOrchestrator instances
Unlike apps, CI tests create multiple of them.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Fix: require named MicroService -> HTTPService instances
Creating multiple MicroService()s creates multiple HTTPService()s
which creates multiple Prometheus fastapi instrumentor instances.
While latter handled that fine for ChatQnA and normal HTTP metrics,
that was not the case for its "inprogress" metrics in CI.
Therefore MicroService constructor name argument is now mandatory, so
that it can be used to make "inprogress" metrics for HTTPService
instances unique.
PS. instrumentor requires HTTPService instance specific Starlette
instance, so it cannot be made singleton.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Fix: update test_token_generator()
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>