Update README with new examples (#808)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
lvliang-intel
2024-09-14 09:41:51 +08:00
committed by GitHub
parent b84c98983d
commit 2d28bebac6
3 changed files with 62 additions and 16 deletions

View File

@@ -37,17 +37,18 @@ Deployment are based on released docker images by default, check [docker image l
#### Deploy Examples
| Use Case | Docker Compose<br/>Deployment on Xeon | Docker Compose<br/>Deployment on Gaudi | Kubernetes with Manifests | Kubernetes with Helm Charts | Kubernetes with GMC |
| ----------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| ChatQnA | [Xeon Instructions](ChatQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](ChatQnA/docker_compose/intel/hpu/gaudi/README.md) | [ChatQnA with Manifests](ChatQnA/kubernetes/intel/README.md) | [ChatQnA with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/chatqna/README.md) | [ChatQnA with GMC](ChatQnA/kubernetes/intel/README_gmc.md) |
| CodeGen | [Xeon Instructions](CodeGen/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](CodeGen/docker_compose/intel/hpu/gaudi/README.md) | [CodeGen with Manifests](CodeGen/kubernetes/intel/README.md) | [CodeGen with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/codegen/README.md) | [CodeGen with GMC](CodeGen/kubernetes/intel/README_gmc.md) |
| CodeTrans | [Xeon Instructions](CodeTrans/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](CodeTrans/docker_compose/intel/hpu/gaudi/README.md) | [CodeTrans with Manifests](CodeTrans/kubernetes/intel/README.md) | [CodeTrans with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/codetrans/README.md) | [CodeTrans with GMC](CodeTrans/kubernetes/intel/README_gmc.md) |
| DocSum | [Xeon Instructions](DocSum/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](DocSum/docker_compose/intel/hpu/gaudi/README.md) | [DocSum with Manifests](DocSum/kubernetes/intel/README.md) | [DocSum with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/docsum/README.md) | [DocSum with GMC](DocSum/kubernetes/intel/README_gmc.md) |
| SearchQnA | [Xeon Instructions](SearchQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](SearchQnA/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [SearchQnA with GMC](SearchQnA/kubernetes/intel/README_gmc.md) |
| FaqGen | [Xeon Instructions](FaqGen/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](FaqGen/docker_compose/intel/hpu/gaudi/README.md) | [FaqGen with Manifests](FaqGen/kubernetes/intel/README.md) | Not Supported | [FaqGen with GMC](FaqGen/kubernetes/intel/README_gmc.md) |
| Translation | [Xeon Instructions](Translation/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](Translation/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [Translation with GMC](Translation/kubernetes/intel/README_gmc.md) |
| AudioQnA | [Xeon Instructions](AudioQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](AudioQnA/docker_compose/intel/hpu/gaudi/README.md) | [AudioQnA with Manifests](AudioQnA/kubernetes/intel/README.md) | Not Supported | [AudioQnA with GMC](AudioQnA/kubernetes/intel/README_gmc.md) |
| VisualQnA | [Xeon Instructions](VisualQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](VisualQnA/docker_compose/intel/hpu/gaudi/README.md) | [VisualQnA with Manifests](VisualQnA/kubernetes/intel/README.md) | Not Supported | [VisualQnA with GMC](VisualQnA/kubernetes/intel/README_gmc.md) |
| Use Case | Docker Compose<br/>Deployment on Xeon | Docker Compose<br/>Deployment on Gaudi | Kubernetes with Manifests | Kubernetes with Helm Charts | Kubernetes with GMC |
| ----------------- | ------------------------------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| ChatQnA | [Xeon Instructions](ChatQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](ChatQnA/docker_compose/intel/hpu/gaudi/README.md) | [ChatQnA with Manifests](ChatQnA/kubernetes/intel/README.md) | [ChatQnA with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/chatqna/README.md) | [ChatQnA with GMC](ChatQnA/kubernetes/intel/README_gmc.md) |
| CodeGen | [Xeon Instructions](CodeGen/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](CodeGen/docker_compose/intel/hpu/gaudi/README.md) | [CodeGen with Manifests](CodeGen/kubernetes/intel/README.md) | [CodeGen with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/codegen/README.md) | [CodeGen with GMC](CodeGen/kubernetes/intel/README_gmc.md) |
| CodeTrans | [Xeon Instructions](CodeTrans/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](CodeTrans/docker_compose/intel/hpu/gaudi/README.md) | [CodeTrans with Manifests](CodeTrans/kubernetes/intel/README.md) | [CodeTrans with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/codetrans/README.md) | [CodeTrans with GMC](CodeTrans/kubernetes/intel/README_gmc.md) |
| DocSum | [Xeon Instructions](DocSum/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](DocSum/docker_compose/intel/hpu/gaudi/README.md) | [DocSum with Manifests](DocSum/kubernetes/intel/README.md) | [DocSum with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/docsum/README.md) | [DocSum with GMC](DocSum/kubernetes/intel/README_gmc.md) |
| SearchQnA | [Xeon Instructions](SearchQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](SearchQnA/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [SearchQnA with GMC](SearchQnA/kubernetes/intel/README_gmc.md) |
| FaqGen | [Xeon Instructions](FaqGen/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](FaqGen/docker_compose/intel/hpu/gaudi/README.md) | [FaqGen with Manifests](FaqGen/kubernetes/intel/README.md) | Not Supported | [FaqGen with GMC](FaqGen/kubernetes/intel/README_gmc.md) |
| Translation | [Xeon Instructions](Translation/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](Translation/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [Translation with GMC](Translation/kubernetes/intel/README_gmc.md) |
| AudioQnA | [Xeon Instructions](AudioQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](AudioQnA/docker_compose/intel/hpu/gaudi/README.md) | [AudioQnA with Manifests](AudioQnA/kubernetes/intel/README.md) | Not Supported | [AudioQnA with GMC](AudioQnA/kubernetes/intel/README_gmc.md) |
| VisualQnA | [Xeon Instructions](VisualQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](VisualQnA/docker_compose/intel/hpu/gaudi/README.md) | [VisualQnA with Manifests](VisualQnA/kubernetes/intel/README.md) | Not Supported | [VisualQnA with GMC](VisualQnA/kubernetes/intel/README_gmc.md) |
| ProductivitySuite | [Xeon Instructions](ProductivitySuite/docker_compose/intel/cpu/xeon/README.md) | Not Supported | [ProductivitySuite with Manifests](ProductivitySuite/kubernetes/intel/README.md) | Not Supported | Not Supported |
## Supported Examples

View File

@@ -1,4 +1,4 @@
# Build Mega Service of videoqna on Xeon
# Build Mega Service of VideoQnA on Xeon
This document outlines the deployment process for a videoqna application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `lvm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.

View File

@@ -60,8 +60,53 @@ This document introduces the supported examples of GenAIExamples. The supported
[VisualQnA](./VisualQnA/README.md) is an example of chatbot for question and answering based on the images.
| LLM | HW | Description |
| --------------------------------------------------------------- | ------ | ----------- |
| [LLaVA-1.5-7B](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | Gaudi2 | Chatbot |
| LVM | HW | Description |
| --------------------------------------------------------------------------------------------- | ------ | ----------- |
| [llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) | Gaudi2 | Chatbot |
> **_NOTE:_** The `Language Translation`, `SearchQnA`, `VisualQnA` and other use cases not listing here are in active development. The code structure of these use cases are subject to change.
### VideoQnA
[VideoQnA](./VideoQnA/README.md) is an example of chatbot for question and answering based on the videos. It retrieves video based on provided user prompt. It uses only the video embeddings to perform vector similarity search in Intel's VDMS vector database and performs all operations on Intel Xeon CPU. The pipeline supports long form videos and time-based search.
By default, the embedding and LVM models are set to a default value as listed below:
| Service | Model | HW | Description |
| --------- | ----------------------------------------------------------------------------------- | ---- | ------------------------ |
| Embedding | [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) | Xeon | Video embeddings service |
| LVM | [DAMO-NLP-SG/Video-LLaMA](https://huggingface.co/DAMO-NLP-SG/VideoLLaMA2-7B) | Xeon | LVM service |
### RerankFinetuning
Rerank model finetuning example is for training rerank model on a dataset for improving its capability on specific field.
By default, the base model is set to a default value as listed below:
| Service | Base Model | HW | Description |
| ----------------- | ------------------------------------------------------------------------- | ---- | ------------------------------- |
| Rerank Finetuning | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Xeon | Rerank model finetuning service |
### InstructionTuning
The Instruction Tuning example is designed to further train large language models (LLMs) on a dataset consisting of (instruction, output) pairs using supervised learning. This process bridges the gap between the LLM's original objective of next-word prediction and the users objective of having the model follow human instructions accurately. By leveraging Instruction Tuning, this example enhances the LLM's ability to better understand and execute specific tasks, improving the model's alignment with user instructions and its overall performance.
By default, the base model is set to a default value as listed below:
| Service | Base Model | HW | Description |
| ----------------- | ------------------------------------------------------------------------------------- | ---------- | ------------------------------------ |
| InstructionTuning | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | Xeon/Gaudi | LLM model Instruction Tuning service |
### DocIndexRetriever
The DocRetriever example demonstrates how to match user queries with free-text records using various retrieval methods. It plays a key role in Retrieval-Augmented Generation (RAG) systems by dynamically fetching relevant information from external sources, ensuring responses are factual and up-to-date. Powered by vector databases, DocRetriever enables efficient, semantic retrieval by storing data as vectors and quickly identifying the most relevant documents based on similarity.
| Framework | Embedding | Vector Database | Serving | HW | Description |
| ------------------------------------------------------------------------------ | --------------------------------------------------- | -------------------------- | --------------------------------------------------------------- | ----------- | -------------------------- |
| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Document Retrieval Service |
### AgentQnA
The AgentQnA example demonstrates a hierarchical, multi-agent system designed for question-answering tasks. A supervisor agent interacts directly with the user, delegating tasks to a worker agent and utilizing various tools to gather information and generate answers. The worker agent primarily uses a retrieval tool to respond to the supervisor's queries. Additionally, the supervisor can access other tools, such as APIs to query knowledge graphs, SQL databases, or external knowledge bases, to enhance the accuracy and relevance of its responses.
Worker agent uses open-source websearch tool (duckduckgo), agents use OpenAI GPT-4o-mini as llm backend.
> **_NOTE:_** This example is in active development. The code structure of these use cases are subject to change.