CodGen Examples using-RAG-and-Agents (#1757)

Signed-off-by: Mustafa <mustafa.cetin@intel.com>
This commit is contained in:
Mustafa
2025-04-09 01:12:20 -07:00
committed by GitHub
parent 8b7cb3539e
commit 892624f539
18 changed files with 1524 additions and 239 deletions

View File

@@ -1,6 +1,6 @@
# Code Generation Application
Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions.
Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions. Additionally Retrieval-Augmented Generation (RAG) and Agents are parts of the CodeGen example which provide an additional layer of intelligence and adaptability, ensuring that the generated code is not only relevant but also accurate, efficient, and tailored to the specific needs of the developers and programmers.
The capabilities of CodeGen LLMs include:
@@ -28,7 +28,7 @@ config:
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 50px
fontSize: 25px
---
flowchart LR
%% Colors %%
@@ -37,34 +37,56 @@ flowchart LR
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style CodeGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph CodeGen-MegaService["CodeGen MegaService "]
subgraph CodeGen-MegaService["CodeGen-MegaService"]
direction LR
LLM([LLM MicroService]):::blue
EM([Embedding<br>MicroService]):::blue
RET([Retrieval<br>MicroService]):::blue
RER([Agents]):::blue
LLM([LLM<br>MicroService]):::blue
end
subgraph UserInterface[" User Interface "]
subgraph User Interface
direction LR
a([User Input Query]):::orchid
UI([UI server<br>]):::orchid
a([Submit Query Tab]):::orchid
UI([UI server]):::orchid
Ingest([Manage Resources]):::orchid
end
CLIP_EM{{Embedding<br>service}}
VDB{{Vector DB}}
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
GW([CodeGen GateWay]):::orange
LLM_gen{{LLM Service <br>}}
GW([CodeGen GateWay<br>]):::orange
%% Data Preparation flow
%% Ingest data flow
direction LR
Ingest[Ingest data] --> UI
UI --> DP
DP <-.-> CLIP_EM
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> CodeGen-MegaService
EM ==> RET
RET ==> RER
RER ==> LLM
%% Embedding service flow
direction LR
EM <-.-> CLIP_EM
RET <-.-> V_RET
LLM <-.-> LLM_gen
direction TB
%% Vector DB interaction
V_RET <-.->VDB
DP <-.->VDB
```
## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
@@ -95,11 +117,11 @@ Currently we support two ways of deploying ChatQnA services with docker compose:
By default, the LLM model is set to a default value as listed below:
| Service | Model |
| ------------ | --------------------------------------------------------------------------------------- |
| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) |
| ------------ | ----------------------------------------------------------------------------------------- |
| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model.
Change the `LLM_MODEL_ID` below for your needs, such as: [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model for m.
Change the `LLM_MODEL_ID` below for your needs, such as: [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct), [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
If you choose to use `meta-llama/CodeLlama-7b-hf` as LLM model, you will need to visit [here](https://huggingface.co/meta-llama/CodeLlama-7b-hf), click the `Expand to review and access` button to ask for model access.
@@ -134,22 +156,44 @@ To set up environment variables for deploying ChatQnA services, follow these ste
#### Deploy CodeGen on Gaudi
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
docker compose up -d
```
TGI service:
```bash
docker compose --profile codegen-gaudi-tgi up -d
```
vLLM service:
```bash
docker compose --profile codegen-gaudi-vllm up -d
```
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
#### Deploy CodeGen on Xeon
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
docker compose up -d
```
TGI service:
```bash
docker compose --profile codegen-xeon-tgi up -d
```
vLLM service:
```bash
docker compose --profile codegen-xeon-vllm up -d
```
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
@@ -170,6 +214,15 @@ Two ways of consuming CodeGen Service:
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
If the user wants a CodeGen service with RAG and Agents based on dedicated documentation.
```bash
curl http://localhost:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
2. Access via frontend
To access the frontend, open the following URL in your browser: http://{host_ip}:5173.

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

View File

@@ -1,10 +1,11 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import ast
import asyncio
import os
from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps import CustomLogger, MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
from comps.cores.mega.utils import handle_message
from comps.cores.proto.api_protocol import (
ChatCompletionRequest,
@@ -16,20 +17,98 @@ from comps.cores.proto.api_protocol import (
from comps.cores.proto.docarray import LLMParams
from fastapi import Request
from fastapi.responses import StreamingResponse
from langchain.prompts import PromptTemplate
logger = CustomLogger("opea_dataprep_microservice")
logflag = os.getenv("LOGFLAG", False)
MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 7778))
LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
RETRIEVAL_SERVICE_HOST_IP = os.getenv("RETRIEVAL_SERVICE_HOST_IP", "0.0.0.0")
REDIS_RETRIEVER_PORT = int(os.getenv("REDIS_RETRIEVER_PORT", 7000))
TEI_EMBEDDING_HOST_IP = os.getenv("TEI_EMBEDDING_HOST_IP", "0.0.0.0")
EMBEDDER_PORT = int(os.getenv("EMBEDDER_PORT", 6000))
grader_prompt = """You are a grader assessing relevance of a retrieved document to a user question. \n
Here is the user question: {question} \n
Here is the retrieved document: \n\n {document} \n\n
If the document contains keywords related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Rules:
- Do not return the question, the provided document or explanation.
- if this document is relevant to the question, return 'yes' otherwise return 'no'.
- Do not include any other details in your response.
"""
def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
"""Aligns the inputs based on the service type of the current node.
Parameters:
- self: Reference to the current instance of the class.
- inputs: Dictionary containing the inputs for the current node.
- cur_node: The current node in the service orchestrator.
- runtime_graph: The runtime graph of the service orchestrator.
- llm_parameters_dict: Dictionary containing the LLM parameters.
- kwargs: Additional keyword arguments.
Returns:
- inputs: The aligned inputs for the current node.
"""
# Check if the current service type is EMBEDDING
if self.services[cur_node].service_type == ServiceType.EMBEDDING:
# Store the input query for later use
self.input_query = inputs["query"]
# Set the input for the embedding service
inputs["input"] = inputs["query"]
# Check if the current service type is RETRIEVER
if self.services[cur_node].service_type == ServiceType.RETRIEVER:
# Extract the embedding from the inputs
embedding = inputs["data"][0]["embedding"]
# Align the inputs for the retriever service
inputs = {"index_name": llm_parameters_dict["index_name"], "text": self.input_query, "embedding": embedding}
return inputs
class CodeGenService:
def __init__(self, host="0.0.0.0", port=8000):
self.host = host
self.port = port
self.megaservice = ServiceOrchestrator()
ServiceOrchestrator.align_inputs = align_inputs
self.megaservice_llm = ServiceOrchestrator()
self.megaservice_retriever = ServiceOrchestrator()
self.megaservice_retriever_llm = ServiceOrchestrator()
self.endpoint = str(MegaServiceEndpoint.CODE_GEN)
def add_remote_service(self):
"""Adds remote microservices to the service orchestrators and defines the flow between them."""
# Define the embedding microservice
embedding = MicroService(
name="embedding",
host=TEI_EMBEDDING_HOST_IP,
port=EMBEDDER_PORT,
endpoint="/v1/embeddings",
use_remote_service=True,
service_type=ServiceType.EMBEDDING,
)
# Define the retriever microservice
retriever = MicroService(
name="retriever",
host=RETRIEVAL_SERVICE_HOST_IP,
port=REDIS_RETRIEVER_PORT,
endpoint="/v1/retrieval",
use_remote_service=True,
service_type=ServiceType.RETRIEVER,
)
# Define the LLM microservice
llm = MicroService(
name="llm",
host=LLM_SERVICE_HOST_IP,
@@ -38,13 +117,61 @@ class CodeGenService:
use_remote_service=True,
service_type=ServiceType.LLM,
)
self.megaservice.add(llm)
# Add the microservices to the megaservice_retriever_llm orchestrator and define the flow
self.megaservice_retriever_llm.add(embedding).add(retriever).add(llm)
self.megaservice_retriever_llm.flow_to(embedding, retriever)
self.megaservice_retriever_llm.flow_to(retriever, llm)
# Add the microservices to the megaservice_retriever orchestrator and define the flow
self.megaservice_retriever.add(embedding).add(retriever)
self.megaservice_retriever.flow_to(embedding, retriever)
# Add the LLM microservice to the megaservice_llm orchestrator
self.megaservice_llm.add(llm)
async def read_streaming_response(self, response: StreamingResponse):
"""Reads the streaming response from a StreamingResponse object.
Parameters:
- self: Reference to the current instance of the class.
- response: The StreamingResponse object to read from.
Returns:
- str: The complete response body as a decoded string.
"""
body = b"" # Initialize an empty byte string to accumulate the response chunks
async for chunk in response.body_iterator:
body += chunk # Append each chunk to the body
return body.decode("utf-8") # Decode the accumulated byte string to a regular string
async def handle_request(self, request: Request):
"""Handles the incoming request, processes it through the appropriate microservices,
and returns the response.
Parameters:
- self: Reference to the current instance of the class.
- request: The incoming request object.
Returns:
- ChatCompletionResponse: The response from the LLM microservice.
"""
# Parse the incoming request data
data = await request.json()
# Get the stream option from the request data, default to True if not provided
stream_opt = data.get("stream", True)
chat_request = ChatCompletionRequest.parse_obj(data)
# Validate and parse the chat request data
chat_request = ChatCompletionRequest.model_validate(data)
# Handle the chat messages to generate the prompt
prompt = handle_message(chat_request.messages)
# Get the agents flag from the request data, default to False if not provided
agents_flag = data.get("agents_flag", False)
# Define the LLM parameters
parameters = LLMParams(
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
top_k=chat_request.top_k if chat_request.top_k else 10,
@@ -54,18 +181,90 @@ class CodeGenService:
presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
stream=stream_opt,
index_name=chat_request.index_name,
)
result_dict, runtime_graph = await self.megaservice.schedule(
initial_inputs={"query": prompt}, llm_parameters=parameters
# Initialize the initial inputs with the generated prompt
initial_inputs = {"query": prompt}
# Check if the key index name is provided in the parameters
if parameters.index_name:
if agents_flag:
# Schedule the retriever microservice
result_ret, runtime_graph = await self.megaservice_retriever.schedule(
initial_inputs=initial_inputs, llm_parameters=parameters
)
# Switch to the LLM microservice
megaservice = self.megaservice_llm
relevant_docs = []
for doc in result_ret["retriever/MicroService"]["retrieved_docs"]:
# Create the PromptTemplate
prompt_agent = PromptTemplate(template=grader_prompt, input_variables=["question", "document"])
# Format the template with the input variables
formatted_prompt = prompt_agent.format(question=prompt, document=doc["text"])
initial_inputs_grader = {"query": formatted_prompt}
# Schedule the LLM microservice for grading
grade, runtime_graph = await self.megaservice_llm.schedule(
initial_inputs=initial_inputs_grader, llm_parameters=parameters
)
for node, response in grade.items():
if isinstance(response, StreamingResponse):
# Read the streaming response
grader_response = await self.read_streaming_response(response)
# Replace null with None
grader_response = grader_response.replace("null", "None")
# Split the response by "data:" and process each part
for i in grader_response.split("data:"):
if '"text":' in i:
# Convert the string to a dictionary
r = ast.literal_eval(i)
# Check if the response text is "yes"
if r["choices"][0]["text"] == "yes":
# Append the document to the relevant_docs list
relevant_docs.append(doc)
# Update the initial inputs with the relevant documents
if len(relevant_docs) > 0:
logger.info(f"[ CodeGenService - handle_request ] {len(relevant_docs)} relevant document\s found.")
query = initial_inputs["query"]
initial_inputs = {}
initial_inputs["retrieved_docs"] = relevant_docs
initial_inputs["initial_query"] = query
else:
logger.info(
"[ CodeGenService - handle_request ] Could not find any relevant documents. The query will be used as input to the LLM."
)
else:
# Use the combined retriever and LLM microservice
megaservice = self.megaservice_retriever_llm
else:
# Use the LLM microservice only
megaservice = self.megaservice_llm
# Schedule the final megaservice
result_dict, runtime_graph = await megaservice.schedule(
initial_inputs=initial_inputs, llm_parameters=parameters
)
for node, response in result_dict.items():
# Here it suppose the last microservice in the megaservice is LLM.
# Check if the last microservice in the megaservice is LLM
if (
isinstance(response, StreamingResponse)
and node == list(self.megaservice.services.keys())[-1]
and self.megaservice.services[node].service_type == ServiceType.LLM
and node == list(megaservice.services.keys())[-1]
and megaservice.services[node].service_type == ServiceType.LLM
):
return response
# Get the response from the last node in the runtime graph
last_node = runtime_graph.all_leaves()[-1]
response = result_dict[last_node]["text"]
choices = []

View File

@@ -13,28 +13,77 @@ After launching your instance, you can connect to it using SSH (for Linux instan
## 🚀 Start Microservices and MegaService
The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
```mermaid
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 25px
---
flowchart LR
subgraph CodeGen
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style CodeGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph CodeGen-MegaService["CodeGen-MegaService"]
direction LR
A[User] --> |Input query| B[CodeGen Gateway]
B --> |Invoke| Megaservice
subgraph Megaservice["Megaservice"]
direction TB
C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
EM([Embedding<br>MicroService]):::blue
RET([Retrieval<br>MicroService]):::blue
RER([Agents]):::blue
LLM([LLM<br>MicroService]):::blue
end
Megaservice --> |Output| E[Response]
subgraph User Interface
direction LR
a([Submit Query Tab]):::orchid
UI([UI server]):::orchid
Ingest([Manage Resources]):::orchid
end
subgraph Legend
CLIP_EM{{Embedding<br>service}}
VDB{{Vector DB}}
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
GW([CodeGen GateWay]):::orange
%% Data Preparation flow
%% Ingest data flow
direction LR
G([Microservice]) ==> H([Microservice])
I([Microservice]) -.-> J{{Server API}}
end
Ingest[Ingest data] --> UI
UI --> DP
DP <-.-> CLIP_EM
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> CodeGen-MegaService
EM ==> RET
RET ==> RER
RER ==> LLM
%% Embedding service flow
direction LR
EM <-.-> CLIP_EM
RET <-.-> V_RET
LLM <-.-> LLM_gen
direction TB
%% Vector DB interaction
V_RET <-.->VDB
DP <-.->VDB
```
### Setup Environment Variables
@@ -51,38 +100,105 @@ export host_ip=${your_ip_address}
export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
```
2. Set Netowork Proxy
2. Set Network Proxy
**If you access public network through proxy, set the network proxy, otherwise, skip this step**
```bash
export no_proxy=${your_no_proxy}
export no_proxy=${no_proxy},${host_ip}
export http_proxy=${your_http_proxy}
export https_proxy=${your_https_proxy}
```
### Start the Docker Containers for All Services
CodeGen support TGI service and vLLM service, you can choose start either one of them.
Start CodeGen based on TGI service:
Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
```
#### TGI service:
```bash
cd GenAIExamples/CodeGen/docker_compose
source set_env.sh
cd intel/cpu/xeon
docker compose --profile codegen-xeon-tgi up -d
```
Start CodeGen based on vLLM service:
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
#### vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose
source set_env.sh
cd intel/cpu/xeon
docker compose --profile codegen-xeon-vllm up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
- `opea/vllm`
### Building the Docker image locally
Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
In order to build the Docker image locally follow the instrustion provided below.
#### Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
#### Build the UI Gradio Image
Build the frontend Gradio image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
```
#### Dataprep Microservice with Redis
Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
#### Embedding Microservice with TEI
Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
#### LLM text generation Microservice
Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
#### Retriever Microservice
Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
#### Start Redis server
Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
### Validate the MicroServices and MegaService
1. LLM Service (for TGI, vLLM)
@@ -90,8 +206,9 @@ docker compose --profile codegen-xeon-vllm up -d
```bash
curl http://${host_ip}:8028/v1/chat/completions \
-X POST \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
-H 'Content-Type: application/json'
-H 'Content-Type: application/json' \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
```
2. LLM Microservices
@@ -99,19 +216,58 @@ docker compose --profile codegen-xeon-vllm up -d
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
-H 'Content-Type: application/json' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
```
3. MegaService
3. Dataprep Microservice
Make sure to replace the file name placeholders with your correct file name
```bash
curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
}'
curl http://${host_ip}:6007/v1/dataprep/ingest \
-X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.pdf" \
-F "files=@./file2.txt" \
-F "index_name=my_API_document"
```
## 🚀 Launch the UI
4. MegaService
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
CodeGen service with RAG and Agents activated based on an index.
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
## 🚀 Launch the Gradio Based UI (Recommended)
To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
Code Generation Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
Resource Management Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
Uploading a Knowledge Index
![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
Here is an example of running a query in the Gradio UI using an Index:
![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
## 🚀 Launch the Svelte Based UI (Optional)
To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
@@ -224,52 +380,3 @@ For example:
- Ask question and get answer
![qna](../../../../assets/img/codegen_qna.png)
## 🚀 Download or Build Docker Images
Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
### 1. Build the LLM Docker Image
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```
### 2. Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
### 3. Build the UI Docker Image
Build the frontend Docker image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
### 4. Build CodeGen React UI Docker Image (Optional)
Build react frontend Docker image via below command:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
cd GenAIExamples/CodeGen/ui
docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
Then run the command `docker images`, you will have the following Docker Images:
- `opea/llm-textgen:latest`
- `opea/codegen:latest`
- `opea/codegen-ui:latest`
- `opea/codegen-react-ui:latest` (optional)

View File

@@ -1,7 +1,8 @@
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-server
@@ -92,10 +93,14 @@ services:
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
- REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
- TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
- EMBEDDER_PORT=${EMBEDDER_PORT}
ipc: host
restart: always
codegen-xeon-ui-server:
image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
container_name: codegen-xeon-ui-server
depends_on:
- codegen-xeon-backend-server
@@ -106,9 +111,93 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
- MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
- host_ip=${host_ip}
- DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
- DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
ipc: host
restart: always
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
ports:
- "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
- "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
dataprep-redis-server:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-redis-server
depends_on:
- redis-vector-db
ports:
- "${DATAPREP_REDIS_PORT}:5000"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
REDIS_HOST: ${host_ip}
INDEX_NAME: ${INDEX_NAME}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LOGFLAG: true
restart: unless-stopped
tei-embedding-serving:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-serving
entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
ports:
- "${TEI_EMBEDDER_PORT:-12000}:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
host_ip: ${host_ip}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
interval: 10s
timeout: 6s
retries: 48
tei-embedding-server:
image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
container_name: tei-embedding-server
ports:
- "${EMBEDDER_PORT:-10201}:6000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
depends_on:
tei-embedding-serving:
condition: service_healthy
restart: unless-stopped
retriever-redis:
image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
container_name: retriever-redis
depends_on:
- redis-vector-db
ports:
- "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
REDIS_DB_PORT: ${REDIS_DB_PORT}
REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
INDEX_NAME: ${INDEX_NAME}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LOGFLAG: ${LOGFLAG}
RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
restart: unless-stopped
networks:
default:
driver: bridge

View File

@@ -6,28 +6,77 @@ The default pipeline deploys with vLLM as the LLM serving component. It also pro
## 🚀 Start MicroServices and MegaService
The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
```mermaid
---
config:
flowchart:
nodeSpacing: 400
rankSpacing: 100
curve: linear
themeVariables:
fontSize: 25px
---
flowchart LR
subgraph CodeGen
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef invisible fill:transparent,stroke:transparent;
style CodeGen-MegaService stroke:#000000
%% Subgraphs %%
subgraph CodeGen-MegaService["CodeGen-MegaService"]
direction LR
A[User] --> |Input query| B[CodeGen Gateway]
B --> |Invoke| Megaservice
subgraph Megaservice["Megaservice"]
direction TB
C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
EM([Embedding<br>MicroService]):::blue
RET([Retrieval<br>MicroService]):::blue
RER([Agents]):::blue
LLM([LLM<br>MicroService]):::blue
end
Megaservice --> |Output| E[Response]
subgraph User Interface
direction LR
a([Submit Query Tab]):::orchid
UI([UI server]):::orchid
Ingest([Manage Resources]):::orchid
end
subgraph Legend
CLIP_EM{{Embedding<br>service}}
VDB{{Vector DB}}
V_RET{{Retriever<br>service}}
Ingest{{Ingest data}}
DP([Data Preparation]):::blue
LLM_gen{{TGI Service}}
GW([CodeGen GateWay]):::orange
%% Data Preparation flow
%% Ingest data flow
direction LR
G([Microservice]) ==> H([Microservice])
I([Microservice]) -.-> J{{Server API}}
end
Ingest[Ingest data] --> UI
UI --> DP
DP <-.-> CLIP_EM
%% Questions interaction
direction LR
a[User Input Query] --> UI
UI --> GW
GW <==> CodeGen-MegaService
EM ==> RET
RET ==> RER
RER ==> LLM
%% Embedding service flow
direction LR
EM <-.-> CLIP_EM
RET <-.-> V_RET
LLM <-.-> LLM_gen
direction TB
%% Vector DB interaction
V_RET <-.->VDB
DP <-.->VDB
```
### Setup Environment Variables
@@ -44,38 +93,107 @@ export host_ip=${your_ip_address}
export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
```
2. Set Netowork Proxy
2. Set Network Proxy
**If you access public network through proxy, set the network proxy, otherwise, skip this step**
```bash
export no_proxy=${your_no_proxy}
export no_proxy=${no_proxy},${host_ip}
export http_proxy=${your_http_proxy}
export https_proxy=${your_https_proxy}
```
### Start the Docker Containers for All Services
CodeGen support TGI service and vLLM service, you can choose start either one of them.
Start CodeGen based on TGI service:
Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
```
#### TGI service:
```bash
cd GenAIExamples/CodeGen/docker_compose
source set_env.sh
cd intel/hpu/gaudi
docker compose --profile codegen-gaudi-tgi up -d
```
Start CodeGen based on vLLM service:
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
#### vLLM service:
```bash
cd GenAIExamples/CodeGen/docker_compose
source set_env.sh
cd intel/hpu/gaudi
docker compose --profile codegen-gaudi-vllm up -d
```
Then run the command `docker images`, you will have the following Docker images:
- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
- `opea/codegen-gradio-ui`
- `opea/codegen`
- `opea/dataprep`
- `opea/embedding`
- `opea/llm-textgen`
- `opea/retriever`
- `redis/redis-stack`
- `opea/vllm`
Refer to the [Gaudi Guide](./README.md) to build docker images from source.
### Building the Docker image locally
Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
In order to build the Docker image locally follow the instrustion provided below.
#### Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
#### Build the UI Gradio Image
Build the frontend Gradio image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
```
#### Dataprep Microservice with Redis
Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
#### Embedding Microservice with TEI
Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
#### LLM text generation Microservice
Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
#### Retriever Microservice
Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
#### Start Redis server
Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
### Validate the MicroServices and MegaService
1. LLM Service (for TGI, vLLM)
@@ -83,8 +201,9 @@ docker compose --profile codegen-gaudi-vllm up -d
```bash
curl http://${host_ip}:8028/v1/chat/completions \
-X POST \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
-H 'Content-Type: application/json'
-H 'Content-Type: application/json' \
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
```
2. LLM Microservices
@@ -92,19 +211,58 @@ docker compose --profile codegen-gaudi-vllm up -d
```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-H 'Content-Type: application/json'
-H 'Content-Type: application/json' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
```
3. MegaService
3. Dataprep Microservice
Make sure to replace the file name placeholders with your correct file name
```bash
curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
}'
curl http://${host_ip}:6007/v1/dataprep/ingest \
-X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.pdf" \
-F "files=@./file2.txt" \
-F "index_name=my_API_document"
```
## 🚀 Launch the Svelte Based UI
4. MegaService
```bash
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
CodeGen service with RAG and Agents activated based on an index.
```bash
curl http://${host_ip}$:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
```
## 🚀 Launch the Gradio Based UI (Recommended)
To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
Code Generation Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
Resource Management Tab
![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
Uploading a Knowledge Index
![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
Here is an example of running a query in the Gradio UI using an Index:
![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
## 🚀 Launch the Svelte Based UI (Optional)
To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
@@ -213,52 +371,3 @@ For example:
- Ask question and get answer
![qna](../../../../assets/img/codegen_qna.png)
## 🚀 Build Docker Images
First of all, you need to build the Docker images locally. This step can be ignored after the Docker images published to the Docker Hub.
### 1. Build the LLM Docker Image
```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```
### 2. Build the MegaService Docker Image
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
```bash
git clone https://github.com/opea-project/GenAIExamples
cd GenAIExamples/CodeGen
docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```
### 3. Build the UI Docker Image
Construct the frontend Docker image via the command below:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```
### 4. Build CodeGen React UI Docker Image (Optional)
Build react frontend Docker image via below command:
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
```bash
cd GenAIExamples/CodeGen/ui
docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
```
Then run the command `docker images`, you will have the following Docker images:
- `opea/llm-textgen:latest`
- `opea/codegen:latest`
- `opea/codegen-ui:latest`
- `opea/codegen-react-ui:latest`

View File

@@ -108,10 +108,15 @@ services:
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
- RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
- REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
- TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
- EMBEDDER_PORT=${EMBEDDER_PORT}
- host_ip=${host_ip}
ipc: host
restart: always
codegen-gaudi-ui-server:
image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
container_name: codegen-gaudi-ui-server
depends_on:
- codegen-gaudi-backend-server
@@ -122,9 +127,93 @@ services:
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
- MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
- host_ip=${host_ip}
- DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
- DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
ipc: host
restart: always
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
ports:
- "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
- "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
dataprep-redis-server:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-redis-server
depends_on:
- redis-vector-db
ports:
- "${DATAPREP_REDIS_PORT}:5000"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
REDIS_HOST: ${host_ip}
INDEX_NAME: ${INDEX_NAME}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LOGFLAG: true
restart: unless-stopped
tei-embedding-serving:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-serving
entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
ports:
- "${TEI_EMBEDDER_PORT:-12000}:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
host_ip: ${host_ip}
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
interval: 10s
timeout: 6s
retries: 48
tei-embedding-server:
image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
container_name: tei-embedding-server
ports:
- "${EMBEDDER_PORT:-10201}:6000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
depends_on:
tei-embedding-serving:
condition: service_healthy
restart: unless-stopped
retriever-redis:
image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
container_name: retriever-redis
depends_on:
- redis-vector-db
ports:
- "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL: ${REDIS_URL}
REDIS_DB_PORT: ${REDIS_DB_PORT}
REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
INDEX_NAME: ${INDEX_NAME}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
LOGFLAG: ${LOGFLAG}
RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
restart: unless-stopped
networks:
default:
driver: bridge

View File

@@ -7,7 +7,6 @@ source .set_env.sh
popd > /dev/null
export host_ip=$(hostname -I | awk '{print $1}')
if [ -z "${HUGGINGFACEHUB_API_TOKEN}" ]; then
echo "Error: HUGGINGFACEHUB_API_TOKEN is not set. Please set HUGGINGFACEHUB_API_TOKEN"
fi
@@ -17,10 +16,35 @@ if [ -z "${host_ip}" ]; then
fi
export no_proxy=${no_proxy},${host_ip}
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
export LLM_SERVICE_PORT=9000
export LLM_ENDPOINT="http://${host_ip}:8028"
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export TGI_LLM_ENDPOINT="http://${host_ip}:8028"
export MEGA_SERVICE_PORT=7778
export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_RETRIEVER_PORT=7000
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
export INDEX_NAME="CodeGen"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export EMBEDDER_PORT=6000
export TEI_EMBEDDER_PORT=8090
export TEI_EMBEDDING_HOST_IP=${host_ip}
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
export DATAPREP_REDIS_PORT=6007
export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
export LOGFLAG=false
export MODEL_CACHE="./data"
export NUM_CARDS=1

View File

@@ -23,6 +23,12 @@ services:
dockerfile: ./docker/Dockerfile.react
extends: codegen
image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
codegen-gradio-ui:
build:
context: ../ui
dockerfile: ./docker/Dockerfile.gradio
extends: codegen
image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
llm-textgen:
build:
context: GenAIComps
@@ -46,3 +52,21 @@ services:
dockerfile: Dockerfile.hpu
extends: codegen
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
dataprep:
build:
context: GenAIComps
dockerfile: comps/dataprep/src/Dockerfile
extends: codegen
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
retriever:
build:
context: GenAIComps
dockerfile: comps/retrievers/src/Dockerfile
extends: codegen
image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
embedding:
build:
context: GenAIComps
dockerfile: comps/embeddings/src/Dockerfile
extends: codegen
image: ${REGISTRY:-opea}/embedding:${TAG:-latest}

View File

@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_RETRIEVER_PORT=7000
export EMBEDDER_PORT=6000
export TEI_EMBEDDER_PORT=8090
export DATAPREP_REDIS_PORT=6007
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
ip_address=$(hostname -I | awk '{print $1}')
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export no_proxy=${no_proxy},${ip_address}
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -31,13 +41,14 @@ function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
# Download Gaudi vllm of latest tag
git clone https://github.com/HabanaAI/vllm-fork.git && cd vllm-fork
VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER} &> /dev/null && cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="codegen codegen-ui llm-textgen vllm-gaudi"
service_list="codegen codegen-gradio-ui llm-textgen vllm-gaudi dataprep retriever embedding"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker images && sleep 1s
@@ -48,18 +59,28 @@ function start_services() {
local llm_container_name="$2"
cd $WORKPATH/docker_compose/intel/hpu/gaudi
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
export LLM_ENDPOINT="http://${ip_address}:8028"
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export MEGA_SERVICE_PORT=7778
export MEGA_SERVICE_HOST_IP=${ip_address}
export LLM_SERVICE_HOST_IP=${ip_address}
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
export NUM_CARDS=1
export host_ip=${ip_address}
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
export INDEX_NAME="CodeGen"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export TEI_EMBEDDING_HOST_IP=${host_ip}
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
export INDEX_NAME="CodeGen"
# Start Docker Containers
docker compose --profile ${compose_profile} up -d | tee ${LOG_PATH}/start_services_with_compose.log
@@ -82,6 +103,16 @@ function validate_services() {
local DOCKER_NAME="$4"
local INPUT_DATA="$5"
if [[ "$SERVICE_NAME" == "ingest" ]]; then
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
else
echo "[ $SERVICE_NAME ] Data preparation failed..."
fi
else
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
@@ -100,6 +131,7 @@ function validate_services() {
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
exit 1
fi
fi
sleep 5s
}
@@ -122,6 +154,14 @@ function validate_microservices() {
"llm-textgen-gaudi-server" \
'{"query":"def print_hello_world():"}'
# Data ingest microservice
validate_services \
"${ip_address}:6007/v1/dataprep/ingest" \
"Data preparation succeeded" \
"ingest" \
"dataprep-redis-server" \
'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
}
function validate_megaservice() {
@@ -133,6 +173,14 @@ function validate_megaservice() {
"codegen-gaudi-backend-server" \
'{"messages": "def print_hello_world():"}'
# Curl the Mega Service with index_name and agents_flag
validate_services \
"${ip_address}:7778/v1/codegen" \
"" \
"mega-codegen" \
"codegen-gaudi-backend-server" \
'{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
}
function validate_frontend() {
@@ -163,6 +211,18 @@ function validate_frontend() {
fi
}
function validate_gradio() {
local URL="http://${ip_address}:5173/health"
local HTTP_STATUS=$(curl "$URL")
local SERVICE_NAME="Gradio"
if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
else
echo "[ $SERVICE_NAME ] UI server has failed..."
fi
}
function stop_docker() {
local docker_profile="$1"
@@ -201,7 +261,7 @@ function main() {
validate_microservices "${docker_llm_container_names[${i}]}"
validate_megaservice
validate_frontend
validate_gradio
stop_docker "${docker_compose_profiles[${i}]}"
sleep 5s

View File

@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}
export MODEL_CACHE=${model_cache:-"./data"}
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_RETRIEVER_PORT=7000
export EMBEDDER_PORT=6000
export TEI_EMBEDDER_PORT=8090
export DATAPREP_REDIS_PORT=6007
WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
ip_address=$(hostname -I | awk '{print $1}')
export http_proxy=${http_proxy}
export https_proxy=${https_proxy}
export no_proxy=${no_proxy},${ip_address}
function build_docker_images() {
opea_branch=${opea_branch:-"main"}
# If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -38,7 +48,8 @@ function build_docker_images() {
cd ../
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="codegen codegen-ui llm-textgen vllm"
service_list="codegen codegen-gradio-ui llm-textgen vllm dataprep retriever embedding"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
@@ -54,12 +65,21 @@ function start_services() {
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
export LLM_ENDPOINT="http://${ip_address}:8028"
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export MEGA_SERVICE_PORT=7778
export MEGA_SERVICE_HOST_IP=${ip_address}
export LLM_SERVICE_HOST_IP=${ip_address}
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
export host_ip=${ip_address}
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
export INDEX_NAME="CodeGen"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export TEI_EMBEDDING_HOST_IP=${host_ip}
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
# Start Docker Containers
docker compose --profile ${compose_profile} up -d > ${LOG_PATH}/start_services_with_compose.log
@@ -82,6 +102,16 @@ function validate_services() {
local DOCKER_NAME="$4"
local INPUT_DATA="$5"
if [[ "$SERVICE_NAME" == "ingest" ]]; then
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
else
echo "[ $SERVICE_NAME ] Data preparation failed..."
fi
else
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
@@ -100,6 +130,7 @@ function validate_services() {
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
exit 1
fi
fi
sleep 5s
}
@@ -122,6 +153,14 @@ function validate_microservices() {
"llm-textgen-server" \
'{"query":"def print_hello_world():", "max_tokens": 256}'
# Data ingest microservice
validate_services \
"${ip_address}:6007/v1/dataprep/ingest" \
"Data preparation succeeded" \
"ingest" \
"dataprep-redis-server" \
'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
}
function validate_megaservice() {
@@ -133,6 +172,14 @@ function validate_megaservice() {
"codegen-xeon-backend-server" \
'{"messages": "def print_hello_world():", "max_tokens": 256}'
# Curl the Mega Service with index_name and agents_flag
validate_services \
"${ip_address}:7778/v1/codegen" \
"" \
"mega-codegen" \
"codegen-xeon-backend-server" \
'{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
}
function validate_frontend() {
@@ -163,6 +210,17 @@ function validate_frontend() {
fi
}
function validate_gradio() {
local URL="http://${ip_address}:5173/health"
local HTTP_STATUS=$(curl "$URL")
local SERVICE_NAME="Gradio"
if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
else
echo "[ $SERVICE_NAME ] UI server has failed..."
fi
}
function stop_docker() {
local docker_profile="$1"
@@ -202,7 +260,7 @@ function main() {
validate_microservices "${docker_llm_container_names[${i}]}"
validate_megaservice
validate_frontend
validate_gradio
stop_docker "${docker_compose_profiles[${i}]}"
sleep 5s

View File

@@ -0,0 +1,33 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
FROM python:3.11-slim
ENV LANG=C.UTF-8
ARG ARCH="cpu"
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
build-essential \
default-jre \
libgl1-mesa-glx \
libjemalloc-dev \
wget
# Install ffmpeg static build
WORKDIR /root
RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz && \
mkdir ffmpeg-git-amd64-static && tar -xvf ffmpeg-git-amd64-static.tar.xz -C ffmpeg-git-amd64-static --strip-components 1 && \
export PATH=/root/ffmpeg-git-amd64-static:$PATH && \
cp /root/ffmpeg-git-amd64-static/ffmpeg /usr/local/bin/ && \
cp /root/ffmpeg-git-amd64-static/ffprobe /usr/local/bin/
RUN mkdir -p /home/user
COPY gradio /home/user/gradio
RUN pip install --no-cache-dir --upgrade pip setuptools && \
pip install --no-cache-dir -r /home/user/gradio/requirements.txt
WORKDIR /home/user/gradio
ENTRYPOINT ["python", "codegen_ui_gradio.py"]

View File

@@ -0,0 +1,65 @@
# Document Summary
This project provides a user interface for summarizing documents and text using a Dockerized frontend application. Users can upload files or paste text to generate summaries.
## Docker
### Build UI Docker Image
To build the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui` directory and run the following command:
```bash
cd GenAIExamples/CodeGen/ui
docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
```
This command builds the Docker image with the tag `opea/codegen-gradio-ui:latest`. It also passes the proxy settings as build arguments to ensure that the build process can access the internet if you are behind a corporate firewall.
### Run UI Docker Image
To run the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and execute the following commands:
```bash
cd GenAIExamples/CodeGen/ui/gradio
ip_address=$(hostname -I | awk '{print $1}')
docker run -d -p 5173:5173 --ipc=host \
-e http_proxy=$http_proxy \
-e https_proxy=$https_proxy \
-e no_proxy=$no_proxy \
-e BACKEND_SERVICE_ENDPOINT=http://$ip_address:7778/v1/codegen \
opea/codegen-gradio-ui:latest
```
This command runs the Docker container in interactive mode, mapping port 5173 of the host to port 5173 of the container. It also sets several environment variables, including the backend service endpoint, which is required for the frontend to communicate with the backend service.
### Python
To run the frontend application directly using Python, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and run the following command:
```bash
cd GenAIExamples/CodeGen/ui/gradio
python codegen_ui_gradio.py
```
This command starts the frontend application using Python.
## Additional Information
### Prerequisites
Ensure you have Docker installed and running on your system. Also, make sure you have the necessary proxy settings configured if you are behind a corporate firewall.
### Environment Variables
- `http_proxy`: Proxy setting for HTTP connections.
- `https_proxy`: Proxy setting for HTTPS connections.
- `no_proxy`: Comma-separated list of hosts that should be excluded from proxying.
- `BACKEND_SERVICE_ENDPOINT`: The endpoint of the backend service that the frontend will communicate with.
### Troubleshooting
- Docker Build Issues: If you encounter issues while building the Docker image, ensure that your proxy settings are correctly configured and that you have internet access.
- Docker Run Issues: If the Docker container fails to start, check the environment variables and ensure that the backend service is running and accessible.
This README file provides detailed instructions and explanations for building and running the Dockerized frontend application, as well as running it directly using Python. It also highlights the key features of the project and provides additional information for troubleshooting and configuring the environment.

View File

@@ -0,0 +1,371 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# This is a Gradio app that includes two tabs: one for code generation and another for resource management.
# The resource management tab has been updated to allow file uploads, deletion, and a table listing all the files.
# Additionally, three small text boxes have been added for managing file dataframe parameters.
import argparse
import json
import os
from pathlib import Path
from urllib.parse import urlparse
import gradio as gr
import pandas as pd
import requests
import uvicorn
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
logflag = os.getenv("LOGFLAG", False)
# create a FastAPI app
app = FastAPI()
cur_dir = os.getcwd()
static_dir = Path(os.path.join(cur_dir, "static/"))
tmp_dir = Path(os.path.join(cur_dir, "split_tmp_videos/"))
Path(static_dir).mkdir(parents=True, exist_ok=True)
app.mount("/static", StaticFiles(directory=static_dir), name="static")
tmp_upload_folder = "/tmp/gradio/"
host_ip = os.getenv("host_ip")
DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
dataprep_ingest_endpoint = f"{DATAPREP_ENDPOINT}/ingest"
dataprep_get_files_endpoint = f"{DATAPREP_ENDPOINT}/get"
dataprep_delete_files_endpoint = f"{DATAPREP_ENDPOINT}/delete"
dataprep_get_indices_endpoint = f"{DATAPREP_ENDPOINT}/indices"
# Define the functions that will be used in the app
def conversation_history(prompt, index, use_agent, history):
print(f"Generating code for prompt: {prompt} using index: {index} and use_agent is {use_agent}")
history.append([prompt, ""])
response_generator = generate_code(prompt, index, use_agent)
for token in response_generator:
history[-1][-1] += token
yield history
def upload_media(media, index=None, chunk_size=1500, chunk_overlap=100):
media = media.strip().split("\n")
if not chunk_size:
chunk_size = 1500
if not chunk_overlap:
chunk_overlap = 100
requests = []
if type(media) is list:
for file in media:
file_ext = os.path.splitext(file)[-1]
if is_valid_url(file):
yield (
gr.Textbox(
visible=True,
value="Ingesting URL...",
)
)
value = ingest_url(file, index, chunk_size, chunk_overlap)
requests.append(value)
yield value
elif file_ext in [".pdf", ".txt"]:
yield (
gr.Textbox(
visible=True,
value="Ingesting file...",
)
)
value = ingest_file(file, index, chunk_size, chunk_overlap)
requests.append(value)
yield value
else:
yield (
gr.Textbox(
visible=True,
value="Your media is either an invalid URL or the file extension type is not supported. (Supports .pdf, .txt, url)",
)
)
return
yield requests
else:
file_ext = os.path.splitext(media)[-1]
if is_valid_url(media):
value = ingest_url(media, index, chunk_size, chunk_overlap)
yield value
elif file_ext in [".pdf", ".txt"]:
value = ingest_file(media, index, chunk_size, chunk_overlap)
yield value
else:
yield (
gr.Textbox(
visible=True,
value="Your file extension type is not supported.",
)
)
return
def generate_code(query, index=None, use_agent=False):
if index is None or index == "None":
input_dict = {"messages": query, "agents_flag": use_agent}
else:
input_dict = {"messages": query, "index_name": index, "agents_flag": use_agent}
print("Query is ", input_dict)
headers = {"Content-Type": "application/json"}
response = requests.post(url=backend_service_endpoint, headers=headers, data=json.dumps(input_dict), stream=True)
line_count = 0
for line in response.iter_lines():
line_count += 1
if line:
line = line.decode("utf-8")
if line.startswith("data: "): # Only process lines starting with "data: "
json_part = line[len("data: ") :] # Remove the "data: " prefix
else:
json_part = line
if json_part.strip() == "[DONE]": # Ignore the DONE marker
continue
try:
json_obj = json.loads(json_part) # Convert to dictionary
if "choices" in json_obj:
for choice in json_obj["choices"]:
if "text" in choice:
# Yield each token individually
yield choice["text"]
except json.JSONDecodeError:
print("Error parsing JSON:", json_part)
if line_count == 0:
yield "Something went wrong, No Response Generated! \nIf you are using an Index, try uploading your media again with a smaller chunk size to avoid exceeding the token max. \
\nOr, check the Use Agent box and try again."
def ingest_file(file, index=None, chunk_size=100, chunk_overlap=150):
headers = {
# "Content-Type: multipart/form-data"
}
file_input = {"files": open(file, "rb")}
if index:
print("Index is", index)
data = {"index_name": index, "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
else:
data = {"chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
response = requests.post(url=dataprep_ingest_endpoint, headers=headers, files=file_input, data=data)
return response.text
def ingest_url(url, index=None, chunk_size=100, chunk_overlap=150):
url = str(url)
if not is_valid_url(url):
return "Invalid URL entered. Please enter a valid URL"
headers = {
# "Content-Type: multipart/form-data"
}
if index:
url_input = {
"link_list": json.dumps([url]),
"index_name": index,
"chunk_size": chunk_size,
"chunk_overlap": chunk_overlap,
}
else:
url_input = {"link_list": json.dumps([url]), "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
response = requests.post(url=dataprep_ingest_endpoint, headers=headers, data=url_input)
return response.text
def is_valid_url(url):
url = str(url)
try:
result = urlparse(url)
return all([result.scheme, result.netloc])
except ValueError:
return False
def get_files(index=None):
headers = {
# "Content-Type: multipart/form-data"
}
if index == "All Files":
index = None
if index:
index = {"index_name": index}
response = requests.post(url=dataprep_get_files_endpoint, headers=headers, data=index)
table = response.json()
return table
else:
response = requests.post(url=dataprep_get_files_endpoint, headers=headers)
table = response.json()
return table
def update_table(index=None):
if index == "All Files":
index = None
files = get_files(index)
if len(files) == 0:
df = pd.DataFrame(files, columns=["Files"])
return df
else:
df = pd.DataFrame(files)
return df
def update_indices():
indices = get_indices()
df = pd.DataFrame(indices, columns=["File Indices"])
return df
def delete_file(file, index=None):
# Remove the selected file from the file list
headers = {
# "Content-Type: application/json"
}
if index:
file_input = {"files": open(file, "rb"), "index_name": index}
else:
file_input = {"files": open(file, "rb")}
response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data=file_input)
table = update_table()
return response.text
def delete_all_files(index=None):
# Remove all files from the file list
headers = {
# "Content-Type: application/json"
}
response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data='{"file_path": "all"}')
table = update_table()
return "Delete All status: " + response.text
def get_indices():
headers = {
# "Content-Type: application/json"
}
response = requests.post(url=dataprep_get_indices_endpoint, headers=headers)
indices = ["None"]
indices += response.json()
return indices
def update_indices_dropdown():
new_dd = gr.update(choices=get_indices(), value="None")
return new_dd
def get_file_names(files):
file_str = ""
if not files:
return file_str
for file in files:
file_str += file + "\n"
file_str.strip()
return file_str
# Define UI components
with gr.Blocks() as ui:
with gr.Tab("Code Generation"):
gr.Markdown("### Generate Code from Natural Language")
chatbot = gr.Chatbot(label="Chat History")
prompt_input = gr.Textbox(label="Enter your query")
with gr.Column():
with gr.Row(equal_height=True):
database_dropdown = gr.Dropdown(choices=get_indices(), label="Select Index", value="None", scale=10)
db_refresh_button = gr.Button("Refresh Dropdown", scale=0.1)
db_refresh_button.click(update_indices_dropdown, outputs=database_dropdown)
use_agent = gr.Checkbox(label="Use Agent", container=False)
generate_button = gr.Button("Generate Code")
generate_button.click(
conversation_history, inputs=[prompt_input, database_dropdown, use_agent, chatbot], outputs=chatbot
)
with gr.Tab("Resource Management"):
# File management components
with gr.Row():
with gr.Column(scale=1):
index_name_input = gr.Textbox(label="Index Name")
chunk_size_input = gr.Textbox(
label="Chunk Size", value="1500", placeholder="Enter an integer (default: 1500)"
)
chunk_overlap_input = gr.Textbox(
label="Chunk Overlap", value="100", placeholder="Enter an integer (default: 100)"
)
with gr.Column(scale=3):
file_upload = gr.File(label="Upload Files", file_count="multiple")
url_input = gr.Textbox(label="Media to be ingested (Append URL's in a new line)")
upload_button = gr.Button("Upload", variant="primary")
upload_status = gr.Textbox(label="Upload Status")
file_upload.change(get_file_names, inputs=file_upload, outputs=url_input)
with gr.Column(scale=1):
file_table = gr.Dataframe(interactive=False, value=update_indices())
refresh_button = gr.Button("Refresh", variant="primary", size="sm")
refresh_button.click(update_indices, outputs=file_table)
upload_button.click(
upload_media,
inputs=[url_input, index_name_input, chunk_size_input, chunk_overlap_input],
outputs=upload_status,
)
delete_all_button = gr.Button("Delete All", variant="primary", size="sm")
delete_all_button.click(delete_all_files, outputs=upload_status)
@app.get("/health")
def health_check():
return {"status": "ok"}
ui.queue()
app = gr.mount_gradio_app(app, ui, path="/")
share = False
enable_queue = True
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="0.0.0.0")
parser.add_argument("--port", type=int, default=os.getenv("UI_PORT", 5173))
parser.add_argument("--concurrency-count", type=int, default=20)
parser.add_argument("--share", action="store_true")
host_ip = os.getenv("host_ip")
DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
args = parser.parse_args()
global gateway_addr
gateway_addr = backend_service_endpoint
global dataprep_ingest_addr
dataprep_ingest_addr = dataprep_ingest_endpoint
global dataprep_get_files_addr
dataprep_get_files_addr = dataprep_get_files_endpoint
uvicorn.run(app, host=args.host, port=args.port)

View File

@@ -0,0 +1,4 @@
gradio==5.22.0
numpy==1.26.4
opencv-python==4.10.0.82
Pillow==10.3.0