CodGen Examples using-RAG-and-Agents (#1757)

Signed-off-by: Mustafa <mustafa.cetin@intel.com>
2025-04-09 01:12:20 -07:00
parent 8b7cb3539e
commit 892624f539
18 changed files with 1524 additions and 239 deletions
--- a/CodeGen/README.md
+++ b/CodeGen/README.md
@@ -1,6 +1,6 @@
 # Code Generation Application
-Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions.
+Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions. Additionally Retrieval-Augmented Generation (RAG) and Agents are parts of the CodeGen example which provide an additional layer of intelligence and adaptability, ensuring that the generated code is not only relevant but also accurate, efficient, and tailored to the specific needs of the developers and programmers.
 The capabilities of CodeGen LLMs include:
@@ -28,7 +28,7 @@ config:
    rankSpacing: 100
    curve: linear
  themeVariables:
-    fontSize: 50px
+    fontSize: 25px
 ---
 flowchart LR
    %% Colors %%
@@ -37,34 +37,56 @@ flowchart LR
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style CodeGen-MegaService stroke:#000000
    %% Subgraphs %%
-    subgraph CodeGen-MegaService["CodeGen MegaService "]
+    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        LLM([LLM MicroService]):::blue
+        EM([Embedding<br>MicroService]):::blue
        RET([Retrieval<br>MicroService]):::blue
        RER([Agents]):::blue
        LLM([LLM<br>MicroService]):::blue
    end
-    subgraph UserInterface[" User Interface "]
+    subgraph User Interface
        direction LR
-        a([User Input Query]):::orchid
+        a([Submit Query Tab]):::orchid
-        UI([UI server<br>]):::orchid
+        UI([UI server]):::orchid
        Ingest([Manage Resources]):::orchid
    end
    CLIP_EM{{Embedding<br>service}}
    VDB{{Vector DB}}
    V_RET{{Retriever<br>service}}
    Ingest{{Ingest data}}
    DP([Data Preparation]):::blue
    LLM_gen{{TGI Service}}
    GW([CodeGen GateWay]):::orange
-    LLM_gen{{LLM Service <br>}}
+    %% Data Preparation flow
-    GW([CodeGen GateWay<br>]):::orange
+    %% Ingest data flow
-
+    direction LR
    Ingest[Ingest data] --> UI
    UI --> DP
    DP <-.-> CLIP_EM
    %% Questions interaction
    direction LR
    a[User Input Query] --> UI
    UI --> GW
    GW <==> CodeGen-MegaService
    EM ==> RET
    RET ==> RER
    RER ==> LLM
    %% Embedding service flow
    direction LR
    EM <-.-> CLIP_EM
    RET <-.-> V_RET
    LLM <-.-> LLM_gen
    direction TB
    %% Vector DB interaction
    V_RET <-.->VDB
    DP <-.->VDB
 ```
 ## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
@@ -94,12 +116,12 @@ Currently we support two ways of deploying ChatQnA services with docker compose:
 By default, the LLM model is set to a default value as listed below:
-| Service      | Model                                                                                   |
+| Service      | Model                                                                                     |
-| ------------ | --------------------------------------------------------------------------------------- |
+| ------------ | ----------------------------------------------------------------------------------------- |
-| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) |
+| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |
-[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model.
+[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model for m.
-Change the `LLM_MODEL_ID` below for your needs, such as: [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
+Change the `LLM_MODEL_ID` below for your needs, such as: [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct), [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
 If you choose to use `meta-llama/CodeLlama-7b-hf` as LLM model, you will need to visit [here](https://huggingface.co/meta-llama/CodeLlama-7b-hf), click the `Expand to review and access` button to ask for model access.
@@ -134,22 +156,44 @@ To set up environment variables for deploying ChatQnA services, follow these ste
 #### Deploy CodeGen on Gaudi
-Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
+Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). User could start CodeGen based on TGI or vLLM service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
-docker compose up -d
+```
 TGI service:
 ```bash
 docker compose --profile codegen-gaudi-tgi up -d
 ```
 vLLM service:
 ```bash
 docker compose --profile codegen-gaudi-vllm up -d
 ```
 Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
 #### Deploy CodeGen on Xeon
-Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
+Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). User could start CodeGen based on TGI or vLLM service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
-docker compose up -d
+```
 TGI service:
 ```bash
 docker compose --profile codegen-xeon-tgi up -d
 ```
 vLLM service:
 ```bash
 docker compose --profile codegen-xeon-vllm up -d
 ```
 Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
@@ -170,6 +214,15 @@ Two ways of consuming CodeGen Service:
       -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
   If the user wants a CodeGen service with RAG and Agents based on dedicated documentation.
   ```bash
   curl http://localhost:7778/v1/codegen \
      -H "Content-Type: application/json" \
      -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
 2. Access via frontend
   To access the frontend, open the following URL in your browser: http://{host_ip}:5173.
--- a/CodeGen/assets/img/codegen_gradio_ui_dataprep.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_dataprep.png
--- a/CodeGen/assets/img/codegen_gradio_ui_main.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_main.png
--- a/CodeGen/assets/img/codegen_gradio_ui_query.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_query.png
--- a/CodeGen/assets/img/codegen_gradio_ui_rm.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_rm.png
--- a/CodeGen/codegen.py
+++ b/CodeGen/codegen.py
@@ -1,10 +1,11 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 import ast
 import asyncio
 import os
-from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
+from comps import CustomLogger, MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
 from comps.cores.mega.utils import handle_message
 from comps.cores.proto.api_protocol import (
    ChatCompletionRequest,
@@ -16,20 +17,98 @@ from comps.cores.proto.api_protocol import (
 from comps.cores.proto.docarray import LLMParams
 from fastapi import Request
 from fastapi.responses import StreamingResponse
 from langchain.prompts import PromptTemplate
 logger = CustomLogger("opea_dataprep_microservice")
 logflag = os.getenv("LOGFLAG", False)
 MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 7778))
 LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
 LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
 RETRIEVAL_SERVICE_HOST_IP = os.getenv("RETRIEVAL_SERVICE_HOST_IP", "0.0.0.0")
 REDIS_RETRIEVER_PORT = int(os.getenv("REDIS_RETRIEVER_PORT", 7000))
 TEI_EMBEDDING_HOST_IP = os.getenv("TEI_EMBEDDING_HOST_IP", "0.0.0.0")
 EMBEDDER_PORT = int(os.getenv("EMBEDDER_PORT", 6000))
 grader_prompt = """You are a grader assessing relevance of a retrieved document to a user question. \n
 Here is the user question: {question} \n
 Here is the retrieved document: \n\n {document} \n\n
 If the document contains keywords related to the user question, grade it as relevant.
 It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
 Rules:
 - Do not return the question, the provided document or explanation.
 - if this document is relevant to the question, return 'yes' otherwise return 'no'.
 - Do not include any other details in your response.
 """
 def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
    """Aligns the inputs based on the service type of the current node.
    Parameters:
    - self: Reference to the current instance of the class.
    - inputs: Dictionary containing the inputs for the current node.
    - cur_node: The current node in the service orchestrator.
    - runtime_graph: The runtime graph of the service orchestrator.
    - llm_parameters_dict: Dictionary containing the LLM parameters.
    - kwargs: Additional keyword arguments.
    Returns:
    - inputs: The aligned inputs for the current node.
    """
    # Check if the current service type is EMBEDDING
    if self.services[cur_node].service_type == ServiceType.EMBEDDING:
        # Store the input query for later use
        self.input_query = inputs["query"]
        # Set the input for the embedding service
        inputs["input"] = inputs["query"]
    # Check if the current service type is RETRIEVER
    if self.services[cur_node].service_type == ServiceType.RETRIEVER:
        # Extract the embedding from the inputs
        embedding = inputs["data"][0]["embedding"]
        # Align the inputs for the retriever service
        inputs = {"index_name": llm_parameters_dict["index_name"], "text": self.input_query, "embedding": embedding}
    return inputs
 class CodeGenService:
    def __init__(self, host="0.0.0.0", port=8000):
        self.host = host
        self.port = port
-        self.megaservice = ServiceOrchestrator()
+        ServiceOrchestrator.align_inputs = align_inputs
        self.megaservice_llm = ServiceOrchestrator()
        self.megaservice_retriever = ServiceOrchestrator()
        self.megaservice_retriever_llm = ServiceOrchestrator()
        self.endpoint = str(MegaServiceEndpoint.CODE_GEN)
    def add_remote_service(self):
        """Adds remote microservices to the service orchestrators and defines the flow between them."""
        # Define the embedding microservice
        embedding = MicroService(
            name="embedding",
            host=TEI_EMBEDDING_HOST_IP,
            port=EMBEDDER_PORT,
            endpoint="/v1/embeddings",
            use_remote_service=True,
            service_type=ServiceType.EMBEDDING,
        )
        # Define the retriever microservice
        retriever = MicroService(
            name="retriever",
            host=RETRIEVAL_SERVICE_HOST_IP,
            port=REDIS_RETRIEVER_PORT,
            endpoint="/v1/retrieval",
            use_remote_service=True,
            service_type=ServiceType.RETRIEVER,
        )
        # Define the LLM microservice
        llm = MicroService(
            name="llm",
            host=LLM_SERVICE_HOST_IP,
@@ -38,13 +117,61 @@ class CodeGenService:
            use_remote_service=True,
            service_type=ServiceType.LLM,
        )
-        self.megaservice.add(llm)
+
        # Add the microservices to the megaservice_retriever_llm orchestrator and define the flow
        self.megaservice_retriever_llm.add(embedding).add(retriever).add(llm)
        self.megaservice_retriever_llm.flow_to(embedding, retriever)
        self.megaservice_retriever_llm.flow_to(retriever, llm)
        # Add the microservices to the megaservice_retriever orchestrator and define the flow
        self.megaservice_retriever.add(embedding).add(retriever)
        self.megaservice_retriever.flow_to(embedding, retriever)
        # Add the LLM microservice to the megaservice_llm orchestrator
        self.megaservice_llm.add(llm)
    async def read_streaming_response(self, response: StreamingResponse):
        """Reads the streaming response from a StreamingResponse object.
        Parameters:
        - self: Reference to the current instance of the class.
        - response: The StreamingResponse object to read from.
        Returns:
        - str: The complete response body as a decoded string.
        """
        body = b""  # Initialize an empty byte string to accumulate the response chunks
        async for chunk in response.body_iterator:
            body += chunk  # Append each chunk to the body
        return body.decode("utf-8")  # Decode the accumulated byte string to a regular string
    async def handle_request(self, request: Request):
        """Handles the incoming request, processes it through the appropriate microservices,
        and returns the response.
        Parameters:
        - self: Reference to the current instance of the class.
        - request: The incoming request object.
        Returns:
        - ChatCompletionResponse: The response from the LLM microservice.
        """
        # Parse the incoming request data
        data = await request.json()
        # Get the stream option from the request data, default to True if not provided
        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
+
        # Validate and parse the chat request data
        chat_request = ChatCompletionRequest.model_validate(data)
        # Handle the chat messages to generate the prompt
        prompt = handle_message(chat_request.messages)
        # Get the agents flag from the request data, default to False if not provided
        agents_flag = data.get("agents_flag", False)
        # Define the LLM parameters
        parameters = LLMParams(
            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
            top_k=chat_request.top_k if chat_request.top_k else 10,
@@ -54,18 +181,90 @@ class CodeGenService:
            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
            stream=stream_opt,
            index_name=chat_request.index_name,
        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
+
-            initial_inputs={"query": prompt}, llm_parameters=parameters
+        # Initialize the initial inputs with the generated prompt
        initial_inputs = {"query": prompt}
        # Check if the key index name is provided in the parameters
        if parameters.index_name:
            if agents_flag:
                # Schedule the retriever microservice
                result_ret, runtime_graph = await self.megaservice_retriever.schedule(
                    initial_inputs=initial_inputs, llm_parameters=parameters
                )
                # Switch to the LLM microservice
                megaservice = self.megaservice_llm
                relevant_docs = []
                for doc in result_ret["retriever/MicroService"]["retrieved_docs"]:
                    # Create the PromptTemplate
                    prompt_agent = PromptTemplate(template=grader_prompt, input_variables=["question", "document"])
                    # Format the template with the input variables
                    formatted_prompt = prompt_agent.format(question=prompt, document=doc["text"])
                    initial_inputs_grader = {"query": formatted_prompt}
                    # Schedule the LLM microservice for grading
                    grade, runtime_graph = await self.megaservice_llm.schedule(
                        initial_inputs=initial_inputs_grader, llm_parameters=parameters
                    )
                    for node, response in grade.items():
                        if isinstance(response, StreamingResponse):
                            # Read the streaming response
                            grader_response = await self.read_streaming_response(response)
                            # Replace null with None
                            grader_response = grader_response.replace("null", "None")
                            # Split the response by "data:" and process each part
                            for i in grader_response.split("data:"):
                                if '"text":' in i:
                                    # Convert the string to a dictionary
                                    r = ast.literal_eval(i)
                                    # Check if the response text is "yes"
                                    if r["choices"][0]["text"] == "yes":
                                        # Append the document to the relevant_docs list
                                        relevant_docs.append(doc)
                # Update the initial inputs with the relevant documents
                if len(relevant_docs) > 0:
                    logger.info(f"[ CodeGenService - handle_request ] {len(relevant_docs)} relevant document\s found.")
                    query = initial_inputs["query"]
                    initial_inputs = {}
                    initial_inputs["retrieved_docs"] = relevant_docs
                    initial_inputs["initial_query"] = query
                else:
                    logger.info(
                        "[ CodeGenService - handle_request ] Could not find any relevant documents. The query will be used as input to the LLM."
                    )
            else:
                # Use the combined retriever and LLM microservice
                megaservice = self.megaservice_retriever_llm
        else:
            # Use the LLM microservice only
            megaservice = self.megaservice_llm
        # Schedule the final megaservice
        result_dict, runtime_graph = await megaservice.schedule(
            initial_inputs=initial_inputs, llm_parameters=parameters
        )
        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
+            # Check if the last microservice in the megaservice is LLM
            if (
                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
+                and node == list(megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
+                and megaservice.services[node].service_type == ServiceType.LLM
            ):
                return response
        # Get the response from the last node in the runtime graph
        last_node = runtime_graph.all_leaves()[-1]
        response = result_dict[last_node]["text"]
        choices = []
--- a/CodeGen/docker_compose/intel/cpu/xeon/README.md
+++ b/CodeGen/docker_compose/intel/cpu/xeon/README.md
@@ -13,28 +13,77 @@ After launching your instance, you can connect to it using SSH (for Linux instan
 ## 🚀 Start Microservices and MegaService
-The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
+The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
 The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
 ```mermaid
 ---
 config:
  flowchart:
    nodeSpacing: 400
    rankSpacing: 100
    curve: linear
  themeVariables:
    fontSize: 25px
 ---
 flowchart LR
-    subgraph CodeGen
+    %% Colors %%
    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style CodeGen-MegaService stroke:#000000
    %% Subgraphs %%
    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        A[User] --> |Input query| B[CodeGen Gateway]
+        EM([Embedding<br>MicroService]):::blue
-        B --> |Invoke| Megaservice
+        RET([Retrieval<br>MicroService]):::blue
-        subgraph Megaservice["Megaservice"]
+        RER([Agents]):::blue
-            direction TB
+        LLM([LLM<br>MicroService]):::blue
-            C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
+    end
-        end
+    subgraph User Interface
-        Megaservice --> |Output| E[Response]
+        direction LR
        a([Submit Query Tab]):::orchid
        UI([UI server]):::orchid
        Ingest([Manage Resources]):::orchid
    end
-    subgraph Legend
+    CLIP_EM{{Embedding<br>service}}
-        direction LR
+    VDB{{Vector DB}}
-        G([Microservice]) ==> H([Microservice])
+    V_RET{{Retriever<br>service}}
-        I([Microservice]) -.-> J{{Server API}}
+    Ingest{{Ingest data}}
-    end
+    DP([Data Preparation]):::blue
    LLM_gen{{TGI Service}}
    GW([CodeGen GateWay]):::orange
    %% Data Preparation flow
    %% Ingest data flow
    direction LR
    Ingest[Ingest data] --> UI
    UI --> DP
    DP <-.-> CLIP_EM
    %% Questions interaction
    direction LR
    a[User Input Query] --> UI
    UI --> GW
    GW <==> CodeGen-MegaService
    EM ==> RET
    RET ==> RER
    RER ==> LLM
    %% Embedding service flow
    direction LR
    EM <-.-> CLIP_EM
    RET <-.-> V_RET
    LLM <-.-> LLM_gen
    direction TB
    %% Vector DB interaction
    V_RET <-.->VDB
    DP <-.->VDB
 ```
 ### Setup Environment Variables
@@ -51,38 +100,105 @@ export host_ip=${your_ip_address}
 export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
 ```
-2. Set Netowork Proxy
+2. Set Network Proxy
 **If you access public network through proxy, set the network proxy, otherwise, skip this step**
 ```bash
-export no_proxy=${your_no_proxy}
+export no_proxy=${no_proxy},${host_ip}
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_https_proxy}
 ```
 ### Start the Docker Containers for All Services
-CodeGen support TGI service and vLLM service, you can choose start either one of them.
+Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
-
+
-Start CodeGen based on TGI service:
+```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
 ```
 #### TGI service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose
 source set_env.sh
 cd intel/cpu/xeon
 docker compose --profile codegen-xeon-tgi up -d
 ```
-Start CodeGen based on vLLM service:
+Then run the command `docker images`, you will have the following Docker images:
 - `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
 - `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
 - `opea/codegen-gradio-ui`
 - `opea/codegen`
 - `opea/dataprep`
 - `opea/embedding`
 - `opea/llm-textgen`
 - `opea/retriever`
 - `redis/redis-stack`
 #### vLLM service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose
 source set_env.sh
 cd intel/cpu/xeon
 docker compose --profile codegen-xeon-vllm up -d
 ```
 Then run the command `docker images`, you will have the following Docker images:
 - `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
 - `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
 - `opea/codegen-gradio-ui`
 - `opea/codegen`
 - `opea/dataprep`
 - `opea/embedding`
 - `opea/llm-textgen`
 - `opea/retriever`
 - `redis/redis-stack`
 - `opea/vllm`
 ### Building the Docker image locally
 Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
 In order to build the Docker image locally follow the instrustion provided below.
 #### Build the MegaService Docker Image
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
 ```bash
 git clone https://github.com/opea-project/GenAIExamples
 cd GenAIExamples/CodeGen
 docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
 ```
 #### Build the UI Gradio Image
 Build the frontend Gradio image via the command below:
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
 ```
 #### Dataprep Microservice with Redis
 Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
 #### Embedding Microservice with TEI
 Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
 #### LLM text generation Microservice
 Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
 #### Retriever Microservice
 Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
 #### Start Redis server
 Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
 ### Validate the MicroServices and MegaService
 1. LLM Service (for TGI, vLLM)
@@ -90,8 +206,9 @@ docker compose --profile codegen-xeon-vllm up -d
   ```bash
   curl http://${host_ip}:8028/v1/chat/completions \
       -X POST \
-       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
+       -H 'Content-Type: application/json' \
-       -H 'Content-Type: application/json'
+       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
   ```
 2. LLM Microservices
@@ -99,19 +216,58 @@ docker compose --profile codegen-xeon-vllm up -d
   ```bash
   curl http://${host_ip}:9000/v1/chat/completions\
     -X POST \
-     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
+     -H 'Content-Type: application/json' \
-     -H 'Content-Type: application/json'
+     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
   ```
-3. MegaService
+3. Dataprep Microservice
   Make sure to replace the file name placeholders with your correct file name
   ```bash
-   curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
+   curl http://${host_ip}:6007/v1/dataprep/ingest \
-        "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
+   -X POST \
-        }'
+   -H "Content-Type: multipart/form-data" \
   -F "files=@./file1.pdf" \
   -F "files=@./file2.txt" \
   -F "index_name=my_API_document"
   ```
-## 🚀 Launch the UI
+4. MegaService
   ```bash
   curl http://${host_ip}:7778/v1/codegen \
     -H "Content-Type: application/json" \
     -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
   CodeGen service with RAG and Agents activated based on an index.
   ```bash
   curl http://${host_ip}:7778/v1/codegen \
     -H "Content-Type: application/json" \
     -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
 ## 🚀 Launch the Gradio Based UI (Recommended)
 To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
 Code Generation Tab
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
 Resource Management Tab
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
 Uploading a Knowledge Index
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
 Here is an example of running a query in the Gradio UI using an Index:
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
 ## 🚀 Launch the Svelte Based UI (Optional)
 To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
@@ -224,52 +380,3 @@ For example:
 - Ask question and get answer
 ![qna](../../../../assets/img/codegen_qna.png)
 ## 🚀 Download or Build Docker Images
 Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
 ### 1. Build the LLM Docker Image
 ```bash
 git clone https://github.com/opea-project/GenAIComps.git
 cd GenAIComps
 docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
 ```
 ### 2. Build the MegaService Docker Image
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build MegaService Docker image via the command below:
 ```bash
 git clone https://github.com/opea-project/GenAIExamples
 cd GenAIExamples/CodeGen
 docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
 ```
 ### 3. Build the UI Docker Image
 Build the frontend Docker image via the command below:
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
 ```
 ### 4. Build CodeGen React UI Docker Image (Optional)
 Build react frontend Docker image via below command:
 **Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
 ```
 Then run the command `docker images`, you will have the following Docker Images:
 - `opea/llm-textgen:latest`
 - `opea/codegen:latest`
 - `opea/codegen-ui:latest`
 - `opea/codegen-react-ui:latest` (optional)
--- a/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml
@@ -1,7 +1,8 @@
-# Copyright (C) 2024 Intel Corporation
+# Copyright (C) 2025 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 services:
  tgi-service:
    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
    container_name: tgi-server
@@ -92,10 +93,14 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
      - RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
      - REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
      - TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
      - EMBEDDER_PORT=${EMBEDDER_PORT}
    ipc: host
    restart: always
  codegen-xeon-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
    container_name: codegen-xeon-ui-server
    depends_on:
      - codegen-xeon-backend-server
@@ -106,9 +111,93 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
      - host_ip=${host_ip}
      - DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
      - DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
    ipc: host
    restart: always
-
+  redis-vector-db:
    image: redis/redis-stack:7.2.0-v9
    container_name: redis-vector-db
    ports:
      - "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
      - "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
  dataprep-redis-server:
    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
    container_name: dataprep-redis-server
    depends_on:
      - redis-vector-db
    ports:
      - "${DATAPREP_REDIS_PORT}:5000"
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      REDIS_URL: ${REDIS_URL}
      REDIS_HOST: ${host_ip}
      INDEX_NAME: ${INDEX_NAME}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      LOGFLAG: true
    restart: unless-stopped
  tei-embedding-serving:
    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
    container_name: tei-embedding-serving
    entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
    ports:
      - "${TEI_EMBEDDER_PORT:-12000}:80"
    volumes:
      - "./data:/data"
    shm_size: 1g
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      host_ip: ${host_ip}
      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
      interval: 10s
      timeout: 6s
      retries: 48
  tei-embedding-server:
    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
    container_name: tei-embedding-server
    ports:
      - "${EMBEDDER_PORT:-10201}:6000"
    ipc: host
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
      EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
    depends_on:
      tei-embedding-serving:
        condition: service_healthy
    restart: unless-stopped
  retriever-redis:
    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
    container_name: retriever-redis
    depends_on:
      - redis-vector-db
    ports:
      - "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
    ipc: host
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      REDIS_URL: ${REDIS_URL}
      REDIS_DB_PORT: ${REDIS_DB_PORT}
      REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
      REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
      INDEX_NAME: ${INDEX_NAME}
      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      LOGFLAG: ${LOGFLAG}
      RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
    restart: unless-stopped
 networks:
  default:
    driver: bridge
--- a/CodeGen/docker_compose/intel/hpu/gaudi/README.md
+++ b/CodeGen/docker_compose/intel/hpu/gaudi/README.md
@@ -6,28 +6,77 @@ The default pipeline deploys with vLLM as the LLM serving component. It also pro
 ## 🚀 Start MicroServices and MegaService
-The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
+The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
 The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:
 ```mermaid
 ---
 config:
  flowchart:
    nodeSpacing: 400
    rankSpacing: 100
    curve: linear
  themeVariables:
    fontSize: 25px
 ---
 flowchart LR
-    subgraph CodeGen
+    %% Colors %%
    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style CodeGen-MegaService stroke:#000000
    %% Subgraphs %%
    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        A[User] --> |Input query| B[CodeGen Gateway]
+        EM([Embedding<br>MicroService]):::blue
-        B --> |Invoke| Megaservice
+        RET([Retrieval<br>MicroService]):::blue
-        subgraph Megaservice["Megaservice"]
+        RER([Agents]):::blue
-            direction TB
+        LLM([LLM<br>MicroService]):::blue
-            C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
+    end
-        end
+    subgraph User Interface
-        Megaservice --> |Output| E[Response]
+        direction LR
        a([Submit Query Tab]):::orchid
        UI([UI server]):::orchid
        Ingest([Manage Resources]):::orchid
    end
-    subgraph Legend
+    CLIP_EM{{Embedding<br>service}}
-        direction LR
+    VDB{{Vector DB}}
-        G([Microservice]) ==> H([Microservice])
+    V_RET{{Retriever<br>service}}
-        I([Microservice]) -.-> J{{Server API}}
+    Ingest{{Ingest data}}
-    end
+    DP([Data Preparation]):::blue
    LLM_gen{{TGI Service}}
    GW([CodeGen GateWay]):::orange
    %% Data Preparation flow
    %% Ingest data flow
    direction LR
    Ingest[Ingest data] --> UI
    UI --> DP
    DP <-.-> CLIP_EM
    %% Questions interaction
    direction LR
    a[User Input Query] --> UI
    UI --> GW
    GW <==> CodeGen-MegaService
    EM ==> RET
    RET ==> RER
    RER ==> LLM
    %% Embedding service flow
    direction LR
    EM <-.-> CLIP_EM
    RET <-.-> V_RET
    LLM <-.-> LLM_gen
    direction TB
    %% Vector DB interaction
    V_RET <-.->VDB
    DP <-.->VDB
 ```
 ### Setup Environment Variables
@@ -44,38 +93,107 @@ export host_ip=${your_ip_address}
 export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
 ```
-2. Set Netowork Proxy
+2. Set Network Proxy
 **If you access public network through proxy, set the network proxy, otherwise, skip this step**
 ```bash
-export no_proxy=${your_no_proxy}
+export no_proxy=${no_proxy},${host_ip}
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_https_proxy}
 ```
 ### Start the Docker Containers for All Services
-CodeGen support TGI service and vLLM service, you can choose start either one of them.
+Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
-
+
-Start CodeGen based on TGI service:
+```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
 ```
 #### TGI service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose
 source set_env.sh
 cd intel/hpu/gaudi
 docker compose --profile codegen-gaudi-tgi up -d
 ```
-Start CodeGen based on vLLM service:
+Then run the command `docker images`, you will have the following Docker images:
 - `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
 - `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
 - `opea/codegen-gradio-ui`
 - `opea/codegen`
 - `opea/dataprep`
 - `opea/embedding`
 - `opea/llm-textgen`
 - `opea/retriever`
 - `redis/redis-stack`
 #### vLLM service:
 ```bash
 cd GenAIExamples/CodeGen/docker_compose
 source set_env.sh
 cd intel/hpu/gaudi
 docker compose --profile codegen-gaudi-vllm up -d
 ```
 Then run the command `docker images`, you will have the following Docker images:
 - `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
 - `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
 - `opea/codegen-gradio-ui`
 - `opea/codegen`
 - `opea/dataprep`
 - `opea/embedding`
 - `opea/llm-textgen`
 - `opea/retriever`
 - `redis/redis-stack`
 - `opea/vllm`
 Refer to the [Gaudi Guide](./README.md) to build docker images from source.
 ### Building the Docker image locally
 Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
 In order to build the Docker image locally follow the instrustion provided below.
 #### Build the MegaService Docker Image
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
 ```bash
 git clone https://github.com/opea-project/GenAIExamples
 cd GenAIExamples/CodeGen
 docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
 ```
 #### Build the UI Gradio Image
 Build the frontend Gradio image via the command below:
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
 ```
 #### Dataprep Microservice with Redis
 Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
 #### Embedding Microservice with TEI
 Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
 #### LLM text generation Microservice
 Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
 #### Retriever Microservice
 Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
 #### Start Redis server
 Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
 ### Validate the MicroServices and MegaService
 1. LLM Service (for TGI, vLLM)
@@ -83,8 +201,9 @@ docker compose --profile codegen-gaudi-vllm up -d
   ```bash
   curl http://${host_ip}:8028/v1/chat/completions \
       -X POST \
-       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
+       -H 'Content-Type: application/json' \
-       -H 'Content-Type: application/json'
+       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
   ```
 2. LLM Microservices
@@ -92,19 +211,58 @@ docker compose --profile codegen-gaudi-vllm up -d
   ```bash
   curl http://${host_ip}:9000/v1/chat/completions\
     -X POST \
-     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
+     -H 'Content-Type: application/json' \
-     -H 'Content-Type: application/json'
+     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
   ```
-3. MegaService
+3. Dataprep Microservice
   Make sure to replace the file name placeholders with your correct file name
   ```bash
-   curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
+   curl http://${host_ip}:6007/v1/dataprep/ingest \
-        "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
+   -X POST \
-        }'
+   -H "Content-Type: multipart/form-data" \
   -F "files=@./file1.pdf" \
   -F "files=@./file2.txt" \
   -F "index_name=my_API_document"
   ```
-## 🚀 Launch the Svelte Based UI
+4. MegaService
   ```bash
   curl http://${host_ip}:7778/v1/codegen \
     -H "Content-Type: application/json" \
     -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
   CodeGen service with RAG and Agents activated based on an index.
   ```bash
   curl http://${host_ip}$:7778/v1/codegen \
     -H "Content-Type: application/json" \
     -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```
 ## 🚀 Launch the Gradio Based UI (Recommended)
 To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
 Code Generation Tab
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
 Resource Management Tab
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
 Uploading a Knowledge Index
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
 Here is an example of running a query in the Gradio UI using an Index:
 ![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
 ## 🚀 Launch the Svelte Based UI (Optional)
 To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
@@ -213,52 +371,3 @@ For example:
 - Ask question and get answer
 ![qna](../../../../assets/img/codegen_qna.png)
 ## 🚀 Build Docker Images
 First of all, you need to build the Docker images locally. This step can be ignored after the Docker images published to the Docker Hub.
 ### 1. Build the LLM Docker Image
 ```bash
 git clone https://github.com/opea-project/GenAIComps.git
 cd GenAIComps
 docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
 ```
 ### 2. Build the MegaService Docker Image
 To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
 ```bash
 git clone https://github.com/opea-project/GenAIExamples
 cd GenAIExamples/CodeGen
 docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
 ```
 ### 3. Build the UI Docker Image
 Construct the frontend Docker image via the command below:
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
 ```
 ### 4. Build CodeGen React UI Docker Image (Optional)
 Build react frontend Docker image via below command:
 **Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
 ```
 Then run the command `docker images`, you will have the following Docker images:
 - `opea/llm-textgen:latest`
 - `opea/codegen:latest`
 - `opea/codegen-ui:latest`
 - `opea/codegen-react-ui:latest`
--- a/CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -108,10 +108,15 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
      - RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
      - REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
      - TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
      - EMBEDDER_PORT=${EMBEDDER_PORT}
      - host_ip=${host_ip}
    ipc: host
    restart: always
  codegen-gaudi-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
    container_name: codegen-gaudi-ui-server
    depends_on:
      - codegen-gaudi-backend-server
@@ -122,9 +127,93 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
      - host_ip=${host_ip}
      - DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
      - DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
    ipc: host
    restart: always
-
+  redis-vector-db:
    image: redis/redis-stack:7.2.0-v9
    container_name: redis-vector-db
    ports:
      - "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
      - "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
  dataprep-redis-server:
    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
    container_name: dataprep-redis-server
    depends_on:
      - redis-vector-db
    ports:
      - "${DATAPREP_REDIS_PORT}:5000"
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      REDIS_URL: ${REDIS_URL}
      REDIS_HOST: ${host_ip}
      INDEX_NAME: ${INDEX_NAME}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      LOGFLAG: true
    restart: unless-stopped
  tei-embedding-serving:
    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
    container_name: tei-embedding-serving
    entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
    ports:
      - "${TEI_EMBEDDER_PORT:-12000}:80"
    volumes:
      - "./data:/data"
    shm_size: 1g
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      host_ip: ${host_ip}
      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
      interval: 10s
      timeout: 6s
      retries: 48
  tei-embedding-server:
    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
    container_name: tei-embedding-server
    ports:
      - "${EMBEDDER_PORT:-10201}:6000"
    ipc: host
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
      EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
    depends_on:
      tei-embedding-serving:
        condition: service_healthy
    restart: unless-stopped
  retriever-redis:
    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
    container_name: retriever-redis
    depends_on:
      - redis-vector-db
    ports:
      - "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
    ipc: host
    environment:
      no_proxy: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      REDIS_URL: ${REDIS_URL}
      REDIS_DB_PORT: ${REDIS_DB_PORT}
      REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
      REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
      INDEX_NAME: ${INDEX_NAME}
      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
      LOGFLAG: ${LOGFLAG}
      RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
    restart: unless-stopped
 networks:
  default:
    driver: bridge
--- a/CodeGen/docker_compose/set_env.sh
+++ b/CodeGen/docker_compose/set_env.sh
@@ -7,7 +7,6 @@ source .set_env.sh
 popd > /dev/null
 export host_ip=$(hostname -I | awk '{print $1}')
 if [ -z "${HUGGINGFACEHUB_API_TOKEN}" ]; then
    echo "Error: HUGGINGFACEHUB_API_TOKEN is not set. Please set HUGGINGFACEHUB_API_TOKEN"
 fi
@@ -17,10 +16,35 @@ if [ -z "${host_ip}" ]; then
 fi
 export no_proxy=${no_proxy},${host_ip}
 export http_proxy=${http_proxy}
 export https_proxy=${https_proxy}
-export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
+export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
 export LLM_SERVICE_PORT=9000
 export LLM_ENDPOINT="http://${host_ip}:8028"
 export MEGA_SERVICE_HOST_IP=${host_ip}
 export LLM_SERVICE_HOST_IP=${host_ip}
 export TGI_LLM_ENDPOINT="http://${host_ip}:8028"
 export MEGA_SERVICE_PORT=7778
 export MEGA_SERVICE_HOST_IP=${host_ip}
 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"
 export REDIS_DB_PORT=6379
 export REDIS_INSIGHTS_PORT=8001
 export REDIS_RETRIEVER_PORT=7000
 export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
 export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
 export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
 export INDEX_NAME="CodeGen"
 export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
 export EMBEDDER_PORT=6000
 export TEI_EMBEDDER_PORT=8090
 export TEI_EMBEDDING_HOST_IP=${host_ip}
 export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
 export DATAPREP_REDIS_PORT=6007
 export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
 export LOGFLAG=false
 export MODEL_CACHE="./data"
 export NUM_CARDS=1
--- a/CodeGen/docker_image_build/build.yaml
+++ b/CodeGen/docker_image_build/build.yaml
@@ -23,6 +23,12 @@ services:
      dockerfile: ./docker/Dockerfile.react
    extends: codegen
    image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
  codegen-gradio-ui:
    build:
      context: ../ui
      dockerfile: ./docker/Dockerfile.gradio
    extends: codegen
    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
  llm-textgen:
    build:
      context: GenAIComps
@@ -46,3 +52,21 @@ services:
      dockerfile: Dockerfile.hpu
    extends: codegen
    image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
  dataprep:
    build:
      context: GenAIComps
      dockerfile: comps/dataprep/src/Dockerfile
    extends: codegen
    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
  retriever:
    build:
      context: GenAIComps
      dockerfile: comps/retrievers/src/Dockerfile
    extends: codegen
    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
  embedding:
    build:
      context: GenAIComps
      dockerfile: comps/embeddings/src/Dockerfile
    extends: codegen
    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
--- a/CodeGen/tests/test_compose_on_gaudi.sh
+++ b/CodeGen/tests/test_compose_on_gaudi.sh
@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
 export MODEL_CACHE=${model_cache:-"./data"}
 export REDIS_DB_PORT=6379
 export REDIS_INSIGHTS_PORT=8001
 export REDIS_RETRIEVER_PORT=7000
 export EMBEDDER_PORT=6000
 export TEI_EMBEDDER_PORT=8090
 export DATAPREP_REDIS_PORT=6007
 WORKPATH=$(dirname "$PWD")
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
 export http_proxy=${http_proxy}
 export https_proxy=${https_proxy}
 export no_proxy=${no_proxy},${ip_address}
 function build_docker_images() {
    opea_branch=${opea_branch:-"main"}
    # If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -31,13 +41,14 @@ function build_docker_images() {
    cd $WORKPATH/docker_image_build
    git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git
    # Download Gaudi vllm of latest tag
    git clone https://github.com/HabanaAI/vllm-fork.git && cd vllm-fork
    VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
    echo "Check out vLLM tag ${VLLM_VER}"
    git checkout ${VLLM_VER} &> /dev/null && cd ../
    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="codegen codegen-ui llm-textgen vllm-gaudi"
+    service_list="codegen codegen-gradio-ui llm-textgen vllm-gaudi dataprep retriever embedding"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
    docker images && sleep 1s
@@ -48,18 +59,28 @@ function start_services() {
    local llm_container_name="$2"
    cd $WORKPATH/docker_compose/intel/hpu/gaudi
-    export http_proxy=${http_proxy}
+
    export https_proxy=${https_proxy}
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
    export LLM_ENDPOINT="http://${ip_address}:8028"
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
    export MEGA_SERVICE_PORT=7778
    export MEGA_SERVICE_HOST_IP=${ip_address}
    export LLM_SERVICE_HOST_IP=${ip_address}
-    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
+    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
    export NUM_CARDS=1
    export host_ip=${ip_address}
-    sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+    export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
    export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
    export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
    export INDEX_NAME="CodeGen"
    export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    export TEI_EMBEDDING_HOST_IP=${host_ip}
    export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
    export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
    export INDEX_NAME="CodeGen"
    # Start Docker Containers
    docker compose --profile ${compose_profile} up -d | tee ${LOG_PATH}/start_services_with_compose.log
@@ -82,23 +103,34 @@ function validate_services() {
    local DOCKER_NAME="$4"
    local INPUT_DATA="$5"
-    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
+    if [[ "$SERVICE_NAME" == "ingest" ]]; then
-    if [ "$HTTP_STATUS" -eq 200 ]; then
+        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-        local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
+        if [ "$HTTP_STATUS" -eq 200 ]; then
-
+            echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
        if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
            echo "[ $SERVICE_NAME ] Content is as expected."
        else
-            echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
+            echo "[ $SERVICE_NAME ] Data preparation failed..."
        fi
    else
        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
        if [ "$HTTP_STATUS" -eq 200 ]; then
            echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
            local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
            if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
                echo "[ $SERVICE_NAME ] Content is as expected."
            else
                echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
                docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
                exit 1
            fi
        else
            echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
            docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
            exit 1
        fi
    else
        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
        docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
        exit 1
    fi
    sleep 5s
 }
@@ -122,6 +154,14 @@ function validate_microservices() {
        "llm-textgen-gaudi-server" \
        '{"query":"def print_hello_world():"}'
    # Data ingest microservice
    validate_services \
        "${ip_address}:6007/v1/dataprep/ingest" \
        "Data preparation succeeded" \
        "ingest" \
        "dataprep-redis-server" \
        'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
 }
 function validate_megaservice() {
@@ -133,6 +173,14 @@ function validate_megaservice() {
        "codegen-gaudi-backend-server" \
        '{"messages": "def print_hello_world():"}'
    # Curl the Mega Service with index_name and agents_flag
    validate_services \
        "${ip_address}:7778/v1/codegen" \
        "" \
        "mega-codegen" \
        "codegen-gaudi-backend-server" \
        '{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
 }
 function validate_frontend() {
@@ -163,6 +211,18 @@ function validate_frontend() {
    fi
 }
 function validate_gradio() {
    local URL="http://${ip_address}:5173/health"
    local HTTP_STATUS=$(curl "$URL")
    local SERVICE_NAME="Gradio"
    if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
        echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
    else
        echo "[ $SERVICE_NAME ] UI server has failed..."
    fi
 }
 function stop_docker() {
    local docker_profile="$1"
@@ -201,7 +261,7 @@ function main() {
        validate_microservices "${docker_llm_container_names[${i}]}"
        validate_megaservice
-        validate_frontend
+        validate_gradio
        stop_docker "${docker_compose_profiles[${i}]}"
        sleep 5s
--- a/CodeGen/tests/test_compose_on_xeon.sh
+++ b/CodeGen/tests/test_compose_on_xeon.sh
@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
 export MODEL_CACHE=${model_cache:-"./data"}
 export REDIS_DB_PORT=6379
 export REDIS_INSIGHTS_PORT=8001
 export REDIS_RETRIEVER_PORT=7000
 export EMBEDDER_PORT=6000
 export TEI_EMBEDDER_PORT=8090
 export DATAPREP_REDIS_PORT=6007
 WORKPATH=$(dirname "$PWD")
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
 export http_proxy=${http_proxy}
 export https_proxy=${https_proxy}
 export no_proxy=${no_proxy},${ip_address}
 function build_docker_images() {
    opea_branch=${opea_branch:-"main"}
    # If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -38,7 +48,8 @@ function build_docker_images() {
    cd ../
    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="codegen codegen-ui llm-textgen vllm"
+    service_list="codegen codegen-gradio-ui llm-textgen vllm dataprep retriever embedding"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
    docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
@@ -54,12 +65,21 @@ function start_services() {
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
    export LLM_ENDPOINT="http://${ip_address}:8028"
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
    export MEGA_SERVICE_PORT=7778
    export MEGA_SERVICE_HOST_IP=${ip_address}
    export LLM_SERVICE_HOST_IP=${ip_address}
-    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
+    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
    export host_ip=${ip_address}
-    sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+    export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
    export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
    export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
    export INDEX_NAME="CodeGen"
    export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    export TEI_EMBEDDING_HOST_IP=${host_ip}
    export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
    export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
    # Start Docker Containers
    docker compose --profile ${compose_profile} up -d > ${LOG_PATH}/start_services_with_compose.log
@@ -82,23 +102,34 @@ function validate_services() {
    local DOCKER_NAME="$4"
    local INPUT_DATA="$5"
-    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
+    if [[ "$SERVICE_NAME" == "ingest" ]]; then
-    if [ "$HTTP_STATUS" -eq 200 ]; then
+        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-        local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
+        if [ "$HTTP_STATUS" -eq 200 ]; then
-
+            echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
        if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
            echo "[ $SERVICE_NAME ] Content is as expected."
        else
-            echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
+            echo "[ $SERVICE_NAME ] Data preparation failed..."
        fi
    else
        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
        if [ "$HTTP_STATUS" -eq 200 ]; then
            echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
            local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
            if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
                echo "[ $SERVICE_NAME ] Content is as expected."
            else
                echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
                docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
                exit 1
            fi
        else
            echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
            docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
            exit 1
        fi
    else
        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
        docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
        exit 1
    fi
    sleep 5s
 }
@@ -122,6 +153,14 @@ function validate_microservices() {
        "llm-textgen-server" \
        '{"query":"def print_hello_world():", "max_tokens": 256}'
    # Data ingest microservice
    validate_services \
        "${ip_address}:6007/v1/dataprep/ingest" \
        "Data preparation succeeded" \
        "ingest" \
        "dataprep-redis-server" \
        'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
 }
 function validate_megaservice() {
@@ -133,6 +172,14 @@ function validate_megaservice() {
        "codegen-xeon-backend-server" \
        '{"messages": "def print_hello_world():", "max_tokens": 256}'
    # Curl the Mega Service with index_name and agents_flag
    validate_services \
        "${ip_address}:7778/v1/codegen" \
        "" \
        "mega-codegen" \
        "codegen-xeon-backend-server" \
        '{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
 }
 function validate_frontend() {
@@ -163,6 +210,17 @@ function validate_frontend() {
    fi
 }
 function validate_gradio() {
    local URL="http://${ip_address}:5173/health"
    local HTTP_STATUS=$(curl "$URL")
    local SERVICE_NAME="Gradio"
    if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
        echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
    else
        echo "[ $SERVICE_NAME ] UI server has failed..."
    fi
 }
 function stop_docker() {
    local docker_profile="$1"
@@ -202,7 +260,7 @@ function main() {
        validate_microservices "${docker_llm_container_names[${i}]}"
        validate_megaservice
-        validate_frontend
+        validate_gradio
        stop_docker "${docker_compose_profiles[${i}]}"
        sleep 5s
--- a/CodeGen/ui/docker/Dockerfile.gradio
+++ b/CodeGen/ui/docker/Dockerfile.gradio
@@ -0,0 +1,33 @@
 # Copyright (C) 2025 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 FROM python:3.11-slim
 ENV LANG=C.UTF-8
 ARG ARCH="cpu"
 RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
    build-essential \
    default-jre \
    libgl1-mesa-glx \
    libjemalloc-dev \
    wget
 # Install ffmpeg static build
 WORKDIR /root
 RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz && \
    mkdir ffmpeg-git-amd64-static && tar -xvf ffmpeg-git-amd64-static.tar.xz -C ffmpeg-git-amd64-static --strip-components 1 && \
    export PATH=/root/ffmpeg-git-amd64-static:$PATH && \
    cp /root/ffmpeg-git-amd64-static/ffmpeg /usr/local/bin/ && \
    cp /root/ffmpeg-git-amd64-static/ffprobe /usr/local/bin/
 RUN mkdir -p /home/user
 COPY gradio /home/user/gradio
 RUN pip install --no-cache-dir --upgrade pip setuptools && \
 pip install --no-cache-dir -r /home/user/gradio/requirements.txt
 WORKDIR /home/user/gradio
 ENTRYPOINT ["python", "codegen_ui_gradio.py"]
--- a/CodeGen/ui/gradio/README.md
+++ b/CodeGen/ui/gradio/README.md
@@ -0,0 +1,65 @@
 # Document Summary
 This project provides a user interface for summarizing documents and text using a Dockerized frontend application. Users can upload files or paste text to generate summaries.
 ## Docker
 ### Build UI Docker Image
 To build the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui` directory and run the following command:
 ```bash
 cd GenAIExamples/CodeGen/ui
 docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
 ```
 This command builds the Docker image with the tag `opea/codegen-gradio-ui:latest`. It also passes the proxy settings as build arguments to ensure that the build process can access the internet if you are behind a corporate firewall.
 ### Run UI Docker Image
 To run the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and execute the following commands:
 ```bash
 cd GenAIExamples/CodeGen/ui/gradio
 ip_address=$(hostname -I | awk '{print $1}')
 docker run -d -p 5173:5173 --ipc=host \
   -e http_proxy=$http_proxy \
   -e https_proxy=$https_proxy \
   -e no_proxy=$no_proxy \
   -e BACKEND_SERVICE_ENDPOINT=http://$ip_address:7778/v1/codegen \
   opea/codegen-gradio-ui:latest
 ```
 This command runs the Docker container in interactive mode, mapping port 5173 of the host to port 5173 of the container. It also sets several environment variables, including the backend service endpoint, which is required for the frontend to communicate with the backend service.
 ### Python
 To run the frontend application directly using Python, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and run the following command:
 ```bash
 cd GenAIExamples/CodeGen/ui/gradio
 python codegen_ui_gradio.py
 ```
 This command starts the frontend application using Python.
 ## Additional Information
 ### Prerequisites
 Ensure you have Docker installed and running on your system. Also, make sure you have the necessary proxy settings configured if you are behind a corporate firewall.
 ### Environment Variables
 - `http_proxy`: Proxy setting for HTTP connections.
 - `https_proxy`: Proxy setting for HTTPS connections.
 - `no_proxy`: Comma-separated list of hosts that should be excluded from proxying.
 - `BACKEND_SERVICE_ENDPOINT`: The endpoint of the backend service that the frontend will communicate with.
 ### Troubleshooting
 - Docker Build Issues: If you encounter issues while building the Docker image, ensure that your proxy settings are correctly configured and that you have internet access.
 - Docker Run Issues: If the Docker container fails to start, check the environment variables and ensure that the backend service is running and accessible.
 This README file provides detailed instructions and explanations for building and running the Dockerized frontend application, as well as running it directly using Python. It also highlights the key features of the project and provides additional information for troubleshooting and configuring the environment.
--- a/CodeGen/ui/gradio/codegen_ui_gradio.py
+++ b/CodeGen/ui/gradio/codegen_ui_gradio.py
@@ -0,0 +1,371 @@
 # Copyright (C) 2025 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0
 # This is a Gradio app that includes two tabs: one for code generation and another for resource management.
 # The resource management tab has been updated to allow file uploads, deletion, and a table listing all the files.
 # Additionally, three small text boxes have been added for managing file dataframe parameters.
 import argparse
 import json
 import os
 from pathlib import Path
 from urllib.parse import urlparse
 import gradio as gr
 import pandas as pd
 import requests
 import uvicorn
 from fastapi import FastAPI
 from fastapi.staticfiles import StaticFiles
 logflag = os.getenv("LOGFLAG", False)
 # create a FastAPI app
 app = FastAPI()
 cur_dir = os.getcwd()
 static_dir = Path(os.path.join(cur_dir, "static/"))
 tmp_dir = Path(os.path.join(cur_dir, "split_tmp_videos/"))
 Path(static_dir).mkdir(parents=True, exist_ok=True)
 app.mount("/static", StaticFiles(directory=static_dir), name="static")
 tmp_upload_folder = "/tmp/gradio/"
 host_ip = os.getenv("host_ip")
 DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
 DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
 MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
 backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
 dataprep_ingest_endpoint = f"{DATAPREP_ENDPOINT}/ingest"
 dataprep_get_files_endpoint = f"{DATAPREP_ENDPOINT}/get"
 dataprep_delete_files_endpoint = f"{DATAPREP_ENDPOINT}/delete"
 dataprep_get_indices_endpoint = f"{DATAPREP_ENDPOINT}/indices"
 # Define the functions that will be used in the app
 def conversation_history(prompt, index, use_agent, history):
    print(f"Generating code for prompt: {prompt} using index: {index} and use_agent is {use_agent}")
    history.append([prompt, ""])
    response_generator = generate_code(prompt, index, use_agent)
    for token in response_generator:
        history[-1][-1] += token
        yield history
 def upload_media(media, index=None, chunk_size=1500, chunk_overlap=100):
    media = media.strip().split("\n")
    if not chunk_size:
        chunk_size = 1500
    if not chunk_overlap:
        chunk_overlap = 100
    requests = []
    if type(media) is list:
        for file in media:
            file_ext = os.path.splitext(file)[-1]
            if is_valid_url(file):
                yield (
                    gr.Textbox(
                        visible=True,
                        value="Ingesting URL...",
                    )
                )
                value = ingest_url(file, index, chunk_size, chunk_overlap)
                requests.append(value)
                yield value
            elif file_ext in [".pdf", ".txt"]:
                yield (
                    gr.Textbox(
                        visible=True,
                        value="Ingesting file...",
                    )
                )
                value = ingest_file(file, index, chunk_size, chunk_overlap)
                requests.append(value)
                yield value
            else:
                yield (
                    gr.Textbox(
                        visible=True,
                        value="Your media is either an invalid URL or the file extension type is not supported. (Supports .pdf, .txt, url)",
                    )
                )
                return
        yield requests
    else:
        file_ext = os.path.splitext(media)[-1]
        if is_valid_url(media):
            value = ingest_url(media, index, chunk_size, chunk_overlap)
            yield value
        elif file_ext in [".pdf", ".txt"]:
            value = ingest_file(media, index, chunk_size, chunk_overlap)
            yield value
        else:
            yield (
                gr.Textbox(
                    visible=True,
                    value="Your file extension type is not supported.",
                )
            )
            return
 def generate_code(query, index=None, use_agent=False):
    if index is None or index == "None":
        input_dict = {"messages": query, "agents_flag": use_agent}
    else:
        input_dict = {"messages": query, "index_name": index, "agents_flag": use_agent}
    print("Query is ", input_dict)
    headers = {"Content-Type": "application/json"}
    response = requests.post(url=backend_service_endpoint, headers=headers, data=json.dumps(input_dict), stream=True)
    line_count = 0
    for line in response.iter_lines():
        line_count += 1
        if line:
            line = line.decode("utf-8")
            if line.startswith("data: "):  # Only process lines starting with "data: "
                json_part = line[len("data: ") :]  # Remove the "data: " prefix
            else:
                json_part = line
            if json_part.strip() == "[DONE]":  # Ignore the DONE marker
                continue
            try:
                json_obj = json.loads(json_part)  # Convert to dictionary
                if "choices" in json_obj:
                    for choice in json_obj["choices"]:
                        if "text" in choice:
                            # Yield each token individually
                            yield choice["text"]
            except json.JSONDecodeError:
                print("Error parsing JSON:", json_part)
    if line_count == 0:
        yield "Something went wrong, No Response Generated! \nIf you are using an Index, try uploading your media again with a smaller chunk size to avoid exceeding the token max. \
        \nOr, check the Use Agent box and try again."
 def ingest_file(file, index=None, chunk_size=100, chunk_overlap=150):
    headers = {
        # "Content-Type: multipart/form-data"
    }
    file_input = {"files": open(file, "rb")}
    if index:
        print("Index is", index)
        data = {"index_name": index, "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
    else:
        data = {"chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
    response = requests.post(url=dataprep_ingest_endpoint, headers=headers, files=file_input, data=data)
    return response.text
 def ingest_url(url, index=None, chunk_size=100, chunk_overlap=150):
    url = str(url)
    if not is_valid_url(url):
        return "Invalid URL entered. Please enter a valid URL"
    headers = {
        # "Content-Type: multipart/form-data"
    }
    if index:
        url_input = {
            "link_list": json.dumps([url]),
            "index_name": index,
            "chunk_size": chunk_size,
            "chunk_overlap": chunk_overlap,
        }
    else:
        url_input = {"link_list": json.dumps([url]), "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
    response = requests.post(url=dataprep_ingest_endpoint, headers=headers, data=url_input)
    return response.text
 def is_valid_url(url):
    url = str(url)
    try:
        result = urlparse(url)
        return all([result.scheme, result.netloc])
    except ValueError:
        return False
 def get_files(index=None):
    headers = {
        # "Content-Type: multipart/form-data"
    }
    if index == "All Files":
        index = None
    if index:
        index = {"index_name": index}
        response = requests.post(url=dataprep_get_files_endpoint, headers=headers, data=index)
        table = response.json()
        return table
    else:
        response = requests.post(url=dataprep_get_files_endpoint, headers=headers)
        table = response.json()
        return table
 def update_table(index=None):
    if index == "All Files":
        index = None
    files = get_files(index)
    if len(files) == 0:
        df = pd.DataFrame(files, columns=["Files"])
        return df
    else:
        df = pd.DataFrame(files)
        return df
 def update_indices():
    indices = get_indices()
    df = pd.DataFrame(indices, columns=["File Indices"])
    return df
 def delete_file(file, index=None):
    # Remove the selected file from the file list
    headers = {
        # "Content-Type: application/json"
    }
    if index:
        file_input = {"files": open(file, "rb"), "index_name": index}
    else:
        file_input = {"files": open(file, "rb")}
    response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data=file_input)
    table = update_table()
    return response.text
 def delete_all_files(index=None):
    # Remove all files from the file list
    headers = {
        # "Content-Type: application/json"
    }
    response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data='{"file_path": "all"}')
    table = update_table()
    return "Delete All status: " + response.text
 def get_indices():
    headers = {
        # "Content-Type: application/json"
    }
    response = requests.post(url=dataprep_get_indices_endpoint, headers=headers)
    indices = ["None"]
    indices += response.json()
    return indices
 def update_indices_dropdown():
    new_dd = gr.update(choices=get_indices(), value="None")
    return new_dd
 def get_file_names(files):
    file_str = ""
    if not files:
        return file_str
    for file in files:
        file_str += file + "\n"
    file_str.strip()
    return file_str
 # Define UI components
 with gr.Blocks() as ui:
    with gr.Tab("Code Generation"):
        gr.Markdown("### Generate Code from Natural Language")
        chatbot = gr.Chatbot(label="Chat History")
        prompt_input = gr.Textbox(label="Enter your query")
        with gr.Column():
            with gr.Row(equal_height=True):
                database_dropdown = gr.Dropdown(choices=get_indices(), label="Select Index", value="None", scale=10)
                db_refresh_button = gr.Button("Refresh Dropdown", scale=0.1)
                db_refresh_button.click(update_indices_dropdown, outputs=database_dropdown)
                use_agent = gr.Checkbox(label="Use Agent", container=False)
        generate_button = gr.Button("Generate Code")
        generate_button.click(
            conversation_history, inputs=[prompt_input, database_dropdown, use_agent, chatbot], outputs=chatbot
        )
    with gr.Tab("Resource Management"):
        # File management components
        with gr.Row():
            with gr.Column(scale=1):
                index_name_input = gr.Textbox(label="Index Name")
                chunk_size_input = gr.Textbox(
                    label="Chunk Size", value="1500", placeholder="Enter an integer (default: 1500)"
                )
                chunk_overlap_input = gr.Textbox(
                    label="Chunk Overlap", value="100", placeholder="Enter an integer (default: 100)"
                )
            with gr.Column(scale=3):
                file_upload = gr.File(label="Upload Files", file_count="multiple")
                url_input = gr.Textbox(label="Media to be ingested (Append URL's in a new line)")
                upload_button = gr.Button("Upload", variant="primary")
                upload_status = gr.Textbox(label="Upload Status")
                file_upload.change(get_file_names, inputs=file_upload, outputs=url_input)
            with gr.Column(scale=1):
                file_table = gr.Dataframe(interactive=False, value=update_indices())
                refresh_button = gr.Button("Refresh", variant="primary", size="sm")
                refresh_button.click(update_indices, outputs=file_table)
                upload_button.click(
                    upload_media,
                    inputs=[url_input, index_name_input, chunk_size_input, chunk_overlap_input],
                    outputs=upload_status,
                )
                delete_all_button = gr.Button("Delete All", variant="primary", size="sm")
                delete_all_button.click(delete_all_files, outputs=upload_status)
@app.get("/health")
 def health_check():
    return {"status": "ok"}
 ui.queue()
 app = gr.mount_gradio_app(app, ui, path="/")
 share = False
 enable_queue = True
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", type=str, default="0.0.0.0")
    parser.add_argument("--port", type=int, default=os.getenv("UI_PORT", 5173))
    parser.add_argument("--concurrency-count", type=int, default=20)
    parser.add_argument("--share", action="store_true")
    host_ip = os.getenv("host_ip")
    DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
    DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
    MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
    backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
    args = parser.parse_args()
    global gateway_addr
    gateway_addr = backend_service_endpoint
    global dataprep_ingest_addr
    dataprep_ingest_addr = dataprep_ingest_endpoint
    global dataprep_get_files_addr
    dataprep_get_files_addr = dataprep_get_files_endpoint
    uvicorn.run(app, host=args.host, port=args.port)
--- a/CodeGen/ui/gradio/requirements.txt
+++ b/CodeGen/ui/gradio/requirements.txt
@@ -0,0 +1,4 @@
 gradio==5.22.0
 numpy==1.26.4
 opencv-python==4.10.0.82
 Pillow==10.3.0