CodGen Examples using-RAG-and-Agents (#1757)

Signed-off-by: Mustafa <mustafa.cetin@intel.com>
2025-04-09 01:12:20 -07:00
parent 8b7cb3539e
commit 892624f539
18 changed files with 1524 additions and 239 deletions
--- a/CodeGen/README.md
+++ b/CodeGen/README.md
@@ -1,6 +1,6 @@
 # Code Generation Application

-Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions.
+Code Generation (CodeGen) Large Language Models (LLMs) are specialized AI models designed for the task of generating computer code. Such models undergo training with datasets that encompass repositories, specialized documentation, programming code, relevant web content, and other related data. They possess a deep understanding of various programming languages, coding patterns, and software development concepts. CodeGen LLMs are engineered to assist developers and programmers. When these LLMs are seamlessly integrated into the developer's Integrated Development Environment (IDE), they possess a comprehensive understanding of the coding context, which includes elements such as comments, function names, and variable names. This contextual awareness empowers them to provide more refined and contextually relevant coding suggestions. Additionally Retrieval-Augmented Generation (RAG) and Agents are parts of the CodeGen example which provide an additional layer of intelligence and adaptability, ensuring that the generated code is not only relevant but also accurate, efficient, and tailored to the specific needs of the developers and programmers.

 The capabilities of CodeGen LLMs include:

@@ -28,7 +28,7 @@ config:
    rankSpacing: 100
    curve: linear
  themeVariables:
-    fontSize: 50px
+    fontSize: 25px
 ---
 flowchart LR
    %% Colors %%
@@ -37,34 +37,56 @@ flowchart LR
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style CodeGen-MegaService stroke:#000000
-
    %% Subgraphs %%
-    subgraph CodeGen-MegaService["CodeGen MegaService "]
+    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        LLM([LLM MicroService]):::blue
+        EM([Embedding<br>MicroService]):::blue
+        RET([Retrieval<br>MicroService]):::blue
+        RER([Agents]):::blue
+        LLM([LLM<br>MicroService]):::blue
    end
-    subgraph UserInterface[" User Interface "]
+    subgraph User Interface
        direction LR
-        a([User Input Query]):::orchid
-        UI([UI server<br>]):::orchid
+        a([Submit Query Tab]):::orchid
+        UI([UI server]):::orchid
+        Ingest([Manage Resources]):::orchid
    end

+    CLIP_EM{{Embedding<br>service}}
+    VDB{{Vector DB}}
+    V_RET{{Retriever<br>service}}
+    Ingest{{Ingest data}}
+    DP([Data Preparation]):::blue
+    LLM_gen{{TGI Service}}
+    GW([CodeGen GateWay]):::orange

-    LLM_gen{{LLM Service <br>}}
-    GW([CodeGen GateWay<br>]):::orange
-
+    %% Data Preparation flow
+    %% Ingest data flow
+    direction LR
+    Ingest[Ingest data] --> UI
+    UI --> DP
+    DP <-.-> CLIP_EM

    %% Questions interaction
    direction LR
    a[User Input Query] --> UI
    UI --> GW
    GW <==> CodeGen-MegaService
+    EM ==> RET
+    RET ==> RER
+    RER ==> LLM


    %% Embedding service flow
    direction LR
+    EM <-.-> CLIP_EM
+    RET <-.-> V_RET
    LLM <-.-> LLM_gen

+    direction TB
+    %% Vector DB interaction
+    V_RET <-.->VDB
+    DP <-.->VDB
 ```

 ## 🤖 Automated Terraform Deployment using Intel® Optimized Cloud Modules for **Terraform**
@@ -95,11 +117,11 @@ Currently we support two ways of deploying ChatQnA services with docker compose:
 By default, the LLM model is set to a default value as listed below:

 | Service      | Model                                                                                     |
-| ------------ | --------------------------------------------------------------------------------------- |
-| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) |
+| ------------ | ----------------------------------------------------------------------------------------- |
+| LLM_MODEL_ID | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) |

-[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model.
-Change the `LLM_MODEL_ID` below for your needs, such as: [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
+[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) may be a gated model that requires submitting an access request through Hugging Face. You can replace it with another model for m.
+Change the `LLM_MODEL_ID` below for your needs, such as: [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct), [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)

 If you choose to use `meta-llama/CodeLlama-7b-hf` as LLM model, you will need to visit [here](https://huggingface.co/meta-llama/CodeLlama-7b-hf), click the `Expand to review and access` button to ask for model access.

@@ -134,22 +156,44 @@ To set up environment variables for deploying ChatQnA services, follow these ste

 #### Deploy CodeGen on Gaudi

-Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
+Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). User could start CodeGen based on TGI or vLLM service:

 ```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
-docker compose up -d
+```
+
+TGI service:
+
+```bash
+docker compose --profile codegen-gaudi-tgi up -d
+```
+
+vLLM service:
+
+```bash
+docker compose --profile codegen-gaudi-vllm up -d
 ```

 Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.

 #### Deploy CodeGen on Xeon

-Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
+Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). User could start CodeGen based on TGI or vLLM service:

 ```bash
 cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
-docker compose up -d
+```
+
+TGI service:
+
+```bash
+docker compose --profile codegen-xeon-tgi up -d
+```
+
+vLLM service:
+
+```bash
+docker compose --profile codegen-xeon-vllm up -d
 ```

 Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
@@ -170,6 +214,15 @@ Two ways of consuming CodeGen Service:
       -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
   ```

+   If the user wants a CodeGen service with RAG and Agents based on dedicated documentation.
+
+   ```bash
+   curl http://localhost:7778/v1/codegen \
+      -H "Content-Type: application/json" \
+      -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+
+   ```
+
 2. Access via frontend

   To access the frontend, open the following URL in your browser: http://{host_ip}:5173.
--- a/CodeGen/assets/img/codegen_gradio_ui_dataprep.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_dataprep.png
--- a/CodeGen/assets/img/codegen_gradio_ui_main.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_main.png
--- a/CodeGen/assets/img/codegen_gradio_ui_query.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_query.png
--- a/CodeGen/assets/img/codegen_gradio_ui_rm.png
+++ b/CodeGen/assets/img/codegen_gradio_ui_rm.png
--- a/CodeGen/codegen.py
+++ b/CodeGen/codegen.py
@@ -1,10 +1,11 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

+import ast
 import asyncio
 import os

-from comps import MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
+from comps import CustomLogger, MegaServiceEndpoint, MicroService, ServiceOrchestrator, ServiceRoleType, ServiceType
 from comps.cores.mega.utils import handle_message
 from comps.cores.proto.api_protocol import (
    ChatCompletionRequest,
@@ -16,20 +17,98 @@ from comps.cores.proto.api_protocol import (
 from comps.cores.proto.docarray import LLMParams
 from fastapi import Request
 from fastapi.responses import StreamingResponse
+from langchain.prompts import PromptTemplate
+
+logger = CustomLogger("opea_dataprep_microservice")
+logflag = os.getenv("LOGFLAG", False)

 MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 7778))
 LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
 LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
+RETRIEVAL_SERVICE_HOST_IP = os.getenv("RETRIEVAL_SERVICE_HOST_IP", "0.0.0.0")
+REDIS_RETRIEVER_PORT = int(os.getenv("REDIS_RETRIEVER_PORT", 7000))
+TEI_EMBEDDING_HOST_IP = os.getenv("TEI_EMBEDDING_HOST_IP", "0.0.0.0")
+EMBEDDER_PORT = int(os.getenv("EMBEDDER_PORT", 6000))
+
+grader_prompt = """You are a grader assessing relevance of a retrieved document to a user question. \n
+Here is the user question: {question} \n
+Here is the retrieved document: \n\n {document} \n\n
+
+If the document contains keywords related to the user question, grade it as relevant.
+It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
+Rules:
+- Do not return the question, the provided document or explanation.
+- if this document is relevant to the question, return 'yes' otherwise return 'no'.
+- Do not include any other details in your response.
+"""
+
+
+def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
+    """Aligns the inputs based on the service type of the current node.
+
+    Parameters:
+    - self: Reference to the current instance of the class.
+    - inputs: Dictionary containing the inputs for the current node.
+    - cur_node: The current node in the service orchestrator.
+    - runtime_graph: The runtime graph of the service orchestrator.
+    - llm_parameters_dict: Dictionary containing the LLM parameters.
+    - kwargs: Additional keyword arguments.
+
+    Returns:
+    - inputs: The aligned inputs for the current node.
+    """
+
+    # Check if the current service type is EMBEDDING
+    if self.services[cur_node].service_type == ServiceType.EMBEDDING:
+        # Store the input query for later use
+        self.input_query = inputs["query"]
+        # Set the input for the embedding service
+        inputs["input"] = inputs["query"]
+
+    # Check if the current service type is RETRIEVER
+    if self.services[cur_node].service_type == ServiceType.RETRIEVER:
+        # Extract the embedding from the inputs
+        embedding = inputs["data"][0]["embedding"]
+        # Align the inputs for the retriever service
+        inputs = {"index_name": llm_parameters_dict["index_name"], "text": self.input_query, "embedding": embedding}
+
+    return inputs


 class CodeGenService:
    def __init__(self, host="0.0.0.0", port=8000):
        self.host = host
        self.port = port
-        self.megaservice = ServiceOrchestrator()
+        ServiceOrchestrator.align_inputs = align_inputs
+        self.megaservice_llm = ServiceOrchestrator()
+        self.megaservice_retriever = ServiceOrchestrator()
+        self.megaservice_retriever_llm = ServiceOrchestrator()
        self.endpoint = str(MegaServiceEndpoint.CODE_GEN)

    def add_remote_service(self):
+        """Adds remote microservices to the service orchestrators and defines the flow between them."""
+
+        # Define the embedding microservice
+        embedding = MicroService(
+            name="embedding",
+            host=TEI_EMBEDDING_HOST_IP,
+            port=EMBEDDER_PORT,
+            endpoint="/v1/embeddings",
+            use_remote_service=True,
+            service_type=ServiceType.EMBEDDING,
+        )
+
+        # Define the retriever microservice
+        retriever = MicroService(
+            name="retriever",
+            host=RETRIEVAL_SERVICE_HOST_IP,
+            port=REDIS_RETRIEVER_PORT,
+            endpoint="/v1/retrieval",
+            use_remote_service=True,
+            service_type=ServiceType.RETRIEVER,
+        )
+
+        # Define the LLM microservice
        llm = MicroService(
            name="llm",
            host=LLM_SERVICE_HOST_IP,
@@ -38,13 +117,61 @@ class CodeGenService:
            use_remote_service=True,
            service_type=ServiceType.LLM,
        )
-        self.megaservice.add(llm)
+
+        # Add the microservices to the megaservice_retriever_llm orchestrator and define the flow
+        self.megaservice_retriever_llm.add(embedding).add(retriever).add(llm)
+        self.megaservice_retriever_llm.flow_to(embedding, retriever)
+        self.megaservice_retriever_llm.flow_to(retriever, llm)
+
+        # Add the microservices to the megaservice_retriever orchestrator and define the flow
+        self.megaservice_retriever.add(embedding).add(retriever)
+        self.megaservice_retriever.flow_to(embedding, retriever)
+
+        # Add the LLM microservice to the megaservice_llm orchestrator
+        self.megaservice_llm.add(llm)
+
+    async def read_streaming_response(self, response: StreamingResponse):
+        """Reads the streaming response from a StreamingResponse object.
+
+        Parameters:
+        - self: Reference to the current instance of the class.
+        - response: The StreamingResponse object to read from.
+
+        Returns:
+        - str: The complete response body as a decoded string.
+        """
+        body = b""  # Initialize an empty byte string to accumulate the response chunks
+        async for chunk in response.body_iterator:
+            body += chunk  # Append each chunk to the body
+        return body.decode("utf-8")  # Decode the accumulated byte string to a regular string

    async def handle_request(self, request: Request):
+        """Handles the incoming request, processes it through the appropriate microservices,
+        and returns the response.
+
+        Parameters:
+        - self: Reference to the current instance of the class.
+        - request: The incoming request object.
+
+        Returns:
+        - ChatCompletionResponse: The response from the LLM microservice.
+        """
+        # Parse the incoming request data
        data = await request.json()
+
+        # Get the stream option from the request data, default to True if not provided
        stream_opt = data.get("stream", True)
-        chat_request = ChatCompletionRequest.parse_obj(data)
+
+        # Validate and parse the chat request data
+        chat_request = ChatCompletionRequest.model_validate(data)
+
+        # Handle the chat messages to generate the prompt
        prompt = handle_message(chat_request.messages)
+
+        # Get the agents flag from the request data, default to False if not provided
+        agents_flag = data.get("agents_flag", False)
+
+        # Define the LLM parameters
        parameters = LLMParams(
            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
            top_k=chat_request.top_k if chat_request.top_k else 10,
@@ -54,18 +181,90 @@ class CodeGenService:
            presence_penalty=chat_request.presence_penalty if chat_request.presence_penalty else 0.0,
            repetition_penalty=chat_request.repetition_penalty if chat_request.repetition_penalty else 1.03,
            stream=stream_opt,
+            index_name=chat_request.index_name,
        )
-        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs={"query": prompt}, llm_parameters=parameters
+
+        # Initialize the initial inputs with the generated prompt
+        initial_inputs = {"query": prompt}
+
+        # Check if the key index name is provided in the parameters
+        if parameters.index_name:
+            if agents_flag:
+                # Schedule the retriever microservice
+                result_ret, runtime_graph = await self.megaservice_retriever.schedule(
+                    initial_inputs=initial_inputs, llm_parameters=parameters
                )
+
+                # Switch to the LLM microservice
+                megaservice = self.megaservice_llm
+
+                relevant_docs = []
+                for doc in result_ret["retriever/MicroService"]["retrieved_docs"]:
+                    # Create the PromptTemplate
+                    prompt_agent = PromptTemplate(template=grader_prompt, input_variables=["question", "document"])
+
+                    # Format the template with the input variables
+                    formatted_prompt = prompt_agent.format(question=prompt, document=doc["text"])
+                    initial_inputs_grader = {"query": formatted_prompt}
+
+                    # Schedule the LLM microservice for grading
+                    grade, runtime_graph = await self.megaservice_llm.schedule(
+                        initial_inputs=initial_inputs_grader, llm_parameters=parameters
+                    )
+
+                    for node, response in grade.items():
+                        if isinstance(response, StreamingResponse):
+                            # Read the streaming response
+                            grader_response = await self.read_streaming_response(response)
+
+                            # Replace null with None
+                            grader_response = grader_response.replace("null", "None")
+
+                            # Split the response by "data:" and process each part
+                            for i in grader_response.split("data:"):
+                                if '"text":' in i:
+                                    # Convert the string to a dictionary
+                                    r = ast.literal_eval(i)
+                                    # Check if the response text is "yes"
+                                    if r["choices"][0]["text"] == "yes":
+                                        # Append the document to the relevant_docs list
+                                        relevant_docs.append(doc)
+
+                # Update the initial inputs with the relevant documents
+                if len(relevant_docs) > 0:
+                    logger.info(f"[ CodeGenService - handle_request ] {len(relevant_docs)} relevant document\s found.")
+                    query = initial_inputs["query"]
+                    initial_inputs = {}
+                    initial_inputs["retrieved_docs"] = relevant_docs
+                    initial_inputs["initial_query"] = query
+
+                else:
+                    logger.info(
+                        "[ CodeGenService - handle_request ] Could not find any relevant documents. The query will be used as input to the LLM."
+                    )
+
+            else:
+                # Use the combined retriever and LLM microservice
+                megaservice = self.megaservice_retriever_llm
+        else:
+            # Use the LLM microservice only
+            megaservice = self.megaservice_llm
+
+        # Schedule the final megaservice
+        result_dict, runtime_graph = await megaservice.schedule(
+            initial_inputs=initial_inputs, llm_parameters=parameters
+        )
+
        for node, response in result_dict.items():
-            # Here it suppose the last microservice in the megaservice is LLM.
+            # Check if the last microservice in the megaservice is LLM
            if (
                isinstance(response, StreamingResponse)
-                and node == list(self.megaservice.services.keys())[-1]
-                and self.megaservice.services[node].service_type == ServiceType.LLM
+                and node == list(megaservice.services.keys())[-1]
+                and megaservice.services[node].service_type == ServiceType.LLM
            ):
                return response
+
+        # Get the response from the last node in the runtime graph
        last_node = runtime_graph.all_leaves()[-1]
        response = result_dict[last_node]["text"]
        choices = []
--- a/CodeGen/docker_compose/intel/cpu/xeon/README.md
+++ b/CodeGen/docker_compose/intel/cpu/xeon/README.md
@@ -13,28 +13,77 @@ After launching your instance, you can connect to it using SSH (for Linux instan

 ## 🚀 Start Microservices and MegaService

-The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
+The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.

 The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:

 ```mermaid
+---
+config:
+  flowchart:
+    nodeSpacing: 400
+    rankSpacing: 100
+    curve: linear
+  themeVariables:
+    fontSize: 25px
+---
 flowchart LR
-    subgraph CodeGen
+    %% Colors %%
+    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef invisible fill:transparent,stroke:transparent;
+    style CodeGen-MegaService stroke:#000000
+    %% Subgraphs %%
+    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        A[User] --> |Input query| B[CodeGen Gateway]
-        B --> |Invoke| Megaservice
-        subgraph Megaservice["Megaservice"]
-            direction TB
-            C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
+        EM([Embedding<br>MicroService]):::blue
+        RET([Retrieval<br>MicroService]):::blue
+        RER([Agents]):::blue
+        LLM([LLM<br>MicroService]):::blue
    end
-        Megaservice --> |Output| E[Response]
+    subgraph User Interface
+        direction LR
+        a([Submit Query Tab]):::orchid
+        UI([UI server]):::orchid
+        Ingest([Manage Resources]):::orchid
    end

-    subgraph Legend
+    CLIP_EM{{Embedding<br>service}}
+    VDB{{Vector DB}}
+    V_RET{{Retriever<br>service}}
+    Ingest{{Ingest data}}
+    DP([Data Preparation]):::blue
+    LLM_gen{{TGI Service}}
+    GW([CodeGen GateWay]):::orange
+
+    %% Data Preparation flow
+    %% Ingest data flow
    direction LR
-        G([Microservice]) ==> H([Microservice])
-        I([Microservice]) -.-> J{{Server API}}
-    end
+    Ingest[Ingest data] --> UI
+    UI --> DP
+    DP <-.-> CLIP_EM
+
+    %% Questions interaction
+    direction LR
+    a[User Input Query] --> UI
+    UI --> GW
+    GW <==> CodeGen-MegaService
+    EM ==> RET
+    RET ==> RER
+    RER ==> LLM
+
+
+    %% Embedding service flow
+    direction LR
+    EM <-.-> CLIP_EM
+    RET <-.-> V_RET
+    LLM <-.-> LLM_gen
+
+    direction TB
+    %% Vector DB interaction
+    V_RET <-.->VDB
+    DP <-.->VDB
 ```

 ### Setup Environment Variables
@@ -51,38 +100,105 @@ export host_ip=${your_ip_address}
 export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
 ```

-2. Set Netowork Proxy
+2. Set Network Proxy

 **If you access public network through proxy, set the network proxy, otherwise, skip this step**

 ```bash
-export no_proxy=${your_no_proxy}
+export no_proxy=${no_proxy},${host_ip}
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_https_proxy}
 ```

 ### Start the Docker Containers for All Services

-CodeGen support TGI service and vLLM service, you can choose start either one of them.
-
-Start CodeGen based on TGI service:
+Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
+
+```bash
+cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
+```
+
+#### TGI service:

 ```bash
-cd GenAIExamples/CodeGen/docker_compose
-source set_env.sh
-cd intel/cpu/xeon
 docker compose --profile codegen-xeon-tgi up -d
 ```

-Start CodeGen based on vLLM service:
+Then run the command `docker images`, you will have the following Docker images:
+
+- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
+- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
+- `opea/codegen-gradio-ui`
+- `opea/codegen`
+- `opea/dataprep`
+- `opea/embedding`
+- `opea/llm-textgen`
+- `opea/retriever`
+- `redis/redis-stack`
+
+#### vLLM service:

 ```bash
-cd GenAIExamples/CodeGen/docker_compose
-source set_env.sh
-cd intel/cpu/xeon
 docker compose --profile codegen-xeon-vllm up -d
 ```

+Then run the command `docker images`, you will have the following Docker images:
+
+- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
+- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
+- `opea/codegen-gradio-ui`
+- `opea/codegen`
+- `opea/dataprep`
+- `opea/embedding`
+- `opea/llm-textgen`
+- `opea/retriever`
+- `redis/redis-stack`
+- `opea/vllm`
+
+### Building the Docker image locally
+
+Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
+In order to build the Docker image locally follow the instrustion provided below.
+
+#### Build the MegaService Docker Image
+
+To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
+
+```bash
+git clone https://github.com/opea-project/GenAIExamples
+cd GenAIExamples/CodeGen
+docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
+```
+
+#### Build the UI Gradio Image
+
+Build the frontend Gradio image via the command below:
+
+```bash
+cd GenAIExamples/CodeGen/ui
+docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
+```
+
+#### Dataprep Microservice with Redis
+
+Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
+
+#### Embedding Microservice with TEI
+
+Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
+
+#### LLM text generation Microservice
+
+Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
+
+#### Retriever Microservice
+
+Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
+
+#### Start Redis server
+
+Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
+
 ### Validate the MicroServices and MegaService

 1. LLM Service (for TGI, vLLM)
@@ -90,8 +206,9 @@ docker compose --profile codegen-xeon-vllm up -d
   ```bash
   curl http://${host_ip}:8028/v1/chat/completions \
       -X POST \
-       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
-       -H 'Content-Type: application/json'
+       -H 'Content-Type: application/json' \
+       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
+
   ```

 2. LLM Microservices
@@ -99,19 +216,58 @@ docker compose --profile codegen-xeon-vllm up -d
   ```bash
   curl http://${host_ip}:9000/v1/chat/completions\
     -X POST \
-     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-     -H 'Content-Type: application/json'
+     -H 'Content-Type: application/json' \
+     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
   ```

-3. MegaService
+3. Dataprep Microservice
+
+   Make sure to replace the file name placeholders with your correct file name

   ```bash
-   curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
-        "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
-        }'
+   curl http://${host_ip}:6007/v1/dataprep/ingest \
+   -X POST \
+   -H "Content-Type: multipart/form-data" \
+   -F "files=@./file1.pdf" \
+   -F "files=@./file2.txt" \
+   -F "index_name=my_API_document"
   ```

-## 🚀 Launch the UI
+4. MegaService
+
+   ```bash
+   curl http://${host_ip}:7778/v1/codegen \
+     -H "Content-Type: application/json" \
+     -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+   ```
+
+   CodeGen service with RAG and Agents activated based on an index.
+
+   ```bash
+   curl http://${host_ip}:7778/v1/codegen \
+     -H "Content-Type: application/json" \
+     -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+   ```
+
+## 🚀 Launch the Gradio Based UI (Recommended)
+
+To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
+
+Code Generation Tab
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
+
+Resource Management Tab
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
+
+Uploading a Knowledge Index
+
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
+
+Here is an example of running a query in the Gradio UI using an Index:
+
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
+
+## 🚀 Launch the Svelte Based UI (Optional)

 To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:

@@ -224,52 +380,3 @@ For example:
 - Ask question and get answer

 ![qna](../../../../assets/img/codegen_qna.png)
-
-## 🚀 Download or Build Docker Images
-
-Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
-
-### 1. Build the LLM Docker Image
-
-```bash
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
-```
-
-### 2. Build the MegaService Docker Image
-
-To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build MegaService Docker image via the command below:
-
-```bash
-git clone https://github.com/opea-project/GenAIExamples
-cd GenAIExamples/CodeGen
-docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
-```
-
-### 3. Build the UI Docker Image
-
-Build the frontend Docker image via the command below:
-
-```bash
-cd GenAIExamples/CodeGen/ui
-docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
-```
-
-### 4. Build CodeGen React UI Docker Image (Optional)
-
-Build react frontend Docker image via below command:
-
-**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
-
-```bash
-cd GenAIExamples/CodeGen/ui
-docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
-```
-
-Then run the command `docker images`, you will have the following Docker Images:
-
- `opea/llm-textgen:latest`
- `opea/codegen:latest`
- `opea/codegen-ui:latest`
- `opea/codegen-react-ui:latest` (optional)
--- a/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml
@@ -1,7 +1,8 @@
-# Copyright (C) 2024 Intel Corporation
+# Copyright (C) 2025 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

 services:
+
  tgi-service:
    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
    container_name: tgi-server
@@ -92,10 +93,14 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
+      - REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
+      - TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
+      - EMBEDDER_PORT=${EMBEDDER_PORT}
    ipc: host
    restart: always
  codegen-xeon-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
    container_name: codegen-xeon-ui-server
    depends_on:
      - codegen-xeon-backend-server
@@ -106,9 +111,93 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
+      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
+      - host_ip=${host_ip}
+      - DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
+      - DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
    ipc: host
    restart: always
-
+  redis-vector-db:
+    image: redis/redis-stack:7.2.0-v9
+    container_name: redis-vector-db
+    ports:
+      - "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
+      - "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
+  dataprep-redis-server:
+    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
+    container_name: dataprep-redis-server
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "${DATAPREP_REDIS_PORT}:5000"
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      REDIS_HOST: ${host_ip}
+      INDEX_NAME: ${INDEX_NAME}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      LOGFLAG: true
+    restart: unless-stopped
+  tei-embedding-serving:
+    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
+    container_name: tei-embedding-serving
+    entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
+    ports:
+      - "${TEI_EMBEDDER_PORT:-12000}:80"
+    volumes:
+      - "./data:/data"
+    shm_size: 1g
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      host_ip: ${host_ip}
+      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
+      interval: 10s
+      timeout: 6s
+      retries: 48
+  tei-embedding-server:
+    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
+    container_name: tei-embedding-server
+    ports:
+      - "${EMBEDDER_PORT:-10201}:6000"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
+      EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
+    depends_on:
+      tei-embedding-serving:
+        condition: service_healthy
+    restart: unless-stopped
+  retriever-redis:
+    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
+    container_name: retriever-redis
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      REDIS_DB_PORT: ${REDIS_DB_PORT}
+      REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
+      REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
+      INDEX_NAME: ${INDEX_NAME}
+      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      LOGFLAG: ${LOGFLAG}
+      RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
+    restart: unless-stopped
 networks:
  default:
    driver: bridge
--- a/CodeGen/docker_compose/intel/hpu/gaudi/README.md
+++ b/CodeGen/docker_compose/intel/hpu/gaudi/README.md
@@ -6,28 +6,77 @@ The default pipeline deploys with vLLM as the LLM serving component. It also pro

 ## 🚀 Start MicroServices and MegaService

-The CodeGen megaservice manages a single microservice called LLM within a Directed Acyclic Graph (DAG). In the diagram above, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.
+The CodeGen megaservice manages a several microservices including 'Embedding MicroService', 'Retrieval MicroService' and 'LLM MicroService' within a Directed Acyclic Graph (DAG). In the diagram below, the LLM microservice is a language model microservice that generates code snippets based on the user's input query. The TGI service serves as a text generation interface, providing a RESTful API for the LLM microservice. Data Preparation allows users to save/update documents or online resources to the vector database. Users can upload files or provide URLs, and manage their saved resources. The CodeGen Gateway acts as the entry point for the CodeGen application, invoking the Megaservice to generate code snippets in response to the user's input query.

 The mega flow of the CodeGen application, from user's input query to the application's output response, is as follows:

 ```mermaid
+---
+config:
+  flowchart:
+    nodeSpacing: 400
+    rankSpacing: 100
+    curve: linear
+  themeVariables:
+    fontSize: 25px
+---
 flowchart LR
-    subgraph CodeGen
+    %% Colors %%
+    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef invisible fill:transparent,stroke:transparent;
+    style CodeGen-MegaService stroke:#000000
+    %% Subgraphs %%
+    subgraph CodeGen-MegaService["CodeGen-MegaService"]
        direction LR
-        A[User] --> |Input query| B[CodeGen Gateway]
-        B --> |Invoke| Megaservice
-        subgraph Megaservice["Megaservice"]
-            direction TB
-            C((LLM<br>9000)) -. Post .-> D{{TGI Service<br>8028}}
+        EM([Embedding<br>MicroService]):::blue
+        RET([Retrieval<br>MicroService]):::blue
+        RER([Agents]):::blue
+        LLM([LLM<br>MicroService]):::blue
    end
-        Megaservice --> |Output| E[Response]
+    subgraph User Interface
+        direction LR
+        a([Submit Query Tab]):::orchid
+        UI([UI server]):::orchid
+        Ingest([Manage Resources]):::orchid
    end

-    subgraph Legend
+    CLIP_EM{{Embedding<br>service}}
+    VDB{{Vector DB}}
+    V_RET{{Retriever<br>service}}
+    Ingest{{Ingest data}}
+    DP([Data Preparation]):::blue
+    LLM_gen{{TGI Service}}
+    GW([CodeGen GateWay]):::orange
+
+    %% Data Preparation flow
+    %% Ingest data flow
    direction LR
-        G([Microservice]) ==> H([Microservice])
-        I([Microservice]) -.-> J{{Server API}}
-    end
+    Ingest[Ingest data] --> UI
+    UI --> DP
+    DP <-.-> CLIP_EM
+
+    %% Questions interaction
+    direction LR
+    a[User Input Query] --> UI
+    UI --> GW
+    GW <==> CodeGen-MegaService
+    EM ==> RET
+    RET ==> RER
+    RER ==> LLM
+
+
+    %% Embedding service flow
+    direction LR
+    EM <-.-> CLIP_EM
+    RET <-.-> V_RET
+    LLM <-.-> LLM_gen
+
+    direction TB
+    %% Vector DB interaction
+    V_RET <-.->VDB
+    DP <-.->VDB
 ```

 ### Setup Environment Variables
@@ -44,38 +93,107 @@ export host_ip=${your_ip_address}
 export HUGGINGFACEHUB_API_TOKEN=you_huggingface_token
 ```

-2. Set Netowork Proxy
+2. Set Network Proxy

 **If you access public network through proxy, set the network proxy, otherwise, skip this step**

 ```bash
-export no_proxy=${your_no_proxy}
+export no_proxy=${no_proxy},${host_ip}
 export http_proxy=${your_http_proxy}
 export https_proxy=${your_https_proxy}
 ```

 ### Start the Docker Containers for All Services

-CodeGen support TGI service and vLLM service, you can choose start either one of them.
-
-Start CodeGen based on TGI service:
+Find the corresponding [compose.yaml](./compose.yaml). User could start CodeGen based on TGI or vLLM service:
+
+```bash
+cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
+```
+
+#### TGI service:

 ```bash
-cd GenAIExamples/CodeGen/docker_compose
-source set_env.sh
-cd intel/hpu/gaudi
 docker compose --profile codegen-gaudi-tgi up -d
 ```

-Start CodeGen based on vLLM service:
+Then run the command `docker images`, you will have the following Docker images:
+
+- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
+- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
+- `opea/codegen-gradio-ui`
+- `opea/codegen`
+- `opea/dataprep`
+- `opea/embedding`
+- `opea/llm-textgen`
+- `opea/retriever`
+- `redis/redis-stack`
+
+#### vLLM service:

 ```bash
-cd GenAIExamples/CodeGen/docker_compose
-source set_env.sh
-cd intel/hpu/gaudi
 docker compose --profile codegen-gaudi-vllm up -d
 ```

+Then run the command `docker images`, you will have the following Docker images:
+
+- `ghcr.io/huggingface/text-embeddings-inference:cpu-1.5`
+- `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
+- `opea/codegen-gradio-ui`
+- `opea/codegen`
+- `opea/dataprep`
+- `opea/embedding`
+- `opea/llm-textgen`
+- `opea/retriever`
+- `redis/redis-stack`
+- `opea/vllm`
+
+Refer to the [Gaudi Guide](./README.md) to build docker images from source.
+
+### Building the Docker image locally
+
+Should the Docker image you seek not yet be available on Docker Hub, you can build the Docker image locally.
+In order to build the Docker image locally follow the instrustion provided below.
+
+#### Build the MegaService Docker Image
+
+To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
+
+```bash
+git clone https://github.com/opea-project/GenAIExamples
+cd GenAIExamples/CodeGen
+docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
+```
+
+#### Build the UI Gradio Image
+
+Build the frontend Gradio image via the command below:
+
+```bash
+cd GenAIExamples/CodeGen/ui
+docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
+```
+
+#### Dataprep Microservice with Redis
+
+Follow the instrustion provided here: [opea/dataprep](https://github.com/MSCetin37/GenAIComps/blob/main/comps/dataprep/src/README_redis.md)
+
+#### Embedding Microservice with TEI
+
+Follow the instrustion provided here: [opea/embedding](https://github.com/MSCetin37/GenAIComps/blob/main/comps/embeddings/src/README_tei.md)
+
+#### LLM text generation Microservice
+
+Follow the instrustion provided here: [opea/llm-textgen](https://github.com/MSCetin37/GenAIComps/tree/main/comps/llms/src/text-generation)
+
+#### Retriever Microservice
+
+Follow the instrustion provided here: [opea/retriever](https://github.com/MSCetin37/GenAIComps/blob/main/comps/retrievers/src/README_redis.md)
+
+#### Start Redis server
+
+Follow the instrustion provided here: [redis/redis-stack](https://github.com/MSCetin37/GenAIComps/tree/main/comps/third_parties/redis/src)
+
 ### Validate the MicroServices and MegaService

 1. LLM Service (for TGI, vLLM)
@@ -83,8 +201,9 @@ docker compose --profile codegen-gaudi-vllm up -d
   ```bash
   curl http://${host_ip}:8028/v1/chat/completions \
       -X POST \
-       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' \
-       -H 'Content-Type: application/json'
+       -H 'Content-Type: application/json' \
+       -d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
+
   ```

 2. LLM Microservices
@@ -92,19 +211,58 @@ docker compose --profile codegen-gaudi-vllm up -d
   ```bash
   curl http://${host_ip}:9000/v1/chat/completions\
     -X POST \
-     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' \
-     -H 'Content-Type: application/json'
+     -H 'Content-Type: application/json' \
+     -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
   ```

-3. MegaService
+3. Dataprep Microservice
+
+   Make sure to replace the file name placeholders with your correct file name

   ```bash
-   curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
-        "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
-        }'
+   curl http://${host_ip}:6007/v1/dataprep/ingest \
+   -X POST \
+   -H "Content-Type: multipart/form-data" \
+   -F "files=@./file1.pdf" \
+   -F "files=@./file2.txt" \
+   -F "index_name=my_API_document"
   ```

-## 🚀 Launch the Svelte Based UI
+4. MegaService
+
+   ```bash
+   curl http://${host_ip}:7778/v1/codegen \
+     -H "Content-Type: application/json" \
+     -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+   ```
+
+   CodeGen service with RAG and Agents activated based on an index.
+
+   ```bash
+   curl http://${host_ip}$:7778/v1/codegen \
+     -H "Content-Type: application/json" \
+     -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
+   ```
+
+## 🚀 Launch the Gradio Based UI (Recommended)
+
+To access the Gradio frontend URL, follow the steps in [this README](../../../../ui/gradio/README.md)
+
+Code Generation Tab
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
+
+Resource Management Tab
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_main.png)
+
+Uploading a Knowledge Index
+
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_dataprep.png)
+
+Here is an example of running a query in the Gradio UI using an Index:
+
+![project-screenshot](../../../../assets/img/codegen_gradio_ui_query.png)
+
+## 🚀 Launch the Svelte Based UI (Optional)

 To access the frontend, open the following URL in your browser: `http://{host_ip}:5173`. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:

@@ -213,52 +371,3 @@ For example:
 - Ask question and get answer

 ![qna](../../../../assets/img/codegen_qna.png)
-
-## 🚀 Build Docker Images
-
-First of all, you need to build the Docker images locally. This step can be ignored after the Docker images published to the Docker Hub.
-
-### 1. Build the LLM Docker Image
-
-```bash
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
-```
-
-### 2. Build the MegaService Docker Image
-
-To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `codegen.py` Python script. Build the MegaService Docker image via the command below:
-
-```bash
-git clone https://github.com/opea-project/GenAIExamples
-cd GenAIExamples/CodeGen
-docker build -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
-```
-
-### 3. Build the UI Docker Image
-
-Construct the frontend Docker image via the command below:
-
-```bash
-cd GenAIExamples/CodeGen/ui
-docker build -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
-```
-
-### 4. Build CodeGen React UI Docker Image (Optional)
-
-Build react frontend Docker image via below command:
-
-**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
-
-```bash
-cd GenAIExamples/CodeGen/ui
-docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
-```
-
-Then run the command `docker images`, you will have the following Docker images:
-
- `opea/llm-textgen:latest`
- `opea/codegen:latest`
- `opea/codegen-ui:latest`
- `opea/codegen-react-ui:latest`
--- a/CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -108,10 +108,15 @@ services:
      - http_proxy=${http_proxy}
      - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
      - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
+      - RETRIEVAL_SERVICE_HOST_IP=${RETRIEVAL_SERVICE_HOST_IP}
+      - REDIS_RETRIEVER_PORT=${REDIS_RETRIEVER_PORT}
+      - TEI_EMBEDDING_HOST_IP=${TEI_EMBEDDING_HOST_IP}
+      - EMBEDDER_PORT=${EMBEDDER_PORT}
+      - host_ip=${host_ip}
    ipc: host
    restart: always
  codegen-gaudi-ui-server:
-    image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
    container_name: codegen-gaudi-ui-server
    depends_on:
      - codegen-gaudi-backend-server
@@ -122,9 +127,93 @@ services:
      - https_proxy=${https_proxy}
      - http_proxy=${http_proxy}
      - BASIC_URL=${BACKEND_SERVICE_ENDPOINT}
+      - MEGA_SERVICE_PORT=${MEGA_SERVICE_PORT}
+      - host_ip=${host_ip}
+      - DATAPREP_ENDPOINT=${DATAPREP_ENDPOINT}
+      - DATAPREP_REDIS_PORT=${DATAPREP_REDIS_PORT}
    ipc: host
    restart: always
-
+  redis-vector-db:
+    image: redis/redis-stack:7.2.0-v9
+    container_name: redis-vector-db
+    ports:
+      - "${REDIS_DB_PORT}:${REDIS_DB_PORT}"
+      - "${REDIS_INSIGHTS_PORT}:${REDIS_INSIGHTS_PORT}"
+  dataprep-redis-server:
+    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
+    container_name: dataprep-redis-server
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "${DATAPREP_REDIS_PORT}:5000"
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      REDIS_HOST: ${host_ip}
+      INDEX_NAME: ${INDEX_NAME}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      LOGFLAG: true
+    restart: unless-stopped
+  tei-embedding-serving:
+    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
+    container_name: tei-embedding-serving
+    entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
+    ports:
+      - "${TEI_EMBEDDER_PORT:-12000}:80"
+    volumes:
+      - "./data:/data"
+    shm_size: 1g
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      host_ip: ${host_ip}
+      HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
+      interval: 10s
+      timeout: 6s
+      retries: 48
+  tei-embedding-server:
+    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
+    container_name: tei-embedding-server
+    ports:
+      - "${EMBEDDER_PORT:-10201}:6000"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
+      EMBEDDING_COMPONENT_NAME: "OPEA_TEI_EMBEDDING"
+    depends_on:
+      tei-embedding-serving:
+        condition: service_healthy
+    restart: unless-stopped
+  retriever-redis:
+    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
+    container_name: retriever-redis
+    depends_on:
+      - redis-vector-db
+    ports:
+      - "${REDIS_RETRIEVER_PORT}:${REDIS_RETRIEVER_PORT}"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      REDIS_URL: ${REDIS_URL}
+      REDIS_DB_PORT: ${REDIS_DB_PORT}
+      REDIS_INSIGHTS_PORT: ${REDIS_INSIGHTS_PORT}
+      REDIS_RETRIEVER_PORT: ${REDIS_RETRIEVER_PORT}
+      INDEX_NAME: ${INDEX_NAME}
+      TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
+      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      LOGFLAG: ${LOGFLAG}
+      RETRIEVER_COMPONENT_NAME: ${RETRIEVER_COMPONENT_NAME:-OPEA_RETRIEVER_REDIS}
+    restart: unless-stopped
 networks:
  default:
    driver: bridge
--- a/CodeGen/docker_compose/set_env.sh
+++ b/CodeGen/docker_compose/set_env.sh
@@ -7,7 +7,6 @@ source .set_env.sh
 popd > /dev/null

 export host_ip=$(hostname -I | awk '{print $1}')
-
 if [ -z "${HUGGINGFACEHUB_API_TOKEN}" ]; then
    echo "Error: HUGGINGFACEHUB_API_TOKEN is not set. Please set HUGGINGFACEHUB_API_TOKEN"
 fi
@@ -17,10 +16,35 @@ if [ -z "${host_ip}" ]; then
 fi

 export no_proxy=${no_proxy},${host_ip}
+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}

-export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
+export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
+export LLM_SERVICE_PORT=9000
 export LLM_ENDPOINT="http://${host_ip}:8028"
-export MEGA_SERVICE_HOST_IP=${host_ip}
 export LLM_SERVICE_HOST_IP=${host_ip}
+export TGI_LLM_ENDPOINT="http://${host_ip}:8028"
+
+export MEGA_SERVICE_PORT=7778
+export MEGA_SERVICE_HOST_IP=${host_ip}
 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"
+
+export REDIS_DB_PORT=6379
+export REDIS_INSIGHTS_PORT=8001
+export REDIS_RETRIEVER_PORT=7000
+export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
+export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
+export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
+export INDEX_NAME="CodeGen"
+
+export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+export EMBEDDER_PORT=6000
+export TEI_EMBEDDER_PORT=8090
+export TEI_EMBEDDING_HOST_IP=${host_ip}
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
+
+export DATAPREP_REDIS_PORT=6007
+export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
+export LOGFLAG=false
 export MODEL_CACHE="./data"
+export NUM_CARDS=1
--- a/CodeGen/docker_image_build/build.yaml
+++ b/CodeGen/docker_image_build/build.yaml
@@ -23,6 +23,12 @@ services:
      dockerfile: ./docker/Dockerfile.react
    extends: codegen
    image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
+  codegen-gradio-ui:
+    build:
+      context: ../ui
+      dockerfile: ./docker/Dockerfile.gradio
+    extends: codegen
+    image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
  llm-textgen:
    build:
      context: GenAIComps
@@ -46,3 +52,21 @@ services:
      dockerfile: Dockerfile.hpu
    extends: codegen
    image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
+  dataprep:
+    build:
+      context: GenAIComps
+      dockerfile: comps/dataprep/src/Dockerfile
+    extends: codegen
+    image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
+  retriever:
+    build:
+      context: GenAIComps
+      dockerfile: comps/retrievers/src/Dockerfile
+    extends: codegen
+    image: ${REGISTRY:-opea}/retriever:${TAG:-latest}
+  embedding:
+    build:
+      context: GenAIComps
+      dockerfile: comps/embeddings/src/Dockerfile
+    extends: codegen
+    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
--- a/CodeGen/tests/test_compose_on_gaudi.sh
+++ b/CodeGen/tests/test_compose_on_gaudi.sh
@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
 export MODEL_CACHE=${model_cache:-"./data"}
+export REDIS_DB_PORT=6379
+export REDIS_INSIGHTS_PORT=8001
+export REDIS_RETRIEVER_PORT=7000
+export EMBEDDER_PORT=6000
+export TEI_EMBEDDER_PORT=8090
+export DATAPREP_REDIS_PORT=6007

 WORKPATH=$(dirname "$PWD")
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')

+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}
+export no_proxy=${no_proxy},${ip_address}
+
 function build_docker_images() {
    opea_branch=${opea_branch:-"main"}
    # If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -31,13 +41,14 @@ function build_docker_images() {
    cd $WORKPATH/docker_image_build
    git clone --depth 1 --branch ${opea_branch} https://github.com/opea-project/GenAIComps.git

+    # Download Gaudi vllm of latest tag
    git clone https://github.com/HabanaAI/vllm-fork.git && cd vllm-fork
    VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
    echo "Check out vLLM tag ${VLLM_VER}"
    git checkout ${VLLM_VER} &> /dev/null && cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="codegen codegen-ui llm-textgen vllm-gaudi"
+    service_list="codegen codegen-gradio-ui llm-textgen vllm-gaudi dataprep retriever embedding"
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

    docker images && sleep 1s
@@ -48,18 +59,28 @@ function start_services() {
    local llm_container_name="$2"

    cd $WORKPATH/docker_compose/intel/hpu/gaudi
-    export http_proxy=${http_proxy}
-    export https_proxy=${https_proxy}
+
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
    export LLM_ENDPOINT="http://${ip_address}:8028"
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+    export MEGA_SERVICE_PORT=7778
    export MEGA_SERVICE_HOST_IP=${ip_address}
    export LLM_SERVICE_HOST_IP=${ip_address}
-    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
+    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
    export NUM_CARDS=1
    export host_ip=${ip_address}

-    sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+    export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
+    export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
+    export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
+    export INDEX_NAME="CodeGen"
+
+    export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+    export TEI_EMBEDDING_HOST_IP=${host_ip}
+    export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
+    export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"
+
+    export INDEX_NAME="CodeGen"

    # Start Docker Containers
    docker compose --profile ${compose_profile} up -d | tee ${LOG_PATH}/start_services_with_compose.log
@@ -82,6 +103,16 @@ function validate_services() {
    local DOCKER_NAME="$4"
    local INPUT_DATA="$5"

+    if [[ "$SERVICE_NAME" == "ingest" ]]; then
+        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
+
+        if [ "$HTTP_STATUS" -eq 200 ]; then
+            echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
+        else
+            echo "[ $SERVICE_NAME ] Data preparation failed..."
+        fi
+
+    else
        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
        if [ "$HTTP_STATUS" -eq 200 ]; then
            echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
@@ -100,6 +131,7 @@ function validate_services() {
            docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
            exit 1
        fi
+    fi
    sleep 5s
 }

@@ -122,6 +154,14 @@ function validate_microservices() {
        "llm-textgen-gaudi-server" \
        '{"query":"def print_hello_world():"}'

+    # Data ingest microservice
+    validate_services \
+        "${ip_address}:6007/v1/dataprep/ingest" \
+        "Data preparation succeeded" \
+        "ingest" \
+        "dataprep-redis-server" \
+        'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
+
 }

 function validate_megaservice() {
@@ -133,6 +173,14 @@ function validate_megaservice() {
        "codegen-gaudi-backend-server" \
        '{"messages": "def print_hello_world():"}'

+    # Curl the Mega Service with index_name and agents_flag
+    validate_services \
+        "${ip_address}:7778/v1/codegen" \
+        "" \
+        "mega-codegen" \
+        "codegen-gaudi-backend-server" \
+        '{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
+
 }

 function validate_frontend() {
@@ -163,6 +211,18 @@ function validate_frontend() {
    fi
 }

+function validate_gradio() {
+    local URL="http://${ip_address}:5173/health"
+    local HTTP_STATUS=$(curl "$URL")
+    local SERVICE_NAME="Gradio"
+
+    if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
+        echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
+    else
+        echo "[ $SERVICE_NAME ] UI server has failed..."
+    fi
+}
+
 function stop_docker() {
    local docker_profile="$1"

@@ -201,7 +261,7 @@ function main() {

        validate_microservices "${docker_llm_container_names[${i}]}"
        validate_megaservice
-        validate_frontend
+        validate_gradio

        stop_docker "${docker_compose_profiles[${i}]}"
        sleep 5s
--- a/CodeGen/tests/test_compose_on_xeon.sh
+++ b/CodeGen/tests/test_compose_on_xeon.sh
@@ -10,11 +10,21 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
 export MODEL_CACHE=${model_cache:-"./data"}
+export REDIS_DB_PORT=6379
+export REDIS_INSIGHTS_PORT=8001
+export REDIS_RETRIEVER_PORT=7000
+export EMBEDDER_PORT=6000
+export TEI_EMBEDDER_PORT=8090
+export DATAPREP_REDIS_PORT=6007

 WORKPATH=$(dirname "$PWD")
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')

+export http_proxy=${http_proxy}
+export https_proxy=${https_proxy}
+export no_proxy=${no_proxy},${ip_address}
+
 function build_docker_images() {
    opea_branch=${opea_branch:-"main"}
    # If the opea_branch isn't main, replace the git clone branch in Dockerfile.
@@ -38,7 +48,8 @@ function build_docker_images() {
    cd ../

    echo "Build all the images with --no-cache, check docker_image_build.log for details..."
-    service_list="codegen codegen-ui llm-textgen vllm"
+    service_list="codegen codegen-gradio-ui llm-textgen vllm dataprep retriever embedding"
+
    docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

    docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
@@ -54,12 +65,21 @@ function start_services() {
    export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
    export LLM_ENDPOINT="http://${ip_address}:8028"
    export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+    export MEGA_SERVICE_PORT=7778
    export MEGA_SERVICE_HOST_IP=${ip_address}
    export LLM_SERVICE_HOST_IP=${ip_address}
-    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:7778/v1/codegen"
+    export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:${MEGA_SERVICE_PORT}/v1/codegen"
    export host_ip=${ip_address}

-    sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env
+    export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
+    export RETRIEVAL_SERVICE_HOST_IP=${host_ip}
+    export RETRIEVER_COMPONENT_NAME="OPEA_RETRIEVER_REDIS"
+    export INDEX_NAME="CodeGen"
+
+    export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+    export TEI_EMBEDDING_HOST_IP=${host_ip}
+    export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:${TEI_EMBEDDER_PORT}"
+    export DATAPREP_ENDPOINT="http://${host_ip}:${DATAPREP_REDIS_PORT}/v1/dataprep"

    # Start Docker Containers
    docker compose --profile ${compose_profile} up -d > ${LOG_PATH}/start_services_with_compose.log
@@ -82,6 +102,16 @@ function validate_services() {
    local DOCKER_NAME="$4"
    local INPUT_DATA="$5"

+    if [[ "$SERVICE_NAME" == "ingest" ]]; then
+        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -F index_name=test_redis -H 'Content-Type: multipart/form-data' "$URL")
+
+        if [ "$HTTP_STATUS" -eq 200 ]; then
+            echo "[ $SERVICE_NAME ] HTTP status is 200. Data preparation succeeded..."
+        else
+            echo "[ $SERVICE_NAME ] Data preparation failed..."
+        fi
+
+    else
        local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
        if [ "$HTTP_STATUS" -eq 200 ]; then
            echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
@@ -100,6 +130,7 @@ function validate_services() {
            docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
            exit 1
        fi
+    fi
    sleep 5s
 }

@@ -122,6 +153,14 @@ function validate_microservices() {
        "llm-textgen-server" \
        '{"query":"def print_hello_world():", "max_tokens": 256}'

+    # Data ingest microservice
+    validate_services \
+        "${ip_address}:6007/v1/dataprep/ingest" \
+        "Data preparation succeeded" \
+        "ingest" \
+        "dataprep-redis-server" \
+        'link_list=["https://modin.readthedocs.io/en/latest/index.html"]'
+
 }

 function validate_megaservice() {
@@ -133,6 +172,14 @@ function validate_megaservice() {
        "codegen-xeon-backend-server" \
        '{"messages": "def print_hello_world():", "max_tokens": 256}'

+    # Curl the Mega Service with index_name and agents_flag
+    validate_services \
+        "${ip_address}:7778/v1/codegen" \
+        "" \
+        "mega-codegen" \
+        "codegen-xeon-backend-server" \
+        '{ "index_name": "test_redis", "agents_flag": "True", "messages": "def print_hello_world():", "max_tokens": 256}'
+
 }

 function validate_frontend() {
@@ -163,6 +210,17 @@ function validate_frontend() {
    fi
 }

+function validate_gradio() {
+    local URL="http://${ip_address}:5173/health"
+    local HTTP_STATUS=$(curl "$URL")
+    local SERVICE_NAME="Gradio"
+
+    if [ "$HTTP_STATUS" = '{"status":"ok"}' ]; then
+        echo "[ $SERVICE_NAME ] HTTP status is 200. UI server is running successfully..."
+    else
+        echo "[ $SERVICE_NAME ] UI server has failed..."
+    fi
+}

 function stop_docker() {
    local docker_profile="$1"
@@ -202,7 +260,7 @@ function main() {

        validate_microservices "${docker_llm_container_names[${i}]}"
        validate_megaservice
-        validate_frontend
+        validate_gradio

        stop_docker "${docker_compose_profiles[${i}]}"
        sleep 5s
--- a/CodeGen/ui/docker/Dockerfile.gradio
+++ b/CodeGen/ui/docker/Dockerfile.gradio
@@ -0,0 +1,33 @@
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+FROM python:3.11-slim
+
+ENV LANG=C.UTF-8
+
+ARG ARCH="cpu"
+
+RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
+    build-essential \
+    default-jre \
+    libgl1-mesa-glx \
+    libjemalloc-dev \
+    wget
+
+# Install ffmpeg static build
+WORKDIR /root
+RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz && \
+    mkdir ffmpeg-git-amd64-static && tar -xvf ffmpeg-git-amd64-static.tar.xz -C ffmpeg-git-amd64-static --strip-components 1 && \
+    export PATH=/root/ffmpeg-git-amd64-static:$PATH && \
+    cp /root/ffmpeg-git-amd64-static/ffmpeg /usr/local/bin/ && \
+    cp /root/ffmpeg-git-amd64-static/ffprobe /usr/local/bin/
+
+RUN mkdir -p /home/user
+
+COPY gradio /home/user/gradio
+
+RUN pip install --no-cache-dir --upgrade pip setuptools && \
+pip install --no-cache-dir -r /home/user/gradio/requirements.txt
+
+WORKDIR /home/user/gradio
+ENTRYPOINT ["python", "codegen_ui_gradio.py"]
--- a/CodeGen/ui/gradio/README.md
+++ b/CodeGen/ui/gradio/README.md
@@ -0,0 +1,65 @@
+# Document Summary
+
+This project provides a user interface for summarizing documents and text using a Dockerized frontend application. Users can upload files or paste text to generate summaries.
+
+## Docker
+
+### Build UI Docker Image
+
+To build the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui` directory and run the following command:
+
+```bash
+cd GenAIExamples/CodeGen/ui
+docker build -t opea/codegen-gradio-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile.gradio .
+```
+
+This command builds the Docker image with the tag `opea/codegen-gradio-ui:latest`. It also passes the proxy settings as build arguments to ensure that the build process can access the internet if you are behind a corporate firewall.
+
+### Run UI Docker Image
+
+To run the frontend Docker image, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and execute the following commands:
+
+```bash
+cd GenAIExamples/CodeGen/ui/gradio
+
+ip_address=$(hostname -I | awk '{print $1}')
+docker run -d -p 5173:5173 --ipc=host \
+   -e http_proxy=$http_proxy \
+   -e https_proxy=$https_proxy \
+   -e no_proxy=$no_proxy \
+   -e BACKEND_SERVICE_ENDPOINT=http://$ip_address:7778/v1/codegen \
+   opea/codegen-gradio-ui:latest
+```
+
+This command runs the Docker container in interactive mode, mapping port 5173 of the host to port 5173 of the container. It also sets several environment variables, including the backend service endpoint, which is required for the frontend to communicate with the backend service.
+
+### Python
+
+To run the frontend application directly using Python, navigate to the `GenAIExamples/CodeGen/ui/gradio` directory and run the following command:
+
+```bash
+cd GenAIExamples/CodeGen/ui/gradio
+python codegen_ui_gradio.py
+```
+
+This command starts the frontend application using Python.
+
+## Additional Information
+
+### Prerequisites
+
+Ensure you have Docker installed and running on your system. Also, make sure you have the necessary proxy settings configured if you are behind a corporate firewall.
+
+### Environment Variables
+
+- `http_proxy`: Proxy setting for HTTP connections.
+- `https_proxy`: Proxy setting for HTTPS connections.
+- `no_proxy`: Comma-separated list of hosts that should be excluded from proxying.
+- `BACKEND_SERVICE_ENDPOINT`: The endpoint of the backend service that the frontend will communicate with.
+
+### Troubleshooting
+
+- Docker Build Issues: If you encounter issues while building the Docker image, ensure that your proxy settings are correctly configured and that you have internet access.
+- Docker Run Issues: If the Docker container fails to start, check the environment variables and ensure that the backend service is running and accessible.
+
+This README file provides detailed instructions and explanations for building and running the Dockerized frontend application, as well as running it directly using Python. It also highlights the key features of the project and provides additional information for troubleshooting and configuring the environment.
--- a/CodeGen/ui/gradio/codegen_ui_gradio.py
+++ b/CodeGen/ui/gradio/codegen_ui_gradio.py
@@ -0,0 +1,371 @@
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# This is a Gradio app that includes two tabs: one for code generation and another for resource management.
+# The resource management tab has been updated to allow file uploads, deletion, and a table listing all the files.
+# Additionally, three small text boxes have been added for managing file dataframe parameters.
+
+import argparse
+import json
+import os
+from pathlib import Path
+from urllib.parse import urlparse
+
+import gradio as gr
+import pandas as pd
+import requests
+import uvicorn
+from fastapi import FastAPI
+from fastapi.staticfiles import StaticFiles
+
+logflag = os.getenv("LOGFLAG", False)
+
+# create a FastAPI app
+app = FastAPI()
+cur_dir = os.getcwd()
+static_dir = Path(os.path.join(cur_dir, "static/"))
+tmp_dir = Path(os.path.join(cur_dir, "split_tmp_videos/"))
+
+Path(static_dir).mkdir(parents=True, exist_ok=True)
+app.mount("/static", StaticFiles(directory=static_dir), name="static")
+
+tmp_upload_folder = "/tmp/gradio/"
+
+
+host_ip = os.getenv("host_ip")
+DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
+DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
+MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
+
+backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
+
+dataprep_ingest_endpoint = f"{DATAPREP_ENDPOINT}/ingest"
+dataprep_get_files_endpoint = f"{DATAPREP_ENDPOINT}/get"
+dataprep_delete_files_endpoint = f"{DATAPREP_ENDPOINT}/delete"
+dataprep_get_indices_endpoint = f"{DATAPREP_ENDPOINT}/indices"
+
+
+# Define the functions that will be used in the app
+def conversation_history(prompt, index, use_agent, history):
+    print(f"Generating code for prompt: {prompt} using index: {index} and use_agent is {use_agent}")
+    history.append([prompt, ""])
+    response_generator = generate_code(prompt, index, use_agent)
+    for token in response_generator:
+        history[-1][-1] += token
+        yield history
+
+
+def upload_media(media, index=None, chunk_size=1500, chunk_overlap=100):
+    media = media.strip().split("\n")
+    if not chunk_size:
+        chunk_size = 1500
+    if not chunk_overlap:
+        chunk_overlap = 100
+
+    requests = []
+    if type(media) is list:
+        for file in media:
+            file_ext = os.path.splitext(file)[-1]
+            if is_valid_url(file):
+                yield (
+                    gr.Textbox(
+                        visible=True,
+                        value="Ingesting URL...",
+                    )
+                )
+                value = ingest_url(file, index, chunk_size, chunk_overlap)
+                requests.append(value)
+                yield value
+            elif file_ext in [".pdf", ".txt"]:
+                yield (
+                    gr.Textbox(
+                        visible=True,
+                        value="Ingesting file...",
+                    )
+                )
+                value = ingest_file(file, index, chunk_size, chunk_overlap)
+                requests.append(value)
+                yield value
+            else:
+                yield (
+                    gr.Textbox(
+                        visible=True,
+                        value="Your media is either an invalid URL or the file extension type is not supported. (Supports .pdf, .txt, url)",
+                    )
+                )
+                return
+        yield requests
+
+    else:
+        file_ext = os.path.splitext(media)[-1]
+        if is_valid_url(media):
+            value = ingest_url(media, index, chunk_size, chunk_overlap)
+            yield value
+        elif file_ext in [".pdf", ".txt"]:
+            value = ingest_file(media, index, chunk_size, chunk_overlap)
+            yield value
+        else:
+            yield (
+                gr.Textbox(
+                    visible=True,
+                    value="Your file extension type is not supported.",
+                )
+            )
+            return
+
+
+def generate_code(query, index=None, use_agent=False):
+    if index is None or index == "None":
+        input_dict = {"messages": query, "agents_flag": use_agent}
+    else:
+        input_dict = {"messages": query, "index_name": index, "agents_flag": use_agent}
+
+    print("Query is ", input_dict)
+    headers = {"Content-Type": "application/json"}
+
+    response = requests.post(url=backend_service_endpoint, headers=headers, data=json.dumps(input_dict), stream=True)
+
+    line_count = 0
+    for line in response.iter_lines():
+        line_count += 1
+        if line:
+            line = line.decode("utf-8")
+            if line.startswith("data: "):  # Only process lines starting with "data: "
+                json_part = line[len("data: ") :]  # Remove the "data: " prefix
+            else:
+                json_part = line
+            if json_part.strip() == "[DONE]":  # Ignore the DONE marker
+                continue
+            try:
+                json_obj = json.loads(json_part)  # Convert to dictionary
+                if "choices" in json_obj:
+                    for choice in json_obj["choices"]:
+                        if "text" in choice:
+                            # Yield each token individually
+                            yield choice["text"]
+            except json.JSONDecodeError:
+                print("Error parsing JSON:", json_part)
+
+    if line_count == 0:
+        yield "Something went wrong, No Response Generated! \nIf you are using an Index, try uploading your media again with a smaller chunk size to avoid exceeding the token max. \
+        \nOr, check the Use Agent box and try again."
+
+
+def ingest_file(file, index=None, chunk_size=100, chunk_overlap=150):
+    headers = {
+        # "Content-Type: multipart/form-data"
+    }
+    file_input = {"files": open(file, "rb")}
+
+    if index:
+        print("Index is", index)
+        data = {"index_name": index, "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
+    else:
+        data = {"chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
+
+    response = requests.post(url=dataprep_ingest_endpoint, headers=headers, files=file_input, data=data)
+
+    return response.text
+
+
+def ingest_url(url, index=None, chunk_size=100, chunk_overlap=150):
+    url = str(url)
+    if not is_valid_url(url):
+        return "Invalid URL entered. Please enter a valid URL"
+
+    headers = {
+        # "Content-Type: multipart/form-data"
+    }
+
+    if index:
+        url_input = {
+            "link_list": json.dumps([url]),
+            "index_name": index,
+            "chunk_size": chunk_size,
+            "chunk_overlap": chunk_overlap,
+        }
+    else:
+        url_input = {"link_list": json.dumps([url]), "chunk_size": chunk_size, "chunk_overlap": chunk_overlap}
+    response = requests.post(url=dataprep_ingest_endpoint, headers=headers, data=url_input)
+
+    return response.text
+
+
+def is_valid_url(url):
+    url = str(url)
+    try:
+        result = urlparse(url)
+        return all([result.scheme, result.netloc])
+    except ValueError:
+        return False
+
+
+def get_files(index=None):
+    headers = {
+        # "Content-Type: multipart/form-data"
+    }
+    if index == "All Files":
+        index = None
+
+    if index:
+        index = {"index_name": index}
+        response = requests.post(url=dataprep_get_files_endpoint, headers=headers, data=index)
+        table = response.json()
+        return table
+    else:
+        response = requests.post(url=dataprep_get_files_endpoint, headers=headers)
+        table = response.json()
+        return table
+
+
+def update_table(index=None):
+    if index == "All Files":
+        index = None
+    files = get_files(index)
+    if len(files) == 0:
+        df = pd.DataFrame(files, columns=["Files"])
+        return df
+    else:
+        df = pd.DataFrame(files)
+        return df
+
+
+def update_indices():
+    indices = get_indices()
+    df = pd.DataFrame(indices, columns=["File Indices"])
+    return df
+
+
+def delete_file(file, index=None):
+    # Remove the selected file from the file list
+    headers = {
+        # "Content-Type: application/json"
+    }
+    if index:
+        file_input = {"files": open(file, "rb"), "index_name": index}
+    else:
+        file_input = {"files": open(file, "rb")}
+    response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data=file_input)
+    table = update_table()
+    return response.text
+
+
+def delete_all_files(index=None):
+    # Remove all files from the file list
+    headers = {
+        # "Content-Type: application/json"
+    }
+    response = requests.post(url=dataprep_delete_files_endpoint, headers=headers, data='{"file_path": "all"}')
+    table = update_table()
+
+    return "Delete All status: " + response.text
+
+
+def get_indices():
+    headers = {
+        # "Content-Type: application/json"
+    }
+    response = requests.post(url=dataprep_get_indices_endpoint, headers=headers)
+    indices = ["None"]
+    indices += response.json()
+    return indices
+
+
+def update_indices_dropdown():
+    new_dd = gr.update(choices=get_indices(), value="None")
+    return new_dd
+
+
+def get_file_names(files):
+    file_str = ""
+    if not files:
+        return file_str
+
+    for file in files:
+        file_str += file + "\n"
+    file_str.strip()
+    return file_str
+
+
+# Define UI components
+with gr.Blocks() as ui:
+    with gr.Tab("Code Generation"):
+        gr.Markdown("### Generate Code from Natural Language")
+        chatbot = gr.Chatbot(label="Chat History")
+        prompt_input = gr.Textbox(label="Enter your query")
+        with gr.Column():
+            with gr.Row(equal_height=True):
+                database_dropdown = gr.Dropdown(choices=get_indices(), label="Select Index", value="None", scale=10)
+                db_refresh_button = gr.Button("Refresh Dropdown", scale=0.1)
+                db_refresh_button.click(update_indices_dropdown, outputs=database_dropdown)
+                use_agent = gr.Checkbox(label="Use Agent", container=False)
+
+        generate_button = gr.Button("Generate Code")
+        generate_button.click(
+            conversation_history, inputs=[prompt_input, database_dropdown, use_agent, chatbot], outputs=chatbot
+        )
+
+    with gr.Tab("Resource Management"):
+        # File management components
+        with gr.Row():
+            with gr.Column(scale=1):
+                index_name_input = gr.Textbox(label="Index Name")
+                chunk_size_input = gr.Textbox(
+                    label="Chunk Size", value="1500", placeholder="Enter an integer (default: 1500)"
+                )
+                chunk_overlap_input = gr.Textbox(
+                    label="Chunk Overlap", value="100", placeholder="Enter an integer (default: 100)"
+                )
+            with gr.Column(scale=3):
+                file_upload = gr.File(label="Upload Files", file_count="multiple")
+                url_input = gr.Textbox(label="Media to be ingested (Append URL's in a new line)")
+                upload_button = gr.Button("Upload", variant="primary")
+                upload_status = gr.Textbox(label="Upload Status")
+                file_upload.change(get_file_names, inputs=file_upload, outputs=url_input)
+            with gr.Column(scale=1):
+                file_table = gr.Dataframe(interactive=False, value=update_indices())
+                refresh_button = gr.Button("Refresh", variant="primary", size="sm")
+                refresh_button.click(update_indices, outputs=file_table)
+                upload_button.click(
+                    upload_media,
+                    inputs=[url_input, index_name_input, chunk_size_input, chunk_overlap_input],
+                    outputs=upload_status,
+                )
+
+                delete_all_button = gr.Button("Delete All", variant="primary", size="sm")
+                delete_all_button.click(delete_all_files, outputs=upload_status)
+
+
+@app.get("/health")
+def health_check():
+    return {"status": "ok"}
+
+
+ui.queue()
+app = gr.mount_gradio_app(app, ui, path="/")
+share = False
+enable_queue = True
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--host", type=str, default="0.0.0.0")
+    parser.add_argument("--port", type=int, default=os.getenv("UI_PORT", 5173))
+    parser.add_argument("--concurrency-count", type=int, default=20)
+    parser.add_argument("--share", action="store_true")
+
+    host_ip = os.getenv("host_ip")
+    DATAPREP_REDIS_PORT = os.getenv("DATAPREP_REDIS_PORT", 6007)
+    DATAPREP_ENDPOINT = os.getenv("DATAPREP_ENDPOINT", f"http://{host_ip}:{DATAPREP_REDIS_PORT}/v1/dataprep")
+    MEGA_SERVICE_PORT = os.getenv("MEGA_SERVICE_PORT", 7778)
+
+    backend_service_endpoint = os.getenv("BACKEND_SERVICE_ENDPOINT", f"http://{host_ip}:{MEGA_SERVICE_PORT}/v1/codegen")
+
+    args = parser.parse_args()
+    global gateway_addr
+    gateway_addr = backend_service_endpoint
+    global dataprep_ingest_addr
+    dataprep_ingest_addr = dataprep_ingest_endpoint
+    global dataprep_get_files_addr
+    dataprep_get_files_addr = dataprep_get_files_endpoint
+
+    uvicorn.run(app, host=args.host, port=args.port)
--- a/CodeGen/ui/gradio/requirements.txt
+++ b/CodeGen/ui/gradio/requirements.txt
@@ -0,0 +1,4 @@
+gradio==5.22.0
+numpy==1.26.4
+opencv-python==4.10.0.82
+Pillow==10.3.0