add initial examples

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
2024-03-21 10:17:09 +08:00
parent bc7c18f68d
commit fabff168ff
147 changed files with 23216 additions and 0 deletions
--- a/ChatQnA/README.md
+++ b/ChatQnA/README.md
@@ -0,0 +1,155 @@
+This ChatQnA use case performs RAG using LangChain, Redis vectordb and Text Generation Inference on Intel Gaudi2. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details.
+
+# Environment Setup
+To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps:
+
+## Build TGI Gaudi Docker Image
+```bash
+bash ./serving/tgi_gaudi/build_docker.sh
+```
+
+## Launch TGI Gaudi Service
+
+### Launch a local server instance on 1 Gaudi card:
+```bash
+bash ./serving/tgi_gaudi/launch_tgi_service.sh
+```
+
+For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
+
+Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
+
+```bash
+export HUGGINGFACEHUB_API_TOKEN=<token>
+```
+
+### Launch a local server instance on 8 Gaudi cards:
+```bash
+bash ./serving/tgi_gaudi/launch_tgi_service.sh 8
+```
+
+### Customize TGI Gaudi Service
+
+The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters:
+- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1.
+- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080.
+- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3".
+
+You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`:
+```bash
+export TGI_ENDPOINT="http://xxx.xxx.xxx.xxx:8080"
+```
+
+## Enable TGI Gaudi FP8 for higher throughput
+The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. According to our test results, FP8 quantization yields approximately a 1.8x performance gain compared to BFLOAT16. Please follow the below steps to enable FP8 quantization.
+
+### Prepare Metadata for FP8 Quantization
+
+Enter into the TGI Gaudi docker container, and then run the below commands:
+
+```bash
+git clone https://github.com/huggingface/optimum-habana.git
+cd optimum-habana/examples/text-generation
+pip install -r requirements_lm_eval.txt
+QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_measure.txt --
+model_name_or_path meta-llama/Llama-2-7b-hf --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1
+QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py run_lm_eval.py -o acc_7b_bs1_quant.txt --model_name_or_path
+meta-llama/Llama-2-7b-hf --attn_softmax_bf16 --use_hpu_graphs --trim_logits --use_kv_cache --reuse_cache --bf16 --batch_size 1 --fp8
+```
+
+After finishing the above commands, the quantization metadata will be generated. Move the metadata directory ./hqt_output/ and copy the quantization JSON file to the host (under …/data). Please adapt the commands with your Docker ID and directory path.
+
+```bash
+docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/hqt_output data/
+docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/quantization_config/maxabs_quant.json data/
+```
+
+### Restart the TGI Gaudi server within all the metadata mapped
+
+```bash
+docker run -d -p 8080:80 -e QUANT_CONFIG=/data/maxabs_quant.json -e HUGGING_FACE_HUB_TOKEN=<your HuggingFace token> -v $volume:/data --
+runtime=habana -e HABANA_VISIBLE_DEVICES="4,5,6" -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --
+model-id meta-llama/Llama-2-7b-hf
+```
+
+Now the TGI Gaudi will launch the FP8 model by default. Please note that currently only Llama2 and Mistral models support FP8 quantization.
+
+
+## Launch Redis
+```bash
+docker pull redis/redis-stack:latest
+docker compose -f langchain/docker/docker-compose-redis.yml up -d
+```
+
+## Launch LangChain Docker
+
+### Build LangChain Docker Image
+
+```bash
+cd langchain/docker/
+bash ./build_docker.sh
+```
+
+### Lanuch LangChain Docker
+
+Update the `HUGGINGFACEHUB_API_TOKEN` environment variable with your huggingface token in the `docker-compose-langchain.yml`
+
+```bash
+docker compose -f docker-compose-langchain.yml up -d
+cd ../../
+```
+
+## Ingest data into redis
+
+After every time of redis container is launched, data should be ingested in the container ingestion steps:
+
+```bash
+docker exec -it qna-rag-redis-server bash
+cd /ws
+python ingest.py
+```
+
+Note: `ingest.py` will download the embedding model, please set the proxy if necessary.
+
+# Start LangChain Server
+
+## Start the Backend Service
+Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service:
+
+```bash
+docker exec -it qna-rag-redis-server bash
+nohup python app/server.py &
+```
+
+## Start the Frontend Service
+
+Navigate to the "ui" folder and execute the following commands to start the fronend GUI:
+```bash
+cd ui
+sudo apt-get install npm && \
+    npm install -g n && \
+    n stable && \
+    hash -r && \
+    npm install -g npm@latest
+```
+
+For CentOS, please use the following commands instead:
+
+```bash
+curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash -
+sudo yum install -y nodejs
+```
+
+Update the `DOC_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address.
+
+Run the following command to install the required dependencies:
+```bash
+npm install
+```
+
+Start the development server by executing the following command:
+```bash
+nohup npm run dev &
+```
+
+This will initiate the frontend service and launch the application.
--- a/ChatQnA/benchmarking/README.md
+++ b/ChatQnA/benchmarking/README.md
@@ -0,0 +1 @@
+Will update soon.
--- a/ChatQnA/benchmarking/client.py
+++ b/ChatQnA/benchmarking/client.py
@@ -0,0 +1,34 @@
+import requests
+import json
+import argparse
+import concurrent.futures
+import random
+
+def extract_qText(json_data):
+    try:
+        file = open('devtest.json')
+        data = json.load(file)
+        json_data = json.loads(json_data)
+        json_data["inputs"] = data[random.randint(0, len(data) - 1)]["qText"]
+        return json.dumps(json_data)
+    except (json.JSONDecodeError, KeyError, IndexError):
+        return None
+
+def send_request(url, json_data):
+    headers = {'Content-Type': 'application/json'}
+    response = requests.post(url, data=json_data, headers=headers)
+    print(f"Question: {json_data} Response: {response.status_code} - {response.text}")
+
+def main(url, json_data, concurrency):
+    with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor:
+        future_to_url = {executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency*2)}
+        for future in concurrent.futures.as_completed(future_to_url):
+            _ = future_to_url[future]
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Concurrent client to send POST requests")
+    parser.add_argument("--url", type=str, default="http://localhost:12345", help="URL to send requests to")
+    parser.add_argument("--json_data", type=str, default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}', help="JSON data to send")
+    parser.add_argument("--concurrency", type=int, default=100, help="Concurrency level")
+    args = parser.parse_args()
+    main(args.url, args.json_data, args.concurrency)
--- a/ChatQnA/benchmarking/devtest.json
+++ b/ChatQnA/benchmarking/devtest.json
--- a/ChatQnA/deployment/nginx/.env
+++ b/ChatQnA/deployment/nginx/.env
@@ -0,0 +1,7 @@
+HUGGING_FACE_HUB_TOKEN=<your-hf-token>
+volume=./data
+model=meta-llama/Llama-2-13b-chat-hf
+MAX_TOTAL_TOKENS=2000
+ENABLE_HPU_GRAPH=True
+PT_HPU_ENABLE_LAZY_COLLECTIVES=true
+OMPI_MCA_btl_vader_single_copy_mechanism=none
--- a/ChatQnA/deployment/nginx/README.md
+++ b/ChatQnA/deployment/nginx/README.md
@@ -0,0 +1,9 @@
+## Launch 8 models on 8 separate Gaudi2 cards:
+
+Add HuggingFace access token in  .env <br/>
+Optinally change model name and linked volume direcotry to store downloaded model<br/><br/>
+Run the following command in your terminal to launch nginx load balancer and 8 instances of tgi_gaudi containers (one for each Gaudi card):
+
+```
+docker compose up -f docker-compose.yml -d 
+```
--- a/ChatQnA/deployment/nginx/docker-compose.yml
+++ b/ChatQnA/deployment/nginx/docker-compose.yml
@@ -0,0 +1,135 @@
+version: '3'
+services:
+  gaudi0:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8081:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=0
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi1:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8082:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=1
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi2:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8083:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=2
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi3:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8084:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=3
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi4:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8085:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=4
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi5:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8086:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=5
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi6:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8087:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=6
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  gaudi7:
+    image: tgi_gaudi
+    runtime: habana
+    ports:
+      - "8088:80"
+    env_file:
+      - .env
+    environment:
+      - HABANA_VISIBLE_DEVICES=7
+    volumes:
+      - $volume:/data
+    cap_add:
+      - sys_nice
+    ipc: "host"
+    command: ["--model-id", "$model"]
+  nginx:
+    build: ./nginx
+    ports:
+      - "80:80"
+    depends_on:
+      - gaudi0
+      - gaudi1
+      - gaudi2
+      - gaudi3
+      - gaudi4
+      - gaudi5
+      - gaudi6
+      - gaudi7
--- a/ChatQnA/deployment/nginx/nginx/Dockerfile
+++ b/ChatQnA/deployment/nginx/nginx/Dockerfile
@@ -0,0 +1,11 @@
+# FROM nginx
+
+# RUN rm /etc/nginx/conf.d/default.conf
+# COPY nginx.conf /etc/nginx/conf.d/default.conf
+
+
+FROM nginx:latest
+RUN rm /etc/nginx/conf.d/default.conf
+COPY nginx.conf /etc/nginx/conf.d/default.conf
+EXPOSE 80
+CMD ["nginx", "-g", "daemon off;"]
--- a/ChatQnA/deployment/nginx/nginx/nginx.conf
+++ b/ChatQnA/deployment/nginx/nginx/nginx.conf
@@ -0,0 +1,23 @@
+upstream backend {
+    least_conn;
+    server gaudi0:80 max_fails=3 fail_timeout=30s;
+    server gaudi1:80 max_fails=3 fail_timeout=30s;
+    server gaudi2:80 max_fails=3 fail_timeout=30s;
+    server gaudi3:80 max_fails=3 fail_timeout=30s;
+    server gaudi4:80 max_fails=3 fail_timeout=30s;
+    server gaudi5:80 max_fails=3 fail_timeout=30s;
+    server gaudi6:80 max_fails=3 fail_timeout=30s;
+    server gaudi7:80 max_fails=3 fail_timeout=30s;
+}
+
+server {
+    listen 80;
+
+    location / {
+        proxy_pass http://backend;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+}
--- a/ChatQnA/langchain/chroma/README.md
+++ b/ChatQnA/langchain/chroma/README.md
--- a/ChatQnA/langchain/docker/Dockerfile
+++ b/ChatQnA/langchain/docker/Dockerfile
@@ -0,0 +1,37 @@
+FROM langchain/langchain
+
+ARG http_proxy
+ARG https_proxy
+ENV http_proxy=$http_proxy
+ENV https_proxy=$https_proxy
+
+RUN apt-get update && \
+    apt-get upgrade -y && \
+    apt-get install -y \
+        libgl1-mesa-glx \
+        libjemalloc-dev
+
+RUN pip install --upgrade pip \
+    sentence-transformers \
+    redis \
+    unstructured \
+    unstructured[pdf] \
+    langchain-cli \
+    pydantic==1.10.13 \
+    langchain==0.1.12 \
+    poetry \
+    pymupdf \
+    easyocr \
+    langchain_benchmarks \
+    pyarrow \
+    jupyter \
+    intel-extension-for-pytorch \
+    intel-openmp
+
+ENV PYTHONPATH=/ws:/qna-app/app
+
+COPY qna-app /qna-app
+COPY qna-app-no-rag /qna-app-no-rag
+WORKDIR /qna-app
+
+ENTRYPOINT ["/usr/bin/sleep", "infinity"]
--- a/ChatQnA/langchain/docker/build_docker.sh
+++ b/ChatQnA/langchain/docker/build_docker.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+docker build . -t qna-rag-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
--- a/ChatQnA/langchain/docker/docker-compose-langchain.yml
+++ b/ChatQnA/langchain/docker/docker-compose-langchain.yml
@@ -0,0 +1,18 @@
+version: '3'
+services:
+  qna-rag-redis-server:
+    image: qna-rag-redis:latest
+    container_name: qna-rag-redis-server
+    environment:
+      - "REDIS_PORT=6379"
+      - "EMBED_MODEL=BAAI/bge-base-en-v1.5"
+      - "REDIS_SCHEMA=schema_dim_768.yml"
+      - "HUGGINGFACEHUB_API_TOKEN=<update-your-hugging-face-token>"
+    ulimits:
+      memlock:
+        soft: -1 # Set memlock to unlimited (no soft or hard limit)
+        hard: -1
+    volumes:
+      - ../redis:/ws 
+      - ../test:/test
+    network_mode: "host"
--- a/ChatQnA/langchain/docker/docker-compose-redis.yml
+++ b/ChatQnA/langchain/docker/docker-compose-redis.yml
@@ -0,0 +1,8 @@
+version: '1'
+services:
+  redis-vector-db:
+    image: redis/redis-stack:latest
+    container_name: redis-vector-db
+    ports:
+      - "6379:6379"
+      - "8001:8001"
--- a/ChatQnA/langchain/docker/qna-app/Dockerfile
+++ b/ChatQnA/langchain/docker/qna-app/Dockerfile
@@ -0,0 +1,21 @@
+FROM python:3.11-slim
+
+RUN pip install poetry==1.6.1
+
+RUN poetry config virtualenvs.create false
+
+WORKDIR /code
+
+COPY ./pyproject.toml ./README.md ./poetry.lock* ./
+
+COPY ./package[s] ./packages
+
+RUN poetry install  --no-interaction --no-ansi --no-root
+
+COPY ./app ./app
+
+RUN poetry install --no-interaction --no-ansi
+
+EXPOSE 8080
+
+CMD exec uvicorn app.server:app --host 0.0.0.0 --port 8080
--- a/ChatQnA/langchain/docker/qna-app/README.md
+++ b/ChatQnA/langchain/docker/qna-app/README.md
@@ -0,0 +1,79 @@
+# my-app
+
+## Installation
+
+Install the LangChain CLI if you haven't yet
+
+```bash
+pip install -U langchain-cli
+```
+
+## Adding packages
+
+```bash
+# adding packages from 
+# https://github.com/langchain-ai/langchain/tree/master/templates
+langchain app add $PROJECT_NAME
+
+# adding custom GitHub repo packages
+langchain app add --repo $OWNER/$REPO
+# or with whole git string (supports other git providers):
+# langchain app add git+https://github.com/hwchase17/chain-of-verification
+
+# with a custom api mount point (defaults to `/{package_name}`)
+langchain app add $PROJECT_NAME --api_path=/my/custom/path/rag
+```
+
+Note: you remove packages by their api path
+
+```bash
+langchain app remove my/custom/path/rag
+```
+
+## Setup LangSmith (Optional)
+LangSmith will help us trace, monitor and debug LangChain applications. 
+LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). 
+If you don't have access, you can skip this section
+
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=<your-api-key>
+export LANGCHAIN_PROJECT=<your-project>  # if not specified, defaults to "default"
+```
+
+## Launch LangServe
+
+```bash
+langchain serve
+```
+
+## Running in Docker
+
+This project folder includes a Dockerfile that allows you to easily build and host your LangServe app.
+
+### Building the Image
+
+To build the image, you simply:
+
+```shell
+docker build . -t my-langserve-app
+```
+
+If you tag your image with something other than `my-langserve-app`,
+note it for use in the next step.
+
+### Running the Image Locally
+
+To run the image, you'll need to include any environment variables
+necessary for your application.
+
+In the below example, we inject the `OPENAI_API_KEY` environment
+variable with the value set in my local environment
+(`$OPENAI_API_KEY`)
+
+We also expose port 8080 with the `-p 8080:8080` option.
+
+```shell
+docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -p 8080:8080 my-langserve-app
+```
--- a/ChatQnA/langchain/docker/qna-app/app/init.py
+++ b/ChatQnA/langchain/docker/qna-app/app/init.py
--- a/ChatQnA/langchain/docker/qna-app/app/prompts.py
+++ b/ChatQnA/langchain/docker/qna-app/app/prompts.py
@@ -0,0 +1,51 @@
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
+
+
+# ========= Raw Q&A template prompt =========
+template = """
+    Use the following pieces of context from retrieved
+    dataset to answer the question. Do not make up an answer if there is no
+    context provided to help answer it. Include the 'source' and 'start_index'
+    from the metadata included in the context you used to answer the question
+
+    Context:
+    ---------
+    {context}
+
+    ---------
+    Question: {question}
+    ---------
+
+    Answer:
+"""
+prompt = ChatPromptTemplate.from_template(template)
+
+
+# ========= contextualize prompt =========
+contextualize_q_system_prompt = """Given a chat history and the latest user question \
+which might reference context in the chat history, formulate a standalone question \
+which can be understood without the chat history. Do NOT answer the question, \
+just reformulate it if needed and otherwise return it as is."""
+contextualize_q_prompt = ChatPromptTemplate.from_messages(
+    [
+        ("system", contextualize_q_system_prompt),
+        MessagesPlaceholder(variable_name="chat_history"),
+        ("human", "{question}"),
+    ]
+)
+
+
+# ========= Q&A with history prompt =========
+qa_system_prompt = """You are an assistant for question-answering tasks. \
+Use the following pieces of retrieved context to answer the question. \
+If you don't know the answer, just say that you don't know. \
+Use three sentences maximum and keep the answer concise.\
+
+{context}"""
+qa_prompt = ChatPromptTemplate.from_messages(
+    [
+        ("system", qa_system_prompt),
+        MessagesPlaceholder(variable_name="chat_history"),
+        ("human", "{question}"),
+    ]
+)
--- a/ChatQnA/langchain/docker/qna-app/app/server.py
+++ b/ChatQnA/langchain/docker/qna-app/app/server.py
@@ -0,0 +1,252 @@
+import os
+from fastapi import FastAPI, APIRouter, Request, UploadFile, File
+from fastapi.responses import RedirectResponse, StreamingResponse, JSONResponse
+from langserve import add_routes
+from rag_redis.chain import chain as qna_rag_redis_chain
+from starlette.middleware.cors import CORSMiddleware
+from langchain_community.llms import HuggingFaceEndpoint
+from langchain_community.embeddings import HuggingFaceBgeEmbeddings
+from langchain_community.vectorstores import Redis
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.messages import HumanMessage
+from langchain_core.runnables import RunnableParallel, RunnablePassthrough
+from rag_redis.config import EMBED_MODEL, INDEX_NAME, REDIS_URL, INDEX_SCHEMA
+from utils import (
+    create_retriever_from_files, reload_retriever, create_kb_folder, 
+    get_current_beijing_time, create_retriever_from_links
+)
+from prompts import contextualize_q_prompt, qa_prompt
+
+app = FastAPI()
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"])
+
+
+class RAGAPIRouter(APIRouter):
+
+    def __init__(self, upload_dir, entrypoint) -> None:
+        super().__init__()
+        self.upload_dir = upload_dir
+        self.entrypoint = entrypoint
+        print(f"[rag - router] Initializing API Router, params:\n \
+                    upload_dir={upload_dir}, entrypoint={entrypoint}")
+
+        # Define LLM
+        self.llm = HuggingFaceEndpoint(
+            endpoint_url=entrypoint,
+            max_new_tokens=512,
+            top_k=10,
+            top_p=0.95,
+            typical_p=0.95,
+            temperature=0.01,
+            repetition_penalty=1.03,
+            streaming=True,
+        )
+        print("[rag - router] LLM initialized.")
+
+        # Define LLM Chain
+        self.embeddings = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
+        rds = Redis.from_existing_index(
+            self.embeddings,
+            index_name=INDEX_NAME,
+            redis_url=REDIS_URL,
+            schema=INDEX_SCHEMA,
+        )
+        retriever = rds.as_retriever(search_type="mmr")
+
+        # Define contextualize chain
+        self.contextualize_q_chain = contextualize_q_prompt | self.llm | StrOutputParser()
+
+        # Define LLM chain
+        self.llm_chain = (
+            RunnablePassthrough.assign(
+                context=self.contextualized_question | retriever
+            )
+            | qa_prompt
+            | self.llm
+        )
+        print("[rag - router] LLM chain initialized.")
+
+        # Define chat history
+        self.chat_history = []
+
+    def contextualized_question(self, input: dict):
+            if input.get("chat_history"):
+                return self.contextualize_q_chain
+            else:
+                return input["question"]
+
+    def handle_rag_chat(self, query: str):
+        response = self.llm_chain.invoke({"question": query, "chat_history": self.chat_history})
+        result = response.split("</s>")[0]
+        self.chat_history.extend([HumanMessage(content=query), response])
+        return result
+
+
+upload_dir = os.getenv("RAG_UPLOAD_DIR", "./upload_dir")
+tgi_endpoint = os.getenv("TGI_ENDPOINT", "http://localhost:8080")
+router = RAGAPIRouter(upload_dir, tgi_endpoint)
+
+
+@router.post("/v1/rag/chat")
+async def rag_chat(request: Request):
+    params = await request.json()
+    print(f"[rag - chat] POST request: /v1/rag/chat, params:{params}")
+    query = params['query']
+    kb_id = params.get("knowledge_base_id", "default")
+    print(f"[rag - chat] history: {router.chat_history}")
+
+    if kb_id == "default":
+        print(f"[rag - chat] use default knowledge base")
+        retriever = reload_retriever(router.embeddings, INDEX_NAME)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+    elif kb_id.startswith("kb"):
+        new_index_name = INDEX_NAME + kb_id
+        print(f"[rag - chat] use knowledge base {kb_id}, index name is {new_index_name}")
+        retriever = reload_retriever(router.embeddings, new_index_name)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+    else:
+        return JSONResponse(status_code=400, content={"message":"Wrong knowledge base id."})
+    return router.handle_rag_chat(query=query)
+
+
+@router.post("/v1/rag/chat_stream")
+async def rag_chat(request: Request):
+    params = await request.json()
+    print(f"[rag - chat_stream] POST request: /v1/rag/chat_stream, params:{params}")
+    query = params['query']
+    kb_id = params.get("knowledge_base_id", "default")
+    print(f"[rag - chat_stream] history: {router.chat_history}")
+
+    if kb_id == "default":
+        retriever = reload_retriever(router.embeddings, INDEX_NAME)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+    elif kb_id.startswith("kb"):
+        new_index_name = INDEX_NAME + kb_id
+        retriever = reload_retriever(router.embeddings, new_index_name)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+    else:
+        return JSONResponse(status_code=400, content={"message":"Wrong knowledge base id."})
+
+    def stream_generator():
+        for text in router.llm_chain.stream({"question": query, "chat_history": router.chat_history}):
+            # print(f"[rag - chat_stream] text: {text}")
+            if text == " ":
+                yield f"data: @#$\n\n"
+                continue
+            if text.isspace():
+                continue
+            if "\n" in text:
+                yield f"data: <br/>\n\n"
+            new_text = text.replace(" ", "@#$")
+            yield f"data: {new_text}\n\n"
+        yield f"data: [DONE]\n\n"
+
+    return StreamingResponse(stream_generator(), media_type="text/event-stream")
+
+
+@router.post("/v1/rag/create")
+async def rag_create(file: UploadFile = File(...)):
+    filename = file.filename
+    if '/' in filename:
+        filename = filename.split('/')[-1]
+    print(f"[rag - create] POST request: /v1/rag/create, filename:{filename}")
+
+    kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir)
+    # save file to local path
+    cur_time = get_current_beijing_time()
+    save_file_name = str(user_upload_dir) + '/' + cur_time + '-' + filename
+    with open(save_file_name, 'wb') as fout:
+        content = await file.read()
+        fout.write(content)
+    print(f"[rag - create] file saved to local path: {save_file_name}")
+
+    # create new retriever
+    try:
+        # get retrieval instance and reload db with new knowledge base
+        print("[rag - create] starting to create local db...")
+        index_name = INDEX_NAME + kb_id
+        retriever = create_retriever_from_files(save_file_name, router.embeddings, index_name)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+        print(f"[rag - create] kb created successfully")
+    except Exception as e:
+        print(f"[rag - create] create knowledge base failed! {e}")
+        return JSONResponse(status_code=500, content={"message":"Fail to create new knowledge base."})
+    return {"knowledge_base_id": kb_id}
+
+
+@router.post("/v1/rag/upload_link")
+async def rag_create(request: Request):
+    params = await request.json()
+    link_list = params['link_list']
+    print(f"[rag - upload_link] POST request: /v1/rag/upload_link, link list:{link_list}")
+
+    kb_id, user_upload_dir, user_persist_dir = create_kb_folder(router.upload_dir)
+
+    # create new retriever
+    try:
+        print("[rag - upload_link] starting to create local db...")
+        index_name = INDEX_NAME + kb_id
+        retriever = create_retriever_from_links(router.embeddings, link_list, index_name)
+        router.llm_chain = (
+            RunnablePassthrough.assign(
+                context=router.contextualized_question | retriever
+            )
+            | qa_prompt
+            | router.llm
+        )
+        print(f"[rag - upload_link] kb created successfully")
+    except Exception as e:
+        print(f"[rag - upload_link] create knowledge base failed! {e}")
+        return JSONResponse(status_code=500, content={"message":"Fail to create new knowledge base."})
+    return {"knowledge_base_id": kb_id}
+
+
+app.include_router(router)
+
+
+@app.get("/")
+async def redirect_root_to_docs():
+    return RedirectResponse("/docs")
+
+add_routes(app, qna_rag_redis_chain, path="/rag-redis")
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(app, host="0.0.0.0", port=8000)
--- a/ChatQnA/langchain/docker/qna-app/app/utils.py
+++ b/ChatQnA/langchain/docker/qna-app/app/utils.py
@@ -0,0 +1,327 @@
+
+import os
+import re
+import uuid
+import requests
+import unicodedata
+import multiprocessing
+from pathlib import Path
+from bs4 import BeautifulSoup
+from urllib.parse import urlparse, urlunparse
+from datetime import timedelta, timezone, datetime
+
+from langchain_community.document_loaders import UnstructuredFileLoader
+from langchain_community.vectorstores import Redis
+from langchain_core.documents import Document
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+
+
+from rag_redis.config import INDEX_SCHEMA, REDIS_URL
+
+def get_current_beijing_time():
+    SHA_TZ = timezone(
+        timedelta(hours=8),
+        name='Asia/Shanghai'
+    )
+    utc_now = datetime.utcnow().replace(tzinfo=timezone.utc)
+    beijing_time = utc_now.astimezone(SHA_TZ).strftime("%Y-%m-%d-%H:%M:%S")
+    return beijing_time
+
+
+def create_kb_folder(upload_dir):
+    kb_id = f"kb_{str(uuid.uuid1())[:8]}"
+    path_prefix = upload_dir
+
+    # create local folder for retieval
+    cur_path = Path(path_prefix) / kb_id
+    os.makedirs(path_prefix, exist_ok=True)
+    cur_path.mkdir(parents=True, exist_ok=True)
+    user_upload_dir = Path(path_prefix) / f"{kb_id}/upload_dir"
+    user_persist_dir = Path(path_prefix) / f"{kb_id}/persist_dir"
+    user_upload_dir.mkdir(parents=True, exist_ok=True)
+    user_persist_dir.mkdir(parents=True, exist_ok=True)
+    print(f"[rag - create kb folder] upload path: {user_upload_dir}, persist path: {user_persist_dir}")
+    return kb_id, str(user_upload_dir), str(user_persist_dir)
+
+
+class Crawler:
+
+    def __init__(self, pool=None):
+        if pool:
+            assert isinstance(pool, (str, list, tuple)), 'url pool should be str, list or tuple'
+        self.pool = pool
+        self.headers = {
+            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng, \
+            */*;q=0.8,application/signed-exchange;v=b3;q=0.7',
+            'Accept-Encoding': 'gzip, deflate, br',
+            'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, \
+            like Gecko) Chrome/113.0.0.0 Safari/537.36'
+        }
+        self.fetched_pool = set()
+
+    def get_sublinks(self, soup):
+        sublinks = []
+        for links in soup.find_all('a'):
+            sublinks.append(str(links.get('href')))
+        return sublinks
+
+    def get_hyperlink(self, soup, base_url):
+        sublinks = []
+        for links in soup.find_all('a'):
+            link = str(links.get('href'))
+            if link.startswith('#') or link is None or link == 'None':
+                continue
+            suffix = link.split('/')[-1]
+            if '.' in suffix and suffix.split('.')[-1] not in ['html', 'htmld']:
+                continue
+            link_parse = urlparse(link)
+            base_url_parse = urlparse(base_url)
+            if link_parse.path == '':
+                continue
+            if link_parse.netloc != '':
+                # keep crawler works in the same domain
+                if link_parse.netloc != base_url_parse.netloc:
+                    continue
+                sublinks.append(link)
+            else:
+                sublinks.append(urlunparse((base_url_parse.scheme,
+                                            base_url_parse.netloc,
+                                            link_parse.path,
+                                            link_parse.params,
+                                            link_parse.query,
+                                            link_parse.fragment)))
+        return sublinks
+
+    def fetch(self, url, headers=None, max_times=5):
+        if not headers:
+            headers = self.headers
+        while max_times:
+            if not url.startswith('http') or not url.startswith('https'):
+                url = 'http://' + url
+            print('start fetch %s...', url)
+            try:
+                response = requests.get(url, headers=headers, verify=True)
+                if response.status_code != 200:
+                    print('fail to fetch %s, response status code: %s', url, response.status_code)
+                else:
+                    return response
+            except Exception as e:
+                print('fail to fetch %s, caused by %s', url, e)
+                raise Exception(e)
+            max_times -= 1
+        return None
+
+    def process_work(self, sub_url, work):
+        response = self.fetch(sub_url)
+        if response is None:
+            return []
+        self.fetched_pool.add(sub_url)
+        soup = self.parse(response.text)
+        base_url = self.get_base_url(sub_url)
+        sublinks = self.get_hyperlink(soup, base_url)
+        if work:
+            work(sub_url, soup)
+        return sublinks
+
+    def crawl(self, pool, work=None, max_depth=10, workers=10):
+        url_pool = set()
+        for url in pool:
+            base_url = self.get_base_url(url)
+            response = self.fetch(url)
+            soup = self.parse(response.text)
+            sublinks = self.get_hyperlink(soup, base_url)
+            self.fetched_pool.add(url)
+            url_pool.update(sublinks)
+            depth = 0
+            while len(url_pool) > 0 and depth < max_depth:
+                print('current depth %s...', depth)
+                mp = multiprocessing.Pool(processes=workers)
+                results = []
+                for sub_url in url_pool:
+                    if sub_url not in self.fetched_pool:
+                        results.append(mp.apply_async(self.process_work, (sub_url, work)))
+                mp.close()
+                mp.join()
+                url_pool = set()
+                for result in results:
+                    sublinks = result.get()
+                    url_pool.update(sublinks)
+                depth += 1
+
+    def parse(self, html_doc):
+        soup = BeautifulSoup(html_doc, 'lxml')
+        return soup
+
+    def download(self, url, file_name):
+        print('download %s into %s...', url, file_name)
+        try:
+            r = requests.get(url, stream=True, headers=self.headers, verify=True)
+            f = open(file_name, "wb")
+            for chunk in r.iter_content(chunk_size=512):
+                if chunk:
+                    f.write(chunk)
+        except Exception as e:
+            print('fail to download %s, caused by %s', url, e)
+
+    def get_base_url(self, url):
+        result = urlparse(url)
+        return urlunparse((result.scheme, result.netloc, '', '', '', ''))
+
+    def clean_text(self, text):
+        text = text.strip().replace('\r', '\n')
+        text = re.sub(' +', ' ', text)
+        text = re.sub('\n+', '\n', text)
+        text = text.split('\n')
+        return '\n'.join([i for i in text if i and i != ' '])
+
+
+def uni_pro(text):
+    """Check if the character is ASCII or falls in the category of non-spacing marks."""
+    normalized_text = unicodedata.normalize('NFKD', text)
+    filtered_text = ''
+    for char in normalized_text:
+        if ord(char) < 128 or unicodedata.category(char) == 'Mn':
+            filtered_text += char
+    return filtered_text
+
+
+def load_html_data(url):
+    crawler = Crawler()
+    res = crawler.fetch(url)
+    if res == None:
+        return None
+    soup = crawler.parse(res.text)
+    all_text = crawler.clean_text(soup.select_one('body').text)
+    main_content = ''
+    for element_name in ['main', 'container']:
+        main_block = None
+        if soup.select(f'.{element_name}'):
+            main_block = soup.select(f'.{element_name}')
+        elif soup.select(f'#{element_name}'):
+            main_block = soup.select(f'#{element_name}')
+        if main_block:
+            for element in main_block:
+                text = crawler.clean_text(element.text)
+                if text not in main_content:
+                    main_content += f'\n{text}'
+            main_content = crawler.clean_text(main_content)
+
+    main_content = main_content.replace('\n', '')
+    main_content = main_content.replace('\n\n', '')
+    main_content = uni_pro(main_content)
+    main_content = re.sub(r'\s+', ' ', main_content)
+
+    # {'text': all_text, 'main_content': main_content}
+
+    return main_content
+
+
+def get_chuck_data(content, max_length, min_length, input):
+    """Process the context to make it maintain a suitable length for the generation."""
+    sentences = re.split('(?<=[!.?])', content)
+
+    paragraphs = []
+    current_length = 0
+    count = 0
+    current_paragraph = ""
+    for sub_sen in sentences:
+        count +=1
+        sentence_length = len(sub_sen)
+        if current_length + sentence_length <= max_length:
+            current_paragraph += sub_sen
+            current_length += sentence_length
+            if count == len(sentences) and len(current_paragraph.strip())>min_length:
+                paragraphs.append([current_paragraph.strip() ,input])
+        else:
+            paragraphs.append([current_paragraph.strip() ,input])
+            current_paragraph = sub_sen
+            current_length = sentence_length
+
+    return paragraphs
+
+
+def parse_html(input):
+        """
+        Parse the uploaded file.
+        """
+        chucks = []
+        for link in input:
+            if re.match(r'^https?:/{2}\w.+$', link):
+                content = load_html_data(link)
+                if content == None:
+                    continue
+                chuck = [[content.strip(), link]]
+                chucks += chuck
+            else:
+                print("The given link/str {} cannot be parsed.".format(link))
+
+        return chucks
+
+
+def document_transfer(data_collection):
+    "Transfer the raw document into langchain supported format."
+    documents = []
+    for data, meta in data_collection:
+        doc_id = str(uuid.uuid4())
+        metadata = {"source": meta, "identify_id":doc_id}
+        doc = Document(page_content=data, metadata=metadata)
+        documents.append(doc)
+    return documents
+
+
+def create_retriever_from_files(doc, embeddings, index_name: str):
+    print(f"[rag - create retriever] create with index: {index_name}")
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1500, chunk_overlap=100, add_start_index=True
+    )
+    loader = UnstructuredFileLoader(doc, mode="single", strategy="fast")
+    chunks = loader.load_and_split(text_splitter)
+
+    rds = Redis.from_texts(
+        texts=[chunk.page_content for chunk in chunks],
+        metadatas=[chunk.metadata for chunk in chunks],
+        embedding=embeddings,
+        index_name=index_name,
+        redis_url=REDIS_URL,
+        index_schema=INDEX_SCHEMA,
+    )
+
+    retriever = rds.as_retriever(search_type="mmr")
+    return retriever
+
+
+def create_retriever_from_links(embeddings, link_list: list, index_name):
+    data_collection = parse_html(link_list)
+    texts = []
+    metadatas = []
+    for data, meta in data_collection:
+        doc_id = str(uuid.uuid4())
+        metadata = {"source": meta, "identify_id":doc_id}
+        texts.append(data)
+        metadatas.append(metadata)
+
+    rds = Redis.from_texts(
+        texts=texts,
+        metadatas=metadatas,
+        embedding=embeddings,
+        index_name=index_name,
+        redis_url=REDIS_URL,
+        index_schema=INDEX_SCHEMA,
+    )
+
+    retriever = rds.as_retriever(search_type="mmr")
+    return retriever
+
+
+def reload_retriever(embeddings, index_name):
+    print(f"[rag - reload retriever] reload with index: {index_name}")
+    rds = Redis.from_existing_index(
+        embeddings,
+        index_name=index_name,
+        redis_url=REDIS_URL,
+        schema=INDEX_SCHEMA,
+    )
+
+    retriever = rds.as_retriever(search_type="mmr")
+    return retriever
--- a/ChatQnA/langchain/docker/qna-app/packages/README.md
+++ b/ChatQnA/langchain/docker/qna-app/packages/README.md
--- a/ChatQnA/langchain/docker/qna-app/pyproject.toml
+++ b/ChatQnA/langchain/docker/qna-app/pyproject.toml
@@ -0,0 +1,23 @@
+[tool.poetry]
+name = "my-app"
+version = "0.1.0"
+description = ""
+authors = ["Your Name <you@example.com>"]
+readme = "README.md"
+packages = [
+    { include = "app" },
+]
+
+[tool.poetry.dependencies]
+python = "^3.11"
+uvicorn = "^0.23.2"
+langserve = {extras = ["server"], version = ">=0.0.30"}
+pydantic = "<2"
+
+
+[tool.poetry.group.dev.dependencies]
+langchain-cli = ">=0.0.15"
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
--- a/ChatQnA/langchain/redis/LICENSE
+++ b/ChatQnA/langchain/redis/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 LangChain, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/ChatQnA/langchain/redis/data/nke-10k-2023.pdf
+++ b/ChatQnA/langchain/redis/data/nke-10k-2023.pdf
--- a/ChatQnA/langchain/redis/data_intel/ia_spec.pdf
+++ b/ChatQnA/langchain/redis/data_intel/ia_spec.pdf
--- a/ChatQnA/langchain/redis/ingest.py
+++ b/ChatQnA/langchain/redis/ingest.py
@@ -0,0 +1,82 @@
+import os
+
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Redis
+from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL
+from PIL import Image
+import numpy as np
+import io
+
+
+def pdf_loader(file_path):
+    try:
+        import fitz  # noqa:F401
+        import easyocr
+    except ImportError:
+        raise ImportError(
+            "`PyMuPDF` or 'easyocr' package is not found, please install it with "
+            "`pip install pymupdf or pip install easyocr.`"
+        )
+
+    doc = fitz.open(file_path)
+    reader = easyocr.Reader(['en'])
+    result =''
+    for i in range(doc.page_count):
+        page = doc.load_page(i)
+        pagetext = page.get_text().strip()
+        if pagetext:
+            result=result+pagetext
+        if len(doc.get_page_images(i)) > 0 :
+            for img in doc.get_page_images(i):
+                if img:
+                    pageimg=''
+                    xref = img[0]
+                    img_data = doc.extract_image(xref)
+                    img_bytes = img_data['image']
+                    pil_image = Image.open(io.BytesIO(img_bytes))
+                    img = np.array(pil_image)
+                    img_result = reader.readtext(img, paragraph=True, detail=0)
+                    pageimg=pageimg + ', '.join(img_result).strip()
+                    if pageimg.endswith('!') or pageimg.endswith('?') or pageimg.endswith('.'):
+                        pass
+                    else:
+                        pageimg=pageimg+'.'
+                result=result+pageimg
+    return result
+
+def ingest_documents():
+    """
+    Ingest PDF to Redis from the data/ directory that
+    contains Edgar 10k filings data for Nike.
+    """
+    # Load list of pdfs
+    company_name = "Nike"
+    data_path = "data/"
+    doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0]
+
+    print("Parsing 10k filing doc for NIKE", doc_path)  # noqa: T201
+
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1500, chunk_overlap=100, add_start_index=True
+    )
+    content = pdf_loader(doc_path)
+    chunks = text_splitter.split_text(content)
+
+    print("Done preprocessing. Created ", len(chunks), " chunks of the original pdf")  # noqa: T201
+    # Create vectorstore
+    embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
+
+    _ = Redis.from_texts(
+        # appending this little bit can sometimes help with semantic retrieval
+        # especially with multiple companies
+        texts=[f"Company: {company_name}. " + chunk for chunk in chunks],
+        embedding=embedder,
+        index_name=INDEX_NAME,
+        index_schema=INDEX_SCHEMA,
+        redis_url=REDIS_URL,
+    )
+
+
+if __name__ == "__main__":
+    ingest_documents()
--- a/ChatQnA/langchain/redis/ingest_dir_text.py
+++ b/ChatQnA/langchain/redis/ingest_dir_text.py
@@ -0,0 +1,31 @@
+from langchain_community.document_loaders import DirectoryLoader
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.document_loaders import UnstructuredFileLoader
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Redis
+from langchain_community.document_loaders import TextLoader
+from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL
+
+loader = DirectoryLoader('/ws/txt_files', glob="**/*.txt", show_progress=True, use_multithreading=True, loader_cls=TextLoader)
+
+text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1500, chunk_overlap=100, add_start_index=True
+    )
+
+chunks = loader.load_and_split(text_splitter)
+print("Done preprocessing. Created", len(chunks), "chunks of the original data")  # noqa: T201
+
+# Create vectorstore
+embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
+
+company_name = "Intel"
+_ = Redis.from_texts(
+        # appending this little bit can sometimes help with semantic retrieval
+        # especially with multiple companies
+        texts=[f"Company: {company_name}. " + chunk.page_content for chunk in chunks],
+        metadatas=[chunk.metadata for chunk in chunks],
+        embedding=embedder,
+        index_name=INDEX_NAME,
+        index_schema=INDEX_SCHEMA,
+        redis_url=REDIS_URL,
+    )
--- a/ChatQnA/langchain/redis/ingest_intel.py
+++ b/ChatQnA/langchain/redis/ingest_intel.py
@@ -0,0 +1,82 @@
+import os
+
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Redis
+from rag_redis.config import EMBED_MODEL, INDEX_NAME, INDEX_SCHEMA, REDIS_URL
+from PIL import Image
+import numpy as np
+import io
+
+
+def pdf_loader(file_path):
+    try:
+        import fitz  # noqa:F401
+        import easyocr
+    except ImportError:
+        raise ImportError(
+            "`PyMuPDF` or 'easyocr' package is not found, please install it with "
+            "`pip install pymupdf or pip install easyocr.`"
+        )
+
+    doc = fitz.open(file_path)
+    reader = easyocr.Reader(['en'])
+    result =''
+    for i in range(doc.page_count):
+        page = doc.load_page(i)
+        pagetext = page.get_text().strip()
+        if pagetext:
+            result=result+pagetext
+        if len(doc.get_page_images(i)) > 0 :
+            for img in doc.get_page_images(i):
+                if img:
+                    pageimg=''
+                    xref = img[0]
+                    img_data = doc.extract_image(xref)
+                    img_bytes = img_data['image']
+                    pil_image = Image.open(io.BytesIO(img_bytes))
+                    img = np.array(pil_image)
+                    img_result = reader.readtext(img, paragraph=True, detail=0)
+                    pageimg=pageimg + ', '.join(img_result).strip()
+                    if pageimg.endswith('!') or pageimg.endswith('?') or pageimg.endswith('.'):
+                        pass
+                    else:
+                        pageimg=pageimg+'.'
+                result=result+pageimg
+    return result
+
+def ingest_documents():
+    """
+    Ingest PDF to Redis from the data/ directory that
+    contains Intel manuals.
+    """
+    # Load list of pdfs
+    company_name = "Intel"
+    data_path = "data_intel/"
+    doc_path = [os.path.join(data_path, file) for file in os.listdir(data_path)][0]
+
+    print("Parsing Intel architecture manuals", doc_path)  # noqa: T201
+
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1500, chunk_overlap=100, add_start_index=True
+    )
+    content = pdf_loader(doc_path)
+    chunks = text_splitter.split_text(content)
+
+    print("Done preprocessing. Created", len(chunks), "chunks of the original pdf")  # noqa: T201
+    # Create vectorstore
+    embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
+
+    _ = Redis.from_texts(
+        # appending this little bit can sometimes help with semantic retrieval
+        # especially with multiple companies
+        texts=[f"Company: {company_name}. " + chunk for chunk in chunks],
+        embedding=embedder,
+        index_name=INDEX_NAME,
+        index_schema=INDEX_SCHEMA,
+        redis_url=REDIS_URL,
+    )
+
+
+if __name__ == "__main__":
+    ingest_documents()
--- a/ChatQnA/langchain/redis/rag_redis.ipynb
+++ b/ChatQnA/langchain/redis/rag_redis.ipynb
@@ -0,0 +1,88 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "681a5d1e",
+   "metadata": {},
+   "source": [
+    "## Connect to RAG App\n",
+    "\n",
+    "Assuming you are already running this server:\n",
+    "```bash\n",
+    "langserve start\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "d774be2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Nike's revenue in 2023 was $51.2 billion. \n",
+      "\n",
+      "Source: 'data/nke-10k-2023.pdf', Start Index: '146100'\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langserve.client import RemoteRunnable\n",
+    "\n",
+    "rag_redis = RemoteRunnable(\"http://localhost:8000/rag-redis\")\n",
+    "\n",
+    "print(rag_redis.invoke(\"What was Nike's revenue in 2023?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "07ae0005",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "As of May 31, 2023, Nike had approximately 83,700 employees worldwide. This information can be found in the first piece of context provided. (source: data/nke-10k-2023.pdf, start_index: 32532)\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(rag_redis.invoke(\"How many employees work at Nike?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4a6b9f00",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/ChatQnA/langchain/redis/rag_redis/init.py
+++ b/ChatQnA/langchain/redis/rag_redis/init.py
--- a/ChatQnA/langchain/redis/rag_redis/chain.py
+++ b/ChatQnA/langchain/redis/rag_redis/chain.py
@@ -0,0 +1,82 @@
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Redis
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.pydantic_v1 import BaseModel
+from langchain_core.runnables import RunnableParallel, RunnablePassthrough
+from langchain_community.llms import HuggingFaceEndpoint
+import intel_extension_for_pytorch as ipex
+import torch
+
+from rag_redis.config import (
+    EMBED_MODEL,
+    INDEX_NAME,
+    INDEX_SCHEMA,
+    REDIS_URL,
+    TGI_ENDPOINT,
+)
+
+# Make this look better in the docs.
+class Question(BaseModel):
+    __root__: str
+
+# Init Embeddings
+embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
+embedder.client= ipex.optimize(embedder.client.eval(), dtype=torch.bfloat16)
+
+#Setup semantic cache for LLM
+from langchain.cache import RedisSemanticCache
+from langchain.globals import set_llm_cache
+set_llm_cache(RedisSemanticCache(
+    embedding=embedder,
+    redis_url=REDIS_URL
+))
+
+# Connect to pre-loaded vectorstore
+# run the ingest.py script to populate this
+vectorstore = Redis.from_existing_index(
+    embedding=embedder, index_name=INDEX_NAME, schema=INDEX_SCHEMA, redis_url=REDIS_URL
+)
+
+# TODO allow user to change parameters
+retriever = vectorstore.as_retriever(search_type="mmr")
+
+# Define our prompt
+template = """
+Use the following pieces of context from retrieved
+dataset to answer the question. Do not make up an answer if there is no
+context provided to help answer it. Include the 'source' and 'start_index'
+from the metadata included in the context you used to answer the question
+
+Context:
+---------
+{context}
+
+---------
+Question: {question}
+---------
+
+Answer:
+"""
+
+prompt = ChatPromptTemplate.from_template(template)
+
+# RAG Chain
+model = HuggingFaceEndpoint(
+    endpoint_url=TGI_ENDPOINT,
+    max_new_tokens=512,
+    top_k=10,
+    top_p=0.95,
+    typical_p=0.95,
+    temperature=0.01,
+    repetition_penalty=1.03,
+    streaming=True,
+    truncate=1024
+)
+
+chain = (
+    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
+    | prompt
+    | model
+    | StrOutputParser()
+).with_types(input_type=Question)
--- a/ChatQnA/langchain/redis/rag_redis/chain_no_rag.py
+++ b/ChatQnA/langchain/redis/rag_redis/chain_no_rag.py
@@ -0,0 +1,59 @@
+from langchain_community.chat_models import ChatOpenAI
+from langchain_community.llms import Ollama
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Redis
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_core.pydantic_v1 import BaseModel
+from langchain_core.runnables import RunnableParallel, RunnablePassthrough
+from langchain_community.llms import HuggingFaceEndpoint
+from langchain.callbacks import streaming_stdout
+import intel_extension_for_pytorch as ipex
+import torch
+
+from rag_redis.config import (
+    EMBED_MODEL,
+    INDEX_NAME,
+    INDEX_SCHEMA,
+    REDIS_URL,
+    TGI_ENDPOINT_NO_RAG,
+)
+
+# Make this look better in the docs.
+class Question(BaseModel):
+    __root__: str
+
+
+# Define our prompt
+template = """
+Answer the question
+
+---------
+Question: {question}
+---------
+
+Answer:
+"""
+
+prompt = ChatPromptTemplate.from_template(template)
+
+# RAG Chain
+callbacks = [streaming_stdout.StreamingStdOutCallbackHandler()]
+model = HuggingFaceEndpoint(
+    endpoint_url=TGI_ENDPOINT_NO_RAG,
+    max_new_tokens=512,
+    top_k=10,
+    top_p=0.95,
+    typical_p=0.95,
+    temperature=0.01,
+    repetition_penalty=1.03,
+    streaming=True,
+    truncate=1024
+)
+
+chain = (
+    RunnableParallel({"question": RunnablePassthrough()})
+    | prompt
+    | model
+    | StrOutputParser()
+).with_types(input_type=Question)
--- a/ChatQnA/langchain/redis/rag_redis/config.py
+++ b/ChatQnA/langchain/redis/rag_redis/config.py
@@ -0,0 +1,81 @@
+import os
+
+
+def get_boolean_env_var(var_name, default_value=False):
+    """Retrieve the boolean value of an environment variable.
+
+    Args:
+    var_name (str): The name of the environment variable to retrieve.
+    default_value (bool): The default value to return if the variable
+    is not found.
+
+    Returns:
+    bool: The value of the environment variable, interpreted as a boolean.
+    """
+    true_values = {"true", "1", "t", "y", "yes"}
+    false_values = {"false", "0", "f", "n", "no"}
+
+    # Retrieve the environment variable's value
+    value = os.getenv(var_name, "").lower()
+
+    # Decide the boolean value based on the content of the string
+    if value in true_values:
+        return True
+    elif value in false_values:
+        return False
+    else:
+        return default_value
+
+
+# Check for openai API key
+#if "OPENAI_API_KEY" not in os.environ:
+#    raise Exception("Must provide an OPENAI_API_KEY as an env var.")
+
+
+# Whether or not to enable langchain debugging
+DEBUG = get_boolean_env_var("DEBUG", False)
+# Set DEBUG env var to "true" if you wish to enable LC debugging module
+if DEBUG:
+    import langchain
+
+    langchain.debug = True
+
+
+# Embedding model
+EMBED_MODEL = os.getenv("EMBED_MODEL", "sentence-transformers/all-MiniLM-L6-v2")
+
+# Redis Connection Information
+REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
+REDIS_PORT = int(os.getenv("REDIS_PORT", 6379))
+
+
+def format_redis_conn_from_env():
+    redis_url = os.getenv("REDIS_URL", None)
+    if redis_url:
+        return redis_url
+    else:
+        using_ssl = get_boolean_env_var("REDIS_SSL", False)
+        start = "rediss://" if using_ssl else "redis://"
+
+        # if using RBAC
+        password = os.getenv("REDIS_PASSWORD", None)
+        username = os.getenv("REDIS_USERNAME", "default")
+        if password is not None:
+            start += f"{username}:{password}@"
+
+        return start + f"{REDIS_HOST}:{REDIS_PORT}"
+
+
+REDIS_URL = format_redis_conn_from_env()
+
+# Vector Index Configuration
+INDEX_NAME = os.getenv("INDEX_NAME", "rag-redis")
+
+
+current_file_path = os.path.abspath(__file__)
+parent_dir = os.path.dirname(current_file_path)
+REDIS_SCHEMA = os.getenv("REDIS_SCHEMA", "schema.yml")
+schema_path = os.path.join(parent_dir, REDIS_SCHEMA)
+INDEX_SCHEMA = schema_path
+TGI_ENDPOINT = os.getenv("TGI_ENDPOINT", "http://localhost:8080")
+TGI_ENDPOINT_NO_RAG = os.getenv("TGI_ENDPOINT_NO_RAG", "http://localhost:8081")
--- a/ChatQnA/langchain/redis/rag_redis/schema.yml
+++ b/ChatQnA/langchain/redis/rag_redis/schema.yml
@@ -0,0 +1,11 @@
+text:
+- name: content
+- name: source
+numeric:
+- name: start_index
+vector:
+- name: content_vector
+  algorithm: HNSW
+  datatype: FLOAT32
+  dims: 384
+  distance_metric: COSINE
--- a/ChatQnA/langchain/redis/rag_redis/schema_dim_1024.yml
+++ b/ChatQnA/langchain/redis/rag_redis/schema_dim_1024.yml
@@ -0,0 +1,11 @@
+text:
+- name: content
+- name: source
+numeric:
+- name: start_index
+vector:
+- name: content_vector
+  algorithm: HNSW
+  datatype: FLOAT32
+  dims: 1024
+  distance_metric: COSINE
--- a/ChatQnA/langchain/redis/rag_redis/schema_dim_768.yml
+++ b/ChatQnA/langchain/redis/rag_redis/schema_dim_768.yml
@@ -0,0 +1,11 @@
+text:
+- name: content
+- name: source
+numeric:
+- name: start_index
+vector:
+- name: content_vector
+  algorithm: HNSW
+  datatype: FLOAT32
+  dims: 768
+  distance_metric: COSINE
--- a/ChatQnA/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml
+++ b/ChatQnA/langchain/redis/rag_redis/schema_lcdocs_dim_768.yml
@@ -0,0 +1,15 @@
+text:
+- name: content
+- name: changefreq
+- name: description
+- name: language
+- name: loc
+- name: priority
+- name: source
+- name: title
+vector:
+- name: content_vector
+  algorithm: HNSW
+  datatype: FLOAT32
+  dims: 768
+  distance_metric: COSINE
--- a/ChatQnA/langchain/test/README.md
+++ b/ChatQnA/langchain/test/README.md
@@ -0,0 +1,18 @@
+## Performance measurements of chain with langsmith
+Pre-requisite: Signup in langsmith [https://www.langchain.com/langsmith] and get the api token <br />
+### Steps to run perf measurements
+1. Build langchain-rag container with most updated Dockerfile
+2. Start tgi server on system with Gaudi
+3. Statr redis container with docker-compose-redis.yml
+4. Add your hugging face access token in docker-compose-langchain.yml and start langchain-rag-server container
+5. enter into langchain-rag-server container and start jupyter notebook server (can specify needed IP address and jupyter will run on port 8888)
+        ```
+        docker exec -it langchain-rag-server bash
+        cd /test
+        jupyter notebook --allow-root --ip=X.X.X.X
+        ```
+6. Launch jupyter notebook in your browser and open the tgi_gaudi.ipynb notebook
+7. Add langsmith api key in first cell of the notebook [os.environ["LANGCHAIN_API_KEY"] = "add-your-langsmith-key"  # Your API key]
+8. Clear all the cells and run all the cells
+9. The output of the last cell which calls client.run_on_dataset() will run the langchain Q&A test and captures measurements in the langsmith server. The URL to access the test result can be obtained from the output of the command
+
--- a/ChatQnA/langchain/test/ollama-xeon.ipynb
+++ b/ChatQnA/langchain/test/ollama-xeon.ipynb
--- a/ChatQnA/langchain/test/tgi_gaudi.ipynb
+++ b/ChatQnA/langchain/test/tgi_gaudi.ipynb
--- a/ChatQnA/llamaindex/README.md
+++ b/ChatQnA/llamaindex/README.md
@@ -0,0 +1 @@
+Will update soon.
--- a/ChatQnA/serving/tgi_gaudi/build_docker.sh
+++ b/ChatQnA/serving/tgi_gaudi/build_docker.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+git clone https://github.com/huggingface/tgi-gaudi.git
+cd ./tgi-gaudi/
+docker build -t tgi_gaudi . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
--- a/ChatQnA/serving/tgi_gaudi/launch_tgi_service.sh
+++ b/ChatQnA/serving/tgi_gaudi/launch_tgi_service.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+# Set default values
+default_port=8080
+default_model="Intel/neural-chat-7b-v3-3"
+default_num_cards=1
+
+# Check if all required arguments are provided
+if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then
+    echo "Usage: $0 [num_cards] [port_number] [model_name]"
+    exit 1
+fi
+
+# Assign arguments to variables
+num_cards=${1:-$default_num_cards}
+port_number=${2:-$default_port}
+model_name=${3:-$default_model}
+
+# Check if num_cards is within the valid range (1-8)
+if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then
+    echo "Error: num_cards must be between 1 and 8."
+    exit 1
+fi
+
+# Set the volume variable
+volume=$PWD/data
+
+# Build the Docker run command based on the number of cards
+if [ "$num_cards" -eq 1 ]; then
+    docker_cmd="docker run -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi --model-id $model_name"
+else
+    docker_cmd="docker run -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi --model-id $model_name --sharded true --num-shard $num_cards"
+fi
+
+# Execute the Docker run command
+eval $docker_cmd
--- a/ChatQnA/serving/vllm/README.md
+++ b/ChatQnA/serving/vllm/README.md
@@ -0,0 +1 @@
+Will update soon.
--- a/ChatQnA/ui/.editorconfig
+++ b/ChatQnA/ui/.editorconfig
@@ -0,0 +1,10 @@
+[*]
+indent_style = tab
+
+[package.json]
+indent_style = space
+indent_size = 2
+
+[*.md]
+indent_style = space
+indent_size = 2
--- a/ChatQnA/ui/.env
+++ b/ChatQnA/ui/.env
@@ -0,0 +1 @@
+DOC_BASE_URL = 'http://xxx.xxx.xxx.xxx:8000/v1/rag'
--- a/ChatQnA/ui/.eslintignore
+++ b/ChatQnA/ui/.eslintignore
@@ -0,0 +1,13 @@
+.DS_Store
+node_modules
+/build
+/.svelte-kit
+/package
+.env
+.env.*
+!.env.example
+
+# Ignore files for PNPM, NPM and YARN
+pnpm-lock.yaml
+package-lock.json
+yarn.lock
--- a/ChatQnA/ui/.eslintrc.cjs
+++ b/ChatQnA/ui/.eslintrc.cjs
@@ -0,0 +1,24 @@
+module.exports = {
+	root: true,
+	parser: "@typescript-eslint/parser",
+	extends: [
+		"eslint:recommended",
+		"plugin:@typescript-eslint/recommended",
+		"prettier",
+	],
+	plugins: ["svelte3", "@typescript-eslint", "neverthrow"],
+	ignorePatterns: ["*.cjs"],
+	overrides: [{ files: ["*.svelte"], processor: "svelte3/svelte3" }],
+	settings: {
+		"svelte3/typescript": () => require("typescript"),
+	},
+	parserOptions: {
+		sourceType: "module",
+		ecmaVersion: 2020,
+	},
+	env: {
+		browser: true,
+		es2017: true,
+		node: true,
+	},
+};
--- a/ChatQnA/ui/.prettierignore
+++ b/ChatQnA/ui/.prettierignore
@@ -0,0 +1,13 @@
+.DS_Store
+node_modules
+/build
+/.svelte-kit
+/package
+.env
+.env.*
+!.env.example
+
+# Ignore files for PNPM, NPM and YARN
+pnpm-lock.yaml
+package-lock.json
+yarn.lock
--- a/ChatQnA/ui/.prettierrc
+++ b/ChatQnA/ui/.prettierrc
@@ -0,0 +1,13 @@
+{
+	"pluginSearchDirs": [
+		"."
+	],
+	"overrides": [
+		{
+			"files": "*.svelte",
+			"options": {
+				"parser": "svelte"
+			}
+		}
+	]
+}
--- a/ChatQnA/ui/README.md
+++ b/ChatQnA/ui/README.md
@@ -0,0 +1,34 @@
+<h1 align="center" id="title"> ChatQnA Customized UI</h1>
+
+### 📸 Project Screenshots
+
+![project-screenshot](https://i.imgur.com/26zMnEr.png)
+![project-screenshot](https://i.imgur.com/fZbOiTk.png)
+![project-screenshot](https://i.imgur.com/FnY3MuU.png)
+
+
+
+<h2>🧐 Features</h2>
+
+Here're some of the project's features:
+
+- Start a Text Chat：Initiate a text chat with the ability to input written conversations, where the dialogue content can also be customized based on uploaded files.
+- Upload File: The choice between uploading locally or copying a remote link. Chat according to uploaded knowledge base.
+- Clear: Clear the record of the current dialog box without retaining the contents of the dialog box.
+- Chat history: Historical chat records can still be retained after refreshing, making it easier for users to view the context.
+- Scroll to Bottom / Top: The chat automatically slides to the bottom. Users can also click the top icon to slide to the top of the chat record.
+- End to End Time: Shows the time spent on the current conversation.
+
+<h2>🛠️ Get it Running:</h2>
+
+1. Clone the repo.
+
+2. cd command to the current folder.
+
+3. Modify the required .env variables.
+    ```
+    DOC_BASE_URL = ''
+    ```
+4. Execute `npm install` to install the corresponding dependencies.
+
+5. Execute `npm run dev` in both enviroments
--- a/ChatQnA/ui/package-lock.json
+++ b/ChatQnA/ui/package-lock.json
--- a/ChatQnA/ui/package.json
+++ b/ChatQnA/ui/package.json
@@ -0,0 +1,58 @@
+{
+  "name": "sveltekit-auth-example",
+  "version": "0.0.1",
+  "private": true,
+  "scripts": {
+    "dev": "vite dev --port 80 --host 0.0.0.0",
+    "build": "vite build",
+    "preview": "vite preview",
+    "check": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json",
+    "check:watch": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json --watch",
+    "lint": "prettier --check . && eslint .",
+    "format": "prettier --write ."
+  },
+  "devDependencies": {
+    "@fortawesome/free-solid-svg-icons": "6.2.0",
+    "@sveltejs/adapter-auto": "1.0.0-next.75",
+    "@sveltejs/kit": "^1.20.1",
+    "@tailwindcss/typography": "0.5.7",
+    "@types/debug": "4.1.7",
+    "@typescript-eslint/eslint-plugin": "^5.27.0",
+    "@typescript-eslint/parser": "^5.27.0",
+    "autoprefixer": "^10.4.7",
+    "daisyui": "3.5.1",
+    "date-picker-svelte": "^2.6.0",
+    "debug": "4.3.4",
+    "eslint": "^8.16.0",
+    "eslint-config-prettier": "^8.3.0",
+    "eslint-plugin-neverthrow": "1.1.4",
+    "eslint-plugin-svelte3": "^4.0.0",
+    "flowbite-svelte": "^0.44.4",
+    "postcss": "^8.4.23",
+    "postcss-load-config": "^4.0.1",
+    "postcss-preset-env": "^8.3.2",
+    "prettier": "^2.8.8",
+    "prettier-plugin-svelte": "^2.7.0",
+    "prettier-plugin-tailwindcss": "^0.3.0",
+    "svelte": "^3.59.1",
+    "svelte-check": "^2.7.1",
+    "svelte-fa": "3.0.3",
+    "svelte-preprocess": "^4.10.7",
+    "tailwindcss": "^3.1.5",
+    "tslib": "^2.3.1",
+    "typescript": "^4.7.4",
+    "vite": "^4.3.9"
+  },
+  "type": "module",
+  "dependencies": {
+    "date-fns": "^2.30.0",
+    "driver.js": "^1.3.0",
+    "flowbite-svelte-icons": "^1.4.0",
+    "fuse.js": "^6.6.2",
+    "lodash": "^4.17.21",
+    "ramda": "^0.29.0",
+    "sse.js": "^0.6.1",
+    "svelte-notifications": "^0.9.98",
+    "svrollbar": "^0.12.0"
+  }
+}
--- a/ChatQnA/ui/postcss.config.cjs
+++ b/ChatQnA/ui/postcss.config.cjs
@@ -0,0 +1,13 @@
+const tailwindcss = require('tailwindcss');
+const autoprefixer = require('autoprefixer');
+
+const config = {
+	plugins: [
+		//Some plugins, like tailwindcss/nesting, need to run before Tailwind,
+		tailwindcss(),
+		//But others, like autoprefixer, need to run after,
+		autoprefixer
+	]
+};
+
+module.exports = config;
--- a/ChatQnA/ui/src/app.d.ts
+++ b/ChatQnA/ui/src/app.d.ts
@@ -0,0 +1,5 @@
+// See: https://kit.svelte.dev/docs/types#app
+// import { Result} from "neverthrow";
+interface Window {
+	deviceType: string;
+}
--- a/ChatQnA/ui/src/app.html
+++ b/ChatQnA/ui/src/app.html
@@ -0,0 +1,14 @@
+<!DOCTYPE html>
+<html lang="en">
+	<head>
+		<meta charset="utf-8" />
+		<link rel="icon" href="%sveltekit.assets%/favicon.png" />
+		<meta name="viewport" content="width=device-width" />
+		%sveltekit.head%
+	</head>
+	<body>
+		<div class="h-full w-full">
+			%sveltekit.body%
+		</div>
+	</body>
+</html>
--- a/ChatQnA/ui/src/app.postcss
+++ b/ChatQnA/ui/src/app.postcss
@@ -0,0 +1,86 @@
+/* Write your global styles here, in PostCSS syntax */
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+
+html, body {
+    height: 100%;
+}
+
+.btn {
+	@apply flex-nowrap;
+}
+a.btn {
+	@apply no-underline;
+}
+.input {
+	@apply text-base;
+}
+
+.bg-dark-blue {
+	background-color: #004a86;
+}
+
+.bg-light-blue {
+	background-color: #0068b5;
+}
+
+.bg-turquoise {
+	background-color: #00a3f6;
+}
+
+.bg-header {
+	background-color: #ffffff;
+}
+
+.bg-button {
+	background-color: #0068b5;
+}
+
+.bg-title {
+	background-color: #f7f7f7;
+}
+
+.text-header {
+	color: #0068b5;
+}
+
+.text-button {
+	color: #252e47;
+}
+
+.text-title-color {
+	color: rgb(38,38,38);
+}
+
+.font-intel {
+	font-family: "intel-clear","tahoma",Helvetica,"helvetica",Arial,sans-serif;
+}
+
+.font-title-intel {
+	font-family: "intel-one","intel-clear",Helvetica,Arial,sans-serif;
+}
+
+.bg-footer {
+	background-color: #e7e7e7;
+}
+
+.bg-light-green {
+	background-color: #d7f3a1;
+}
+
+.bg-purple {
+	background-color: #653171;
+}
+
+.bg-dark-blue {
+	background-color: #224678;
+}
+
+.border-input-color {
+	border-color: #605e5c;
+}
+
+.w-12\/12 {
+	width: 100%
+}
--- a/ChatQnA/ui/src/lib/assets/avatar/svelte/Delete.svelte
+++ b/ChatQnA/ui/src/lib/assets/avatar/svelte/Delete.svelte
@@ -0,0 +1,14 @@
+<script lang="ts">
+	import { createEventDispatcher } from "svelte";
+
+	let dispatch = createEventDispatcher();
+</script>
+
+<!-- svelte-ignore a11y-click-events-have-key-events -->
+<svg
+class="absolute top-0 right-0 hover:opacity-70"
+on:click={() => { 
+	dispatch('DeleteAvatar') }}
+viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" width="20" height="20">
+<path d="M512 832c-176.448 0-320-143.552-320-320S335.552 192 512 192s320 143.552 320 320-143.552 320-320 320m0-704C300.256 128 128 300.256 128 512s172.256 384 384 384 384-172.256 384-384S723.744 128 512 128" fill="#bbbbbb"></path><path d="M649.824 361.376a31.968 31.968 0 0 0-45.248 0L505.6 460.352l-98.976-98.976a31.968 31.968 0 1 0-45.248 45.248l98.976 98.976-98.976 98.976a32 32 0 0 0 45.248 45.248l98.976-98.976 98.976 98.976a31.904 31.904 0 0 0 45.248 0 31.968 31.968 0 0 0 0-45.248L550.848 505.6l98.976-98.976a31.968 31.968 0 0 0 0-45.248" fill="#bbbbbb"></path>
+</svg>
--- a/ChatQnA/ui/src/lib/assets/chat/svelte/Assistant.svelte
+++ b/ChatQnA/ui/src/lib/assets/chat/svelte/Assistant.svelte
@@ -0,0 +1,28 @@
+	<!-- <svg
+		width="35"
+		height="35"
+		viewBox="0 0 48 48"
+		fill="none"
+		xmlns="http://www.w3.org/2000/svg"
+	>
+		<g clip-path="url(#clip0_16_93)">
+			<rect x="0.5" y="0.238312" width="47" height="47" fill="#0068B5" />
+			<path
+				d="M39.51 0.238312H8.49C4.0955 0.238312 0.5 3.83381 0.5 8.22831V39.2483C0.5 43.6428 4.0955 47.2383 8.49 47.2383H39.51C43.9045 47.2383 47.5 43.6428 47.5 39.2483V8.22831C47.5 3.83381 43.9045 0.238312 39.51 0.238312ZM44.915 39.2483C44.915 42.2328 42.4945 44.6533 39.51 44.6533H8.49C5.5055 44.6533 3.085 42.2328 3.085 39.2483V8.22831C3.085 5.24381 5.5055 2.82331 8.49 2.82331H39.51C42.4945 2.82331 44.915 5.24381 44.915 8.22831V39.2483Z"
+				fill="#0068B5"
+			/>
+			<path
+				d="M9.52393 21.3178H11.7094L11.7094 29.3548H9.52393V21.3178ZM20.3574 22.2108C20.1694 21.9523 19.8874 21.7408 19.4879 21.5763C19.1119 21.4118 18.6889 21.3178 18.2424 21.3178C17.2084 21.3178 16.3389 21.7643 15.6574 22.6338V21.4823H13.7304V29.3078H15.7984V25.7593C15.7984 24.8898 15.8454 24.2788 15.9629 23.9498C16.0569 23.6208 16.2684 23.3623 16.5504 23.1743C16.8324 22.9863 17.1614 22.8688 17.5139 22.8688C17.7959 22.8688 18.0309 22.9393 18.2424 23.0803C18.4304 23.2213 18.5949 23.4093 18.6654 23.6678C18.7594 23.9263 18.8064 24.4668 18.8064 25.3128V29.3078H20.8744V24.4433C20.8744 23.8323 20.8274 23.3858 20.7569 23.0568C20.6864 22.7513 20.5689 22.4693 20.3574 22.2108ZM25.7389 27.8038C25.5979 27.8038 25.4804 27.7803 25.3864 27.7098C25.2924 27.6393 25.2219 27.5453 25.1984 27.4513C25.1749 27.3573 25.1514 26.9813 25.1514 26.3233V23.1508H26.5614V21.5058H25.1514V18.7563L23.0834 19.9548V21.5058V23.1508V26.5583C23.0834 27.2868 23.1069 27.7803 23.1539 28.0153C23.2009 28.3443 23.2949 28.6263 23.4359 28.8143C23.5769 29.0023 23.7884 29.1668 24.0939 29.3078C24.3994 29.4253 24.7284 29.4958 25.1044 29.4958C25.7154 29.4958 26.2559 29.4018 26.7494 29.1903L26.5614 27.5923C26.2089 27.7333 25.9269 27.8038 25.7389 27.8038ZM33.7524 22.4928C33.0709 21.7173 32.1544 21.3413 31.0029 21.3413C29.9689 21.3413 29.0994 21.7173 28.4414 22.4458C27.7599 23.1743 27.4309 24.1848 27.4309 25.5008C27.4309 26.5818 27.6894 27.4748 28.2064 28.2033C28.8644 29.0963 29.8749 29.5428 31.2379 29.5428C32.1074 29.5428 32.8124 29.3548 33.3764 28.9553C33.9404 28.5558 34.3634 27.9918 34.6219 27.2163L32.5539 26.8638C32.4364 27.2633 32.2719 27.5453 32.0604 27.7098C31.8489 27.8743 31.5669 27.9683 31.2379 27.9683C30.7679 27.9683 30.3684 27.8038 30.0394 27.4513C29.7104 27.0988 29.5459 26.6288 29.5459 26.0178H34.7394C34.7394 24.4433 34.4339 23.2448 33.7524 22.4928ZM29.5694 24.7488C29.5694 24.1848 29.7104 23.7383 29.9924 23.4093C30.2979 23.0803 30.6504 22.9158 31.1204 22.9158C31.5434 22.9158 31.8959 23.0803 32.2014 23.3858C32.5069 23.6913 32.6479 24.1613 32.6714 24.7488H29.5694ZM36.4079 18.5448H38.4759V29.3548H36.4079V18.5448Z"
+				fill="white"
+			/>
+			<path
+				d="M9.52393 18.5448H11.7094L11.7094 20.5654H9.52393V18.5448ZM39.2058 53.1889C59.7131 70.5741 37.9465 53.1367 37.547 52.9722C60.5267 71.228 41.5876 53.1889 41.1411 53.1889C40.1071 53.1889 54.2638 57.2959 53.5823 58.1654L44.3775 54.0099L42.8 56.0803L44.9335 56.0763L43.617 55.1029L49.2888 57.4321C49.2888 56.5626 69.0838 68.5409 41.665 52.9722C67.9574 69.2353 48.7539 58.3534 49.0359 58.1654C49.3179 57.9774 72.2331 77.3305 48.0529 59.0448C73.8431 77.373 40.6532 52.2185 40.8647 52.3595C64.5928 69.3279 66.2469 69.734 44.0477 53.3531C68.4587 70.8049 45.1808 54.42 45.1808 55.266L49.6436 57.6191L50.8176 56.2254L46.645 54.7317C46.645 54.1207 47.0599 55.184 46.9894 54.855C46.9189 54.5495 63.0924 72.6928 39.2058 53.1889ZM45.3834 56.0442C45.2424 56.0442 60.49 64.1373 43.0764 53.1889C59.6606 67.1938 58.0346 62.1756 40.8647 50.7007C58.8678 64.6804 43.7296 53.3942 43.7296 52.7362L43.617 55.1029L43.3529 52.3595L44.7353 53.7418L43.0764 53.1889L44.244 54.855L46.1176 55.6771L42.8 57.336L45.5647 53.1889L41.9705 49.5948L46.1176 55.1029L46.3941 55.6771C46.3941 56.4056 44.3403 54.3363 44.3873 54.5713C65.2775 66.4664 68.0297 70.4029 45.348 56.6803C69.965 73.7705 43.9793 55.5361 44.2848 55.6771C44.5903 55.7946 60.4832 66.2088 41.9705 53.7418C42.5815 53.7418 44.8545 53.1837 45.348 52.9722L43.7511 52.3595C43.3986 52.5005 45.5714 56.0442 45.3834 56.0442ZM44.0342 56.5108C43.3527 55.7353 45.3338 56.783 44.1823 56.783C43.1483 56.783 44.9043 55.6048 44.2463 56.3333C43.5648 57.0618 43.7511 51.0435 43.7511 52.3595C43.7511 53.4405 43.6653 53.0133 44.1823 53.7418C44.8403 54.6348 41.7134 54.2598 43.0764 54.2598C43.9459 54.2598 43.4702 56.9103 44.0342 56.5108C44.5982 56.1113 44.1288 57.5428 44.3873 56.7673L43.7511 56.2254C55.3795 71.8986 44.3938 54.9384 44.1823 55.1029C43.9708 55.2674 44.0801 54.2598 43.7511 54.2598C56.2643 69.3767 58.4567 71.4935 44.1823 55.1029C57.894 68.7712 44.3873 57.3783 44.3873 56.7673L44.1823 56.945C44.1823 55.3705 44.7157 57.2628 44.0342 56.5108ZM44.3873 54.5713C44.3873 54.0073 43.7522 56.8398 44.0342 56.5108C44.3397 56.1818 43.495 56.2254 43.965 56.2254C44.388 56.2254 55.4258 75.7185 43.7511 56.2254C44.0566 56.5309 44.1588 56.1955 44.1823 56.783L44.3873 54.5713Z"
+				fill="#00C7FD"
+			/>
+		</g>
+		<defs>
+			<clipPath id="clip0_16_93">
+				<rect x="0.5" y="0.238312" width="47" height="47" fill="white" />
+			</clipPath>
+		</defs>
+	</svg> -->
--- a/ChatQnA/ui/src/lib/assets/chat/svelte/PaperAirplane.svelte
+++ b/ChatQnA/ui/src/lib/assets/chat/svelte/PaperAirplane.svelte
@@ -0,0 +1,52 @@
+<script lang="ts">
+	export let overrideClasses = "";
+
+	const classes = overrideClasses ? overrideClasses : `w-5 h-5 text-gray-400`;
+</script>
+
+<!-- <svg
+	class={classes}
+	width="10"
+	height="10"
+	fill="none"
+	viewBox="0 0 18 18"
+	style="min-width: 18px; min-height: 18px;"
+	><g
+		><path
+			fill="#3369FF"
+			d="M15.71 8.019 3.835 1.368a1.125 1.125 0 0 0-1.61 1.36l2.04 5.71h5.298a.562.562 0 1 1 0 1.125H4.264l-2.04 5.71a1.128 1.128 0 0 0 1.058 1.506c.194 0 .384-.05.552-.146l11.877-6.65a1.125 1.125 0 0 0 0-1.964Z"
+		/></g
+	></svg
+> -->
+<!-- 
+<svg
+	class={classes}
+	xmlns="http://www.w3.org/2000/svg"
+	fill="none"
+	viewBox="0 0 24 24"
+	stroke-width="1.5"
+	stroke="currentColor"
+>
+	<path
+		stroke-linecap="round"
+		stroke-linejoin="round"
+		d="M6 12L3.269 3.126A59.768 59.768 0 0121.485 12 59.77 59.77 0 013.27 20.876L5.999 12zm0 0h7.5"
+	/>
+</svg> -->
+
+<svg
+	t="1708926517502"
+	class={classes}
+	viewBox="0 0 1024 1024"
+	version="1.1"
+	xmlns="http://www.w3.org/2000/svg"
+	p-id="4586"
+	id="mx_n_1708926517503"
+	width="200"
+	height="200"
+	><path
+		d="M0 1024l106.496-474.112 588.8-36.864-588.8-39.936-106.496-473.088 1024 512z"
+		p-id="4587"
+		fill="#0068b5"
+	/></svg
+>
--- a/ChatQnA/ui/src/lib/assets/chat/svelte/PersonOutlined.svelte
+++ b/ChatQnA/ui/src/lib/assets/chat/svelte/PersonOutlined.svelte
@@ -0,0 +1,10 @@
+<!-- <svg
+	viewBox="0 0 1024 1024"
+	version="1.1"
+	xmlns="http://www.w3.org/2000/svg"
+	width="32"
+	height="32"
+>
+<path d="M512 512c93.866667 0 170.666667-76.8 170.666667-170.666667 0-93.866667-76.8-170.666667-170.666667-170.666667C418.133333 170.666667 341.333333 247.466667 341.333333 341.333333 341.333333 435.2 418.133333 512 512 512zM512 597.333333c-115.2 0-341.333333 55.466667-341.333333 170.666667l0 85.333333 682.666667 0 0-85.333333C853.333333 652.8 627.2 597.333333 512 597.333333z" p-id="4050" fill="#ffffff"></path></svg> -->
+
+<svg t="1708914168912" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="1581" width="200" height="200"><path d="M447.13 46.545h101.818v930.91H447.13V46.545z" fill="#0068b5" p-id="1582" data-spm-anchor-id="a313x.search_index.0.i0.12a13a81x9rPe6" class="selected"></path></svg>
--- a/ChatQnA/ui/src/lib/assets/layout/css/driver.css
+++ b/ChatQnA/ui/src/lib/assets/layout/css/driver.css
@@ -0,0 +1,87 @@
+.driverjs-theme {
+    background: transparent;
+    color: #fff;
+    box-shadow: none;
+    padding: 0;
+}
+
+.driver-popover-arrow {
+    border: 10px solid transparent;
+    animation: blink 1s 3 steps(1);
+}
+
+@keyframes blink {
+    0% { opacity: 1; }
+    50% { opacity: 0.2; }
+    100% { opacity: 1; }
+}
+
+.driver-popover.driverjs-theme .driver-popover-arrow-side-left.driver-popover-arrow {
+    border-left-color: #174ed1;
+  }
+  
+  .driver-popover.driverjs-theme .driver-popover-arrow-side-right.driver-popover-arrow {
+    border-right-color: #174ed1;
+  }
+  
+  .driver-popover.driverjs-theme .driver-popover-arrow-side-top.driver-popover-arrow {
+    border-top-color: #174ed1;
+  }
+  
+  .driver-popover.driverjs-theme .driver-popover-arrow-side-bottom.driver-popover-arrow {
+    border-bottom-color: #174ed1;
+  }
+
+.driver-popover-footer {
+    background: transparent;
+    color: #fff;
+}
+.driver-popover-title {
+    border-top-left-radius: 5px;
+    border-top-right-radius: 5px;
+}
+
+.driver-popover-title, .driver-popover-description {
+    display: block;
+    padding: 15px 15px 7px 15px;
+    background: #174ed1;
+    border: none;
+}
+
+.driver-popover-close-btn {
+    color: #fff
+}
+
+.driver-popover-footer button:hover, .driver-popover-footer button:focus {
+    background: #174ed1;
+    color: #fff;
+}
+
+.driver-popover-description {
+    padding: 5px 15px;
+    border-bottom-left-radius: 5px;
+    border-bottom-right-radius: 5px;
+}
+
+.driver-popover-title[style*=block]+.driver-popover-description  {
+    margin: 0;
+
+}
+.driver-popover-progress-text {
+    color: #fff;
+
+}
+
+.driver-popover-footer button {
+    background: #174ed1;
+    border: 2px #174ed1 dashed;
+    color: #fff;
+    border-radius: 50%;
+    text-shadow: none;
+}
+.driver-popover-close-btn:hover, .driver-popover-close-btn:focus {
+    color: #fff;
+}
+.driver-popover-navigation-btns button+button   {
+    margin-left: 10px;
+}
--- a/ChatQnA/ui/src/lib/assets/upload/next.svelte
+++ b/ChatQnA/ui/src/lib/assets/upload/next.svelte
@@ -0,0 +1,16 @@
+<svg
+	class="h-4 w-4 text-white rtl:rotate-180 dark:text-white-800"
+	aria-hidden="true"
+	xmlns="http://www.w3.org/2000/svg"
+	fill="none"
+	viewBox="0 0 6 10"
+>
+	<path
+		stroke="currentColor"
+		stroke-linecap="round"
+		stroke-linejoin="round"
+		stroke-width="2"
+		d="m1 9 4-4-4-4"
+	/>
+</svg>
+
--- a/ChatQnA/ui/src/lib/assets/upload/previous.svelte
+++ b/ChatQnA/ui/src/lib/assets/upload/previous.svelte
@@ -0,0 +1,15 @@
+<svg
+	class="h-4 w-4 text-white rtl:rotate-180 dark:text-white-800"
+	aria-hidden="true"
+	xmlns="http://www.w3.org/2000/svg"
+	fill="none"
+	viewBox="0 0 6 10"
+>
+	<path
+		stroke="currentColor"
+		stroke-linecap="round"
+		stroke-linejoin="round"
+		stroke-width="2"
+		d="M5 1 1 5l4 4"
+	/>
+</svg>
--- a/ChatQnA/ui/src/lib/assets/voice/svg/paste.svg
+++ b/ChatQnA/ui/src/lib/assets/voice/svg/paste.svg
@@ -0,0 +1 @@
+<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><svg t="1699596229588" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="20460" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32"><path d="M576 128a96 96 0 0 1 96 96v128h-224a96 96 0 0 0-95.84 90.368L352 448v224H224a96 96 0 0 1-96-96V224a96 96 0 0 1 96-96h352z" fill="#CCD9FF" p-id="20461"></path><path d="M576 96a128 128 0 0 1 128 128v128h-64V224a64 64 0 0 0-59.2-63.84L576 160H224a64 64 0 0 0-64 64v352a64 64 0 0 0 64 64h128v64H224a128 128 0 0 1-128-128V224a128 128 0 0 1 128-128z" fill="#3671FD" p-id="20462"></path><path d="M800 320H448a128 128 0 0 0-128 128v352a128 128 0 0 0 128 128h352a128 128 0 0 0 128-128V448a128 128 0 0 0-128-128z m-352 64h352a64 64 0 0 1 64 64v352a64 64 0 0 1-64 64H448a64 64 0 0 1-64-64V448a64 64 0 0 1 64-64z" fill="#3671FD" p-id="20463"></path><path d="M128 736a32 32 0 0 1 32 32 96 96 0 0 0 90.368 95.84L256 864a32 32 0 0 1 0 64 160 160 0 0 1-160-160 32 32 0 0 1 32-32z" fill="#FE9C23" p-id="20464"></path></svg>
--- a/ChatQnA/ui/src/lib/assets/voice/svg/uploadFile.svg
+++ b/ChatQnA/ui/src/lib/assets/voice/svg/uploadFile.svg
--- a/ChatQnA/ui/src/lib/modules/chat/ChatMessage.svelte
+++ b/ChatQnA/ui/src/lib/modules/chat/ChatMessage.svelte
@@ -0,0 +1,52 @@
+<script lang="ts">
+	import MessageAvatar from "$lib/modules/chat/MessageAvatar.svelte";
+	import type { Message } from "$lib/shared/constant/Interface";
+	import MessageTimer from "./MessageTimer.svelte";
+	import { createEventDispatcher } from "svelte";
+
+	let dispatch = createEventDispatcher();
+
+	export let msg: Message;
+	export let time: string = "";
+	console.log("msg", msg);
+</script>
+
+<div
+	class={msg.role === 0
+		? "flex w-full gap-3"
+		: "flex w-full items-center gap-3"}
+>
+	<div
+		class={msg.role === 0
+			? "flex aspect-square w-[3px]  items-center justify-center rounded bg-[#0597ff] max-sm:hidden"
+			: "flex aspect-square h-10 w-[3px] items-center justify-center rounded bg-[#000] max-sm:hidden"}
+	>
+		<MessageAvatar role={msg.role} />
+	</div>
+	<div class="group relative items-center">
+		<div>
+			<p
+				class=" max-w-[60vw] items-center whitespace-pre-line break-keep text-[0.8rem] leading-5 sm:max-w-[50rem]"
+			>
+				{@html msg.content}
+			</p>
+		</div>
+	</div>
+</div>
+{#if time}
+	<div>
+		<MessageTimer
+			{time}
+			on:handleTop={() => {
+				dispatch("scrollTop");
+			}}
+		/>
+	</div>
+{/if}
+
+<style>
+	.wrap-style {
+		word-wrap: break-word;
+		word-break: break-all;
+	}
+</style>
--- a/ChatQnA/ui/src/lib/modules/chat/MessageAvatar.svelte
+++ b/ChatQnA/ui/src/lib/modules/chat/MessageAvatar.svelte
@@ -0,0 +1,14 @@
+<script lang="ts">
+	import AssistantIcon from "$lib/assets/chat/svelte/Assistant.svelte";
+	import PersonOutlined from "$lib/assets/chat/svelte/PersonOutlined.svelte";
+	import { MessageRole } from "$lib/shared/constant/Interface";
+	export let role: MessageRole;
+
+
+</script>
+
+{#if role === MessageRole.User}
+	<PersonOutlined />
+{:else}
+	<AssistantIcon />
+{/if}
--- a/ChatQnA/ui/src/lib/modules/chat/MessageTimer.svelte
+++ b/ChatQnA/ui/src/lib/modules/chat/MessageTimer.svelte
@@ -0,0 +1,51 @@
+<script lang="ts">
+	export let time: string;
+	import { createEventDispatcher } from "svelte";
+
+	let dispatch = createEventDispatcher();
+</script>
+
+<div class="ml-2 flex flex-col">
+	<div class="my-4 flex items-center justify-end gap-2 space-x-2">
+		<div class="ml-2 w-min cursor-pointer" data-state="closed">
+			<!-- svelte-ignore a11y-click-events-have-key-events -->
+			<svg
+				xmlns="http://www.w3.org/2000/svg"
+				xml:space="preserve"
+				viewBox="0 0 21.6 21.6"
+				width="24"
+				height="24"
+				class="w-5 fill-[#0597ff] hover:fill-[#0597ff]"
+				on:click={() => {
+					dispatch("handleTop");
+				}}
+				><path
+					d="M2.2 3.6V.8h17.2v2.8zm7.2 17.2V10.4L5.8 14l-1.9-1.9 6.9-6.9 6.9 6.9-1.9 1.9-3.6-3.6v10.4z"
+				/></svg
+			>
+		</div>
+		<div
+			class="inline-block w-0.5 self-stretch bg-gray-300 opacity-100 dark:opacity-50"
+		/>
+		<div class="w-min cursor-pointer" data-state="closed">
+			<svg
+				xmlns="http://www.w3.org/2000/svg"
+				xml:space="preserve"
+				viewBox="0 0 21.6 21.6"
+				width="24"
+				height="24"
+				class="w-5 fill-[#0597ff] hover:fill-[#0597ff]"
+				><path d="M12.3 17.1V7.6H7.6v2.8h1.9v6.7H6.4v2.7h8.8v-2.7z" /><circle
+					cx="10.8"
+					cy="3.6"
+					r="1.9"
+				/></svg
+			>
+		</div>
+		<div class="flex items-center space-x-1 text-base text-gray-800">
+			<strong>End to End Time: </strong>
+			<p>{time}s</p>
+		</div>
+	</div>
+	<div class="ml-2 flex flex-col" />
+</div>
--- a/ChatQnA/ui/src/lib/modules/frame/Layout.svelte
+++ b/ChatQnA/ui/src/lib/modules/frame/Layout.svelte
@@ -0,0 +1,32 @@
+<script lang="ts">
+	import { onMount } from "svelte";
+	import { page } from "$app/stores";
+	import { browser } from "$app/environment";
+	import { open } from "$lib/shared/stores/common/Store";
+	import Scrollbar from "$lib/shared/components/scrollbar/Scrollbar.svelte";
+
+	let root: HTMLElement
+	onMount(() => {
+		document.getElementsByTagName("body").item(0)!.removeAttribute("tabindex");
+		// root.style.height = document.documentElement.clientHeight + 'px'
+	});
+
+	if (browser) {
+		page.subscribe(() => {
+			// close side navigation when route changes
+			if (window.innerWidth > 768) {
+				$open = true;
+			}
+		});
+	}
+</script>
+
+<div bind:this={root} class='h-full overflow-hidden relative'>
+	<div class="h-full flex items-start">
+		<div class='relative flex flex-col h-full pl-0 w-full  bg-white'>
+			<Scrollbar className="h-0 grow " classLayout="h-full" alwaysVisible={false}>
+				<slot />
+			</Scrollbar>
+		</div>
+	</div>
+</div>
--- a/ChatQnA/ui/src/lib/network/chat/Network.ts
+++ b/ChatQnA/ui/src/lib/network/chat/Network.ts
@@ -0,0 +1,24 @@
+import { env } from "$env/dynamic/public";
+import { SSE } from "sse.js";
+
+const DOC_BASE_URL = env.DOC_BASE_URL;
+
+
+export async function fetchTextStream(
+	query: string,
+	knowledge_base_id: string,
+) {
+	let payload = {};
+	let url = "";
+
+	payload = {
+		query: query,
+		knowledge_base_id: knowledge_base_id,
+	};
+	url = `${DOC_BASE_URL}/chat_stream`;
+
+	return new SSE(url, {
+		headers: { "Content-Type": "application/json" },
+		payload: JSON.stringify(payload),
+	});
+}
--- a/ChatQnA/ui/src/lib/network/upload/Network.ts
+++ b/ChatQnA/ui/src/lib/network/upload/Network.ts
@@ -0,0 +1,44 @@
+import { env } from "$env/dynamic/public";
+
+const DOC_BASE_URL = env.DOC_BASE_URL;
+
+export async function fetchKnowledgeBaseId(file: Blob, fileName: string) {
+  const url = `${DOC_BASE_URL}/create`;
+  const formData = new FormData();
+  formData.append("file", file, fileName);
+  const init: RequestInit = {
+    method: "POST",
+    body: formData,
+  };
+
+  try {
+    const response = await fetch(url, init);
+    if (!response.ok) throw response.status;
+    return await response.json();
+  } catch (error) {
+    console.error("network error: ", error);
+    return undefined;
+  }
+}
+
+
+export async function fetchKnowledgeBaseIdByPaste(pasteUrlList: any, urlType: string | undefined) {
+  const url = `${DOC_BASE_URL}/upload_link`;
+  const data = {
+    link_list: pasteUrlList,
+  };
+  const init: RequestInit = {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(data),
+  };
+
+  try {
+    const response = await fetch(url, init);
+    if (!response.ok) throw response.status;
+    return await response.json();
+  } catch (error) {
+    console.error("network error: ", error);
+    return undefined;
+  }
+}
--- a/ChatQnA/ui/src/lib/shared/Utils.ts
+++ b/ChatQnA/ui/src/lib/shared/Utils.ts
@@ -0,0 +1,43 @@
+export function scrollToBottom(scrollToDiv: HTMLElement) {
+    if (scrollToDiv) {
+        setTimeout(
+            () =>
+                scrollToDiv.scroll({
+                    behavior: "auto",
+                    top: scrollToDiv.scrollHeight,
+                }),
+            100
+        );
+    }
+}
+
+export function scrollToTop(scrollToDiv: HTMLElement) {
+    if (scrollToDiv) {
+        setTimeout(
+            () =>
+                scrollToDiv.scroll({
+                    behavior: "auto",
+                    top: 0,
+                }),
+            100
+        );
+    }
+}
+
+export function getCurrentTimeStamp() {
+    return Math.floor(new Date().getTime())
+}
+
+
+export function fromTimeStampToTime(timeStamp: number) {
+    return new Date(timeStamp * 1000).toTimeString().slice(0, 8)
+}
+
+
+export function formatTime(seconds) {
+    const hours = String(Math.floor(seconds / 3600)).padStart(2, '0');
+    const minutes = String(Math.floor((seconds % 3600) / 60)).padStart(2, '0');
+    const remainingSeconds = String(seconds % 60).padStart(2, '0');
+    return `${hours}:${minutes}:${remainingSeconds}`;
+}
+
--- a/ChatQnA/ui/src/lib/shared/components/chat/gallery.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/chat/gallery.svelte
@@ -0,0 +1,140 @@
+<script lang="ts">
+	import Scrollbar from "$lib/shared/components/scrollbar/Scrollbar.svelte";
+	import ChatMessage from "$lib/modules/chat/ChatMessage.svelte";
+	import "driver.js/dist/driver.css";
+	import "$lib/assets/layout/css/driver.css";
+	import Previous from "$lib/assets/upload/previous.svelte";
+	import Next from "$lib/assets/upload/next.svelte";
+	import { scrollToBottom } from "$lib/shared/Utils";
+	import { onMount } from "svelte";
+
+	let scrollToDiv: HTMLDivElement;
+
+	export let items;
+	export let label: string;
+	export let scrollName: string;	
+
+	onMount(async () => {
+		scrollToDiv = document
+			.querySelector(scrollName)
+			?.querySelector(".svlr-viewport")!;
+		console.log(
+			"scrollToDiv",
+			scrollName,
+			document,
+			document.querySelector("chat-scrollbar1")
+		);
+	});
+	// gallery
+	let currentIndex = 0;
+
+	function nextItem() {
+		currentIndex = (currentIndex + 1) % items.length;
+		console.log("nextItem", currentIndex);
+	}
+
+	function prevItem() {
+		currentIndex = (currentIndex - 1 + items.length) % items.length;
+		console.log("prevItem", currentIndex);
+	}
+
+	$: currentItem = items[currentIndex];
+
+	$: {
+		if (items) {
+			scrollToBottom(scrollToDiv);
+		}
+	}
+	// gallery
+</script>
+
+<div
+	id="custom-controls-gallery"
+	class="relative mb-8 h-0 w-full w-full grow px-2 {scrollName}"
+	data-carousel="slide"
+>
+	<!-- Carousel wrapper -->
+	<!-- Display current item -->
+	{#if currentItem}
+		<Scrollbar
+			classLayout="flex flex-col gap-5"
+			className="  h-0 w-full grow px-2 mt-3 ml-10"
+		>
+			{#each currentItem.content as message, i}
+				<ChatMessage msg={message} />
+			{/each}
+		</Scrollbar>
+		<!-- Loading text -->
+	{/if}
+
+	<div class="radius absolute left-0 p-2">
+		<!-- Display end to end time -->
+		<label for="" class="mr-2 text-xs font-bold text-blue-700">{label} </label>
+	</div>
+	{#if currentItem.time !== "0s"}
+		<div class="radius absolute right-0 p-2">
+			<!-- Display end to end time -->
+			<label for="" class="mr-2 text-xs font-bold text-blue-700"
+				>End to End Time:
+			</label>
+			<label for="" class="text-xs">{currentItem.time}</label>
+		</div>
+	{/if}
+	<div class="flex items-center justify-between">
+		<div class="justify-left ml-2 flex items-center">
+			<!-- Previous button -->
+			<button
+				type="button"
+				class="group absolute start-0 top-0 z-30 flex h-full
+									cursor-pointer items-center justify-center
+									focus:outline-none"
+				on:click={prevItem}
+			>
+				<span
+					class="group-focus:ring-gray dark:group-hover:bg-[#000]-800/60 dark:group-focus:ring-[#000]-800/70 inline-flex h-7
+										 w-7 items-center justify-center
+										 rounded-full bg-[#000]/10
+										 group-hover:bg-[#000]/50 group-focus:bg-[#000]/50
+										 group-focus:outline-none
+										 group-focus:ring-4 dark:bg-gray-800/30"
+				>
+					<Previous />
+					<span class="sr-only">Previous</span>
+				</span>
+			</button>
+			<!-- Next button -->
+
+			<button
+				type="button"
+				class="group absolute end-0 top-0 z-30 flex h-full cursor-pointer items-center justify-center focus:outline-none"
+				on:click={nextItem}
+			>
+				<span
+					class="group-focus:ring-gray dark:group-hover:bg-[#000]-800/60 dark:group-focus:ring-[#000]-800/70 inline-flex h-7
+									w-7 items-center justify-center
+									rounded-full bg-[#000]/10
+									group-hover:bg-[#000]/50 group-focus:bg-[#000]/50
+									group-focus:outline-none
+									group-focus:ring-4 dark:bg-gray-800/30"
+				>
+					<Next />
+					<span class="sr-only">Next</span>
+				</span>
+			</button>
+		</div>
+	</div>
+</div>
+
+<style>
+	.row::-webkit-scrollbar {
+		display: none;
+	}
+
+	.row {
+		scrollbar-width: none;
+	}
+
+	.row {
+		-ms-overflow-style: none;
+	}
+</style>
--- a/ChatQnA/ui/src/lib/shared/components/loading/Loading.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/loading/Loading.svelte
@@ -0,0 +1,32 @@
+<div
+	class="mb-6 flex items-center justify-center self-center bg-black text-sm text-gray-500"
+/>
+<div class="flex items-center justify-center gap-3">
+	<div class="relative inline-flex">
+		<div class="h-2 w-2 rounded-full bg-blue-600" />
+		<div
+			class="absolute left-0 top-0 h-2 w-2 animate-[ping_1s_infinite_100ms] rounded-full bg-blue-600"
+		/>
+		<div
+			class="duration-800 absolute left-0 top-0 h-2 w-2 animate-pulse rounded-full bg-blue-600"
+		/>
+	</div>
+	<div class="relative inline-flex">
+		<div class="h-2 w-2 rounded-full bg-blue-600" />
+		<div
+			class="absolute left-0 top-0 h-2 w-2 animate-[ping_1s_infinite_300ms] rounded-full bg-blue-600"
+		/>
+		<div
+			class="absolute left-0 top-0 h-2 w-2 animate-pulse rounded-full bg-blue-600"
+		/>
+	</div>
+	<div class="relative inline-flex">
+		<div class="h-2 w-2 rounded-full bg-blue-600" />
+		<div
+			class="absolute left-0 top-0 h-2 w-2 animate-[ping_1s_infinite_500ms] rounded-full bg-blue-600"
+		/>
+		<div
+			class="absolute left-0 top-0 h-2 w-2 animate-pulse rounded-full bg-blue-600"
+		/>
+	</div>
+</div>
--- a/ChatQnA/ui/src/lib/shared/components/scrollbar/Scrollbar.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/scrollbar/Scrollbar.svelte
@@ -0,0 +1,32 @@
+<script lang="ts">
+	import { Svroller } from "svrollbar";
+    export let className: string = "";
+    export let classLayout: string = "";
+    export let alwaysVisible = true;
+</script>
+
+<div class={className}>
+    <Svroller height="100%" width="100%" {alwaysVisible}>
+        <div class={classLayout}>
+            <slot></slot>
+        </div>
+    </Svroller>	
+</div>
+
+<style>
+    :global(.svlr-contents) {
+        height: 100%;
+    }
+
+    .row::-webkit-scrollbar {
+		display: none;
+	}
+
+	.row {
+		scrollbar-width: none;
+	}
+
+	.row {
+		-ms-overflow-style: none;
+	}
+</style>
--- a/ChatQnA/ui/src/lib/shared/components/upload/PasteKnowledge.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/upload/PasteKnowledge.svelte
@@ -0,0 +1,33 @@
+<script lang="ts">
+	import { Button, Helper, Input, Label, Modal } from "flowbite-svelte";
+	import { createEventDispatcher } from "svelte";
+
+	const dispatch = createEventDispatcher();
+	let formModal = false;
+	let urlValue = "";
+
+	function handelPasteURL() {
+		const pasteUrlList = urlValue.split(";").map((url) => url.trim());
+		dispatch("paste", { pasteUrlList });
+		formModal = false;
+	}
+</script>
+
+<Label class="space-y-1">
+	<div class="grid grid-cols-3">
+		<Input
+			class="col-span-2 rounded-none rounded-l-lg focus:border-blue-700 focus:ring-blue-700"
+			type="text"
+			name="text"
+			placeholder="URL"
+			bind:value={urlValue}
+		/>
+		<Button
+			type="submit"
+			class="w-full rounded-none rounded-r-lg bg-blue-700"
+			on:click={() => handelPasteURL()}>Confirm</Button
+		>
+	</div>
+
+	<Helper>Use semicolons (;) to separate multiple URLs.</Helper>
+</Label>
--- a/ChatQnA/ui/src/lib/shared/components/upload/upload-knowledge.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/upload/upload-knowledge.svelte
@@ -0,0 +1,32 @@
+<script lang="ts">
+	import { Fileupload, Label } from "flowbite-svelte";
+	import { createEventDispatcher } from "svelte";
+  
+	const dispatch = createEventDispatcher();
+	let value;
+  
+	function handleInput(event: Event) {
+	  const file = (event.target as HTMLInputElement).files![0];
+  
+	  if (!file) return;
+  
+	  const reader = new FileReader();
+	  reader.onloadend = () => {
+		if (!reader.result) return;
+		const src = reader.result.toString();
+		dispatch("upload", { src: src, fileName: file.name });
+	  };
+	  reader.readAsDataURL(file);
+	}
+  
+</script>
+  
+  <div>
+	<Label class="space-y-2 mb-2">
+	  <Fileupload
+		bind:value
+		on:change={handleInput}
+		class="focus:border-blue-700 foucs:ring-0"
+	  />
+	</Label>
+  </div>
--- a/ChatQnA/ui/src/lib/shared/components/upload/uploadFile.svelte
+++ b/ChatQnA/ui/src/lib/shared/components/upload/uploadFile.svelte
@@ -0,0 +1,151 @@
+<script lang="ts">
+	import { Drawer, Button, CloseButton, Tabs, TabItem } from "flowbite-svelte";
+	import { InfoCircleSolid } from "flowbite-svelte-icons";
+	import { sineIn } from "svelte/easing";
+	import UploadFile from "./upload-knowledge.svelte";
+	import PasteURL from "./PasteKnowledge.svelte";
+	import {
+		knowledge1,
+		knowledgeName,
+	} from "$lib/shared/stores/common/Store";
+	import DeleteIcon from "$lib/assets/avatar/svelte/Delete.svelte";
+	import { getNotificationsContext } from "svelte-notifications";
+	import {
+		fetchKnowledgeBaseId,
+		fetchKnowledgeBaseIdByPaste,
+	} from "$lib/network/upload/Network";
+
+	const { addNotification } = getNotificationsContext();
+	console.log("allKnowledges", $knowledgeName);
+
+	let hidden6 = true;
+	let selectKnowledge = -1;
+	let transitionParamsRight = {
+		x: 320,
+		duration: 200,
+		easing: sineIn,
+	};
+
+	async function handleKnowledgePaste(
+		e: CustomEvent<{ pasteUrlList: string[] }>
+	) {
+		let knowledge_id = "";
+		// let knowledge_id2 = "";
+		try {
+			const pasteUrlList = e.detail.pasteUrlList;
+			const res = await fetchKnowledgeBaseIdByPaste(pasteUrlList, "url1");
+			// sihan
+			knowledge_id = res.knowledge_base_id ? res.knowledge_base_id : "default";
+		} catch {
+			knowledge_id = "default";
+		}
+		knowledge1.set({ id: knowledge_id });
+		knowledgeName.set('knowledge_base');
+
+		addNotification({
+			text: "Uploaded successfully",
+			position: "top-left",
+			type: "success",
+			removeAfter: 3000,
+		});
+	}
+
+	async function handleKnowledgeUpload(e: CustomEvent<any>) {
+		let knowledge_id = "";
+		// let knowledge_id2 = "";
+		try {
+			const blob = await fetch(e.detail.src).then((r) => r.blob());
+			const fileName = e.detail.fileName;
+			// letong
+			const res = await fetchKnowledgeBaseId(blob, fileName);
+			// sihan
+			knowledge_id = res.knowledge_base_id ? res.knowledge_base_id : "default";
+			// knowledge_id2 = res2.knowledge_base_id ? res2.knowledge_base_id : "default";
+			console.log("knowledge_id", knowledge_id);
+		} catch {
+			knowledge_id = "default";
+			// knowledge_id2 = "default";
+		}
+		knowledge1.set({ id: knowledge_id });
+		knowledgeName.set(e.detail.fileName);
+		addNotification({
+			text: "Uploaded successfully",
+			position: "top-left",
+			type: "success",
+			removeAfter: 3000,
+		});
+	}
+
+	function handleKnowledgeDelete() {
+		knowledge1.set({ id: "default" });
+		knowledgeName.set("");
+	}
+</script>
+
+<div class="text-center">
+	<Button
+		on:click={() => (hidden6 = false)}
+		class="bg-transparent focus-within:ring-gray-300 hover:bg-transparent focus:ring-0"
+	>
+		<svg
+			aria-hidden="true"
+			class="h-7 w-7 text-blue-700"
+			fill="none"
+			stroke="currentColor"
+			viewBox="0 0 24 24"
+			xmlns="http://www.w3.org/2000/svg"
+			><path
+				stroke-linecap="round"
+				stroke-linejoin="round"
+				stroke-width="2"
+				d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12"
+			/></svg
+		>
+	</Button>
+</div>
+
+<Drawer
+	backdrop={false}
+	placement="right"
+	transitionType="fly"
+	transitionParams={transitionParamsRight}
+	bind:hidden={hidden6}
+	class=" shadow border-2 border-r-0 border-b-0"
+	id="sidebar6"
+>
+	<div class="flex items-center">
+		<h5
+			id="drawer-label"
+			class="mb-4 inline-flex items-center text-base font-semibold text-gray-500 dark:text-gray-400"
+		>
+			<InfoCircleSolid class="me-2.5 h-4 w-4" />Data Source
+		</h5>
+		<CloseButton
+			on:click={() => (hidden6 = true)}
+			class="mb-4 dark:text-white"
+		/>
+	</div>
+	<p class="mb-6 text-sm text-gray-500 dark:text-gray-400">
+		Please upload your local file or paste a remote file link, and Chat will
+		respond based on the content of the uploaded file.
+	</p>
+	<Tabs
+		style="full"
+		defaultClass="flex rounded-lg divide-x rtl:divide-x-reverse divide-gray-200 shadow dark:divide-gray-700 foucs:ring-0"
+	>
+		<TabItem class="w-full" open>
+			<span slot="title">Upload File</span>
+			<UploadFile on:upload={handleKnowledgeUpload} />
+		</TabItem>
+		<TabItem class="w-full">
+			<span slot="title">Paste Link</span>
+			<PasteURL on:paste={handleKnowledgePaste} />
+		</TabItem>
+	</Tabs>
+	{#if ($knowledgeName) && ($knowledgeName !== "")}
+		<div class="relative">
+			<p class="border-b p-6 pb-2">{$knowledgeName}</p>
+			<DeleteIcon on:DeleteAvatar={() => handleKnowledgeDelete()} />
+		</div>
+	{/if}
+</Drawer>
--- a/ChatQnA/ui/src/lib/shared/constant/Interface.ts
+++ b/ChatQnA/ui/src/lib/shared/constant/Interface.ts
@@ -0,0 +1,25 @@
+export enum MessageRole {
+	Assistant, User
+}
+
+export enum MessageType {
+	Text, SingleAudio, AudioList, SingleImage, ImageList, singleVideo
+}
+
+
+type Map<T> = T extends MessageType.Text | MessageType.SingleAudio ? string :
+				T extends MessageType.AudioList ? string[] :
+				T extends MessageType.SingleImage ? { imgSrc: string; imgId: string; } : 
+				{ imgSrc: string; imgId: string; }[];
+
+export interface Message {
+	role: MessageRole,
+	type: MessageType,
+	content: Map<Message['type']>,
+	time: number,
+}
+
+export enum LOCAL_STORAGE_KEY {
+	STORAGE_CHAT_KEY = 'chatMessages',
+	STORAGE_TIME_KEY = 'initTime',
+}
--- a/ChatQnA/ui/src/lib/shared/stores/common/Store.ts
+++ b/ChatQnA/ui/src/lib/shared/stores/common/Store.ts
@@ -0,0 +1,25 @@
+import { writable } from "svelte/store";
+
+export let open = writable(true);
+
+export let knowledgeAccess = writable(true);
+
+export let showTemplate = writable(false);
+
+export let showSidePage = writable(false);
+
+export let droppedObj = writable({});
+
+export let isLoading = writable(false);
+
+export let newUploadNum = writable(0);
+
+export let ifStoreMsg = writable(true);
+
+export const resetControl = writable(false);
+
+export const knowledge1 = writable<{
+	id: string;
+}>();
+
+export const knowledgeName = writable("");
--- a/ChatQnA/ui/src/routes/+layout.svelte
+++ b/ChatQnA/ui/src/routes/+layout.svelte
@@ -0,0 +1,32 @@
+<script>
+	import "tailwindcss/tailwind.css";
+	import "../app.postcss";
+	import Notifications from "svelte-notifications";
+	import Layout from "$lib/modules/frame/Layout.svelte";
+	import { onMount } from "svelte";
+
+	onMount(() => {
+		window.deviceType = window.innerWidth > 640 ? "pc" : "mobile";
+		window.onresize = () => {
+			window.deviceType = window.innerWidth > 640 ? "pc" : "mobile";
+		};
+		window.addEventListener("load", function () {
+			setTimeout(function () {
+				// This hides the address bar:
+				window.scrollTo(0, 1);
+			}, 0);
+		});
+
+	});
+</script>
+
+<Notifications>
+	<Layout>
+		<div class="flex h-full flex-col">
+			<div class="h-0 grow bg-white  lg:rounded-tl-3xl">
+				<slot />
+			</div>
+		</div>
+
+	</Layout>
+</Notifications>
--- a/ChatQnA/ui/src/routes/+page.svelte
+++ b/ChatQnA/ui/src/routes/+page.svelte
@@ -0,0 +1,249 @@
+<script lang="ts">
+	export let data;
+	import { ifStoreMsg, knowledge1 } from "$lib/shared/stores/common/Store";
+	import { onMount } from "svelte";
+	import {
+		LOCAL_STORAGE_KEY,
+		MessageRole,
+		MessageType,
+		type Message,
+	} from "$lib/shared/constant/Interface";
+	import {
+		fromTimeStampToTime,
+		getCurrentTimeStamp,
+		scrollToBottom,
+		scrollToTop,
+	} from "$lib/shared/Utils";
+	import { fetchTextStream } from "$lib/network/chat/Network";
+	import LoadingAnimation from "$lib/shared/components/loading/Loading.svelte";
+	import { browser } from "$app/environment";
+	import "driver.js/dist/driver.css";
+	import "$lib/assets/layout/css/driver.css";
+	import UploadFile from "$lib/shared/components/upload/uploadFile.svelte";
+	import PaperAirplane from "$lib/assets/chat/svelte/PaperAirplane.svelte";
+	import Gallery from "$lib/shared/components/chat/gallery.svelte";
+	import Scrollbar from "$lib/shared/components/scrollbar/Scrollbar.svelte";
+	import ChatMessage from "$lib/modules/chat/ChatMessage.svelte";
+
+	let query: string = "";
+	let loading: boolean = false;
+	let scrollToDiv: HTMLDivElement;
+	// ·········
+	let chatMessages: Message[] = data.chatMsg ? data.chatMsg : [];
+	console.log("chatMessages", chatMessages);
+
+	// ··············
+
+	$: knowledge_1 = $knowledge1?.id ? $knowledge1.id : "default";
+
+	onMount(async () => {
+		scrollToDiv = document
+			.querySelector(".chat-scrollbar")
+			?.querySelector(".svlr-viewport")!;
+	});
+
+	function handleTop() {
+		console.log("top");
+
+		scrollToTop(scrollToDiv);
+	}
+
+	function storeMessages() {
+		console.log('localStorage', chatMessages);
+		
+		localStorage.setItem(
+			LOCAL_STORAGE_KEY.STORAGE_CHAT_KEY,
+			JSON.stringify(chatMessages)
+		);
+	}
+
+	const callTextStream = async (query: string) => {
+		const eventSource = await fetchTextStream(query, knowledge_1);
+
+		eventSource.addEventListener("message", (e: any) => {
+			let currentMsg = e.data;
+            currentMsg = currentMsg.replace("@#$", " ")
+			console.log("currentMsg", currentMsg);
+			if (currentMsg == "[DONE]") {
+				console.log("done getCurrentTimeStamp", getCurrentTimeStamp);
+				let startTime = chatMessages[chatMessages.length - 1].time;
+
+				loading = false;
+				let totalTime = parseFloat(((getCurrentTimeStamp() - startTime) / 1000).toFixed(2));
+				console.log("done totalTime", totalTime);
+				console.log(
+					"chatMessages[chatMessages.length - 1]",
+					chatMessages[chatMessages.length - 1]
+				);
+
+				if (chatMessages.length - 1 !== -1) {
+					chatMessages[chatMessages.length - 1].time = totalTime;
+				}
+				console.log("done chatMessages", chatMessages);
+
+				storeMessages();
+			} else {
+				if (chatMessages[chatMessages.length - 1].role == MessageRole.User) {
+					console.log("?", getCurrentTimeStamp());
+
+					chatMessages = [
+						...chatMessages,
+						{
+							role: MessageRole.Assistant,
+							type: MessageType.Text,
+							content: currentMsg,
+							time: getCurrentTimeStamp(),
+						},
+					];
+					console.log("? chatMessages", chatMessages);
+				} else {
+					let content = chatMessages[chatMessages.length - 1].content as string;
+					chatMessages[chatMessages.length - 1].content =
+						content + currentMsg;
+				}
+				scrollToBottom(scrollToDiv);
+			}
+		});
+		eventSource.stream();
+	};
+
+	const handleTextSubmit = async () => {
+		console.log("handleTextSubmit");
+
+		loading = true;
+		const newMessage = {
+			role: MessageRole.User,
+			type: MessageType.Text,
+			content: query,
+			time: 0,
+		};
+		chatMessages = [...chatMessages, newMessage];
+		scrollToBottom(scrollToDiv);
+		storeMessages();
+		query = "";
+
+		await callTextStream(newMessage.content);
+
+		scrollToBottom(scrollToDiv);
+		storeMessages();
+	};
+
+	function handelClearHistory() {
+		localStorage.removeItem(LOCAL_STORAGE_KEY.STORAGE_CHAT_KEY);
+		chatMessages = [];
+	}
+
+	function isEmptyObject(obj: any): boolean {
+		for (let key in obj) {
+			if (obj.hasOwnProperty(key)) {
+				return false;
+			}
+		}
+		return true;
+	}
+</script>
+
+<!-- <DropZone on:drop={handleImageSubmit}> -->
+<div
+	class="h-full items-center gap-5 bg-white sm:flex sm:pb-2 lg:rounded-tl-3xl"
+>
+	<div class="mx-auto flex h-full w-full flex-col sm:mt-0 sm:w-[72%]">
+		<div class="flex justify-between p-2">
+			<p class="text-[1.7rem] font-bold tracking-tight">ChatQnA</p>
+			<UploadFile />
+		</div>
+		<div
+			class="fixed relative flex w-full flex-col items-center justify-between bg-white p-2 pb-0"
+		>
+			<div class="relative my-4 flex w-full flex-row justify-center">
+				<div class="foucs:border-none relative w-full">
+					<input
+						class="text-md block w-full border-0 border-b-2 border-gray-300 px-1 py-4
+						text-gray-900 focus:border-gray-300 focus:ring-0 dark:border-gray-600 dark:bg-gray-700 dark:text-white dark:placeholder-gray-400 dark:focus:border-blue-500 dark:focus:ring-blue-500"
+						type="text"
+						placeholder="Enter prompt here"
+						disabled={loading}
+						maxlength="1200"
+						bind:value={query}
+						on:keydown={(event) => {
+							if (event.key === "Enter" && !event.shiftKey && query) {
+								event.preventDefault();
+								handleTextSubmit();
+							}
+						}}
+					/>
+					<button
+						on:click={() => {
+							if (query) {
+								handleTextSubmit();
+							}
+						}}
+						type="submit"
+						class="absolute bottom-2.5 end-2.5 px-4 py-2 text-sm font-medium text-white dark:bg-blue-600 dark:hover:bg-blue-700 dark:focus:ring-blue-800"
+						><PaperAirplane /></button
+					>
+				</div>
+			</div>
+		</div>
+
+		<!-- clear -->
+		{#if Array.isArray(chatMessages) && chatMessages.length > 0 && !loading}
+			<div class="flex w-full justify-between pr-5">
+				<div class="flex items-center">
+					<button
+						class="bg-primary text-primary-foreground hover:bg-primary/90 group flex items-center justify-center space-x-2 p-2"
+						type="button"
+						on:click={() => handelClearHistory()}
+						><svg
+							xmlns="http://www.w3.org/2000/svg"
+							viewBox="0 0 20 20"
+							width="24"
+							height="24"
+							class="fill-[#0597ff] group-hover:fill-[#0597ff]"
+							><path
+								d="M12.6 12 10 9.4 7.4 12 6 10.6 8.6 8 6 5.4 7.4 4 10 6.6 12.6 4 14 5.4 11.4 8l2.6 2.6zm7.4 8V2q0-.824-.587-1.412A1.93 1.93 0 0 0 18 0H2Q1.176 0 .588.588A1.93 1.93 0 0 0 0 2v12q0 .825.588 1.412Q1.175 16 2 16h14zm-3.15-6H2V2h16v13.125z"
+							/></svg
+						><span class="font-medium text-[#0597ff]">CLEAR</span></button
+					>
+				</div>
+			</div>
+		{/if}
+		<!-- clear -->
+
+		<div class="mx-auto flex h-full w-full flex-col">
+			<Scrollbar
+				classLayout="flex flex-col gap-1 mr-4"
+				className="chat-scrollbar h-0 w-full grow px-2 pt-2 mt-3 mr-5"
+			>
+				{#each chatMessages as message, i}
+					<ChatMessage
+						on:scrollTop={() => handleTop()}
+						msg={message}
+						time={i === 0 || (message.time > 0 && message.time < 100)
+							? message.time
+							: ""}
+					/>
+				{/each}
+			</Scrollbar>
+			<!-- Loading text -->
+			{#if loading}
+				<LoadingAnimation />
+			{/if}
+		</div>
+		<!-- gallery -->
+	</div>
+</div>
+
+<style>
+	.row::-webkit-scrollbar {
+		display: none;
+	}
+
+	.row {
+		scrollbar-width: none;
+	}
+
+	.row {
+		-ms-overflow-style: none;
+	}
+</style>
--- a/ChatQnA/ui/src/routes/+page.ts
+++ b/ChatQnA/ui/src/routes/+page.ts
@@ -0,0 +1,12 @@
+import { browser } from '$app/environment';
+import { LOCAL_STORAGE_KEY } from '$lib/shared/constant/Interface';
+
+export const load = async () => {
+  if (browser) {
+    const chat = localStorage.getItem(LOCAL_STORAGE_KEY.STORAGE_CHAT_KEY);
+    
+    return {
+      chatMsg: JSON.parse(chat || '[]')
+    }
+  }
+};
--- a/ChatQnA/ui/static/favicon.png
+++ b/ChatQnA/ui/static/favicon.png
--- a/ChatQnA/ui/svelte.config.js
+++ b/ChatQnA/ui/svelte.config.js
@@ -0,0 +1,25 @@
+import adapter from '@sveltejs/adapter-auto';
+import preprocess from 'svelte-preprocess';
+import postcssPresetEnv from 'postcss-preset-env';
+
+
+/** @type {import('@sveltejs/kit').Config} */
+const config = {
+	// Consult https://github.com/sveltejs/svelte-preprocess
+	// for more information about preprocessors
+	preprocess: preprocess({
+		sourceMap: true,
+		postcss: {
+			plugins: [postcssPresetEnv({ features: { 'nesting-rules': true } })]
+		}
+	}),
+
+	kit: {
+		adapter: adapter(),
+		env: {
+			publicPrefix: ''
+		}
+	}
+};
+
+export default config;
--- a/ChatQnA/ui/tailwind.config.cjs
+++ b/ChatQnA/ui/tailwind.config.cjs
@@ -0,0 +1,30 @@
+const config = {
+	content: ["./src/**/*.{html,js,svelte,ts}",
+    "./node_modules/flowbite-svelte/**/*.{html,js,svelte,ts}",],
+  
+	plugins: [require('flowbite/plugin')],
+  
+	darkMode: 'class',
+  
+	theme: {
+	  extend: {
+		colors: {
+		  // flowbite-svelte
+		  primary: {
+			50: '#FFF5F2',
+			100: '#FFF1EE',
+			200: '#FFE4DE',
+			300: '#FFD5CC',
+			400: '#FFBCAD',
+			500: '#FE795D',
+			600: '#EF562F',
+			700: '#EB4F27',
+			800: '#CC4522',
+			900: '#A5371B'
+		  }
+		}
+	  }
+	}
+  };
+  
+  module.exports = config;
--- a/ChatQnA/ui/tsconfig.json
+++ b/ChatQnA/ui/tsconfig.json
@@ -0,0 +1,17 @@
+{
+	"extends": "./.svelte-kit/tsconfig.json",
+	"compilerOptions": {
+		"allowJs": true,
+		"checkJs": true,
+		"esModuleInterop": true,
+		"forceConsistentCasingInFileNames": true,
+		"resolveJsonModule": true,
+		"skipLibCheck": true,
+		"sourceMap": true,
+		"strict": true
+	}
+	// Path aliases are handled by https://kit.svelte.dev/docs/configuration#alias
+	//
+	// If you want to overwrite includes/excludes, make sure to copy over the relevant includes/excludes
+	// from the referenced tsconfig.json - TypeScript does not merge them in
+}
--- a/ChatQnA/ui/vite.config.ts
+++ b/ChatQnA/ui/vite.config.ts
@@ -0,0 +1,10 @@
+import { sveltekit } from '@sveltejs/kit/vite';
+import type { UserConfig } from 'vite';
+
+
+const config: UserConfig = {
+	plugins: [sveltekit()],
+	server: {}
+};
+
+export default config;
--- a/CodeGen/README.md
+++ b/CodeGen/README.md
@@ -0,0 +1,136 @@
+Code generation is a noteworthy application of Large Language Model (LLM) technology. In this example, we present a Copilot application to showcase how code generation can be executed on the Intel Gaudi2 platform. This CodeGen use case involves code generation utilizing open source models such as "m-a-p/OpenCodeInterpreter-DS-6.7B", "deepseek-ai/deepseek-coder-33b-instruct" and Text Generation Inference on Intel Gaudi2.
+
+
+# Environment Setup
+To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Intel Gaudi2, please follow these steps:
+
+## Build TGI Gaudi Docker Image
+```bash
+bash ./tgi_gaudi/build_docker.sh
+```
+
+## Launch TGI Gaudi Service
+
+### Launch a local server instance on 1 Gaudi card:
+```bash
+bash ./tgi_gaudi/launch_tgi_service.sh
+```
+
+### Launch a local server instance on 4 Gaudi cards:
+```bash
+bash ./tgi_gaudi/launch_tgi_service.sh 4 9000 "deepseek-ai/deepseek-coder-33b-instruct"
+```
+
+### Customize TGI Gaudi Service
+
+The ./tgi_gaudi/launch_tgi_service.sh script accepts three parameters:
+- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1.
+- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080.
+- model_name: The model name utilized for LLM, with the default set to "m-a-p/OpenCodeInterpreter-DS-6.7B".
+
+You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`:
+```bash
+export TGI_ENDPOINT="xxx.xxx.xxx.xxx:8080"
+```
+
+## Launch Copilot Docker
+
+### Build Copilot Docker Image
+
+```bash
+cd codegen
+bash ./build_docker.sh
+cd ..
+```
+
+### Lanuch Copilot Docker
+
+```bash
+docker run -it --net=host --ipc=host -v /var/run/docker.sock:/var/run/docker.sock copilot:latest
+```
+
+
+# Start Copilot Server
+
+## Start the Backend Service
+Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service:
+
+Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
+
+
+```bash
+export HUGGINGFACEHUB_API_TOKEN=<token>
+nohup python server.py &
+```
+
+
+## Install Copilot VSCode extension offline
+
+Copy the vsix file `copilot-0.0.1.vsix` to local and install it in VSCode as below.
+
+![Install-screenshot](https://i.imgur.com/JXQ3rqE.jpg)
+
+We will be also releasing the plugin in Visual Studio Code plugin market to facilitate the installation.
+
+# How to use
+
+## Service URL setting
+
+Please adjust the service URL in the extension settings based on the endpoint of the code generation backend service.
+
+![Setting-screenshot](https://i.imgur.com/4hjvKPu.png)
+![Setting-screenshot](https://i.imgur.com/JfJVFV3.png)
+
+## Customize
+
+The Copilot enables users to input their corresponding sensitive information and tokens in the user settings according to their own needs. This customization enhances the accuracy and output content to better meet individual requirements.
+
+![Customize](https://i.imgur.com/PkObak9.png)
+
+## Code suggestion
+
+To trigger inline completion, you'll need to type # {your keyword} (start with your programming language's comment keyword, like // in C++ and # in python). Make sure Inline Suggest is enabled from the VS Code Settings.
+For example:
+
+![code suggestion](https://i.imgur.com/sH5UoTO.png)
+
+To provide programmers with a smooth experience, the Copilot supports multiple ways to trigger inline code suggestions. If you are interested in the details, they are summarized as follows:
+- Generate code from single-line comments: The simplest way introduced before.
+- Generate code from consecutive single-line comments:
+
+![codegen from single-line comments](https://i.imgur.com/GZsQywX.png)
+
+
+- Generate code from multi-line comments, which will not be triggered until there is at least one `space` outside the multi-line comment):
+
+![codegen from multi-line comments](https://i.imgur.com/PzhiWrG.png)
+
+- Automatically complete multi-line comments:
+
+![auto complete](https://i.imgur.com/cJO3PQ0.jpg)
+
+## Chat with AI assistant
+
+You can start a conversation with the AI programming assistant by clicking on the robot icon in the plugin bar on the left:
+![icon](https://i.imgur.com/f7rzfCQ.png)
+
+
+Then you can see the conversation window on the left, where you can chat with AI assistant:
+![dialog](https://i.imgur.com/aiYzU60.png)
+
+There are 4 areas worth noting:
+- Enter and submit your question
+- Your previous questions 
+- Answers from AI assistant (Code will be highlighted properly according to the programming language it is written in, also support streaming output)
+- Copy or replace code with one click (Note that you need to select the code in the editor first and then click "replace", otherwise the code will be inserted)
+
+You can also select the code in the editor and ask AI assistant question about it.
+For example:
+
+- Select code
+
+![select code](https://i.imgur.com/grvrtY6.png)
+
+- Ask question and get answer
+
+![qna](https://i.imgur.com/8Kdpld7.png)
--- a/CodeGen/codegen/Dockerfile
+++ b/CodeGen/codegen/Dockerfile
@@ -0,0 +1,25 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM langchain/langchain
+RUN apt-get update && apt-get -y install libgl1-mesa-glx
+RUN pip install -U langchain-cli pydantic==1.10.13
+RUN pip install langchain==0.1.11
+RUN pip install shortuuid
+RUN pip install huggingface_hub
+RUN mkdir -p /ws
+ENV PYTHONPATH=/ws
+COPY codegen-app /codegen-app
+WORKDIR /codegen-app
+CMD ["/bin/bash"]
--- a/CodeGen/codegen/build_docker.sh
+++ b/CodeGen/codegen/build_docker.sh
@@ -0,0 +1,17 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/bin/bash
+
+docker build . -t copilot:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
--- a/CodeGen/codegen/codegen-app/openai_protocol.py
+++ b/CodeGen/codegen/codegen-app/openai_protocol.py
@@ -0,0 +1,42 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2023 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Code source from FastChat's OpenAI protocol: 
+https://github.com/lm-sys/FastChat/blob/main/fastchat/protocol/openai_api_protocol.py
+"""
+from typing import Optional, List, Any, Union
+import time
+import shortuuid
+# pylint: disable=E0611
+from pydantic import BaseModel, Field
+
+class ChatCompletionRequest(BaseModel):
+    prompt: Union[str, List[Any]]
+    device: Optional[str] = 'cpu'
+    temperature: Optional[float] = 0.7
+    top_p: Optional[float] = 1.0
+    top_k: Optional[int] = 1
+    repetition_penalty: Optional[float] = 1.0
+    max_new_tokens: Optional[int] = 128
+    stream: Optional[bool] = False
+
+
+class ChatCompletionResponse(BaseModel):
+    id: str = Field(default_factory=lambda: f"chatcmpl-{shortuuid.random()}")
+    object: str = "chat.completion"
+    created: int = Field(default_factory=lambda: int(time.time()))
+    response: str
--- a/CodeGen/codegen/codegen-app/server.py
+++ b/CodeGen/codegen/codegen-app/server.py
@@ -0,0 +1,229 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import requests
+import json
+import types
+from concurrent import futures
+from typing import Optional
+from fastapi import FastAPI, APIRouter
+from fastapi.responses import RedirectResponse, StreamingResponse
+from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
+from langchain_community.llms import HuggingFaceEndpoint
+from langchain_core.pydantic_v1 import BaseModel
+from starlette.middleware.cors import CORSMiddleware
+from openai_protocol import ChatCompletionRequest, ChatCompletionResponse
+
+app = FastAPI()
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"])
+
+class CodeGenAPIRouter(APIRouter):
+    def __init__(self, entrypoint) -> None:
+        super().__init__()
+        self.entrypoint = entrypoint
+        print(f"[codegen - router] Initializing API Router, entrypoint={entrypoint}")
+
+        # Define LLM
+        self.llm = HuggingFaceEndpoint(
+            endpoint_url=entrypoint,
+            max_new_tokens=512,
+            top_k=10,
+            top_p=0.95,
+            typical_p=0.95,
+            temperature=0.01,
+            repetition_penalty=1.03,
+            streaming=True,
+        )
+        print("[codegen - router] LLM initialized.")
+
+    def is_generator(self, obj):
+        return isinstance(obj, types.GeneratorType)
+
+    def handle_chat_completion_request(self, request: ChatCompletionRequest):
+        try:
+            print(f"Predicting chat completion using prompt '{request.prompt}'")
+            buffered_texts = ""
+            if request.stream:
+                generator = self.llm(request.prompt, callbacks=[StreamingStdOutCallbackHandler()])
+                if not self.is_generator(generator):
+                    generator = (generator,)
+                def stream_generator():
+                    nonlocal buffered_texts
+                    for output in generator:
+                        yield f"data: {output}\n\n"
+                    yield f"data: [DONE]\n\n"
+                return StreamingResponse(stream_generator(), media_type="text/event-stream")
+            else:
+                response = self.llm(request.prompt)
+        except Exception as e:
+            print(f"An error occurred: {e}")
+        else:
+            print("Chat completion finished.")
+            return ChatCompletionResponse(response=response)
+
+tgi_endpoint = os.getenv("TGI_ENDPOINT", "http://localhost:8080")
+router = CodeGenAPIRouter(tgi_endpoint)
+
+app.include_router(router)
+
+def check_completion_request(request: BaseModel) -> Optional[str]:
+    if request.temperature is not None and request.temperature < 0:
+        return f"Param Error: {request.temperature} is less than the minimum of 0 --- 'temperature'"
+
+    if request.temperature is not None and request.temperature > 2:
+        return f"Param Error: {request.temperature} is greater than the maximum of 2 --- 'temperature'"
+
+    if request.top_p is not None and request.top_p < 0:
+        return f"Param Error: {request.top_p} is less than the minimum of 0 --- 'top_p'"
+
+    if request.top_p is not None and request.top_p > 1:
+        return f"Param Error: {request.top_p} is greater than the maximum of 1 --- 'top_p'"
+
+    if request.top_k is not None and (not isinstance(request.top_k, int)):
+        return f"Param Error: {request.top_k} is not valid under any of the given schemas --- 'top_k'"
+
+    if request.top_k is not None and request.top_k < 1:
+        return f"Param Error: {request.top_k} is greater than the minimum of 1 --- 'top_k'"
+
+    if request.max_new_tokens is not None and (not isinstance(request.max_new_tokens, int)):
+        return f"Param Error: {request.max_new_tokens} is not valid under any of the given schemas --- 'max_new_tokens'"
+
+    return None
+
+def filter_code_format(code):
+    language_prefixes = {
+        "go": "```go",
+        "c": "```c",
+        "cpp": "```cpp",
+        "java": "```java",
+        "python": "```python",
+        "typescript": "```typescript"
+    }
+    suffix = "\n```"
+
+    # Find the first occurrence of a language prefix
+    first_prefix_pos = len(code)
+    for prefix in language_prefixes.values():
+        pos = code.find(prefix)
+        if pos != -1 and pos < first_prefix_pos:
+            first_prefix_pos = pos + len(prefix) + 1
+
+    # Find the first occurrence of the suffix after the first language prefix
+    first_suffix_pos = code.find(suffix, first_prefix_pos + 1)
+
+    # Extract the code block
+    if first_prefix_pos != -1 and first_suffix_pos != -1:
+        return code[first_prefix_pos:first_suffix_pos]
+    elif first_prefix_pos != -1:
+        return code[first_prefix_pos:]
+
+    return code
+
+# router /v1/code_generation only supports non-streaming mode.
+@router.post("/v1/code_generation")
+async def code_generation_endpoint(chat_request: ChatCompletionRequest):
+    if router.use_deepspeed:
+        responses = []
+
+        def send_request(port):
+            try:
+                url = f'http://{router.host}:{port}/v1/code_generation'
+                response = requests.post(url, json=chat_request.dict())
+                response.raise_for_status()
+                json_response = json.loads(response.content)
+                cleaned_code = filter_code_format(json_response['response'])
+                chat_completion_response = ChatCompletionResponse(response=cleaned_code)
+                responses.append(chat_completion_response)
+            except requests.exceptions.RequestException as e:
+                print(f"Error sending/receiving on port {port}: {e}")
+
+        with futures.ThreadPoolExecutor(max_workers=router.world_size) as executor:
+            worker_ports = [router.port + i + 1 for i in range(router.world_size)]
+            executor.map(send_request, worker_ports)
+        if responses:
+            return responses[0]
+    else:
+        ret = check_completion_request(chat_request)
+        if ret is not None:
+            raise RuntimeError("Invalid parameter.")
+        return router.handle_chat_completion_request(chat_request)
+
+# router /v1/code_chat supports both non-streaming and streaming mode.
+@router.post("/v1/code_chat")
+async def code_chat_endpoint(chat_request: ChatCompletionRequest):
+    if router.use_deepspeed:
+        if chat_request.stream:
+            responses = []
+            def generate_stream(port):
+                url = f'http://{router.host}:{port}/v1/code_generation'
+                response = requests.post(url, json=chat_request.dict(), stream=True, timeout=1000)
+                responses.append(response)
+            with futures.ThreadPoolExecutor(max_workers=router.world_size) as executor:
+                worker_ports = [router.port + i + 1 for i in range(router.world_size)]
+                executor.map(generate_stream, worker_ports)
+
+            while not responses:
+                pass
+            def generate():
+                if responses[0]:
+                    for chunk in responses[0].iter_lines(decode_unicode=False, delimiter=b"\0"):
+                        if chunk:
+                            yield f"data: {chunk}\n\n"
+                    yield f"data: [DONE]\n\n"
+
+            return StreamingResponse(generate(), media_type="text/event-stream")
+        else:
+            responses = []
+
+            def send_request(port):
+                try:
+                    url = f'http://{router.host}:{port}/v1/code_generation'
+                    response = requests.post(url, json=chat_request.dict())
+                    response.raise_for_status()
+                    json_response = json.loads(response.content)
+                    chat_completion_response = ChatCompletionResponse(response=json_response['response'])
+                    responses.append(chat_completion_response)
+                except requests.exceptions.RequestException as e:
+                    print(f"Error sending/receiving on port {port}: {e}")
+
+            with futures.ThreadPoolExecutor(max_workers=router.world_size) as executor:
+                worker_ports = [router.port + i + 1 for i in range(router.world_size)]
+                executor.map(send_request, worker_ports)
+            if responses:
+                return responses[0]
+    else:
+        ret = check_completion_request(chat_request)
+        if ret is not None:
+            raise RuntimeError("Invalid parameter.")
+        return router.handle_chat_completion_request(chat_request)
+
+@app.get("/")
+async def redirect_root_to_docs():
+    return RedirectResponse("/docs")
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+
--- a/CodeGen/copilot-0.0.1.vsix
+++ b/CodeGen/copilot-0.0.1.vsix
--- a/CodeGen/tgi_gaudi/build_docker.sh
+++ b/CodeGen/tgi_gaudi/build_docker.sh
@@ -0,0 +1,19 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/bin/bash
+
+git clone https://github.com/huggingface/tgi-gaudi.git
+cd ./tgi-gaudi/
+docker build -t tgi_gaudi_codegen . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
--- a/CodeGen/tgi_gaudi/launch_tgi_service.sh
+++ b/CodeGen/tgi_gaudi/launch_tgi_service.sh
@@ -0,0 +1,50 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/bin/bash
+
+# Set default values
+default_port=8080
+default_model="m-a-p/OpenCodeInterpreter-DS-6.7B"
+default_num_cards=1
+
+# Check if all required arguments are provided
+if [ "$#" -lt 0 ] || [ "$#" -gt 3 ]; then
+    echo "Usage: $0 [num_cards] [port_number] [model_name]"
+    exit 1
+fi
+
+# Assign arguments to variables
+num_cards=${1:-$default_num_cards}
+port_number=${2:-$default_port}
+model_name=${3:-$default_model}
+
+# Check if num_cards is within the valid range (1-8)
+if [ "$num_cards" -lt 1 ] || [ "$num_cards" -gt 8 ]; then
+    echo "Error: num_cards must be between 1 and 8."
+    exit 1
+fi
+
+# Set the volume variable
+volume=$PWD/data
+
+# Build the Docker run command based on the number of cards
+if [ "$num_cards" -eq 1 ]; then
+    docker_cmd="docker run -p $port_number:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi_codegen --model-id $model_name"
+else
+    docker_cmd="docker run -p $port_number:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy tgi_gaudi_codegen --model-id $model_name --sharded true --num-shard $num_cards"
+fi
+
+# Execute the Docker run command
+eval $docker_cmd
--- a/DocSum/README.md
+++ b/DocSum/README.md
@@ -0,0 +1,101 @@
+Text summarization is an NLP task that creates a concise and informative summary of a longer text. LLMs can be used to create summaries of news articles, research papers, technical documents, and other types of text. Suppose you have a set of documents (PDFs, Notion pages, customer questions, etc.) and you want to summarize the content. In this example use case, we use LangChain to apply some summarization strategies and run LLM inference using Text Generation Inference on Intel Gaudi2.
+
+# Environment Setup
+To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps:
+
+## Build TGI Gaudi Docker Image
+```bash
+bash ./serving/tgi_gaudi/build_docker.sh
+```
+
+## Launch TGI Gaudi Service
+
+### Launch a local server instance on 1 Gaudi card:
+```bash
+bash ./serving/tgi_gaudi/launch_tgi_service.sh
+```
+
+For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
+
+Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
+
+```bash
+export HUGGINGFACEHUB_API_TOKEN=<token>
+```
+
+### Launch a local server instance on 8 Gaudi cards:
+```bash
+bash ./serving/tgi_gaudi/launch_tgi_service.sh 8
+```
+
+### Customize TGI Gaudi Service
+
+The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters:
+- num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1.
+- port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080.
+- model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3".
+
+You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_ENDPOINT`:
+```bash
+export TGI_ENDPOINT="http://xxx.xxx.xxx.xxx:8080"
+```
+
+## Launch Document Summary Docker
+
+### Build Document Summary Docker Image
+
+```bash
+cd langchain/docker/
+bash ./build_docker.sh
+cd ../../
+```
+
+### Lanuch Document Summary Docker
+
+```bash
+docker run -it --net=host --ipc=host -v /var/run/docker.sock:/var/run/docker.sock document-summarize:latest
+```
+
+
+# Start Document Summary Server
+
+## Start the Backend Service
+Make sure TGI-Gaudi service is running. Launch the backend service:
+
+```bash
+export HUGGINGFACEHUB_API_TOKEN=<token>
+nohup python app/server.py &
+```
+
+## Start the Frontend Service
+
+Navigate to the "ui" folder and execute the following commands to start the fronend GUI:
+```bash
+cd ui
+sudo apt-get install npm && \
+    npm install -g n && \
+    n stable && \
+    hash -r && \
+    npm install -g npm@latest
+```
+
+For CentOS, please use the following commands instead:
+
+```bash
+curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash -
+sudo yum install -y nodejs
+```
+
+Update the `BASIC_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address.
+
+Run the following command to install the required dependencies:
+```bash
+npm install
+```
+
+Start the development server by executing the following command:
+```bash
+nohup npm run dev &
+```
+
+This will initiate the frontend service and launch the application.
--- a/DocSum/langchain/docker/Dockerfile
+++ b/DocSum/langchain/docker/Dockerfile
@@ -0,0 +1,35 @@
+FROM langchain/langchain
+
+ARG http_proxy
+ARG https_proxy
+ENV http_proxy=$http_proxy
+ENV https_proxy=$https_proxy
+
+RUN apt-get update && \
+    apt-get upgrade -y && \
+    apt-get install -y \
+        libgl1-mesa-glx \
+        libjemalloc-dev
+
+RUN pip install --upgrade pip \
+    sentence-transformers \
+    langchain-cli \
+    pydantic==1.10.13 \
+    langchain==0.1.12 \
+    poetry \
+    langchain_benchmarks \
+    pyarrow \
+    jupyter \
+    docx2txt \
+    pypdf \
+    beautifulsoup4 \
+    python-multipart \
+    intel-extension-for-pytorch \
+    intel-openmp
+
+ENV PYTHONPATH=/ws:/summarize-app/app
+
+COPY summarize-app /summarize-app
+WORKDIR /summarize-app
+
+CMD ["/bin/bash"]
--- a/DocSum/langchain/docker/build_docker.sh
+++ b/DocSum/langchain/docker/build_docker.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+docker build . -t document-summarize:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`DOC_BASE_URL = 'http://xxx.xxx.xxx.xxx:8000/v1/rag'`
				`@@ -0,0 +1 @@`
				<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"><svg t="1699596229588" class="icon" viewBox="0 0 1024 1024" version="1.1" xmlns="http://www.w3.org/2000/svg" p-id="20460" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32"><path d="M576 128a96 96 0 0 1 96 96v128h-224a96 96 0 0 0-95.84 90.368L352 448v224H224a96 96 0 0 1-96-96V224a96 96 0 0 1 96-96h352z" fill="#CCD9FF" p-id="20461"></path><path d="M576 96a128 128 0 0 1 128 128v128h-64V224a64 64 0 0 0-59.2-63.84L576 160H224a64 64 0 0 0-64 64v352a64 64 0 0 0 64 64h128v64H224a128 128 0 0 1-128-128V224a128 128 0 0 1 128-128z" fill="#3671FD" p-id="20462"></path><path d="M800 320H448a128 128 0 0 0-128 128v352a128 128 0 0 0 128 128h352a128 128 0 0 0 128-128V448a128 128 0 0 0-128-128z m-352 64h352a64 64 0 0 1 64 64v352a64 64 0 0 1-64 64H448a64 64 0 0 1-64-64V448a64 64 0 0 1 64-64z" fill="#3671FD" p-id="20463"></path><path d="M128 736a32 32 0 0 1 32 32 96 96 0 0 0 90.368 95.84L256 864a32 32 0 0 1 0 64 160 160 0 0 1-160-160 32 32 0 0 1 32-32z" fill="#FE9C23" p-id="20464"></path></svg>