Support Long context for DocSum (#1255)

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
2024-12-20 19:17:10 +08:00
parent 05365b6140
commit 50dd959d60
15 changed files with 861 additions and 267 deletions
--- a/DocSum/docker_compose/amd/gpu/rocm/compose.yaml
+++ b/DocSum/docker_compose/amd/gpu/rocm/compose.yaml
@@ -27,7 +27,7 @@ services:
    security_opt:
      - seccomp:unconfined
    ipc: host
-    command: --model-id ${DOCSUM_LLM_MODEL_ID}
+    command: --model-id ${DOCSUM_LLM_MODEL_ID} --max-input-length ${MAX_INPUT_TOKENS} --max-total-tokens ${MAX_TOTAL_TOKENS}

  docsum-llm-server:
    image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
@@ -53,6 +53,9 @@ services:
      https_proxy: ${https_proxy}
      TGI_LLM_ENDPOINT: "http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}"
      HUGGINGFACEHUB_API_TOKEN: ${DOCSUM_HUGGINGFACEHUB_API_TOKEN}
+      MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
+      MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
+      LLM_MODEL_ID: ${DOCSUM_LLM_MODEL_ID}
    restart: unless-stopped

  whisper:
--- a/DocSum/docker_compose/amd/gpu/rocm/set_env.sh
+++ b/DocSum/docker_compose/amd/gpu/rocm/set_env.sh
@@ -3,6 +3,8 @@
 # Copyright (C) 2024 Advanced Micro Devices, Inc.
 # SPDX-License-Identifier: Apache-2.0

+export MAX_INPUT_TOKENS=2048
+export MAX_TOTAL_TOKENS=4096
 export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
 export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
 export HOST_IP=${host_ip}
--- a/DocSum/docker_compose/intel/cpu/xeon/README.md
+++ b/DocSum/docker_compose/intel/cpu/xeon/README.md
@@ -223,11 +223,12 @@ You will have the following Docker Images:
   Text:

   ```bash
+   ## json input
   curl -X POST http://${host_ip}:8888/v1/docsum \
        -H "Content-Type: application/json" \
        -d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

-   # Use English mode (default).
+   # form input, use English mode (default).
   curl http://${host_ip}:8888/v1/docsum \
       -H "Content-Type: multipart/form-data" \
       -F "type=text" \
@@ -290,6 +291,93 @@ You will have the following Docker Images:
      -F "stream=true"
   ```

+7. MegaService with long context
+
+   If you want to deal with long context, can set following parameters and select suitable summary type.
+
+   - "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
+   - "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
+   - "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
+
+   **summary_type=auto**
+
+   "summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=auto"
+   ```
+
+   **summary_type=stuff**
+
+   In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=stuff"
+   ```
+
+   **summary_type=truncate**
+
+   Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=truncate"
+   ```
+
+   **summary_type=map_reduce**
+
+   Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=map_reduce"
+   ```
+
+   **summary_type=refine**
+
+   Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=refine"
+   ```
+
 ## 🚀 Launch the UI

 Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.
--- a/DocSum/docker_compose/intel/cpu/xeon/compose.yaml
+++ b/DocSum/docker_compose/intel/cpu/xeon/compose.yaml
@@ -2,9 +2,9 @@
 # SPDX-License-Identifier: Apache-2.0

 services:
-  tgi-service:
+  tgi-server:
    image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
-    container_name: tgi-service
+    container_name: tgi-server
    ports:
      - "8008:80"
    environment:
@@ -16,13 +16,13 @@ services:
    volumes:
      - "./data:/data"
    shm_size: 1g
-    command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0
+    command: --model-id ${LLM_MODEL_ID} --cuda-graphs 0  --max-input-length ${MAX_INPUT_TOKENS} --max-total-tokens ${MAX_TOTAL_TOKENS}

  llm-docsum-tgi:
    image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
    container_name: llm-docsum-server
    depends_on:
-      - tgi-service
+      - tgi-server
    ports:
      - "9000:9000"
    ipc: host
@@ -32,11 +32,15 @@ services:
      https_proxy: ${https_proxy}
      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
+      MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
+      LLM_MODEL_ID: ${LLM_MODEL_ID}
+      LOGFLAG: True
    restart: unless-stopped

  whisper:
    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
-    container_name: whisper-service
+    container_name: whisper-server
    ports:
      - "7066:7066"
    ipc: host
@@ -48,7 +52,7 @@ services:

  dataprep-audio2text:
    image: ${REGISTRY:-opea}/dataprep-audio2text:${TAG:-latest}
-    container_name: dataprep-audio2text-service
+    container_name: dataprep-audio2text-server
    ports:
      - "9099:9099"
    ipc: host
@@ -57,7 +61,7 @@ services:

  dataprep-video2audio:
    image: ${REGISTRY:-opea}/dataprep-video2audio:${TAG:-latest}
-    container_name: dataprep-video2audio-service
+    container_name: dataprep-video2audio-server
    ports:
      - "7078:7078"
    ipc: host
@@ -78,7 +82,7 @@ services:
    image: ${REGISTRY:-opea}/docsum:${TAG:-latest}
    container_name: docsum-xeon-backend-server
    depends_on:
-      - tgi-service
+      - tgi-server
      - llm-docsum-tgi
      - dataprep-multimedia2text
      - dataprep-video2audio
--- a/DocSum/docker_compose/intel/hpu/gaudi/README.md
+++ b/DocSum/docker_compose/intel/hpu/gaudi/README.md
@@ -207,18 +207,19 @@ You will have the following Docker Images:
   Text:

   ```bash
+   ## json input
   curl -X POST http://${host_ip}:8888/v1/docsum \
        -H "Content-Type: application/json" \
        -d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

-   # Use English mode (default).
+   # form input. Use English mode (default).
   curl http://${host_ip}:8888/v1/docsum \
       -H "Content-Type: multipart/form-data" \
       -F "type=text" \
       -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
       -F "max_tokens=32" \
       -F "language=en" \
-       -F "stream=true"
+       -F "stream=True"

   # Use Chinese mode.
   curl http://${host_ip}:8888/v1/docsum \
@@ -227,7 +228,7 @@ You will have the following Docker Images:
       -F "messages=2024年9月26日，北京——今日，英特尔正式发布英特尔® 至强® 6性能核处理器（代号Granite Rapids），为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
       -F "max_tokens=32" \
       -F "language=zh" \
-       -F "stream=true"
+       -F "stream=True"

   # Upload file
   curl http://${host_ip}:8888/v1/docsum \
@@ -237,7 +238,6 @@ You will have the following Docker Images:
      -F "files=@/path to your file (.txt, .docx, .pdf)" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
   ```

   > Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
@@ -255,7 +255,7 @@ You will have the following Docker Images:
      -F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
+      -F "stream=True"
   ```

   Video:
@@ -271,7 +271,94 @@ You will have the following Docker Images:
      -F "messages=convert your video to base64 data type" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
+      -F "stream=True"
+   ```
+
+7. MegaService with long context
+
+   If you want to deal with long context, can set following parameters and select suitable summary type.
+
+   - "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
+   - "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
+   - "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
+
+   **summary_type=auto**
+
+   "summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=auto"
+   ```
+
+   **summary_type=stuff**
+
+   In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=stuff"
+   ```
+
+   **summary_type=truncate**
+
+   Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=truncate"
+   ```
+
+   **summary_type=map_reduce**
+
+   Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=map_reduce"
+   ```
+
+   **summary_type=refine**
+
+   Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=refine"
   ```

 > More detailed tests can be found here `cd GenAIExamples/DocSum/test`
--- a/DocSum/docker_compose/intel/hpu/gaudi/compose.yaml
+++ b/DocSum/docker_compose/intel/hpu/gaudi/compose.yaml
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: Apache-2.0

 services:
-  tgi-service:
+  tgi-server:
    image: ghcr.io/huggingface/tgi-gaudi:2.0.6
    container_name: tgi-gaudi-server
    ports:
@@ -23,13 +23,13 @@ services:
    cap_add:
      - SYS_NICE
    ipc: host
-    command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
+    command: --model-id ${LLM_MODEL_ID} --max-input-length ${MAX_INPUT_TOKENS} --max-total-tokens ${MAX_TOTAL_TOKENS}

  llm-docsum-tgi:
    image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
    container_name: llm-docsum-gaudi-server
    depends_on:
-      - tgi-service
+      - tgi-server
    ports:
      - "9000:9000"
    ipc: host
@@ -39,11 +39,15 @@ services:
      https_proxy: ${https_proxy}
      TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT}
      HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
+      MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
+      MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
+      LLM_MODEL_ID: ${LLM_MODEL_ID}
+      LOGFLAG: True
    restart: unless-stopped

  whisper:
    image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
-    container_name: whisper-service
+    container_name: whisper-server
    ports:
      - "7066:7066"
    ipc: host
@@ -60,7 +64,7 @@ services:

  dataprep-audio2text:
    image: ${REGISTRY:-opea}/dataprep-audio2text:${TAG:-latest}
-    container_name: dataprep-audio2text-service
+    container_name: dataprep-audio2text-server
    ports:
      - "9199:9099"
    ipc: host
@@ -69,7 +73,7 @@ services:

  dataprep-video2audio:
    image: ${REGISTRY:-opea}/dataprep-video2audio:${TAG:-latest}
-    container_name: dataprep-video2audio-service
+    container_name: dataprep-video2audio-server
    ports:
      - "7078:7078"
    ipc: host
@@ -90,7 +94,7 @@ services:
    image: ${REGISTRY:-opea}/docsum:${TAG:-latest}
    container_name: docsum-gaudi-backend-server
    depends_on:
-      - tgi-service
+      - tgi-server
      - llm-docsum-tgi
      - dataprep-multimedia2text
      - dataprep-video2audio
--- a/DocSum/docker_compose/set_env.sh
+++ b/DocSum/docker_compose/set_env.sh
@@ -6,6 +6,9 @@ pushd "../../" > /dev/null
 source .set_env.sh
 popd > /dev/null

+export MAX_INPUT_TOKENS=1024
+export MAX_TOTAL_TOKENS=2048
+
 export no_proxy="${no_proxy},${host_ip}"
 export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
 export MEGA_SERVICE_HOST_IP=${host_ip}
--- a/DocSum/docsum.py
+++ b/DocSum/docsum.py
@@ -14,7 +14,7 @@ from comps.cores.proto.api_protocol import (
    ChatMessage,
    UsageInfo,
 )
-from comps.cores.proto.docarray import LLMParams
+from comps.cores.proto.docarray import DocSumLLMParams
 from fastapi import File, Request, UploadFile
 from fastapi.responses import StreamingResponse

@@ -27,6 +27,16 @@ LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
 LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))


+def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **kwargs):
+    if self.services[cur_node].service_type == ServiceType.LLM:
+        docsum_parameters = kwargs.get("docsum_parameters", None)
+        if docsum_parameters:
+            docsum_parameters = docsum_parameters.model_dump()
+            del docsum_parameters["query"]
+            inputs.update(docsum_parameters)
+    return inputs
+
+
 def read_pdf(file):
    from langchain.document_loaders import PyPDFLoader

@@ -66,6 +76,7 @@ class DocSumService:
    def __init__(self, host="0.0.0.0", port=8000):
        self.host = host
        self.port = port
+        ServiceOrchestrator.align_inputs = align_inputs
        self.megaservice = ServiceOrchestrator()
        self.endpoint = str(MegaServiceEndpoint.DOC_SUMMARY)

@@ -97,6 +108,9 @@ class DocSumService:
        if "application/json" in request.headers.get("content-type"):
            data = await request.json()
            stream_opt = data.get("stream", True)
+            summary_type = data.get("summary_type", "auto")
+            chunk_size = data.get("chunk_size", -1)
+            chunk_overlap = data.get("chunk_overlap", -1)
            chat_request = ChatCompletionRequest.model_validate(data)
            prompt = handle_message(chat_request.messages)

@@ -105,6 +119,9 @@ class DocSumService:
        elif "multipart/form-data" in request.headers.get("content-type"):
            data = await request.form()
            stream_opt = data.get("stream", True)
+            summary_type = data.get("summary_type", "auto")
+            chunk_size = data.get("chunk_size", -1)
+            chunk_overlap = data.get("chunk_overlap", -1)
            chat_request = ChatCompletionRequest.model_validate(data)

            data_type = data.get("type")
@@ -148,7 +165,8 @@ class DocSumService:
        else:
            raise ValueError(f"Unknown request type: {request.headers.get('content-type')}")

-        parameters = LLMParams(
+        docsum_parameters = DocSumLLMParams(
+            query="",
            max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
            top_k=chat_request.top_k if chat_request.top_k else 10,
            top_p=chat_request.top_p if chat_request.top_p else 0.95,
@@ -159,10 +177,13 @@ class DocSumService:
            streaming=stream_opt,
            model=chat_request.model if chat_request.model else None,
            language=chat_request.language if chat_request.language else "auto",
+            summary_type=summary_type,
+            chunk_overlap=chunk_overlap,
+            chunk_size=chunk_size,
        )

        result_dict, runtime_graph = await self.megaservice.schedule(
-            initial_inputs=initial_inputs_data, llm_parameters=parameters
+            initial_inputs=initial_inputs_data, docsum_parameters=docsum_parameters
        )

        for node, response in result_dict.items():
--- a/DocSum/tests/data/long.txt
+++ b/DocSum/tests/data/long.txt
@@ -0,0 +1,99 @@
+Intel Corporation[note 1] is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware.[5] Intel designs, manufactures, and sells computer components and related products for business and consumer markets. It is considered one of the world's largest semiconductor chip manufacturers by revenue[6][7] and ranked in the Fortune 500 list of the largest United States corporations by revenue for nearly a decade, from 2007 to 2016 fiscal years, until it was removed from the ranking in 2018.[8] In 2020, it was reinstated and ranked 45th, being the 7th-largest technology company in the ranking.
+
+Intel supplies microprocessors for most manufacturers of computer systems, and is one of the developers of the x86 series of instruction sets found in most personal computers (PCs). It also manufactures chipsets, network interface controllers, flash memory, graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and other devices related to communications and computing. Intel has a strong presence in the high-performance general-purpose and gaming PC market with its Intel Core line of CPUs, whose high-end models are among the fastest consumer CPUs, as well as its Intel Arc series of GPUs. The Open Source Technology Center at Intel hosts PowerTOP and LatencyTOP, and supports other open source projects such as Wayland, Mesa, Threading Building Blocks (TBB), and Xen.[9]
+
+Intel was founded on July 18, 1968, by semiconductor pioneers Gordon Moore (of Moore's law) and Robert Noyce, along with investor Arthur Rock, and is associated with the executive leadership and vision of Andrew Grove.[10] The company was a key component of the rise of Silicon Valley as a high-tech center,[11] as well as being an early developer of SRAM and DRAM memory chips, which represented the majority of its business until 1981. Although Intel created the world's first commercial microprocessor chip—the Intel 4004—in 1971, it was not until the success of the PC in the early 1990s that this became its primary business.
+
+During the 1990s, the partnership between Microsoft Windows and Intel, known as "Wintel", became instrumental in shaping the PC landscape[12][13] and solidified Intel's position on the market. As a result, Intel invested heavily in new microprocessor designs in the mid to late 1990s, fostering the rapid growth of the computer industry. During this period, it became the dominant supplier of PC microprocessors, with a market share of 90%,[14] and was known for aggressive and anti-competitive tactics in defense of its market position, particularly against AMD, as well as a struggle with Microsoft for control over the direction of the PC industry.[15][16]
+
+Since the 2000s and especially since the late 2010s, Intel has faced increasing competition, which has led to a reduction in Intel's dominance and market share in the PC market.[17] Nevertheless, with a 68.4% market share as of 2023, Intel still leads the x86 market by a wide margin.[18] In addition, Intel's ability to design and manufacture its own chips is considered a rarity in the semiconductor industry,[19] as most chip designers do not have their own production facilities and instead rely on contract manufacturers (e.g. TSMC, Foxconn and Samsung ).[20]
+
+Industries
+Operating segments
+Client Computing Group – 51.8% of 2020 revenues – produces PC processors and related components.[21][22]
+Data Center Group – 33.7% of 2020 revenues – produces hardware components used in server, network, and storage platforms.[21]
+Internet of Things Group – 5.2% of 2020 revenues – offers platforms designed for retail, transportation, industrial, buildings and home use.[21]
+Programmable Solutions Group – 2.4% of 2020 revenues – manufactures programmable semiconductors (primarily FPGAs).[21]
+Customers
+In 2023, Dell accounted for about 19% of Intel's total revenues, Lenovo accounted for 11% of total revenues, and HP Inc. accounted for 10% of total revenues.[3] As of May 2024, the U.S. Department of Defense is another large customer for Intel.[23][24][25][26] In September 2024, Intel reportedly qualified for as much as $3.5 billion in federal grants to make semiconductors for the Defense Department.[27]
+
+Market share
+According to IDC, while Intel enjoyed the biggest market share in both the overall worldwide PC microprocessor market (73.3%) and the mobile PC microprocessor (80.4%) in the second quarter of 2011, the numbers decreased by 1.5% and 1.9% compared to the first quarter of 2011.[28][29]
+
+Intel's market share decreased significantly in the enthusiast market as of 2019,[30] and they have faced delays for their 10 nm products. According to former Intel CEO Bob Swan, the delay was caused by the company's overly aggressive strategy for moving to its next node.[31]
+
+Historical market share
+In the 1980s, Intel was among the world's top ten sellers of semiconductors (10th in 1987[32]). Along with Microsoft Windows, it was part of the "Wintel" personal computer domination in the 1990s and early 2000s. In 1992, Intel became the biggest semiconductor chip maker by revenue and held the position until 2018 when Samsung Electronics surpassed it, but Intel returned to its former position the year after.[33][34] Other major semiconductor companies include TSMC, GlobalFoundries, Texas Instruments, ASML, STMicroelectronics, United Microelectronics Corporation (UMC), Micron, SK Hynix, Kioxia, and SMIC.
+
+Major competitors
+Intel's competitors in PC chipsets included AMD, VIA Technologies, Silicon Integrated Systems, and Nvidia. Intel's competitors in networking include NXP Semiconductors, Infineon,[needs update] Broadcom Limited, Marvell Technology Group and Applied Micro Circuits Corporation, and competitors in flash memory included Spansion, Samsung Electronics, Qimonda, Kioxia, STMicroelectronics, Micron, and SK Hynix.
+
+The only major competitor in the x86 processor market is AMD, with which Intel has had full cross-licensing agreements since 1976: each partner can use the other's patented technological innovations without charge after a certain time.[35] However, the cross-licensing agreement is canceled in the event of an AMD bankruptcy or takeover.[36]
+
+Some smaller competitors, such as VIA Technologies, produce low-power x86 processors for small factor computers and portable equipment. However, the advent of such mobile computing devices, in particular, smartphones, has led to a decline in PC sales.[37] Since over 95% of the world's smartphones currently use processors cores designed by Arm, using the Arm instruction set, Arm has become a major competitor for Intel's processor market. Arm is also planning to make attempts at setting foot into the PC and server market, with Ampere and IBM each individually designing CPUs for servers and supercomputers.[38] The only other major competitor in processor instruction sets is RISC-V, which is an open source CPU instruction set. The major Chinese phone and telecommunications manufacturer Huawei has released chips based on the RISC-V instruction set due to US sanctions against China.[39]
+
+Intel has been involved in several disputes regarding the violation of antitrust laws, which are noted below.
+
+Carbon footprint
+Intel reported total CO2e emissions (direct + indirect) for the twelve months ending December 31, 2020, at 2,882 Kt (+94/+3.4% y-o-y).[40] Intel plans to reduce carbon emissions 10% by 2030 from a 2020 base year.[41]
+
+Intel's annual total CO2e emissions (direct + indirect) in kilotonnes
+Dec. 2017	Dec. 2018	Dec. 2019	Dec. 2020	Dec. 2021
+2,461[42]	2,578[43]	2,788[44]	2,882[40]	3,274[45]
+Manufacturing locations
+Intel has self-reported that they have Wafer fabrication plants in the United States, Ireland, and Israel. They have also self-reported that they have assembly and testing sites mostly in China, Costa Rica, Malaysia, and Vietnam, and one site in the United States.[46][47]
+
+Corporate history
+For a chronological guide, see Timeline of Intel.
+Origins
+
+Andy Grove, Robert Noyce and Gordon Moore in 1978
+Intel was incorporated in Mountain View, California, on July 18, 1968, by Gordon E. Moore (known for "Moore's law"), a chemist; Robert Noyce, a physicist and co-inventor of the integrated circuit; and Arthur Rock, an investor and venture capitalist.[48][49][50] Moore and Noyce had left Fairchild Semiconductor, where they were part of the "traitorous eight" who founded it. There were originally 500,000 shares outstanding of which Dr. Noyce bought 245,000 shares, Dr. Moore 245,000 shares, and Mr. Rock 10,000 shares; all at $1 per share. Rock offered $2,500,000 of convertible debentures to a limited group of private investors (equivalent to $21 million in 2022), convertible at $5 per share.[51][52] Just 2 years later, Intel became a public company via an initial public offering (IPO), raising $6.8 million ($23.50 per share). Intel was one of the very first companies to be listed on the then-newly established National Association of Securities Dealers Automated Quotations (NASDAQ) stock exchange.[53] Intel's third employee was Andy Grove,[note 2] a chemical engineer, who later ran the company through much of the 1980s and the high-growth 1990s.
+
+In deciding on a name, Moore and Noyce quickly rejected "Moore Noyce",[54] near homophone for "more noise" – an ill-suited name for an electronics company, since noise in electronics is usually undesirable and typically associated with bad interference. Instead, they founded the company as NM Electronics on July 18, 1968, but by the end of the month had changed the name to Intel, which stood for Integrated Electronics.[note 3] Since "Intel" was already trademarked by the hotel chain Intelco, they had to buy the rights for the name.[53][60]
+
+Early history
+At its founding, Intel was distinguished by its ability to make logic circuits using semiconductor devices. The founders' goal was the semiconductor memory market, widely predicted to replace magnetic-core memory. Its first product, a quick entry into the small, high-speed memory market in 1969, was the 3101 Schottky TTL bipolar 64-bit static random-access memory (SRAM), which was nearly twice as fast as earlier Schottky diode implementations by Fairchild and the Electrotechnical Laboratory in Tsukuba, Japan.[61][62] In the same year, Intel also produced the 3301 Schottky bipolar 1024-bit read-only memory (ROM)[63] and the first commercial metal–oxide–semiconductor field-effect transistor (MOSFET) silicon gate SRAM chip, the 256-bit 1101.[53][64][65]
+
+While the 1101 was a significant advance, its complex static cell structure made it too slow and costly for mainframe memories. The three-transistor cell implemented in the first commercially available dynamic random-access memory (DRAM), the 1103 released in 1970, solved these issues. The 1103 was the bestselling semiconductor memory chip in the world by 1972, as it replaced core memory in many applications.[66][67] Intel's business grew during the 1970s as it expanded and improved its manufacturing processes and produced a wider range of products, still dominated by various memory devices.
+
+
+Federico Faggin, designer of the Intel 4004
+Intel created the first commercially available microprocessor, the Intel 4004, in 1971.[53] The microprocessor represented a notable advance in the technology of integrated circuitry, as it miniaturized the central processing unit of a computer, which then made it possible for small machines to perform calculations that in the past only very large machines could do. Considerable technological innovation was needed before the microprocessor could actually become the basis of what was first known as a "mini computer" and then known as a "personal computer".[68] Intel also created one of the first microcomputers in 1973.[64][69]
+
+Intel opened its first international manufacturing facility in 1972, in Malaysia, which would host multiple Intel operations, before opening assembly facilities and semiconductor plants in Singapore and Jerusalem in the early 1980s, and manufacturing and development centers in China, India, and Costa Rica in the 1990s.[70] By the early 1980s, its business was dominated by DRAM chips. However, increased competition from Japanese semiconductor manufacturers had, by 1983, dramatically reduced the profitability of this market. The growing success of the IBM personal computer, based on an Intel microprocessor, was among factors that convinced Gordon Moore (CEO since 1975) to shift the company's focus to microprocessors and to change fundamental aspects of that business model. Moore's decision to sole-source Intel's 386 chip played into the company's continuing success.
+
+By the end of the 1980s, buoyed by its fortuitous position as microprocessor supplier to IBM and IBM's competitors within the rapidly growing personal computer market, Intel embarked on a 10-year period of unprecedented growth as the primary and most profitable hardware supplier to the PC industry, part of the winning 'Wintel' combination. Moore handed over his position as CEO to Andy Grove in 1987. By launching its Intel Inside marketing campaign in 1991, Intel was able to associate brand loyalty with consumer selection, so that by the end of the 1990s, its line of Pentium processors had become a household name.
+
+Challenges to dominance (2000s)
+After 2000, growth in demand for high-end microprocessors slowed. Competitors, most notably AMD (Intel's largest competitor in its primary x86 architecture market), garnered significant market share, initially in low-end and mid-range processors but ultimately across the product range, and Intel's dominant position in its core market was greatly reduced,[71] mostly due to controversial NetBurst microarchitecture. In the early 2000s then-CEO, Craig Barrett attempted to diversify the company's business beyond semiconductors, but few of these activities were ultimately successful.
+
+Litigation
+Bob had also for a number of years been embroiled in litigation. U.S. law did not initially recognize intellectual property rights related to microprocessor topology (circuit layouts), until the Semiconductor Chip Protection Act of 1984, a law sought by Intel and the Semiconductor Industry Association (SIA).[72] During the late 1980s and 1990s (after this law was passed), Intel also sued companies that tried to develop competitor chips to the 80386 CPU.[73] The lawsuits were noted to significantly burden the competition with legal bills, even if Intel lost the suits.[73] Antitrust allegations had been simmering since the early 1990s and had been the cause of one lawsuit against Intel in 1991. In 2004 and 2005, AMD brought further claims against Intel related to unfair competition.
+
+Reorganization and success with Intel Core (2005–2015)
+In 2005, CEO Paul Otellini reorganized the company to refocus its core processor and chipset business on platforms (enterprise, digital home, digital health, and mobility).
+
+On June 6, 2005, Steve Jobs, then CEO of Apple, announced that Apple would be using Intel's x86 processors for its Macintosh computers, switching from the PowerPC architecture developed by the AIM alliance.[74] This was seen as a win for Intel;[75] an analyst called the move "risky" and "foolish", as Intel's current offerings at the time were considered to be behind those of AMD and IBM.[76]
+
+In 2006, Intel unveiled its Core microarchitecture to widespread critical acclaim; the product range was perceived as an exceptional leap in processor performance that at a stroke regained much of its leadership of the field.[77][78] In 2008, Intel had another "tick" when it introduced the Penryn microarchitecture, fabricated using the 45 nm process node. Later that year, Intel released a processor with the Nehalem architecture to positive reception.[79]
+
+On June 27, 2006, the sale of Intel's XScale assets was announced. Intel agreed to sell the XScale processor business to Marvell Technology Group for an estimated $600 million and the assumption of unspecified liabilities. The move was intended to permit Intel to focus its resources on its core x86 and server businesses, and the acquisition completed on November 9, 2006.[80]
+
+In 2008, Intel spun off key assets of a solar startup business effort to form an independent company, SpectraWatt Inc. In 2011, SpectraWatt filed for bankruptcy.[81]
+
+In February 2011, Intel began to build a new microprocessor manufacturing facility in Chandler, Arizona, completed in 2013 at a cost of $5 billion.[82] The building is now the 10 nm-certified Fab 42 and is connected to the other Fabs (12, 22, 32) on Ocotillo Campus via an enclosed bridge known as the Link.[83][84][85][86] The company produces three-quarters of its products in the United States, although three-quarters of its revenue come from overseas.[87]
+
+The Alliance for Affordable Internet (A4AI) was launched in October 2013 and Intel is part of the coalition of public and private organizations that also includes Facebook, Google, and Microsoft. Led by Sir Tim Berners-Lee, the A4AI seeks to make Internet access more affordable so that access is broadened in the developing world, where only 31% of people are online. Google will help to decrease Internet access prices so that they fall below the UN Broadband Commission's worldwide target of 5% of monthly income.[88]
+
+Attempts at entering the smartphone market
+In April 2011, Intel began a pilot project with ZTE Corporation to produce smartphones using the Intel Atom processor for China's domestic market. In December 2011, Intel announced that it reorganized several of its business units into a new mobile and communications group[89] that would be responsible for the company's smartphone, tablet, and wireless efforts. Intel planned to introduce Medfield – a processor for tablets and smartphones – to the market in 2012, as an effort to compete with Arm.[90] As a 32-nanometer processor, Medfield is designed to be energy-efficient, which is one of the core features in Arm's chips.[91]
+
+At the Intel Developers Forum (IDF) 2011 in San Francisco, Intel's partnership with Google was announced. In January 2012, Google announced Android 2.3, supporting Intel's Atom microprocessor.[92][93][94] In 2013, Intel's Kirk Skaugen said that Intel's exclusive focus on Microsoft platforms was a thing of the past and that they would now support all "tier-one operating systems" such as Linux, Android, iOS, and Chrome.[95]
+
+In 2014, Intel cut thousands of employees in response to "evolving market trends",[96] and offered to subsidize manufacturers for the extra costs involved in using Intel chips in their tablets. In April 2016, Intel cancelled the SoFIA platform and the Broxton Atom SoC for smartphones,[97][98][99][100] effectively leaving the smartphone market.[101][102]
+
+Intel custom foundry
+Finding itself with excess fab capacity after the failure of the Ultrabook to gain market traction and with PC sales declining, in 2013 Intel reached a foundry agreement to produce chips for Altera using a 14 nm process. General Manager of Intel's custom foundry division Sunit Rikhi indicated that Intel would pursue further such deals in the future.[103] This was after poor sales of Windows 8 hardware caused a major retrenchment for most of the major semiconductor manufacturers, except for Qualcomm, which continued to see healthy purchases from its largest customer, Apple.[104]
+
+As of July 2013, five companies were using Intel's fabs via the Intel Custom Foundry division: Achronix, Tabula, Netronome, Microsemi, and Panasonic – most are field-programmable gate array (FPGA) makers, but Netronome designs network processors. Only Achronix began shipping chips made by Intel using the 22 nm Tri-Gate process.[105][106] Several other customers also exist but were not announced at the time.[107]
--- a/DocSum/tests/data/short.txt
+++ b/DocSum/tests/data/short.txt
@@ -0,0 +1 @@
+Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.
--- a/DocSum/tests/test_compose_on_gaudi.sh
+++ b/DocSum/tests/test_compose_on_gaudi.sh
@@ -2,7 +2,7 @@
 # Copyright (C) 2024 Intel Corporation
 # SPDX-License-Identifier: Apache-2.0

-set -e
+set -xe

 IMAGE_REPO=${IMAGE_REPO:-"opea"}
 IMAGE_TAG=${IMAGE_TAG:-"latest"}
@@ -14,7 +14,8 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
 echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
-
+export MAX_INPUT_TOKENS=2048
+export MAX_TOTAL_TOKENS=4096
 export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
 export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
 export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
@@ -28,7 +29,7 @@ export V2A_ENDPOINT=http://$host_ip:7078

 export A2T_ENDPOINT=http://$host_ip:7066
 export A2T_SERVICE_HOST_IP=${host_ip}
-export A2T_SERVICE_PORT=9099
+export A2T_SERVICE_PORT=9199

 export DATA_ENDPOINT=http://$host_ip:7079
 export DATA_SERVICE_HOST_IP=${host_ip}
@@ -69,7 +70,32 @@ function start_services() {
    done
 }

-function validate_services() {
+get_base64_str() {
+    local file_name=$1
+    base64 -w 0 "$file_name"
+}
+
+# Function to generate input data for testing based on the document type
+input_data_for_test() {
+    local document_type=$1
+    case $document_type in
+        ("text")
+            echo "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco of education week reports it takes just 10 minutes to cross through gillette wyoming this small city sits in the northeast corner of the state surrounded by 100s of miles of prairie but schools here in campbell county are on the edge of something big the next generation science standards you are going to build a strand of dna and you are going to decode it and figure out what that dna actually says for christy mathis at sage valley junior high school the new standards are about learning to think like a scientist there is a lot of really good stuff in them every standard is a performance task it is not you know the child needs to memorize these things it is the student needs to be able to do some pretty intense stuff we are analyzing we are critiquing we are."
+            ;;
+        ("audio")
+            get_base64_str "$ROOT_FOLDER/data/test.wav"
+            ;;
+        ("video")
+            get_base64_str "$ROOT_FOLDER/data/test.mp4"
+            ;;
+        (*)
+            echo "Invalid document type" >&2
+            exit 1
+            ;;
+    esac
+}
+
+function validate_services_json() {
    local URL="$1"
    local EXPECTED_RESULT="$2"
    local SERVICE_NAME="$3"
@@ -100,115 +126,23 @@ function validate_services() {
    sleep 1s
 }

-get_base64_str() {
-    local file_name=$1
-    base64 -w 0 "$file_name"
-}
+function validate_services_form() {
+    local URL="$1"
+    local EXPECTED_RESULT="$2"
+    local SERVICE_NAME="$3"
+    local DOCKER_NAME="$4"
+    local FORM_DATA1="$5"
+    local FORM_DATA2="$6"
+    local FORM_DATA3="$7"
+    local FORM_DATA4="$8"
+    local FORM_DATA5="$9"

-# Function to generate input data for testing based on the document type
-input_data_for_test() {
-    local document_type=$1
-    case $document_type in
-        ("text")
-            echo "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco of education week reports it takes just 10 minutes to cross through gillette wyoming this small city sits in the northeast corner of the state surrounded by 100s of miles of prairie but schools here in campbell county are on the edge of something big the next generation science standards you are going to build a strand of dna and you are going to decode it and figure out what that dna actually says for christy mathis at sage valley junior high school the new standards are about learning to think like a scientist there is a lot of really good stuff in them every standard is a performance task it is not you know the child needs to memorize these things it is the student needs to be able to do some pretty intense stuff we are analyzing we are critiquing we are."
-            ;;
-        ("audio")
-            get_base64_str "$ROOT_FOLDER/data/test.wav"
-            ;;
-        ("video")
-            get_base64_str "$ROOT_FOLDER/data/test.mp4"
-            ;;
-        (*)
-            echo "Invalid document type" >&2
-            exit 1
-            ;;
-    esac
-}
-
-function validate_microservices() {
-    # Check if the microservices are running correctly.
-
-    # whisper microservice
-    ulimit -s 65536
-    validate_services \
-        "${host_ip}:7066/v1/asr" \
-        '{"asr_result":"well"}' \
-        "whisper" \
-        "whisper-service" \
-        "{\"audio\": \"$(input_data_for_test "audio")\"}"
-
-    # Audio2Text service
-    validate_services \
-        "${host_ip}:9199/v1/audio/transcriptions" \
-        '"query":"well"' \
-        "dataprep-audio2text" \
-        "dataprep-audio2text-service" \
-        "{\"byte_str\": \"$(input_data_for_test "audio")\"}"
-
-    # Video2Audio service
-    validate_services \
-        "${host_ip}:7078/v1/video2audio" \
-        "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//tQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASW5mbwAAAA8AAAAIAAAN3wAtLS0tLS0tLS0tLS1LS0tLS0tLS0tLS0tpaWlpaWlpaWlpaWlph4eHh4eHh4eHh4eHpaWlpaWlpaWlpaWlpcPDw8PDw8PDw8PDw+Hh4eHh4eHh4eHh4eH///////////////8AAAAATGF2YzU4LjU0AAAAAAAAAAAAAAAAJAYwAAAAAAAADd95t4qPAAAAAAAAAAAAAAAAAAAAAP/7kGQAAAMhClSVMEACMOAabaCMAREA" \
-        "dataprep-video2audio" \
-        "dataprep-video2audio-service" \
-        "{\"byte_str\": \"$(input_data_for_test "video")\"}"
-
-    # Docsum Data service - video
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "well" \
-        "dataprep-multimedia2text" \
-        "dataprep-multimedia2text" \
-        "{\"video\": \"$(input_data_for_test "video")\"}"
-
-    # Docsum Data service - audio
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "well" \
-        "dataprep-multimedia2text" \
-        "dataprep-multimedia2text" \
-        "{\"audio\": \"$(input_data_for_test "audio")\"}"
-
-    # Docsum Data service - text
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco" \
-        "dataprep-multimedia2text" \
-        "dataprep-multimedia2text" \
-        "{\"text\": \"$(input_data_for_test "text")\"}"
-
-    # tgi for llm service
-    validate_services \
-        "${host_ip}:8008/generate" \
-        "generated_text" \
-        "tgi-gaudi" \
-        "tgi-gaudi-server" \
-        '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
-
-    # llm microservice
-    validate_services \
-        "${host_ip}:9000/v1/chat/docsum" \
-        "data: " \
-        "llm-docsum-tgi" \
-        "llm-docsum-gaudi-server" \
-        '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
-
-}
-
-function validate_megaservice() {
-    local SERVICE_NAME="docsum-gaudi-backend-server"
-    local DOCKER_NAME="docsum-gaudi-backend-server"
-    local EXPECTED_RESULT="[DONE]"
-    local INPUT_DATA="messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
-    local URL="${host_ip}:8888/v1/docsum"
-    local DATA_TYPE="type=text"
-
-    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$DATA_TYPE" -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL")
+    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$FORM_DATA1" -F "$FORM_DATA2" -F "$FORM_DATA3" -F "$FORM_DATA4" -F "$FORM_DATA5" -H 'Content-Type: multipart/form-data' "$URL")

    if [ "$HTTP_STATUS" -eq 200 ]; then
        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."

-        local CONTENT=$(curl -s -X POST -F "$DATA_TYPE" -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
+        local CONTENT=$(curl -s -X POST -F "$FORM_DATA1" -F "$FORM_DATA2" -F "$FORM_DATA3" -F "$FORM_DATA4" -F "$FORM_DATA5" -H 'Content-Type: multipart/form-data' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)

        if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
            echo "[ $SERVICE_NAME ] Content is as expected."
@@ -225,32 +159,224 @@ function validate_megaservice() {
    sleep 1s
 }

-function validate_megaservice_json() {
-    # Curl the Mega Service
-    echo ">>> Checking text data with Content-Type: application/json"
-    validate_services \
+function validate_microservices() {
+    # Check if the microservices are running correctly.
+
+    # tgi for llm service
+    validate_services_json \
+        "${host_ip}:8008/generate" \
+        "generated_text" \
+        "tgi-gaudi" \
+        "tgi-gaudi-server" \
+        '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
+
+    # llm microservice
+    validate_services_json \
+        "${host_ip}:9000/v1/chat/docsum" \
+        "data: " \
+        "llm-docsum-tgi" \
+        "llm-docsum-gaudi-server" \
+        '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
+
+    # whisper microservice
+    ulimit -s 65536
+    validate_services_json \
+        "${host_ip}:7066/v1/asr" \
+        '{"asr_result":"well"}' \
+        "whisper" \
+        "whisper-server" \
+        "{\"audio\": \"$(input_data_for_test "audio")\"}"
+
+    # Audio2Text service
+    validate_services_json \
+        "${host_ip}:9199/v1/audio/transcriptions" \
+        '"query":"well"' \
+        "dataprep-audio2text" \
+        "dataprep-audio2text-server" \
+        "{\"byte_str\": \"$(input_data_for_test "audio")\"}"
+
+    # Video2Audio service
+    validate_services_json \
+        "${host_ip}:7078/v1/video2audio" \
+        "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//tQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASW5mbwAAAA8AAAAIAAAN3wAtLS0tLS0tLS0tLS1LS0tLS0tLS0tLS0tpaWlpaWlpaWlpaWlph4eHh4eHh4eHh4eHpaWlpaWlpaWlpaWlpcPDw8PDw8PDw8PDw+Hh4eHh4eHh4eHh4eH///////////////8AAAAATGF2YzU4LjU0AAAAAAAAAAAAAAAAJAYwAAAAAAAADd95t4qPAAAAAAAAAAAAAAAAAAAAAP/7kGQAAAMhClSVMEACMOAabaCMAREA" \
+        "dataprep-video2audio" \
+        "dataprep-video2audio-server" \
+        "{\"byte_str\": \"$(input_data_for_test "video")\"}"
+
+    # Docsum Data service - video
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "well" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"video\": \"$(input_data_for_test "video")\"}"
+
+    # Docsum Data service - audio
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "well" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"audio\": \"$(input_data_for_test "audio")\"}"
+
+    # Docsum Data service - text
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"text\": \"$(input_data_for_test "text")\"}"
+
+}
+
+function validate_megaservice_text() {
+    echo ">>> Checking text data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-gaudi-backend-server" \
        "docsum-gaudi-backend-server" \
        '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

-    echo ">>> Checking audio data"
-    validate_services \
+    echo ">>> Checking text data in form format, set language=en"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+
+    echo ">>> Checking text data in form format, set language=zh"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=2024年9月26日，北京——今日，英特尔正式发布英特尔® 至强® 6性能核处理器（代号Granite Rapids），为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
+        "max_tokens=32" \
+        "language=zh" \
+        "stream=True"
+
+    echo ">>> Checking text data in form format, upload file"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/short.txt" \
+        "max_tokens=32" \
+        "language=en"
+}
+
+function validate_megaservice_multimedia() {
+    echo ">>> Checking audio data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-gaudi-backend-server" \
        "docsum-gaudi-backend-server" \
        "{\"type\": \"audio\",  \"messages\": \"$(input_data_for_test "audio")\"}"

-    echo ">>> Checking video data"
-    validate_services \
+    echo ">>> Checking audio data in form format"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=audio" \
+        "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+
+    echo ">>> Checking video data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-gaudi-backend-server" \
        "docsum-gaudi-backend-server" \
        "{\"type\": \"video\",  \"messages\": \"$(input_data_for_test "video")\"}"

+    echo ">>> Checking video data in form format"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=video" \
+        "messages=\"$(input_data_for_test "video")\"" \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+}
+
+function validate_megaservice_long_text() {
+    echo ">>> Checking long text data in form format, set summary_type=auto"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=auto"
+
+    echo ">>> Checking long text data in form format, set summary_type=stuff"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=stuff"
+
+    echo ">>> Checking long text data in form format, set summary_type=truncate"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=truncate"
+
+    echo ">>> Checking long text data in form format, set summary_type=map_reduce"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=map_reduce"
+
+    echo ">>> Checking long text data in form format, set summary_type=refine"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-gaudi-backend-server" \
+        "docsum-gaudi-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=refine"
 }

 function stop_docker() {
@@ -278,10 +404,16 @@ function main() {
    validate_microservices

    echo "==========================================="
-    echo ">>>> Validating megaservice..."
-    validate_megaservice
-    echo ">>>> Validating validate_megaservice_json..."
-    validate_megaservice_json
+    echo ">>>> Validating megaservice for text..."
+    validate_megaservice_text
+
+    echo "==========================================="
+    echo ">>>> Validating megaservice for multimedia..."
+    validate_megaservice_multimedia
+
+    echo "==========================================="
+    echo ">>>> Validating megaservice for long text..."
+    validate_megaservice_long_text

    echo "==========================================="
    echo ">>>> Stopping Docker containers..."
--- a/DocSum/tests/test_compose_on_rocm.sh
+++ b/DocSum/tests/test_compose_on_rocm.sh
@@ -13,7 +13,8 @@ echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 WORKPATH=$(dirname "$PWD")
 LOG_PATH="$WORKPATH/tests"
 ip_address=$(hostname -I | awk '{print $1}')
-
+export MAX_INPUT_TOKENS=1024
+export MAX_TOTAL_TOKENS=2048
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
 export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
--- a/DocSum/tests/test_compose_on_xeon.sh
+++ b/DocSum/tests/test_compose_on_xeon.sh
@@ -14,7 +14,8 @@ echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
 echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
 export REGISTRY=${IMAGE_REPO}
 export TAG=${IMAGE_TAG}
-
+export MAX_INPUT_TOKENS=2048
+export MAX_TOTAL_TOKENS=4096
 export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
 export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
 export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
@@ -59,7 +60,7 @@ function start_services() {
    sleep 3m

    until [[ "$n" -ge 100 ]]; do
-        docker logs tgi-service > ${LOG_PATH}/tgi_service_start.log
+        docker logs tgi-server > ${LOG_PATH}/tgi_service_start.log
        if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
            break
        fi
@@ -68,7 +69,32 @@ function start_services() {
    done
 }

-function validate_services() {
+get_base64_str() {
+    local file_name=$1
+    base64 -w 0 "$file_name"
+}
+
+# Function to generate input data for testing based on the document type
+input_data_for_test() {
+    local document_type=$1
+    case $document_type in
+        ("text")
+            echo "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco of education week reports it takes just 10 minutes to cross through gillette wyoming this small city sits in the northeast corner of the state surrounded by 100s of miles of prairie but schools here in campbell county are on the edge of something big the next generation science standards you are going to build a strand of dna and you are going to decode it and figure out what that dna actually says for christy mathis at sage valley junior high school the new standards are about learning to think like a scientist there is a lot of really good stuff in them every standard is a performance task it is not you know the child needs to memorize these things it is the student needs to be able to do some pretty intense stuff we are analyzing we are critiquing we are."
+            ;;
+        ("audio")
+            get_base64_str "$ROOT_FOLDER/data/test.wav"
+            ;;
+        ("video")
+            get_base64_str "$ROOT_FOLDER/data/test.mp4"
+            ;;
+        (*)
+            echo "Invalid document type" >&2
+            exit 1
+            ;;
+    esac
+}
+
+function validate_services_json() {
    local URL="$1"
    local EXPECTED_RESULT="$2"
    local SERVICE_NAME="$3"
@@ -102,115 +128,23 @@ function validate_services() {
    sleep 1s
 }

-get_base64_str() {
-    local file_name=$1
-    base64 -w 0 "$file_name"
-}
+function validate_services_form() {
+    local URL="$1"
+    local EXPECTED_RESULT="$2"
+    local SERVICE_NAME="$3"
+    local DOCKER_NAME="$4"
+    local FORM_DATA1="$5"
+    local FORM_DATA2="$6"
+    local FORM_DATA3="$7"
+    local FORM_DATA4="$8"
+    local FORM_DATA5="$9"

-# Function to generate input data for testing based on the document type
-input_data_for_test() {
-    local document_type=$1
-    case $document_type in
-        ("text")
-            echo "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco of education week reports it takes just 10 minutes to cross through gillette wyoming this small city sits in the northeast corner of the state surrounded by 100s of miles of prairie but schools here in campbell county are on the edge of something big the next generation science standards you are going to build a strand of dna and you are going to decode it and figure out what that dna actually says for christy mathis at sage valley junior high school the new standards are about learning to think like a scientist there is a lot of really good stuff in them every standard is a performance task it is not you know the child needs to memorize these things it is the student needs to be able to do some pretty intense stuff we are analyzing we are critiquing we are."
-            ;;
-        ("audio")
-            get_base64_str "$ROOT_FOLDER/data/test.wav"
-            ;;
-        ("video")
-            get_base64_str "$ROOT_FOLDER/data/test.mp4"
-            ;;
-        (*)
-            echo "Invalid document type" >&2
-            exit 1
-            ;;
-    esac
-}
-
-function validate_microservices() {
-    # Check if the microservices are running correctly.
-
-    # whisper microservice
-    ulimit -s 65536
-    validate_services \
-        "${host_ip}:7066/v1/asr" \
-        '{"asr_result":"well"}' \
-        "whisper-service" \
-        "whisper-service" \
-        "{\"audio\": \"$(input_data_for_test "audio")\"}"
-
-    # Audio2Text service
-    validate_services \
-        "${host_ip}:9099/v1/audio/transcriptions" \
-        '"query":"well"' \
-        "dataprep-audio2text" \
-        "dataprep-audio2text-service" \
-        "{\"byte_str\": \"$(input_data_for_test "audio")\"}"
-
-    # Video2Audio service
-    validate_services \
-        "${host_ip}:7078/v1/video2audio" \
-        "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//tQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASW5mbwAAAA8AAAAIAAAN3wAtLS0tLS0tLS0tLS1LS0tLS0tLS0tLS0tpaWlpaWlpaWlpaWlph4eHh4eHh4eHh4eHpaWlpaWlpaWlpaWlpcPDw8PDw8PDw8PDw+Hh4eHh4eHh4eHh4eH///////////////8AAAAATGF2YzU4LjU0AAAAAAAAAAAAAAAAJAYwAAAAAAAADd95t4qPAAAAAAAAAAAAAAAAAAAAAP/7kGQAAAMhClSVMEACMOAabaCMAREA" \
-        "dataprep-video2audio" \
-        "dataprep-video2audio-service" \
-        "{\"byte_str\": \"$(input_data_for_test "video")\"}"
-
-    # Docsum Data service - video
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "well" \
-        "dataprep-multimedia2text-service" \
-        "dataprep-multimedia2text" \
-        "{\"video\": \"$(input_data_for_test "video")\"}"
-
-    # Docsum Data service - audio
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "well" \
-        "dataprep-multimedia2text-service" \
-        "dataprep-multimedia2text" \
-        "{\"audio\": \"$(input_data_for_test "audio")\"}"
-
-    # Docsum Data service - text
-    validate_services \
-        "${host_ip}:7079/v1/multimedia2text" \
-        "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco" \
-        "dataprep-multimedia2text-service" \
-        "dataprep-multimedia2text" \
-        "{\"text\": \"$(input_data_for_test "text")\"}"
-
-    # tgi for llm service
-    validate_services \
-        "${host_ip}:8008/generate" \
-        "generated_text" \
-        "tgi-llm" \
-        "tgi-service" \
-        '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
-
-    # llm microservice
-    validate_services \
-        "${host_ip}:9000/v1/chat/docsum" \
-        "data: " \
-        "llm" \
-        "llm-docsum-server" \
-        '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
-
-}
-
-function validate_megaservice() {
-    local SERVICE_NAME="docsum-xeon-backend-server"
-    local DOCKER_NAME="docsum-xeon-backend-server"
-    local EXPECTED_RESULT="[DONE]"
-    local INPUT_DATA="messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
-    local URL="${host_ip}:8888/v1/docsum"
-    local DATA_TYPE="type=text"
-
-    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$DATA_TYPE" -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL")
+    local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$FORM_DATA1" -F "$FORM_DATA2" -F "$FORM_DATA3" -F "$FORM_DATA4" -F "$FORM_DATA5" -H 'Content-Type: multipart/form-data' "$URL")

    if [ "$HTTP_STATUS" -eq 200 ]; then
        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."

-        local CONTENT=$(curl -s -X POST -F "$DATA_TYPE" -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
+        local CONTENT=$(curl -s -X POST -F "$FORM_DATA1" -F "$FORM_DATA2" -F "$FORM_DATA3" -F "$FORM_DATA4" -F "$FORM_DATA5" -H 'Content-Type: multipart/form-data' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)

        if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
            echo "[ $SERVICE_NAME ] Content is as expected."
@@ -227,33 +161,224 @@ function validate_megaservice() {
    sleep 1s
 }

-function validate_megaservice_json() {
-    # Curl the Mega Service
-    echo ""
-    echo ">>> Checking text data with Content-Type: application/json"
-    validate_services \
+function validate_microservices() {
+    # Check if the microservices are running correctly.
+
+    # tgi for llm service
+    validate_services_json \
+        "${host_ip}:8008/generate" \
+        "generated_text" \
+        "tgi" \
+        "tgi-server" \
+        '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'
+
+    # llm microservice
+    validate_services_json \
+        "${host_ip}:9000/v1/chat/docsum" \
+        "data: " \
+        "llm-docsum-tgi" \
+        "llm-docsum-server" \
+        '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
+
+    # whisper microservice
+    ulimit -s 65536
+    validate_services_json \
+        "${host_ip}:7066/v1/asr" \
+        '{"asr_result":"well"}' \
+        "whisper" \
+        "whisper-server" \
+        "{\"audio\": \"$(input_data_for_test "audio")\"}"
+
+    # Audio2Text service
+    validate_services_json \
+        "${host_ip}:9099/v1/audio/transcriptions" \
+        '"query":"well"' \
+        "dataprep-audio2text" \
+        "dataprep-audio2text-server" \
+        "{\"byte_str\": \"$(input_data_for_test "audio")\"}"
+
+    # Video2Audio service
+    validate_services_json \
+        "${host_ip}:7078/v1/video2audio" \
+        "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//tQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASW5mbwAAAA8AAAAIAAAN3wAtLS0tLS0tLS0tLS1LS0tLS0tLS0tLS0tpaWlpaWlpaWlpaWlph4eHh4eHh4eHh4eHpaWlpaWlpaWlpaWlpcPDw8PDw8PDw8PDw+Hh4eHh4eHh4eHh4eH///////////////8AAAAATGF2YzU4LjU0AAAAAAAAAAAAAAAAJAYwAAAAAAAADd95t4qPAAAAAAAAAAAAAAAAAAAAAP/7kGQAAAMhClSVMEACMOAabaCMAREA" \
+        "dataprep-video2audio" \
+        "dataprep-video2audio-server" \
+        "{\"byte_str\": \"$(input_data_for_test "video")\"}"
+
+    # Docsum Data service - video
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "well" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"video\": \"$(input_data_for_test "video")\"}"
+
+    # Docsum Data service - audio
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "well" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"audio\": \"$(input_data_for_test "audio")\"}"
+
+    # Docsum Data service - text
+    validate_services_json \
+        "${host_ip}:7079/v1/multimedia2text" \
+        "THIS IS A TEST >>>> and a number of states are starting to adopt them voluntarily special correspondent john delenco" \
+        "dataprep-multimedia2text" \
+        "dataprep-multimedia2text" \
+        "{\"text\": \"$(input_data_for_test "text")\"}"
+
+}
+
+function validate_megaservice_text() {
+    echo ">>> Checking text data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-xeon-backend-server" \
        "docsum-xeon-backend-server" \
        '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

-    echo ">>> Checking audio data"
-    validate_services \
+    echo ">>> Checking text data in form format, set language=en"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+
+    echo ">>> Checking text data in form format, set language=zh"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=2024年9月26日，北京——今日，英特尔正式发布英特尔® 至强® 6性能核处理器（代号Granite Rapids），为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
+        "max_tokens=32" \
+        "language=zh" \
+        "stream=True"
+
+    echo ">>> Checking text data in form format, upload file"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/short.txt" \
+        "max_tokens=32" \
+        "language=en"
+}
+
+function validate_megaservice_multimedia() {
+    echo ">>> Checking audio data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-xeon-backend-server" \
        "docsum-xeon-backend-server" \
        "{\"type\": \"audio\",  \"messages\": \"$(input_data_for_test "audio")\"}"

-    echo ">>> Checking video data"
-    validate_services \
+    echo ">>> Checking audio data in form format"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=audio" \
+        "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+
+    echo ">>> Checking video data in json format"
+    validate_services_json \
        "${host_ip}:8888/v1/docsum" \
        "[DONE]" \
        "docsum-xeon-backend-server" \
        "docsum-xeon-backend-server" \
        "{\"type\": \"video\",  \"messages\": \"$(input_data_for_test "video")\"}"

+    echo ">>> Checking video data in form format"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=video" \
+        "messages=\"$(input_data_for_test "video")\"" \
+        "max_tokens=32" \
+        "language=en" \
+        "stream=True"
+}
+
+function validate_megaservice_long_text() {
+    echo ">>> Checking long text data in form format, set summary_type=auto"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=auto"
+
+    echo ">>> Checking long text data in form format, set summary_type=stuff"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=stuff"
+
+    echo ">>> Checking long text data in form format, set summary_type=truncate"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=truncate"
+
+    echo ">>> Checking long text data in form format, set summary_type=map_reduce"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=map_reduce"
+
+    echo ">>> Checking long text data in form format, set summary_type=refine"
+    validate_services_form \
+        "${host_ip}:8888/v1/docsum" \
+        "[DONE]" \
+        "docsum-xeon-backend-server" \
+        "docsum-xeon-backend-server" \
+        "type=text" \
+        "messages=" \
+        "files=@$ROOT_FOLDER/data/long.txt" \
+        "max_tokens=128" \
+        "summary_type=refine"
 }

 function stop_docker() {
@@ -281,10 +406,16 @@ function main() {
    validate_microservices

    echo "==========================================="
-    echo ">>>> Validating megaservice..."
-    validate_megaservice
-    echo ">>>> Validating validate_megaservice_json..."
-    validate_megaservice_json
+    echo ">>>> Validating megaservice for text..."
+    validate_megaservice_text
+
+    echo "==========================================="
+    echo ">>>> Validating megaservice for multimedia..."
+    validate_megaservice_multimedia
+
+    echo "==========================================="
+    echo ">>>> Validating megaservice for long text..."
+    validate_megaservice_long_text

    echo "==========================================="
    echo ">>>> Stopping Docker containers..."
--- a/FaqGen/docker_compose/intel/cpu/xeon/README.md
+++ b/FaqGen/docker_compose/intel/cpu/xeon/README.md
@@ -122,6 +122,15 @@ docker compose up -d
      -F "stream=False"
   ```

+   ```bash
+   ## enable stream
+   curl http://${host_ip}:8888/v1/faqgen \
+      -H "Content-Type: multipart/form-data" \
+      -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
+      -F "max_tokens=32" \
+      -F "stream=True"
+   ```
+
   Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.

 ## 🚀 Launch the UI
--- a/FaqGen/docker_compose/intel/hpu/gaudi/README.md
+++ b/FaqGen/docker_compose/intel/hpu/gaudi/README.md
@@ -123,6 +123,15 @@ docker compose up -d
      -F "stream=False"
   ```

+   ```bash
+   ##enable streaming
+   curl http://${host_ip}:8888/v1/faqgen \
+     -H "Content-Type: multipart/form-data" \
+     -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
+     -F "max_tokens=32" \
+     -F "stream=True"
+   ```
+
 ## 🚀 Launch the UI

 Open this URL `http://{host_ip}:5173` in your browser to access the frontend.
				`@@ -0,0 +1 @@`
				`Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.`