Support Long context for DocSum (#1255)

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
2024-12-20 19:17:10 +08:00
parent 05365b6140
commit 50dd959d60
15 changed files with 861 additions and 267 deletions
--- a/DocSum/docker_compose/intel/hpu/gaudi/README.md
+++ b/DocSum/docker_compose/intel/hpu/gaudi/README.md
@@ -207,18 +207,19 @@ You will have the following Docker Images:
   Text:

   ```bash
+   ## json input
   curl -X POST http://${host_ip}:8888/v1/docsum \
        -H "Content-Type: application/json" \
        -d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

-   # Use English mode (default).
+   # form input. Use English mode (default).
   curl http://${host_ip}:8888/v1/docsum \
       -H "Content-Type: multipart/form-data" \
       -F "type=text" \
       -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
       -F "max_tokens=32" \
       -F "language=en" \
-       -F "stream=true"
+       -F "stream=True"

   # Use Chinese mode.
   curl http://${host_ip}:8888/v1/docsum \
@@ -227,7 +228,7 @@ You will have the following Docker Images:
       -F "messages=2024年9月26日，北京——今日，英特尔正式发布英特尔® 至强® 6性能核处理器（代号Granite Rapids），为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
       -F "max_tokens=32" \
       -F "language=zh" \
-       -F "stream=true"
+       -F "stream=True"

   # Upload file
   curl http://${host_ip}:8888/v1/docsum \
@@ -237,7 +238,6 @@ You will have the following Docker Images:
      -F "files=@/path to your file (.txt, .docx, .pdf)" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
   ```

   > Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
@@ -255,7 +255,7 @@ You will have the following Docker Images:
      -F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
+      -F "stream=True"
   ```

   Video:
@@ -271,7 +271,94 @@ You will have the following Docker Images:
      -F "messages=convert your video to base64 data type" \
      -F "max_tokens=32" \
      -F "language=en" \
-      -F "stream=true"
+      -F "stream=True"
+   ```
+
+7. MegaService with long context
+
+   If you want to deal with long context, can set following parameters and select suitable summary type.
+
+   - "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
+   - "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
+   - "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
+
+   **summary_type=auto**
+
+   "summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=auto"
+   ```
+
+   **summary_type=stuff**
+
+   In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=stuff"
+   ```
+
+   **summary_type=truncate**
+
+   Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=truncate"
+   ```
+
+   **summary_type=map_reduce**
+
+   Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=map_reduce"
+   ```
+
+   **summary_type=refine**
+
+   Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
+
+   In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
+
+   ```bash
+   curl http://${host_ip}:8888/v1/docsum \
+      -H "Content-Type: multipart/form-data" \
+      -F "type=text" \
+      -F "messages=" \
+      -F "max_tokens=32" \
+      -F "files=@/path to your file (.txt, .docx, .pdf)" \
+      -F "language=en" \
+      -F "summary_type=refine"
   ```

 > More detailed tests can be found here `cd GenAIExamples/DocSum/test`