Support Long context for DocSum (#1255)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
This commit is contained in:
@@ -207,18 +207,19 @@ You will have the following Docker Images:
|
||||
Text:
|
||||
|
||||
```bash
|
||||
## json input
|
||||
curl -X POST http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
|
||||
|
||||
# Use English mode (default).
|
||||
# form input. Use English mode (default).
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
|
||||
-F "max_tokens=32" \
|
||||
-F "language=en" \
|
||||
-F "stream=true"
|
||||
-F "stream=True"
|
||||
|
||||
# Use Chinese mode.
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
@@ -227,7 +228,7 @@ You will have the following Docker Images:
|
||||
-F "messages=2024年9月26日,北京——今日,英特尔正式发布英特尔® 至强® 6性能核处理器(代号Granite Rapids),为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \
|
||||
-F "max_tokens=32" \
|
||||
-F "language=zh" \
|
||||
-F "stream=true"
|
||||
-F "stream=True"
|
||||
|
||||
# Upload file
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
@@ -237,7 +238,6 @@ You will have the following Docker Images:
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "max_tokens=32" \
|
||||
-F "language=en" \
|
||||
-F "stream=true"
|
||||
```
|
||||
|
||||
> Audio and Video file uploads are not supported in docsum with curl request, please use the Gradio-UI.
|
||||
@@ -255,7 +255,7 @@ You will have the following Docker Images:
|
||||
-F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \
|
||||
-F "max_tokens=32" \
|
||||
-F "language=en" \
|
||||
-F "stream=true"
|
||||
-F "stream=True"
|
||||
```
|
||||
|
||||
Video:
|
||||
@@ -271,7 +271,94 @@ You will have the following Docker Images:
|
||||
-F "messages=convert your video to base64 data type" \
|
||||
-F "max_tokens=32" \
|
||||
-F "language=en" \
|
||||
-F "stream=true"
|
||||
-F "stream=True"
|
||||
```
|
||||
|
||||
7. MegaService with long context
|
||||
|
||||
If you want to deal with long context, can set following parameters and select suitable summary type.
|
||||
|
||||
- "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
|
||||
- "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
|
||||
- "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
|
||||
|
||||
**summary_type=auto**
|
||||
|
||||
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=" \
|
||||
-F "max_tokens=32" \
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "language=en" \
|
||||
-F "summary_type=auto"
|
||||
```
|
||||
|
||||
**summary_type=stuff**
|
||||
|
||||
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=" \
|
||||
-F "max_tokens=32" \
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "language=en" \
|
||||
-F "summary_type=stuff"
|
||||
```
|
||||
|
||||
**summary_type=truncate**
|
||||
|
||||
Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=" \
|
||||
-F "max_tokens=32" \
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "language=en" \
|
||||
-F "summary_type=truncate"
|
||||
```
|
||||
|
||||
**summary_type=map_reduce**
|
||||
|
||||
Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
|
||||
|
||||
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=" \
|
||||
-F "max_tokens=32" \
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "language=en" \
|
||||
-F "summary_type=map_reduce"
|
||||
```
|
||||
|
||||
**summary_type=refine**
|
||||
|
||||
Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
|
||||
|
||||
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
|
||||
|
||||
```bash
|
||||
curl http://${host_ip}:8888/v1/docsum \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "type=text" \
|
||||
-F "messages=" \
|
||||
-F "max_tokens=32" \
|
||||
-F "files=@/path to your file (.txt, .docx, .pdf)" \
|
||||
-F "language=en" \
|
||||
-F "summary_type=refine"
|
||||
```
|
||||
|
||||
> More detailed tests can be found here `cd GenAIExamples/DocSum/test`
|
||||
|
||||
Reference in New Issue
Block a user