Enable vLLM Gaudi support for LLM service based on officially habana vllm release (#137)

Signed-off-by: tianyil1 <tianyi.liu@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Tianyi Liu
2024-06-12 11:28:52 +08:00
committed by GitHub
parent 7f4f1b1158
commit 0dedc28af3
6 changed files with 91 additions and 26 deletions

View File

@@ -134,8 +134,8 @@ The initially supported `Microservices` are described in the below table. More `
<td>Dataprep on Xeon CPU</td>
</tr>
<tr>
<td rowspan="5"><a href="./comps/llms/README.md">LLM</a></td>
<td rowspan="5"><a href="https://www.langchain.com">LangChain</a></td>
<td rowspan="6"><a href="./comps/llms/README.md">LLM</a></td>
<td rowspan="6"><a href="https://www.langchain.com">LangChain</a></td>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td><a href="https://github.com/huggingface/tgi-gaudi">TGI Gaudi</a></td>
<td>Gaudi2</td>
@@ -147,7 +147,7 @@ The initially supported `Microservices` are described in the below table. More `
<td>LLM on Xeon CPU</td>
</tr>
<tr>
<td rowspan="2"><a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf">meta-llama/Llama-2-7b-chat-hf</a></td>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td rowspan="2"><a href="https://github.com/ray-project/ray">Ray Serve</a></td>
<td>Gaudi2</td>
<td>LLM on Gaudi2</td>
@@ -157,8 +157,12 @@ The initially supported `Microservices` are described in the below table. More `
<td>LLM on Xeon CPU</td>
</tr>
<tr>
<td><a href="https://huggingface.co/mistralai/Mistral-7B-v0.1">mistralai/Mistral-7B-v0.1</a></td>
<td><a href="https://github.com/vllm-project/vllm/">vLLM</a></td>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td rowspan="2"><a href="https://github.com/vllm-project/vllm/">vLLM</a></td>
<td>Gaudi2</td>
<td>LLM on Gaudi2</td>
</tr>
<tr>
<td>Xeon</td>
<td>LLM on Xeon CPU</td>
</tr>