Update README.md of model/port change (#1969)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -52,18 +52,29 @@ This uses the default vLLM-based deployment profile (`codegen-xeon-vllm`).
|
||||
|
||||
```bash
|
||||
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
|
||||
export HOST_IP="your_external_ip_address"
|
||||
export host_ip="your_external_ip_address"
|
||||
# Replace with your Hugging Face Hub API token
|
||||
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"
|
||||
|
||||
# Optional: Configure proxy if needed
|
||||
# export http_proxy="your_http_proxy"
|
||||
# export https_proxy="your_https_proxy"
|
||||
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
|
||||
# export no_proxy="localhost,127.0.0.1,${host_ip}" # Add other hosts if necessary
|
||||
source ../../../set_env.sh
|
||||
```
|
||||
|
||||
_Note: The compose file might read additional variables from a `.env` file or expect them defined elsewhere. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
|
||||
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
|
||||
like
|
||||
|
||||
```
|
||||
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
|
||||
```
|
||||
|
||||
can be changed to small model if needed
|
||||
|
||||
```
|
||||
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
|
||||
```
|
||||
|
||||
2. **Start Services (vLLM Profile):**
|
||||
|
||||
@@ -91,7 +102,7 @@ The `compose.yaml` file uses Docker Compose profiles to select the LLM serving b
|
||||
- **Services Deployed:** `codegen-tgi-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`.
|
||||
- **To Run:**
|
||||
```bash
|
||||
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
|
||||
# Ensure environment variables (host_ip, HUGGINGFACEHUB_API_TOKEN) are set
|
||||
docker compose --profile codegen-xeon-tgi up -d
|
||||
```
|
||||
|
||||
@@ -103,14 +114,14 @@ Key parameters are configured via environment variables set before running `dock
|
||||
|
||||
| Environment Variable | Description | Default (Set Externally) |
|
||||
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------- |
|
||||
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
|
||||
| `host_ip` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
|
||||
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
|
||||
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
|
||||
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
|
||||
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-tgi-server:80/generate` or `http://codegen-vllm-server:8000/v1/chat/completions` |
|
||||
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
|
||||
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
|
||||
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
|
||||
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `host_ip` and port `7778`. | `http://${host_ip}:7778/v1/codegen` |
|
||||
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
|
||||
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
|
||||
|
||||
@@ -150,23 +161,23 @@ Check logs for specific services: `docker compose logs <service_name>`
|
||||
|
||||
### Run Validation Script/Commands
|
||||
|
||||
Use `curl` commands to test the main service endpoints. Ensure `HOST_IP` is correctly set in your environment.
|
||||
Use `curl` commands to test the main service endpoints. Ensure `host_ip` is correctly set in your environment.
|
||||
|
||||
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 8000 internally, exposed differently):**
|
||||
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 9000 internally, exposed differently):**
|
||||
|
||||
```bash
|
||||
# This command structure targets the OpenAI-compatible vLLM endpoint
|
||||
curl http://${HOST_IP}:8000/v1/chat/completions \
|
||||
curl http://${host_ip}:9000/v1/chat/completions \
|
||||
-X POST \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
|
||||
-d '{"model": "Qwen/Qwen2.5-Coder-32B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
|
||||
```
|
||||
|
||||
- **Expected Output:** A JSON response with generated code in `choices[0].message.content`.
|
||||
|
||||
2. **Validate CodeGen Gateway (MegaService on default port 7778):**
|
||||
```bash
|
||||
curl http://${HOST_IP}:7778/v1/codegen \
|
||||
curl http://${host_ip}:7778/v1/codegen \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"messages": "Write a Python function that adds two numbers."}'
|
||||
```
|
||||
@@ -179,8 +190,8 @@ Multiple UI options can be configured via the `compose.yaml`.
|
||||
### Gradio UI (Default)
|
||||
|
||||
Access the default Gradio UI by navigating to:
|
||||
`http://{HOST_IP}:8080`
|
||||
_(Port `8080` is the default host mapping for `codegen-gradio-ui-server`)_
|
||||
`http://{host_ip}:5173`
|
||||
_(Port `5173` is the default host mapping for `codegen-gradio-ui-server`)_
|
||||
|
||||

|
||||

|
||||
@@ -189,7 +200,7 @@ _(Port `8080` is the default host mapping for `codegen-gradio-ui-server`)_
|
||||
|
||||
1. Modify `compose.yaml`: Comment out the `codegen-gradio-ui-server` service and uncomment/add the `codegen-xeon-ui-server` (Svelte) service definition, ensuring the port mapping is correct (e.g., `"- 5173:5173"`).
|
||||
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
|
||||
3. Access: `http://{HOST_IP}:5173` (or the host port you mapped).
|
||||
3. Access: `http://{host_ip}:5173` (or the host port you mapped).
|
||||
|
||||

|
||||
|
||||
@@ -197,7 +208,7 @@ _(Port `8080` is the default host mapping for `codegen-gradio-ui-server`)_
|
||||
|
||||
1. Modify `compose.yaml`: Comment out the default UI service and uncomment/add the `codegen-xeon-react-ui-server` definition, ensuring correct port mapping (e.g., `"- 5174:80"`).
|
||||
2. Restart Docker Compose: `docker compose --profile <profile_name> up -d`
|
||||
3. Access: `http://{HOST_IP}:5174` (or the host port you mapped).
|
||||
3. Access: `http://{host_ip}:5174` (or the host port you mapped).
|
||||
|
||||

|
||||
|
||||
@@ -207,7 +218,7 @@ Users can interact with the backend service using the `Neural Copilot` VS Code e
|
||||
|
||||
1. **Install:** Find and install `Neural Copilot` from the VS Code Marketplace.
|
||||

|
||||
2. **Configure:** Set the "Service URL" in the extension settings to your CodeGen backend endpoint: `http://${HOST_IP}:7778/v1/codegen` (use the correct port if changed).
|
||||
2. **Configure:** Set the "Service URL" in the extension settings to your CodeGen backend endpoint: `http://${host_ip}:7778/v1/codegen` (use the correct port if changed).
|
||||

|
||||
3. **Usage:**
|
||||
- **Inline Suggestion:** Type a comment describing the code you want (e.g., `# Python function to read a file`) and wait for suggestions.
|
||||
@@ -218,7 +229,7 @@ Users can interact with the backend service using the `Neural Copilot` VS Code e
|
||||
## Troubleshooting
|
||||
|
||||
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`. Ensure internet connectivity or correct proxy settings. Check logs of `tgi-service`/`vllm-service` and `tei-embedding-server`. Gated models need prior Hugging Face access.
|
||||
- **Connection Errors:** Verify `HOST_IP` is correct and accessible. Check `docker ps` for port mappings. Ensure `no_proxy` includes `HOST_IP` if using a proxy. Check logs of the service failing to connect (e.g., `codegen-backend-server` logs if it can't reach `codegen-llm-server`).
|
||||
- **Connection Errors:** Verify `host_ip` is correct and accessible. Check `docker ps` for port mappings. Ensure `no_proxy` includes `host_ip` if using a proxy. Check logs of the service failing to connect (e.g., `codegen-backend-server` logs if it can't reach `codegen-llm-server`).
|
||||
- **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in `compose.yaml`.
|
||||
- **Resource Issues:** CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
|
||||
|
||||
|
||||
@@ -53,18 +53,29 @@ This uses the default vLLM-based deployment profile (`codegen-gaudi-vllm`).
|
||||
|
||||
```bash
|
||||
# Replace with your host's external IP address (do not use localhost or 127.0.0.1)
|
||||
export HOST_IP="your_external_ip_address"
|
||||
export host_ip="your_external_ip_address"
|
||||
# Replace with your Hugging Face Hub API token
|
||||
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token"
|
||||
|
||||
# Optional: Configure proxy if needed
|
||||
# export http_proxy="your_http_proxy"
|
||||
# export https_proxy="your_https_proxy"
|
||||
# export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
|
||||
# export no_proxy="localhost,127.0.0.1,${host_ip}" # Add other hosts if necessary
|
||||
source ../../../set_env.sh
|
||||
```
|
||||
|
||||
_Note: Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
|
||||
_Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
|
||||
like
|
||||
|
||||
```
|
||||
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-32B-Instruct"
|
||||
```
|
||||
|
||||
can be changed to small model if needed
|
||||
|
||||
```
|
||||
export LLM_MODEL_ID="Qwen/Qwen2.5-Coder-7B-Instruct"
|
||||
```
|
||||
|
||||
2. **Start Services (vLLM Profile):**
|
||||
|
||||
@@ -94,7 +105,7 @@ The `compose.yaml` file uses Docker Compose profiles to select the LLM serving b
|
||||
- **Other Services:** Same CPU-based services as the vLLM profile.
|
||||
- **To Run:**
|
||||
```bash
|
||||
# Ensure environment variables (HOST_IP, HUGGINGFACEHUB_API_TOKEN) are set
|
||||
# Ensure environment variables (host_ip, HUGGINGFACEHUB_API_TOKEN) are set
|
||||
docker compose --profile codegen-gaudi-tgi up -d
|
||||
```
|
||||
|
||||
@@ -106,14 +117,14 @@ Key parameters are configured via environment variables set before running `dock
|
||||
|
||||
| Environment Variable | Description | Default (Set Externally) |
|
||||
| :-------------------------------------- | :------------------------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------- |
|
||||
| `HOST_IP` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
|
||||
| `host_ip` | External IP address of the host machine. **Required.** | `your_external_ip_address` |
|
||||
| `HUGGINGFACEHUB_API_TOKEN` | Your Hugging Face Hub token for model access. **Required.** | `your_huggingface_token` |
|
||||
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-7B-Instruct` |
|
||||
| `LLM_MODEL_ID` | Hugging Face model ID for the CodeGen LLM (used by TGI/vLLM service). Configured within `compose.yaml` environment. | `Qwen/Qwen2.5-Coder-32B-Instruct` |
|
||||
| `EMBEDDING_MODEL_ID` | Hugging Face model ID for the embedding model (used by TEI service). Configured within `compose.yaml` environment. | `BAAI/bge-base-en-v1.5` |
|
||||
| `LLM_ENDPOINT` | Internal URL for the LLM serving endpoint (used by `codegen-llm-server`). Configured in `compose.yaml`. | `http://codegen-tgi-server:80/generate` or `http://codegen-vllm-server:8000/v1/chat/completions` |
|
||||
| `TEI_EMBEDDING_ENDPOINT` | Internal URL for the Embedding service. Configured in `compose.yaml`. | `http://codegen-tei-embedding-server:80/embed` |
|
||||
| `DATAPREP_ENDPOINT` | Internal URL for the Data Preparation service. Configured in `compose.yaml`. | `http://codegen-dataprep-server:80/dataprep` |
|
||||
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `HOST_IP` and port `7778`. | `http://${HOST_IP}:7778/v1/codegen` |
|
||||
| `BACKEND_SERVICE_ENDPOINT` | External URL for the CodeGen Gateway (MegaService). Derived from `host_ip` and port `7778`. | `http://${host_ip}:7778/v1/codegen` |
|
||||
| `*_PORT` (Internal) | Internal container ports (e.g., `80`, `6379`). Defined in `compose.yaml`. | N/A |
|
||||
| `http_proxy` / `https_proxy`/`no_proxy` | Network proxy settings (if required). | `""` |
|
||||
|
||||
@@ -170,21 +181,21 @@ Check logs: `docker compose logs <service_name>`. Pay attention to `vllm-gaudi-s
|
||||
|
||||
### Run Validation Script/Commands
|
||||
|
||||
Use `curl` commands targeting the main service endpoints. Ensure `HOST_IP` is correctly set.
|
||||
Use `curl` commands targeting the main service endpoints. Ensure `host_ip` is correctly set.
|
||||
|
||||
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 8000 internally, exposed differently):**
|
||||
1. **Validate LLM Serving Endpoint (Example for vLLM on default port 9000 internally, exposed differently):**
|
||||
|
||||
```bash
|
||||
# This command structure targets the OpenAI-compatible vLLM endpoint
|
||||
curl http://${HOST_IP}:8000/v1/chat/completions \
|
||||
curl http://${host_ip}:9000/v1/chat/completions \
|
||||
-X POST \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
|
||||
-d '{"model": "Qwen/Qwen2.5-Coder-32B-Instruct", "messages": [{"role": "user", "content": "Implement a basic Python class"}], "max_tokens":32}'
|
||||
```
|
||||
|
||||
2. **Validate CodeGen Gateway (MegaService, default host port 7778):**
|
||||
```bash
|
||||
curl http://${HOST_IP}:7778/v1/codegen \
|
||||
curl http://${host_ip}:7778/v1/codegen \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"messages": "Implement a sorting algorithm in Python."}'
|
||||
```
|
||||
@@ -197,8 +208,8 @@ UI options are similar to the Xeon deployment.
|
||||
### Gradio UI (Default)
|
||||
|
||||
Access the default Gradio UI:
|
||||
`http://{HOST_IP}:8080`
|
||||
_(Port `8080` is the default host mapping)_
|
||||
`http://{host_ip}:5173`
|
||||
_(Port `5173` is the default host mapping)_
|
||||
|
||||

|
||||
|
||||
@@ -206,17 +217,17 @@ _(Port `8080` is the default host mapping)_
|
||||
|
||||
1. Modify `compose.yaml`: Swap Gradio service for Svelte (`codegen-gaudi-ui-server`), check port map (e.g., `5173:5173`).
|
||||
2. Restart: `docker compose --profile <profile_name> up -d`
|
||||
3. Access: `http://{HOST_IP}:5173`
|
||||
3. Access: `http://{host_ip}:5173`
|
||||
|
||||
### React UI (Optional)
|
||||
|
||||
1. Modify `compose.yaml`: Swap Gradio service for React (`codegen-gaudi-react-ui-server`), check port map (e.g., `5174:80`).
|
||||
2. Restart: `docker compose --profile <profile_name> up -d`
|
||||
3. Access: `http://{HOST_IP}:5174`
|
||||
3. Access: `http://{host_ip}:5174`
|
||||
|
||||
### VS Code Extension (Optional)
|
||||
|
||||
Use the `Neural Copilot` extension configured with the CodeGen backend URL: `http://${HOST_IP}:7778/v1/codegen`. (See Xeon README for detailed setup screenshots).
|
||||
Use the `Neural Copilot` extension configured with the CodeGen backend URL: `http://${host_ip}:7778/v1/codegen`. (See Xeon README for detailed setup screenshots).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@@ -226,7 +237,7 @@ Use the `Neural Copilot` extension configured with the CodeGen backend URL: `htt
|
||||
- Verify `runtime: habana` and volume mounts in `compose.yaml`.
|
||||
- Gaudi initialization can take significant time and memory. Monitor resource usage.
|
||||
- **Model Download Issues:** Check `HUGGINGFACEHUB_API_TOKEN`, internet access, proxy settings. Check LLM service logs.
|
||||
- **Connection Errors:** Verify `HOST_IP`, ports, and proxy settings. Use `docker ps` and check service logs.
|
||||
- **Connection Errors:** Verify `host_ip`, ports, and proxy settings. Use `docker ps` and check service logs.
|
||||
|
||||
## Stopping the Application
|
||||
|
||||
|
||||
@@ -2,7 +2,11 @@
|
||||
|
||||
DocRetriever are the most widely adopted use case for leveraging the different methodologies to match user query against a set of free-text records. DocRetriever is essential to RAG system, which bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
|
||||
|
||||
## 1. Build Images for necessary microservices. (Optional after docker image release)
|
||||
\_Note:
|
||||
|
||||
As the related docker images were published to Docker Hub, you can ignore the below step 1 and 2, quick start from step 3.
|
||||
|
||||
## 1. Build Images for necessary microservices. (Optional)
|
||||
|
||||
- Embedding TEI Image
|
||||
|
||||
@@ -30,7 +34,7 @@ DocRetriever are the most widely adopted use case for leveraging the different m
|
||||
docker build -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
|
||||
```
|
||||
|
||||
## 2. Build Images for MegaService
|
||||
## 2. Build Images for MegaService (Optional)
|
||||
|
||||
```bash
|
||||
cd ..
|
||||
@@ -44,6 +48,19 @@ docker build --no-cache -t opea/doc-index-retriever:latest --build-arg https_pro
|
||||
```bash
|
||||
export host_ip="YOUR IP ADDR"
|
||||
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
|
||||
```
|
||||
|
||||
Set environment variables by
|
||||
|
||||
```
|
||||
cd GenAIExamples/DocIndexRetriever/docker_compose/intel/cpu/xeon
|
||||
source set_env.sh
|
||||
```
|
||||
|
||||
Note: set_env.sh will help to set all required variables. Please ensure all required variables like ports (LLM_SERVICE_PORT, MEGA_SERVICE_PORT, etc.) are set if not using defaults from the compose file.
|
||||
or Set environment variables manually
|
||||
|
||||
```
|
||||
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
|
||||
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
|
||||
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
|
||||
|
||||
Reference in New Issue
Block a user