Deploy CodeGen Application on AMD GPU (ROCm) with Docker Compose
This README provides instructions for deploying the CodeGen application using Docker Compose on a system equipped with AMD GPUs supporting ROCm, detailing the steps to configure, run, and validate the services. This guide defaults to using the vLLM backend for LLM serving.
Table of Contents
- Steps to Run with Docker Compose (Default vLLM)
- Service Overview
- Available Deployment Options
- Configuration Parameters and Usage
- Building Docker Images Locally (Optional)
- Validate Service Health
- How to Open the UI
- Troubleshooting
- Stopping the Application
- Next Steps
Steps to Run with Docker Compose (Default vLLM)
This section assumes you are using pre-built images and targets the default vLLM deployment.
-
Set Deploy Environment Variables:
- Go to the Docker Compose directory:
# Adjust path if your GenAIExamples clone is located elsewhere cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm - Setting variables in the operating system environment:
- Set variable
HUGGINGFACEHUB_API_TOKEN:### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token'
- Set variable
- Edit the environment script for the vLLM deployment (
set_env_vllm.sh):nano set_env_vllm.sh- Configure
HOST_IP,EXTERNAL_HOST_IP,*_PORTvariables, and proxies (http_proxy,https_proxy,no_proxy) as described in the Configuration section below.
- Configure
- Source the environment variables:
. set_env_vllm.sh
- Go to the Docker Compose directory:
-
Start the Services (vLLM):
docker compose -f compose_vllm.yaml up -d -
Verify: Proceed to the Validate Service Health section after allowing time for services to start.
Service Overview
When using the default compose_vllm.yaml (vLLM-based), the following services are deployed:
| Service Name | Default Port (Host) | Internal Port | Purpose |
|---|---|---|---|
| codegen-vllm-service | ${CODEGEN_VLLM_SERVICE_PORT} (e.g., 8028) |
8000 | LLM Serving (vLLM on ROCm) |
| codegen-llm-server | ${CODEGEN_LLM_SERVICE_PORT} (e.g., 9000) |
80 | LLM Microservice Wrapper |
| codegen-backend-server | ${CODEGEN_BACKEND_SERVICE_PORT} (e.g., 7778) |
80 | CodeGen MegaService/Gateway |
| codegen-ui-server | ${CODEGEN_UI_SERVICE_PORT} (e.g., 5173) |
80 | Frontend User Interface |
(Note: Ports are configurable via set_env_vllm.sh. Check the script for actual defaults used.)
(Note: The TGI deployment (compose.yaml) uses codegen-tgi-service instead of codegen-vllm-service)
Available Deployment Options
This directory provides different Docker Compose files:
compose_vllm.yaml (vLLM - Default)
- Description: Deploys the CodeGen application using vLLM optimized for ROCm as the backend LLM service. This is the default setup.
- Services Deployed:
codegen-vllm-service,codegen-llm-server,codegen-backend-server,codegen-ui-server. Requiresset_env_vllm.sh.
compose.yaml (TGI)
- Description: Deploys the CodeGen application using Text Generation Inference (TGI) optimized for ROCm as the backend LLM service.
- Services Deployed:
codegen-tgi-service,codegen-llm-server,codegen-backend-server,codegen-ui-server. Requiresset_env.sh.
Configuration Parameters and Usage
Docker Compose GPU Configuration
To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose files (compose.yaml, compose_vllm.yaml) for the LLM serving container:
# Example for vLLM service in compose_vllm.yaml
# Note: Modern docker compose might use deploy.resources syntax instead.
# Check your docker version and compose file.
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
This configuration forwards all available GPUs to the container. To use a specific GPU, specify its cardN and renderN device IDs (e.g., /dev/dri/card0:/dev/dri/card0, /dev/dri/render128:/dev/dri/render128). For example:
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
How to Identify GPU Device IDs:
Use AMD GPU driver utilities to determine the correct cardN and renderN IDs for your GPU.
Environment Variables (set_env*.sh)
These scripts (set_env_vllm.sh for vLLM, set_env.sh for TGI) configure crucial parameters passed to the containers.
| Environment Variable | Description | Example Value (Edit in Script) |
|---|---|---|
HUGGINGFACEHUB_API_TOKEN |
Your Hugging Face Hub token for model access. Required. | your_huggingfacehub_token |
HOST_IP |
Internal/Primary IP address of the host machine. Used for inter-service communication. Required. | 192.168.1.100 |
EXTERNAL_HOST_IP |
External IP/hostname used to access the UI from outside. Same as HOST_IP if no proxy/LB. Required. |
192.168.1.100 |
CODEGEN_LLM_MODEL_ID |
Hugging Face model ID for the CodeGen LLM. | Qwen/Qwen2.5-Coder-7B-Instruct |
CODEGEN_VLLM_SERVICE_PORT |
Host port mapping for the vLLM serving endpoint (in set_env_vllm.sh). |
8028 |
CODEGEN_TGI_SERVICE_PORT |
Host port mapping for the TGI serving endpoint (in set_env.sh). |
8028 |
CODEGEN_LLM_SERVICE_PORT |
Host port mapping for the LLM Microservice wrapper. | 9000 |
CODEGEN_BACKEND_SERVICE_PORT |
Host port mapping for the CodeGen MegaService/Gateway. | 7778 |
CODEGEN_UI_SERVICE_PORT |
Host port mapping for the UI service. | 5173 |
http_proxy |
Network HTTP Proxy URL (if required). | Your_HTTP_Proxy |
https_proxy |
Network HTTPS Proxy URL (if required). | Your_HTTPs_Proxy |
no_proxy |
Comma-separated list of hosts to bypass proxy. Should include localhost,127.0.0.1,$HOST_IP. |
localhost,127.0.0.1 |
How to Use: Edit the relevant set_env*.sh file (set_env_vllm.sh for the default) with your values, then source it (. ./set_env*.sh) before running docker compose.
Building Docker Images Locally (Optional)
Follow these steps if you need to build the Docker images from source instead of using pre-built ones.
1. Setup Build Environment
-
Create application install directory and go to it:
mkdir ~/codegen-install && cd codegen-install
2. Clone Repositories
-
Clone the repository GenAIExamples (the default repository branch "main" is used here):
git clone https://github.com/opea-project/GenAIExamples.gitIf you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value):
git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3We remind you that when using a specific version of the code, you need to use the README from this version.
-
Go to build directory:
cd ~/codegen-install/GenAIExamples/CodeGen/docker_image_build -
Cleaning up the GenAIComps repository if it was previously cloned in this directory. This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty:
echo Y | rm -R GenAIComps -
Clone the repository GenAIComps (the default repository branch "main" is used here):
git clone https://github.com/opea-project/GenAIComps.gitIf you use a specific tag of the GenAIExamples repository, then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value):
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3We remind you that when using a specific version of the code, you need to use the README from this version.
3. Select Services and Build
-
Setting the list of images for the build (from the build file.yaml)
Select the services corresponding to your desired deployment (vLLM is the default):
vLLM-based application (Default)
service_list="vllm-rocm llm-textgen codegen codegen-ui"TGI-based application
service_list="llm-textgen codegen codegen-ui" -
Optional. Pull TGI Docker Image (Do this if you plan to build/use the TGI variant)
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm -
Build Docker Images
Ensure you are in the
~/codegen-install/GenAIExamples/CodeGen/docker_image_builddirectory.docker compose -f build.yaml build ${service_list} --no-cacheAfter the build, check the list of images with the command:
docker image lsThe list of images should include (depending on
service_list):vLLM-based application:
- opea/vllm-rocm:latest
- opea/llm-textgen:latest
- opea/codegen:latest
- opea/codegen-ui:latest
TGI-based application:
- ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (if pulled)
- opea/llm-textgen:latest
- opea/codegen:latest
- opea/codegen-ui:latest
After building, ensure the
image:tags in the maincompose_vllm.yamlorcompose.yaml(in theamd/gpu/rocmdirectory) match these built images (e.g.,opea/vllm-rocm:latest).
Validate Service Health
Run these checks after starting the services to ensure they are operational. Focus on the vLLM checks first as it's the default.
1. Validate the vLLM/TGI Service
If you use vLLM (Default - using compose_vllm.yaml and set_env_vllm.sh)
-
How Tested: Send a POST request with a sample prompt to the vLLM endpoint.
-
CURL Command:
DATA='{"model": "Qwen/Qwen2.5-Coder-7B-Instruct", '\ '"messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. '\ 'The API takes as input an operation request and updates the TODO list in place. '\ 'If the request is invalid, raise an exception."}], "max_tokens": 256}' curl http://${HOST_IP}:${CODEGEN_VLLM_SERVICE_PORT}/v1/chat/completions \ -X POST \ -d "$DATA" \ -H 'Content-Type: application/json' -
Sample Output:
{ "id": "chatcmpl-142f34ef35b64a8db3deedd170fed951", "object": "chat.completion" // ... (rest of output) ... } -
Expected Result: A JSON response with a
choices[0].message.contentfield containing meaningful generated code.
If you use TGI (using compose.yaml and set_env.sh)
-
How Tested: Send a POST request with a sample prompt to the TGI endpoint.
-
CURL Command:
DATA='{"inputs":"Implement a high-level API for a TODO list application. '\ # ... (data payload as before) ... '"parameters":{"max_new_tokens":256,"do_sample": true}}' curl http://${HOST_IP}:${CODEGEN_TGI_SERVICE_PORT}/generate \ -X POST \ -d "$DATA" \ -H 'Content-Type: application/json' -
Sample Output:
{ "generated_text": " The supported operations are \"add_task\", \"complete_task\", and \"remove_task\". # ... (generated code) ..." } -
Expected Result: A JSON response with a
generated_textfield containing meaningful generated code.
2. Validate the LLM Service
-
Service Name:
codegen-llm-server -
How Tested: Send a POST request to the LLM microservice wrapper endpoint.
-
CURL Command:
DATA='{"query":"Implement a high-level API for a TODO list application. '\ # ... (data payload as before) ... '"repetition_penalty":1.03,"stream":false}' curl http://${HOST_IP}:${CODEGEN_LLM_SERVICE_PORT}/v1/chat/completions \ -X POST \ -d "$DATA" \ -H 'Content-Type: application/json' -
Sample Output: (Structure may vary slightly depending on whether vLLM or TGI is backend)
{ "id": "cmpl-4e89a590b1af46bfb37ce8f12b2996f8" // Example ID // ... (output structure depends on backend, check original validation) ... } -
Expected Result: A JSON response containing meaningful generated code within the
choicesarray.
3. Validate the MegaService (Backend)
-
Service Name:
codegen-backend-server -
How Tested: Send a POST request to the main CodeGen gateway endpoint.
-
CURL Command:
DATA='{"messages": "Implement a high-level API for a TODO list application. '\ # ... (data payload as before) ... 'If the request is invalid, raise an exception."}' curl http://${HOST_IP}:${CODEGEN_BACKEND_SERVICE_PORT}/v1/codegen \ -H "Content-Type: application/json" \ -d "$DATA" -
Sample Output:
data: {"id":"cmpl-...", ...} # ... more data chunks ... data: [DONE] -
Expected Result: A stream of server-sent events (SSE) containing JSON data with generated code tokens, ending with
data: [DONE].
4. Validate the Frontend (UI)
- Service Name:
codegen-ui-server - How Tested: Access the UI URL in a web browser and perform a test query.
- Steps: See How to Open the UI.
- Expected Result: The UI loads correctly, and submitting a prompt results in generated code displayed on the page.
How to Open the UI
-
Determine the UI access URL using the
EXTERNAL_HOST_IPandCODEGEN_UI_SERVICE_PORTvariables defined in your sourcedset_env*.shfile (useset_env_vllm.shfor the default vLLM deployment). The default URL format is:http://${EXTERNAL_HOST_IP}:${CODEGEN_UI_SERVICE_PORT}(e.g.,http://192.168.1.100:5173) -
Open this URL in your web browser.
-
Enter a prompt in the input field (e.g., "Write a Python code that returns the current time and date") and press Enter or click the submit button.
Troubleshooting
(No specific troubleshooting steps provided in the original content for this file. Add common issues if known.)
- Check container logs (
docker compose -f <file> logs <service_name>), especially forcodegen-vllm-serviceorcodegen-tgi-service. - Ensure
HUGGINGFACEHUB_API_TOKENis correct. - Verify ROCm drivers and Docker setup for GPU access.
- Confirm network connectivity and proxy settings.
- Ensure
HOST_IPandEXTERNAL_HOST_IPare correctly set and accessible. - If building locally, ensure build steps completed without error and image tags match compose file.
Stopping the Application
If you use vLLM (Default)
# Ensure you are in the correct directory
# cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
docker compose -f compose_vllm.yaml down
If you use TGI
# Ensure you are in the correct directory
# cd GenAIExamples/CodeGen/docker_compose/amd/gpu/rocm
docker compose -f compose.yaml down
Next Steps
- Explore the alternative TGI deployment option if needed.
- Refer to the main CodeGen README for architecture details and links to other deployment methods (Kubernetes, Xeon).
- Consult the OPEA GenAIComps repository for details on individual microservices.

