Merge branch 'main' into ft

[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-07 15:05:27 +08:00 · 2025-03-06 00:43:57 -05:00 · 2025-03-04 01:13:31 -05:00
12 changed files with 354 additions and 422 deletions
--- a/InstructionTuning/README.md
+++ b/InstructionTuning/README.md
@@ -1,21 +1,23 @@
-# Instruction Tuning
+# Finetuning

-Instruction tuning is the process of further training LLMs on a dataset consisting of (instruction, output) pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. This implementation deploys a Ray cluster for the task.
+This example includes instruction tuning and rerank model finetuning. Instruction tuning is the process of further training LLMs on a dataset consisting of (instruction, output) pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. Rerank model finetuning is the process of further training rerank model on a dataset for improving its capability on specific field. The implementation of this example deploys a Ray cluster for the task.

-## Deploy Instruction Tuning Service
+## Deploy Finetuning Service

-### Deploy Instruction Tuning Service on Xeon
+### Deploy Finetuning Service on Xeon

 Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for detail.

-### Deploy Instruction Tuning Service on Gaudi
+### Deploy Finetuning Service on Gaudi

 Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for detail.

-## Consume Instruction Tuning Service
+## Consume Finetuning Service

 ### 1. Upload a training file

+#### Instruction tuning dataset example
+
 Download a training file `alpaca_data.json` and upload it to the server with below command, this file can be downloaded in [here](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json):

 ```bash
@@ -23,8 +25,19 @@ Download a training file `alpaca_data.json` and upload it to the server with bel
 curl http://${your_ip}:8015/v1/files -X POST -H "Content-Type: multipart/form-data" -F "file=@./alpaca_data.json" -F purpose="fine-tune"
 ```

+#### Rerank model finetuning dataset example
+
+Download a toy example training file `toy_finetune_data.jsonl` and upload it to the server with below command, this file can be downloaded in [here](https://github.com/FlagOpen/FlagEmbedding/blob/JUNJIE99-patch-1/examples/finetune/toy_finetune_data.jsonl):
+
+```bash
+# upload a training file
+curl http://${your_ip}:8015/v1/files -X POST -H "Content-Type: multipart/form-data" -F "file=@./toy_finetune_data.jsonl" -F purpose="fine-tune"
+```
+
 ### 2. Create fine-tuning job

+#### Instruction tuning
+
 After a training file like `alpaca_data.json` is uploaded, use the following command to launch a finetuning job using `meta-llama/Llama-2-7b-chat-hf` as base model:

 ```bash
@@ -40,6 +53,25 @@ curl http://${your_ip}:8015/v1/fine_tuning/jobs \

 The outputs of the finetune job (adapter_model.safetensors, adapter_config,json... ) are stored in `/home/user/comps/finetuning/src/output` and other execution logs are stored in `/home/user/ray_results`

+#### Rerank model finetuning
+
+After a training file `toy_finetune_data.jsonl` is uploaded, use the following command to launch a finetuning job using `BAAI/bge-reranker-large` as base model:
+
+```bash
+# create a finetuning job
+curl http://${your_ip}:8015/v1/fine_tuning/jobs \
+  -X POST \
+  -H "Content-Type: application/json" \
+  -d '{
+    "training_file": "toy_finetune_data.jsonl",
+    "model": "BAAI/bge-reranker-large",
+    "General":{
+      "task":"rerank",
+      "lora_config":null
+    }
+  }'
+```
+
 ### 3. Manage fine-tuning job

 Below commands show how to list finetuning jobs, retrieve a finetuning job, cancel a finetuning job and list checkpoints of a finetuning job.
--- a/InstructionTuning/docker_compose/intel/cpu/xeon/README.md
+++ b/InstructionTuning/docker_compose/intel/cpu/xeon/README.md
@@ -1,6 +1,6 @@
-# Deploy Instruction Tuning Service on Xeon
+# Deploy Finetuning Service on Xeon

-This document outlines the deployment process for a Instruction Tuning Service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Xeon server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
+This document outlines the deployment process for a finetuning Service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Xeon server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.

 ## 🚀 Build Docker Images

--- a/InstructionTuning/docker_compose/intel/hpu/gaudi/README.md
+++ b/InstructionTuning/docker_compose/intel/hpu/gaudi/README.md
@@ -1,6 +1,6 @@
-# Deploy Instruction Tuning Service on Gaudi
+# Deploy Finetuning Service on Gaudi

-This document outlines the deployment process for a Instruction Tuning Service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Gaudi server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
+This document outlines the deployment process for a finetuning Service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Gaudi server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.

 ## 🚀 Build Docker Images

--- a/RerankFinetuning/docker_image_build/build.yaml
+++ b/RerankFinetuning/docker_image_build/build.yaml
--- a/Finetuning/tests/test_compose_on_gaudi.sh
+++ b/Finetuning/tests/test_compose_on_gaudi.sh
--- a/InstructionTuning/tests/test_compose_on_xeon.sh
+++ b/InstructionTuning/tests/test_compose_on_xeon.sh
--- a/InstructionTuning/docker_image_build/build.yaml
+++ b/InstructionTuning/docker_image_build/build.yaml
@@ -1,13 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-services:
-  finetuning:
-    build:
-      args:
-        http_proxy: ${http_proxy}
-        https_proxy: ${https_proxy}
-        no_proxy: ${no_proxy}
-      context: GenAIComps
-      dockerfile: comps/finetuning/src/Dockerfile
-    image: ${REGISTRY:-opea}/finetuning:${TAG:-latest}
--- a/RerankFinetuning/README.md
+++ b/RerankFinetuning/README.md
@@ -1,61 +0,0 @@
-# Rerank Model Finetuning
-
-Rerank model finetuning is the process of further training rerank model on a dataset for improving its capability on specific field.
-
-## Deploy Rerank Model Finetuning Service
-
-### Deploy Rerank Model Finetuning Service on Xeon
-
-Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for detail.
-
-### Deploy Rerank Model Finetuning Service on Gaudi
-
-Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for detail.
-
-## Consume Rerank Model Finetuning Service
-
-### 1. Upload a training file
-
-Download a toy example training file `toy_finetune_data.jsonl` and upload it to the server with below command, this file can be downloaded in [here](https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/finetune/toy_finetune_data.jsonl):
-
-```bash
-# upload a training file
-curl http://${your_ip}:8015/v1/files -X POST -H "Content-Type: multipart/form-data" -F "file=@./toy_finetune_data.jsonl" -F purpose="fine-tune"
-```
-
-### 2. Create fine-tuning job
-
-After a training file `toy_finetune_data.jsonl` is uploaded, use the following command to launch a finetuning job using `BAAI/bge-reranker-large` as base model:
-
-```bash
-# create a finetuning job
-curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-  -X POST \
-  -H "Content-Type: application/json" \
-  -d '{
-    "training_file": "toy_finetune_data.jsonl",
-    "model": "BAAI/bge-reranker-large",
-    "General":{
-      "task":"rerank",
-      "lora_config":null
-    }
-  }'
-```
-
-### 3. Manage fine-tuning job
-
-Below commands show how to list finetuning jobs, retrieve a finetuning job, cancel a finetuning job and list checkpoints of a finetuning job.
-
-```bash
-# list finetuning jobs
-curl http://${your_ip}:8015/v1/fine_tuning/jobs -X GET
-
-# retrieve one finetuning job
-curl http://${your_ip}:8015/v1/fine_tuning/jobs/retrieve -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
-
-# cancel one finetuning job
-curl http://${your_ip}:8015/v1/fine_tuning/jobs/cancel -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
-
-# list checkpoints of a finetuning job
-curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
-```
--- a/RerankFinetuning/docker_compose/intel/cpu/xeon/README.md
+++ b/RerankFinetuning/docker_compose/intel/cpu/xeon/README.md
@@ -1,26 +0,0 @@
-# Deploy Rerank Model Finetuning Service on Xeon
-
-This document outlines the deployment process for a rerank model finetuning service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Xeon server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
-
-## 🚀 Build Docker Images
-
-First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
-
-### 1. Build Docker Image
-
-Build docker image with below command:
-
-```bash
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-export HF_TOKEN=${your_huggingface_token}
-docker build -t opea/finetuning:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg HF_TOKEN=$HF_TOKEN -f comps/finetuning/src/Dockerfile .
-```
-
-### 2. Run Docker with CLI
-
-Start docker container with below command:
-
-```bash
-docker run -d --name="finetuning-server" -p 8015:8015 --runtime=runc --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/finetuning:latest
-```
--- a/RerankFinetuning/docker_compose/intel/hpu/gaudi/README.md
+++ b/RerankFinetuning/docker_compose/intel/hpu/gaudi/README.md
@@ -1,26 +0,0 @@
-# Deploy Rerank Model Finetuning Service on Gaudi
-
-This document outlines the deployment process for a rerank model finetuning service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice on Intel Xeon server. The steps include Docker image creation, container deployment. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service.
-
-## 🚀 Build Docker Images
-
-First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub.
-
-### 1. Build Docker Image
-
-Build docker image with below command:
-
-```bash
-git clone https://github.com/opea-project/GenAIComps.git
-cd GenAIComps
-docker build -t opea/finetuning-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/finetuning/src/Dockerfile.intel_hpu .
-```
-
-### 2. Run Docker with CLI
-
-Start docker container with below command:
-
-```bash
-export HF_TOKEN=${your_huggingface_token}
-docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p 8015:8015 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e no_proxy=$no_proxy -e HF_TOKEN=$HF_TOKEN opea/finetuning-gaudi:latest
-```
--- a/RerankFinetuning/tests/test_compose_on_gaudi.sh
+++ b/RerankFinetuning/tests/test_compose_on_gaudi.sh
@@ -1,131 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-set -x
-IMAGE_REPO=${IMAGE_REPO:-"opea"}
-IMAGE_TAG=${IMAGE_TAG:-"latest"}
-echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
-echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
-export REGISTRY=${IMAGE_REPO}
-export TAG=${IMAGE_TAG}
-
-WORKPATH=$(dirname "$PWD")
-LOG_PATH="$WORKPATH/tests"
-ip_address=$(hostname -I | awk '{print $1}')
-finetuning_service_port=8015
-ray_port=8265
-service_name=finetuning-gaudi
-
-function build_docker_images() {
-    cd $WORKPATH/docker_image_build
-    if [ ! -d "GenAIComps" ] ; then
-        git clone --depth 1 --branch ${opea_branch:-"main"} https://github.com/opea-project/GenAIComps.git
-    fi
-    docker compose -f build.yaml build ${service_name} --no-cache > ${LOG_PATH}/docker_image_build.log
-}
-
-function start_service() {
-    export no_proxy="localhost,127.0.0.1,"${ip_address}
-    docker run -d --name="finetuning-server" -p $finetuning_service_port:$finetuning_service_port -p $ray_port:$ray_port --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy ${IMAGE_REPO}/finetuning-gaudi:${IMAGE_TAG}
-    sleep 1m
-}
-
-function validate_microservice() {
-    cd $LOG_PATH
-    export no_proxy="localhost,127.0.0.1,"${ip_address}
-
-    # test /v1/dataprep upload file
-    URL="http://${ip_address}:$finetuning_service_port/v1/files"
-    cat <<EOF > test_data.json
-{"query": "Five women walk along a beach wearing flip-flops.", "pos": ["Some women with flip-flops on, are walking along the beach"], "neg": ["The 4 women are sitting on the beach.", "There was a reform in 1996.", "She's not going to court to clear her record.", "The man is talking about hawaii.", "A woman is standing outside.", "The battle was over. ", "A group of people plays volleyball."]}
-{"query": "A woman standing on a high cliff on one leg looking over a river.", "pos": ["A woman is standing on a cliff."], "neg": ["A woman sits on a chair.", "George Bush told the Republicans there was no way he would let them even consider this foolish idea, against his top advisors advice.", "The family was falling apart.", "no one showed up to the meeting", "A boy is sitting outside playing in the sand.", "Ended as soon as I received the wire.", "A child is reading in her bedroom."]}
-{"query": "Two woman are playing instruments; one a clarinet, the other a violin.", "pos": ["Some people are playing a tune."], "neg": ["Two women are playing a guitar and drums.", "A man is skiing down a mountain.", "The fatal dose was not taken when the murderer thought it would be.", "Person on bike", "The girl is standing, leaning against the archway.", "A group of women watch soap operas.", "No matter how old people get they never forget. "]}
-{"query": "A girl with a blue tank top sitting watching three dogs.", "pos": ["A girl is wearing blue."], "neg": ["A girl is with three cats.", "The people are watching a funeral procession.", "The child is wearing black.", "Financing is an issue for us in public schools.", "Kids at a pool.", "It is calming to be assaulted.", "I face a serious problem at eighteen years old. "]}
-{"query": "A yellow dog running along a forest path.", "pos": ["a dog is running"], "neg": ["a cat is running", "Steele did not keep her original story.", "The rule discourages people to pay their child support.", "A man in a vest sits in a car.", "Person in black clothing, with white bandanna and sunglasses waits at a bus stop.", "Neither the Globe or Mail had comments on the current state of Canada's road system. ", "The Spring Creek facility is old and outdated."]}
-{"query": "It sets out essential activities in each phase along with critical factors related to those activities.", "pos": ["Critical factors for essential activities are set out."], "neg": ["It lays out critical activities but makes no provision for critical factors related to those activities.", "People are assembled in protest.", "The state would prefer for you to do that.", "A girl sits beside a boy.", "Two males are performing.", "Nobody is jumping", "Conrad was being plotted against, to be hit on the head."]}
-EOF
-    HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -F 'file=@./test_data.json' -F purpose="fine-tune" -H 'Content-Type: multipart/form-data' "$URL")
-    HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
-    RESPONSE_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
-    SERVICE_NAME="finetuning-server - upload - file"
-
-    # Parse the JSON response
-    purpose=$(echo "$RESPONSE_BODY" | jq -r '.purpose')
-    filename=$(echo "$RESPONSE_BODY" | jq -r '.filename')
-
-    # Define expected values
-    expected_purpose="fine-tune"
-    expected_filename="test_data.json"
-
-    if [ "$HTTP_STATUS" -ne "200" ]; then
-        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_upload_file.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-    fi
-    # Check if the parsed values match the expected values
-    if [[ "$purpose" != "$expected_purpose" || "$filename" != "$expected_filename" ]]; then
-        echo "[ $SERVICE_NAME ] Content does not match the expected result: $RESPONSE_BODY"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_upload_file.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] Content is as expected."
-    fi
-
-    # test /v1/fine_tuning/jobs
-    URL="http://${ip_address}:$finetuning_service_port/v1/fine_tuning/jobs"
-    HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -H 'Content-Type: application/json' -d '{"training_file": "test_data.json","model": "BAAI/bge-reranker-base","General":{"task":"rerank","lora_config":null}}' "$URL")
-    HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
-    RESPONSE_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
-    SERVICE_NAME="finetuning-server - create finetuning job"
-
-    if [ "$HTTP_STATUS" -ne "200" ]; then
-        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_create.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-    fi
-    if [[ "$RESPONSE_BODY" != *'{"id":"ft-job'* ]]; then
-        echo "[ $SERVICE_NAME ] Content does not match the expected result: $RESPONSE_BODY"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_create.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] Content is as expected."
-    fi
-
-    sleep 3m
-
-    docker logs finetuning-server 2>&1 | tee ${LOG_PATH}/finetuning-server_create.log
-    FINETUNING_LOG=$(grep "succeeded" ${LOG_PATH}/finetuning-server_create.log)
-    if [[ "$FINETUNING_LOG" != *'succeeded'* ]]; then
-        echo "Finetuning failed."
-        RAY_JOBID=$(grep "Submitted Ray job" ${LOG_PATH}/finetuning-server_create.log | sed 's/.*raysubmit/raysubmit/' | cut -d' ' -f 1)
-        docker exec finetuning-server python -c "import os;os.environ['RAY_ADDRESS']='http://localhost:8265';from ray.job_submission import JobSubmissionClient;client = JobSubmissionClient();print(client.get_job_logs('${RAY_JOBID}'))" 2>&1 | tee ${LOG_PATH}/finetuning.log
-        exit 1
-    else
-        echo "Finetuning succeeded."
-    fi
-}
-
-function stop_docker() {
-    cid=$(docker ps -aq --filter "name=finetuning-server*")
-    if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
-}
-
-function main() {
-
-    stop_docker
-
-    build_docker_images
-    start_service
-
-    validate_microservice
-
-    stop_docker
-    echo y | docker system prune
-
-}
-
-main
--- a/RerankFinetuning/tests/test_compose_on_xeon.sh
+++ b/RerankFinetuning/tests/test_compose_on_xeon.sh
@@ -1,131 +0,0 @@
-# Copyright (C) 2024 Intel Corporation
-# SPDX-License-Identifier: Apache-2.0
-
-set -x
-IMAGE_REPO=${IMAGE_REPO:-"opea"}
-IMAGE_TAG=${IMAGE_TAG:-"latest"}
-echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
-echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
-export REGISTRY=${IMAGE_REPO}
-export TAG=${IMAGE_TAG}
-
-WORKPATH=$(dirname "$PWD")
-LOG_PATH="$WORKPATH/tests"
-ip_address=$(hostname -I | awk '{print $1}')
-finetuning_service_port=8015
-ray_port=8265
-service_name=finetuning
-
-function build_docker_images() {
-    cd $WORKPATH/docker_image_build
-    if [ ! -d "GenAIComps" ] ; then
-        git clone --depth 1 --branch ${opea_branch:-"main"} https://github.com/opea-project/GenAIComps.git
-    fi
-    docker compose -f build.yaml build ${service_name} --no-cache > ${LOG_PATH}/docker_image_build.log
-}
-
-function start_service() {
-    export no_proxy="localhost,127.0.0.1,"${ip_address}
-    docker run -d --name="finetuning-server" -p $finetuning_service_port:$finetuning_service_port -p $ray_port:$ray_port --runtime=runc --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy ${IMAGE_REPO}/finetuning:${IMAGE_TAG}
-    sleep 1m
-}
-
-function validate_microservice() {
-    cd $LOG_PATH
-    export no_proxy="localhost,127.0.0.1,"${ip_address}
-
-    # test /v1/dataprep upload file
-    URL="http://${ip_address}:$finetuning_service_port/v1/files"
-    cat <<EOF > test_data.json
-{"query": "Five women walk along a beach wearing flip-flops.", "pos": ["Some women with flip-flops on, are walking along the beach"], "neg": ["The 4 women are sitting on the beach.", "There was a reform in 1996.", "She's not going to court to clear her record.", "The man is talking about hawaii.", "A woman is standing outside.", "The battle was over. ", "A group of people plays volleyball."]}
-{"query": "A woman standing on a high cliff on one leg looking over a river.", "pos": ["A woman is standing on a cliff."], "neg": ["A woman sits on a chair.", "George Bush told the Republicans there was no way he would let them even consider this foolish idea, against his top advisors advice.", "The family was falling apart.", "no one showed up to the meeting", "A boy is sitting outside playing in the sand.", "Ended as soon as I received the wire.", "A child is reading in her bedroom."]}
-{"query": "Two woman are playing instruments; one a clarinet, the other a violin.", "pos": ["Some people are playing a tune."], "neg": ["Two women are playing a guitar and drums.", "A man is skiing down a mountain.", "The fatal dose was not taken when the murderer thought it would be.", "Person on bike", "The girl is standing, leaning against the archway.", "A group of women watch soap operas.", "No matter how old people get they never forget. "]}
-{"query": "A girl with a blue tank top sitting watching three dogs.", "pos": ["A girl is wearing blue."], "neg": ["A girl is with three cats.", "The people are watching a funeral procession.", "The child is wearing black.", "Financing is an issue for us in public schools.", "Kids at a pool.", "It is calming to be assaulted.", "I face a serious problem at eighteen years old. "]}
-{"query": "A yellow dog running along a forest path.", "pos": ["a dog is running"], "neg": ["a cat is running", "Steele did not keep her original story.", "The rule discourages people to pay their child support.", "A man in a vest sits in a car.", "Person in black clothing, with white bandanna and sunglasses waits at a bus stop.", "Neither the Globe or Mail had comments on the current state of Canada's road system. ", "The Spring Creek facility is old and outdated."]}
-{"query": "It sets out essential activities in each phase along with critical factors related to those activities.", "pos": ["Critical factors for essential activities are set out."], "neg": ["It lays out critical activities but makes no provision for critical factors related to those activities.", "People are assembled in protest.", "The state would prefer for you to do that.", "A girl sits beside a boy.", "Two males are performing.", "Nobody is jumping", "Conrad was being plotted against, to be hit on the head."]}
-EOF
-    HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -F 'file=@./test_data.json' -F purpose="fine-tune" -H 'Content-Type: multipart/form-data' "$URL")
-    HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
-    RESPONSE_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
-    SERVICE_NAME="finetuning-server - upload - file"
-
-    # Parse the JSON response
-    purpose=$(echo "$RESPONSE_BODY" | jq -r '.purpose')
-    filename=$(echo "$RESPONSE_BODY" | jq -r '.filename')
-
-    # Define expected values
-    expected_purpose="fine-tune"
-    expected_filename="test_data.json"
-
-    if [ "$HTTP_STATUS" -ne "200" ]; then
-        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_upload_file.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-    fi
-    # Check if the parsed values match the expected values
-    if [[ "$purpose" != "$expected_purpose" || "$filename" != "$expected_filename" ]]; then
-        echo "[ $SERVICE_NAME ] Content does not match the expected result: $RESPONSE_BODY"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_upload_file.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] Content is as expected."
-    fi
-
-    # test /v1/fine_tuning/jobs
-    URL="http://${ip_address}:$finetuning_service_port/v1/fine_tuning/jobs"
-    HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST -H 'Content-Type: application/json' -d '{"training_file": "test_data.json","model": "BAAI/bge-reranker-base","General":{"task":"rerank","lora_config":null}}' "$URL")
-    HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
-    RESPONSE_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
-    SERVICE_NAME="finetuning-server - create finetuning job"
-
-    if [ "$HTTP_STATUS" -ne "200" ]; then
-        echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_create.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."
-    fi
-    if [[ "$RESPONSE_BODY" != *'{"id":"ft-job'* ]]; then
-        echo "[ $SERVICE_NAME ] Content does not match the expected result: $RESPONSE_BODY"
-        docker logs finetuning-server >> ${LOG_PATH}/finetuning-server_create.log
-        exit 1
-    else
-        echo "[ $SERVICE_NAME ] Content is as expected."
-    fi
-
-    sleep 3m
-
-    docker logs finetuning-server 2>&1 | tee ${LOG_PATH}/finetuning-server_create.log
-    FINETUNING_LOG=$(grep "succeeded" ${LOG_PATH}/finetuning-server_create.log)
-    if [[ "$FINETUNING_LOG" != *'succeeded'* ]]; then
-        echo "Finetuning failed."
-        RAY_JOBID=$(grep "Submitted Ray job" ${LOG_PATH}/finetuning-server_create.log | sed 's/.*raysubmit/raysubmit/' | cut -d' ' -f 1)
-        docker exec finetuning-server python -c "import os;os.environ['RAY_ADDRESS']='http://localhost:8265';from ray.job_submission import JobSubmissionClient;client = JobSubmissionClient();print(client.get_job_logs('${RAY_JOBID}'))" 2>&1 | tee ${LOG_PATH}/finetuning.log
-        exit 1
-    else
-        echo "Finetuning succeeded."
-    fi
-}
-
-function stop_docker() {
-    cid=$(docker ps -aq --filter "name=finetuning-server*")
-    if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
-}
-
-function main() {
-
-    stop_docker
-
-    build_docker_images
-    start_service
-
-    validate_microservice
-
-    stop_docker
-    echo y | docker system prune
-
-}
-
-main
Author	SHA1	Message	Date
chen, suyue	de96cd4dcf	Merge branch 'main' into ft	2025-03-07 15:05:27 +08:00
pre-commit-ci[bot]	769105b986	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-03-06 00:43:57 -05:00
Ye, Xinyu	641f60c76c	merged InstructionTuning and RerankFinetuning into Finetuning. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>	2025-03-04 01:13:31 -05:00