fix: others headless change

chore: some menu
chore: change some
2026-01-08 23:34:11 +00:00 · 2025-01-23 15:11:05 +08:00 · 2025-01-23 14:01:14 +08:00 · 2025-01-23 10:30:49 +08:00 · 2025-01-22 16:14:04 +08:00 · 2025-01-22 15:51:44 +08:00
429 changed files with 7234 additions and 13638 deletions
--- a/.github/actions/setup-poetry/action.yml
+++ b/.github/actions/setup-poetry/action.yml
@@ -8,7 +8,7 @@ inputs:
  poetry-version:
    description: Poetry version to set up
    required: true
-    default: '2.0.1'
+    default: '1.8.4'
  poetry-lockfile:
    description: Path to the Poetry lockfile to restore cache from
    required: true
--- a/.github/workflows/api-tests.yml
+++ b/.github/workflows/api-tests.yml
@@ -42,23 +42,25 @@ jobs:
        run: poetry install -C api --with dev

      - name: Check dependencies in pyproject.toml
-        run: poetry run -P api bash dev/pytest/pytest_artifacts.sh
+        run: poetry run -C api bash dev/pytest/pytest_artifacts.sh

      - name: Run Unit tests
-        run: poetry run -P api bash dev/pytest/pytest_unit_tests.sh
+        run: poetry run -C api bash dev/pytest/pytest_unit_tests.sh

      - name: Run ModelRuntime
-        run: poetry run -P api bash dev/pytest/pytest_model_runtime.sh
+        run: poetry run -C api bash dev/pytest/pytest_model_runtime.sh

      - name: Run dify config tests
-        run: poetry run -P api python dev/pytest/pytest_config_tests.py
+        run: poetry run -C api python dev/pytest/pytest_config_tests.py

      - name: Run Tool
-        run: poetry run -P api bash dev/pytest/pytest_tools.sh
+        run: poetry run -C api bash dev/pytest/pytest_tools.sh

      - name: Run mypy
        run: |
-          poetry run -C api python -m mypy --install-types --non-interactive .
+          pushd api
+          poetry run python -m mypy --install-types --non-interactive .
+          popd

      - name: Set up dotenvs
        run: |
@@ -78,4 +80,4 @@ jobs:
            ssrf_proxy

      - name: Run Workflow
-        run: poetry run -P api bash dev/pytest/pytest_workflow.sh
+        run: poetry run -C api bash dev/pytest/pytest_workflow.sh
--- a/.github/workflows/style.yml
+++ b/.github/workflows/style.yml
@@ -38,12 +38,12 @@ jobs:
        if: steps.changed-files.outputs.any_changed == 'true'
        run: |
          poetry run -C api ruff --version
-          poetry run -C api ruff check ./
-          poetry run -C api ruff format --check ./
+          poetry run -C api ruff check ./api
+          poetry run -C api ruff format --check ./api

      - name: Dotenv check
        if: steps.changed-files.outputs.any_changed == 'true'
-        run: poetry run -P api dotenv-linter ./api/.env.example ./web/.env.example
+        run: poetry run -C api dotenv-linter ./api/.env.example ./web/.env.example

      - name: Lint hints
        if: failure()
--- a/.github/workflows/vdb-tests.yml
+++ b/.github/workflows/vdb-tests.yml
@@ -70,4 +70,4 @@ jobs:
            tidb

      - name: Test Vector Stores
-        run: poetry run -P api bash dev/pytest/pytest_vdb.sh
+        run: poetry run -C api bash dev/pytest/pytest_vdb.sh
--- a/api/.ruff.toml
+++ b/api/.ruff.toml
@@ -53,12 +53,10 @@ ignore = [
    "FURB152", # math-constant
    "UP007", # non-pep604-annotation
    "UP032", # f-string
-    "UP045", # non-pep604-annotation-optional
    "B005", # strip-with-multi-characters
    "B006", # mutable-argument-default
    "B007", # unused-loop-control-variable
    "B026", # star-arg-unpacking-after-keyword-arg
-    "B903", # class-as-data-structure
    "B904", # raise-without-from-inside-except
    "B905", # zip-without-explicit-strict
    "N806", # non-lowercase-variable-in-function
--- a/api/Dockerfile
+++ b/api/Dockerfile
@@ -4,7 +4,7 @@ FROM python:3.12-slim-bookworm AS base
 WORKDIR /app/api

 # Install Poetry
-ENV POETRY_VERSION=2.0.1
+ENV POETRY_VERSION=1.8.4

 # if you located in China, you can use aliyun mirror to speed up
 # RUN pip install --no-cache-dir poetry==${POETRY_VERSION} -i https://mirrors.aliyun.com/pypi/simple/
@@ -52,14 +52,12 @@ RUN apt-get update \
    && apt-get install -y --no-install-recommends curl nodejs libgmp-dev libmpfr-dev libmpc-dev \
    # if you located in China, you can use aliyun mirror to speed up
    # && echo "deb http://mirrors.aliyun.com/debian testing main" > /etc/apt/sources.list \
-    && echo "deb http://deb.debian.org/debian bookworm main" > /etc/apt/sources.list \
+    && echo "deb http://deb.debian.org/debian testing main" > /etc/apt/sources.list \
    && apt-get update \
    # For Security
-    && apt-get install -y --no-install-recommends expat libldap-2.5-0 perl libsqlite3-0 zlib1g \
+    && apt-get install -y --no-install-recommends expat=2.6.4-1 libldap-2.5-0=2.5.19+dfsg-1 perl=5.40.0-8 libsqlite3-0=3.46.1-1 zlib1g=1:1.3.dfsg+really1.3.1-1+b1 \
    # install a chinese font to support the use of tools like matplotlib
    && apt-get install -y fonts-noto-cjk \
-    # install libmagic to support the use of python-magic guess MIMETYPE
-    && apt-get install -y libmagic1 \
    && apt-get autoremove -y \
    && rm -rf /var/lib/apt/lists/*

--- a/api/README.md
+++ b/api/README.md
@@ -79,5 +79,5 @@
 2. Run the tests locally with mocked system environment variables in `tool.pytest_env` section in `pyproject.toml`

   ```bash
-   poetry run -P api bash dev/pytest/pytest_all_tests.sh
+   poetry run -C api bash dev/pytest/pytest_all_tests.sh
   ```
--- a/api/configs/feature/init.py
+++ b/api/configs/feature/init.py
@@ -146,7 +146,7 @@ class EndpointConfig(BaseSettings):
    )

    CONSOLE_WEB_URL: str = Field(
-        description="Base URL for the console web interface,used for frontend references and CORS configuration",
+        description="Base URL for the console web interface," "used for frontend references and CORS configuration",
        default="",
    )

--- a/api/configs/feature/hosted_service/init.py
+++ b/api/configs/feature/hosted_service/init.py
@@ -181,7 +181,7 @@ class HostedFetchAppTemplateConfig(BaseSettings):
    """

    HOSTED_FETCH_APP_TEMPLATES_MODE: str = Field(
-        description="Mode for fetching app templates: remote, db, or builtin default to remote,",
+        description="Mode for fetching app templates: remote, db, or builtin" " default to remote,",
        default="remote",
    )

--- a/api/configs/packaging/init.py
+++ b/api/configs/packaging/init.py
@@ -9,7 +9,7 @@ class PackagingInfo(BaseSettings):

    CURRENT_VERSION: str = Field(
        description="Dify version",
-        default="0.15.2",
+        default="0.15.0",
    )

    COMMIT_SHA: str = Field(
--- a/api/controllers/common/helpers.py
+++ b/api/controllers/common/helpers.py
@@ -1,32 +1,12 @@
 import mimetypes
 import os
-import platform
 import re
 import urllib.parse
-import warnings
 from collections.abc import Mapping
 from typing import Any
 from uuid import uuid4

 import httpx
-
-try:
-    import magic
-except ImportError:
-    if platform.system() == "Windows":
-        warnings.warn(
-            "To use python-magic guess MIMETYPE, you need to run `pip install python-magic-bin`", stacklevel=2
-        )
-    elif platform.system() == "Darwin":
-        warnings.warn("To use python-magic guess MIMETYPE, you need to run `brew install libmagic`", stacklevel=2)
-    elif platform.system() == "Linux":
-        warnings.warn(
-            "To use python-magic guess MIMETYPE, you need to run `sudo apt-get install libmagic1`", stacklevel=2
-        )
-    else:
-        warnings.warn("To use python-magic guess MIMETYPE, you need to install `libmagic`", stacklevel=2)
-    magic = None  # type: ignore
-
 from pydantic import BaseModel

 from configs import dify_config
@@ -67,13 +47,6 @@ def guess_file_info_from_response(response: httpx.Response):
        # If guessing fails, use Content-Type from response headers
        mimetype = response.headers.get("Content-Type", "application/octet-stream")

-    # Use python-magic to guess MIME type if still unknown or generic
-    if mimetype == "application/octet-stream" and magic is not None:
-        try:
-            mimetype = magic.from_buffer(response.content[:1024], mime=True)
-        except magic.MagicException:
-            pass
-
    extension = os.path.splitext(filename)[1]

    # Ensure filename has an extension
--- a/api/controllers/console/admin.py
+++ b/api/controllers/console/admin.py
@@ -56,7 +56,7 @@ class InsertExploreAppListApi(Resource):

        app = App.query.filter(App.id == args["app_id"]).first()
        if not app:
-            raise NotFound(f"App '{args['app_id']}' is not found")
+            raise NotFound(f'App \'{args["app_id"]}\' is not found')

        site = app.site
        if not site:
--- a/api/controllers/console/app/audio.py
+++ b/api/controllers/console/app/audio.py
@@ -22,7 +22,7 @@ from controllers.console.wraps import account_initialization_required, setup_req
 from core.errors.error import ModelCurrentlyNotSupportError, ProviderTokenNotInitError, QuotaExceededError
 from core.model_runtime.errors.invoke import InvokeError
 from libs.login import login_required
-from models import App, AppMode
+from models.model import AppMode
 from services.audio_service import AudioService
 from services.errors.audio import (
    AudioTooLargeServiceError,
@@ -79,7 +79,7 @@ class ChatMessageTextApi(Resource):
    @login_required
    @account_initialization_required
    @get_app_model
-    def post(self, app_model: App):
+    def post(self, app_model):
        from werkzeug.exceptions import InternalServerError

        try:
@@ -98,13 +98,9 @@ class ChatMessageTextApi(Resource):
                and app_model.workflow.features_dict
            ):
                text_to_speech = app_model.workflow.features_dict.get("text_to_speech")
-                if text_to_speech is None:
-                    raise ValueError("TTS is not enabled")
                voice = args.get("voice") or text_to_speech.get("voice")
            else:
                try:
-                    if app_model.app_model_config is None:
-                        raise ValueError("AppModelConfig not found")
                    voice = args.get("voice") or app_model.app_model_config.text_to_speech_dict.get("voice")
                except Exception:
                    voice = None
--- a/api/controllers/console/datasets/datasets.py
+++ b/api/controllers/console/datasets/datasets.py
@@ -52,12 +52,12 @@ class DatasetListApi(Resource):
        # provider = request.args.get("provider", default="vendor")
        search = request.args.get("keyword", default=None, type=str)
        tag_ids = request.args.getlist("tag_ids")
-        include_all = request.args.get("include_all", default="false").lower() == "true"
+
        if ids:
            datasets, total = DatasetService.get_datasets_by_ids(ids, current_user.current_tenant_id)
        else:
            datasets, total = DatasetService.get_datasets(
-                page, limit, current_user.current_tenant_id, current_user, search, tag_ids, include_all
+                page, limit, current_user.current_tenant_id, current_user, search, tag_ids
            )

        # check embedding setting
@@ -457,7 +457,7 @@ class DatasetIndexingEstimateApi(Resource):
            )
        except LLMBadRequestError:
            raise ProviderNotInitializeError(
-                "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                "No Embedding Model available. Please configure a valid provider " "in the Settings -> Model Provider."
            )
        except ProviderTokenNotInitError as ex:
            raise ProviderNotInitializeError(ex.description)
@@ -619,7 +619,9 @@ class DatasetRetrievalSettingApi(Resource):
        vector_type = dify_config.VECTOR_STORE
        match vector_type:
            case (
-                VectorType.RELYT
+                VectorType.MILVUS
+                | VectorType.RELYT
+                | VectorType.PGVECTOR
                | VectorType.TIDB_VECTOR
                | VectorType.CHROMA
                | VectorType.TENCENT
@@ -643,7 +645,6 @@ class DatasetRetrievalSettingApi(Resource):
                | VectorType.TIDB_ON_QDRANT
                | VectorType.LINDORM
                | VectorType.COUCHBASE
-                | VectorType.MILVUS
            ):
                return {
                    "retrieval_method": [
--- a/api/controllers/console/datasets/datasets_document.py
+++ b/api/controllers/console/datasets/datasets_document.py
@@ -350,7 +350,8 @@ class DatasetInitApi(Resource):
                )
            except InvokeAuthorizationError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -525,7 +526,8 @@ class DocumentBatchIndexingEstimateApi(DocumentResource):
                return response.model_dump(), 200
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
--- a/api/controllers/console/datasets/datasets_segments.py
+++ b/api/controllers/console/datasets/datasets_segments.py
@@ -168,7 +168,8 @@ class DatasetDocumentSegmentApi(Resource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -216,7 +217,8 @@ class DatasetDocumentSegmentAddApi(Resource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -265,7 +267,8 @@ class DatasetDocumentSegmentUpdateApi(Resource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -365,9 +368,9 @@ class DatasetDocumentSegmentBatchImportApi(Resource):
            result = []
            for index, row in df.iterrows():
                if document.doc_form == "qa_model":
-                    data = {"content": row.iloc[0], "answer": row.iloc[1]}
+                    data = {"content": row[0], "answer": row[1]}
                else:
-                    data = {"content": row.iloc[0]}
+                    data = {"content": row[0]}
                result.append(data)
            if len(result) == 0:
                raise ValueError("The CSV file is empty.")
@@ -434,7 +437,8 @@ class ChildChunkAddApi(Resource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
--- a/api/controllers/console/explore/conversation.py
+++ b/api/controllers/console/explore/conversation.py
@@ -32,7 +32,7 @@ class ConversationListApi(InstalledAppResource):

        pinned = None
        if "pinned" in args and args["pinned"] is not None:
-            pinned = args["pinned"] == "true"
+            pinned = True if args["pinned"] == "true" else False

        try:
            with Session(db.engine) as session:
--- a/api/controllers/console/explore/message.py
+++ b/api/controllers/console/explore/message.py
@@ -50,7 +50,7 @@ class MessageListApi(InstalledAppResource):

        try:
            return MessageService.pagination_by_first_id(
-                app_model, current_user, args["conversation_id"], args["first_id"], args["limit"]
+                app_model, current_user, args["conversation_id"], args["first_id"], args["limit"], "desc"
            )
        except services.errors.conversation.ConversationNotExistsError:
            raise NotFound("Conversation Not Exists.")
--- a/api/controllers/inner_api/workspace/workspace.py
+++ b/api/controllers/inner_api/workspace/workspace.py
@@ -1,5 +1,3 @@
-import json
-
 from flask_restful import Resource, reqparse  # type: ignore

 from controllers.console.wraps import setup_required
@@ -31,34 +29,4 @@ class EnterpriseWorkspace(Resource):
        return {"message": "enterprise workspace created."}


-class EnterpriseWorkspaceNoOwnerEmail(Resource):
-    @setup_required
-    @inner_api_only
-    def post(self):
-        parser = reqparse.RequestParser()
-        parser.add_argument("name", type=str, required=True, location="json")
-        args = parser.parse_args()
-
-        tenant = TenantService.create_tenant(args["name"], is_from_dashboard=True)
-
-        tenant_was_created.send(tenant)
-
-        resp = {
-            "id": tenant.id,
-            "name": tenant.name,
-            "encrypt_public_key": tenant.encrypt_public_key,
-            "plan": tenant.plan,
-            "status": tenant.status,
-            "custom_config": json.loads(tenant.custom_config) if tenant.custom_config else {},
-            "created_at": tenant.created_at.isoformat() if tenant.created_at else None,
-            "updated_at": tenant.updated_at.isoformat() if tenant.updated_at else None,
-        }
-
-        return {
-            "message": "enterprise workspace created.",
-            "tenant": resp,
-        }
-
-
 api.add_resource(EnterpriseWorkspace, "/enterprise/workspace")
-api.add_resource(EnterpriseWorkspaceNoOwnerEmail, "/enterprise/workspace/ownerless")
--- a/api/controllers/service_api/init.py
+++ b/api/controllers/service_api/init.py
@@ -7,4 +7,4 @@ api = ExternalApi(bp)

 from . import index
 from .app import app, audio, completion, conversation, file, message, workflow
-from .dataset import dataset, document, hit_testing, segment, upload_file
+from .dataset import dataset, document, hit_testing, segment
--- a/api/controllers/service_api/dataset/dataset.py
+++ b/api/controllers/service_api/dataset/dataset.py
@@ -31,11 +31,8 @@ class DatasetListApi(DatasetApiResource):
        # provider = request.args.get("provider", default="vendor")
        search = request.args.get("keyword", default=None, type=str)
        tag_ids = request.args.getlist("tag_ids")
-        include_all = request.args.get("include_all", default="false").lower() == "true"

-        datasets, total = DatasetService.get_datasets(
-            page, limit, tenant_id, current_user, search, tag_ids, include_all
-        )
+        datasets, total = DatasetService.get_datasets(page, limit, tenant_id, current_user, search, tag_ids)
        # check embedding setting
        provider_manager = ProviderManager()
        configurations = provider_manager.get_configurations(tenant_id=current_user.current_tenant_id)
--- a/api/controllers/service_api/dataset/document.py
+++ b/api/controllers/service_api/dataset/document.py
@@ -18,7 +18,6 @@ from controllers.service_api.app.error import (
 from controllers.service_api.dataset.error import (
    ArchivedDocumentImmutableError,
    DocumentIndexingError,
-    InvalidMetadataError,
 )
 from controllers.service_api.wraps import DatasetApiResource, cloud_edition_billing_resource_check
 from core.errors.error import ProviderTokenNotInitError
@@ -51,9 +50,6 @@ class DocumentAddByTextApi(DatasetApiResource):
            "indexing_technique", type=str, choices=Dataset.INDEXING_TECHNIQUE_LIST, nullable=False, location="json"
        )
        parser.add_argument("retrieval_model", type=dict, required=False, nullable=False, location="json")
-        parser.add_argument("doc_type", type=str, required=False, nullable=True, location="json")
-        parser.add_argument("doc_metadata", type=dict, required=False, nullable=True, location="json")
-
        args = parser.parse_args()
        dataset_id = str(dataset_id)
        tenant_id = str(tenant_id)
@@ -65,28 +61,6 @@ class DocumentAddByTextApi(DatasetApiResource):
        if not dataset.indexing_technique and not args["indexing_technique"]:
            raise ValueError("indexing_technique is required.")

-        # Validate metadata if provided
-        if args.get("doc_type") or args.get("doc_metadata"):
-            if not args.get("doc_type") or not args.get("doc_metadata"):
-                raise InvalidMetadataError("Both doc_type and doc_metadata must be provided when adding metadata")
-
-            if args["doc_type"] not in DocumentService.DOCUMENT_METADATA_SCHEMA:
-                raise InvalidMetadataError(
-                    "Invalid doc_type. Must be one of: " + ", ".join(DocumentService.DOCUMENT_METADATA_SCHEMA.keys())
-                )
-
-            if not isinstance(args["doc_metadata"], dict):
-                raise InvalidMetadataError("doc_metadata must be a dictionary")
-
-            # Validate metadata schema based on doc_type
-            if args["doc_type"] != "others":
-                metadata_schema = DocumentService.DOCUMENT_METADATA_SCHEMA[args["doc_type"]]
-                for key, value in args["doc_metadata"].items():
-                    if key in metadata_schema and not isinstance(value, metadata_schema[key]):
-                        raise InvalidMetadataError(f"Invalid type for metadata field {key}")
-            # set to MetaDataConfig
-            args["metadata"] = {"doc_type": args["doc_type"], "doc_metadata": args["doc_metadata"]}
-
        text = args.get("text")
        name = args.get("name")
        if text is None or name is None:
@@ -133,8 +107,6 @@ class DocumentUpdateByTextApi(DatasetApiResource):
            "doc_language", type=str, default="English", required=False, nullable=False, location="json"
        )
        parser.add_argument("retrieval_model", type=dict, required=False, nullable=False, location="json")
-        parser.add_argument("doc_type", type=str, required=False, nullable=True, location="json")
-        parser.add_argument("doc_metadata", type=dict, required=False, nullable=True, location="json")
        args = parser.parse_args()
        dataset_id = str(dataset_id)
        tenant_id = str(tenant_id)
@@ -143,32 +115,6 @@ class DocumentUpdateByTextApi(DatasetApiResource):
        if not dataset:
            raise ValueError("Dataset is not exist.")

-        # indexing_technique is already set in dataset since this is an update
-        args["indexing_technique"] = dataset.indexing_technique
-
-        # Validate metadata if provided
-        if args.get("doc_type") or args.get("doc_metadata"):
-            if not args.get("doc_type") or not args.get("doc_metadata"):
-                raise InvalidMetadataError("Both doc_type and doc_metadata must be provided when adding metadata")
-
-            if args["doc_type"] not in DocumentService.DOCUMENT_METADATA_SCHEMA:
-                raise InvalidMetadataError(
-                    "Invalid doc_type. Must be one of: " + ", ".join(DocumentService.DOCUMENT_METADATA_SCHEMA.keys())
-                )
-
-            if not isinstance(args["doc_metadata"], dict):
-                raise InvalidMetadataError("doc_metadata must be a dictionary")
-
-            # Validate metadata schema based on doc_type
-            if args["doc_type"] != "others":
-                metadata_schema = DocumentService.DOCUMENT_METADATA_SCHEMA[args["doc_type"]]
-                for key, value in args["doc_metadata"].items():
-                    if key in metadata_schema and not isinstance(value, metadata_schema[key]):
-                        raise InvalidMetadataError(f"Invalid type for metadata field {key}")
-
-            # set to MetaDataConfig
-            args["metadata"] = {"doc_type": args["doc_type"], "doc_metadata": args["doc_metadata"]}
-
        if args["text"]:
            text = args.get("text")
            name = args.get("name")
@@ -215,30 +161,6 @@ class DocumentAddByFileApi(DatasetApiResource):
            args["doc_form"] = "text_model"
        if "doc_language" not in args:
            args["doc_language"] = "English"
-
-        # Validate metadata if provided
-        if args.get("doc_type") or args.get("doc_metadata"):
-            if not args.get("doc_type") or not args.get("doc_metadata"):
-                raise InvalidMetadataError("Both doc_type and doc_metadata must be provided when adding metadata")
-
-            if args["doc_type"] not in DocumentService.DOCUMENT_METADATA_SCHEMA:
-                raise InvalidMetadataError(
-                    "Invalid doc_type. Must be one of: " + ", ".join(DocumentService.DOCUMENT_METADATA_SCHEMA.keys())
-                )
-
-            if not isinstance(args["doc_metadata"], dict):
-                raise InvalidMetadataError("doc_metadata must be a dictionary")
-
-            # Validate metadata schema based on doc_type
-            if args["doc_type"] != "others":
-                metadata_schema = DocumentService.DOCUMENT_METADATA_SCHEMA[args["doc_type"]]
-                for key, value in args["doc_metadata"].items():
-                    if key in metadata_schema and not isinstance(value, metadata_schema[key]):
-                        raise InvalidMetadataError(f"Invalid type for metadata field {key}")
-
-            # set to MetaDataConfig
-            args["metadata"] = {"doc_type": args["doc_type"], "doc_metadata": args["doc_metadata"]}
-
        # get dataset info
        dataset_id = str(dataset_id)
        tenant_id = str(tenant_id)
@@ -306,29 +228,6 @@ class DocumentUpdateByFileApi(DatasetApiResource):
        if "doc_language" not in args:
            args["doc_language"] = "English"

-        # Validate metadata if provided
-        if args.get("doc_type") or args.get("doc_metadata"):
-            if not args.get("doc_type") or not args.get("doc_metadata"):
-                raise InvalidMetadataError("Both doc_type and doc_metadata must be provided when adding metadata")
-
-            if args["doc_type"] not in DocumentService.DOCUMENT_METADATA_SCHEMA:
-                raise InvalidMetadataError(
-                    "Invalid doc_type. Must be one of: " + ", ".join(DocumentService.DOCUMENT_METADATA_SCHEMA.keys())
-                )
-
-            if not isinstance(args["doc_metadata"], dict):
-                raise InvalidMetadataError("doc_metadata must be a dictionary")
-
-            # Validate metadata schema based on doc_type
-            if args["doc_type"] != "others":
-                metadata_schema = DocumentService.DOCUMENT_METADATA_SCHEMA[args["doc_type"]]
-                for key, value in args["doc_metadata"].items():
-                    if key in metadata_schema and not isinstance(value, metadata_schema[key]):
-                        raise InvalidMetadataError(f"Invalid type for metadata field {key}")
-
-            # set to MetaDataConfig
-            args["metadata"] = {"doc_type": args["doc_type"], "doc_metadata": args["doc_metadata"]}
-
        # get dataset info
        dataset_id = str(dataset_id)
        tenant_id = str(tenant_id)
--- a/api/controllers/service_api/dataset/segment.py
+++ b/api/controllers/service_api/dataset/segment.py
@@ -53,7 +53,8 @@ class SegmentApi(DatasetApiResource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -94,7 +95,8 @@ class SegmentApi(DatasetApiResource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
@@ -173,7 +175,8 @@ class DatasetSegmentApi(DatasetApiResource):
                )
            except LLMBadRequestError:
                raise ProviderNotInitializeError(
-                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
+                    "No Embedding Model available. Please configure a valid provider "
+                    "in the Settings -> Model Provider."
                )
            except ProviderTokenNotInitError as ex:
                raise ProviderNotInitializeError(ex.description)
--- a/api/controllers/service_api/dataset/upload_file.py
+++ b/api/controllers/service_api/dataset/upload_file.py
@@ -1,54 +0,0 @@
-from werkzeug.exceptions import NotFound
-
-from controllers.service_api import api
-from controllers.service_api.wraps import (
-    DatasetApiResource,
-)
-from core.file import helpers as file_helpers
-from extensions.ext_database import db
-from models.dataset import Dataset
-from models.model import UploadFile
-from services.dataset_service import DocumentService
-
-
-class UploadFileApi(DatasetApiResource):
-    def get(self, tenant_id, dataset_id, document_id):
-        """Get upload file."""
-        # check dataset
-        dataset_id = str(dataset_id)
-        tenant_id = str(tenant_id)
-        dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
-        if not dataset:
-            raise NotFound("Dataset not found.")
-        # check document
-        document_id = str(document_id)
-        document = DocumentService.get_document(dataset.id, document_id)
-        if not document:
-            raise NotFound("Document not found.")
-        # check upload file
-        if document.data_source_type != "upload_file":
-            raise ValueError(f"Document data source type ({document.data_source_type}) is not upload_file.")
-        data_source_info = document.data_source_info_dict
-        if data_source_info and "upload_file_id" in data_source_info:
-            file_id = data_source_info["upload_file_id"]
-            upload_file = db.session.query(UploadFile).filter(UploadFile.id == file_id).first()
-            if not upload_file:
-                raise NotFound("UploadFile not found.")
-        else:
-            raise ValueError("Upload file id not found in document data source info.")
-
-        url = file_helpers.get_signed_file_url(upload_file_id=upload_file.id)
-        return {
-            "id": upload_file.id,
-            "name": upload_file.name,
-            "size": upload_file.size,
-            "extension": upload_file.extension,
-            "url": url,
-            "download_url": f"{url}&as_attachment=true",
-            "mime_type": upload_file.mime_type,
-            "created_by": upload_file.created_by,
-            "created_at": upload_file.created_at.timestamp(),
-        }, 200
-
-
-api.add_resource(UploadFileApi, "/datasets/<uuid:dataset_id>/documents/<uuid:document_id>/upload-file")
--- a/api/controllers/service_api/wraps.py
+++ b/api/controllers/service_api/wraps.py
@@ -195,11 +195,7 @@ def validate_and_get_api_token(scope: str | None = None):
    with Session(db.engine, expire_on_commit=False) as session:
        update_stmt = (
            update(ApiToken)
-            .where(
-                ApiToken.token == auth_token,
-                (ApiToken.last_used_at.is_(None) | (ApiToken.last_used_at < cutoff_time)),
-                ApiToken.type == scope,
-            )
+            .where(ApiToken.token == auth_token, ApiToken.last_used_at < cutoff_time, ApiToken.type == scope)
            .values(last_used_at=current_time)
            .returning(ApiToken)
        )
@@ -240,7 +236,7 @@ def create_or_update_end_user_for_user_id(app_model: App, user_id: Optional[str]
            tenant_id=app_model.tenant_id,
            app_id=app_model.id,
            type="service_api",
-            is_anonymous=user_id == "DEFAULT-USER",
+            is_anonymous=True if user_id == "DEFAULT-USER" else False,
            session_id=user_id,
        )
        db.session.add(end_user)
--- a/api/controllers/web/conversation.py
+++ b/api/controllers/web/conversation.py
@@ -39,7 +39,7 @@ class ConversationListApi(WebApiResource):

        pinned = None
        if "pinned" in args and args["pinned"] is not None:
-            pinned = args["pinned"] == "true"
+            pinned = True if args["pinned"] == "true" else False

        try:
            with Session(db.engine) as session:
--- a/api/controllers/web/message.py
+++ b/api/controllers/web/message.py
@@ -91,7 +91,7 @@ class MessageListApi(WebApiResource):

        try:
            return MessageService.pagination_by_first_id(
-                app_model, end_user, args["conversation_id"], args["first_id"], args["limit"]
+                app_model, end_user, args["conversation_id"], args["first_id"], args["limit"], "desc"
            )
        except services.errors.conversation.ConversationNotExistsError:
            raise NotFound("Conversation Not Exists.")
--- a/api/core/agent/cot_agent_runner.py
+++ b/api/core/agent/cot_agent_runner.py
@@ -172,7 +172,7 @@ class CotAgentRunner(BaseAgentRunner, ABC):

            self.save_agent_thought(
                agent_thought=agent_thought,
-                tool_name=(scratchpad.action.action_name if scratchpad.action and not scratchpad.is_final() else ""),
+                tool_name=scratchpad.action.action_name if scratchpad.action else "",
                tool_input={scratchpad.action.action_name: scratchpad.action.action_input} if scratchpad.action else {},
                tool_invoke_meta={},
                thought=scratchpad.thought or "",
--- a/api/core/app/apps/agent_chat/app_runner.py
+++ b/api/core/app/apps/agent_chat/app_runner.py
@@ -202,7 +202,7 @@ class AgentChatAppRunner(AppRunner):
        # change function call strategy based on LLM model
        llm_model = cast(LargeLanguageModel, model_instance.model_type_instance)
        model_schema = llm_model.get_model_schema(model_instance.model, model_instance.credentials)
-        if not model_schema:
+        if not model_schema or not model_schema.features:
            raise ValueError("Model schema not found")

        if {ModelFeature.MULTI_TOOL_CALL, ModelFeature.TOOL_CALL}.intersection(model_schema.features or []):
--- a/api/core/app/apps/base_app_queue_manager.py
+++ b/api/core/app/apps/base_app_queue_manager.py
@@ -167,7 +167,8 @@ class AppQueueManager:
        else:
            if isinstance(data, DeclarativeMeta) or hasattr(data, "_sa_instance_state"):
                raise TypeError(
-                    "Critical Error: Passing SQLAlchemy Model instances that cause thread safety issues is not allowed."
+                    "Critical Error: Passing SQLAlchemy Model instances "
+                    "that cause thread safety issues is not allowed."
                )


--- a/api/core/app/apps/message_based_app_generator.py
+++ b/api/core/app/apps/message_based_app_generator.py
@@ -89,7 +89,6 @@ class MessageBasedAppGenerator(BaseAppGenerator):
            Conversation.id == conversation_id,
            Conversation.app_id == app_model.id,
            Conversation.status == "normal",
-            Conversation.is_deleted.is_(False),
        ]

        if isinstance(user, Account):
--- a/api/core/app/task_pipeline/message_cycle_manage.py
+++ b/api/core/app/task_pipeline/message_cycle_manage.py
@@ -145,7 +145,7 @@ class MessageCycleManage:

            # get extension
            if "." in message_file.url:
-                extension = f".{message_file.url.split('.')[-1]}"
+                extension = f'.{message_file.url.split(".")[-1]}'
                if len(extension) > 10:
                    extension = ".bin"
            else:
--- a/api/core/external_data_tool/api/api.py
+++ b/api/core/external_data_tool/api/api.py
@@ -62,9 +62,8 @@ class ApiExternalDataTool(ExternalDataTool):

        if not api_based_extension:
            raise ValueError(
-                "[External data tool] API query failed, variable: {}, error: api_based_extension_id is invalid".format(
-                    self.variable
-                )
+                "[External data tool] API query failed, variable: {}, "
+                "error: api_based_extension_id is invalid".format(self.variable)
            )

        # decrypt api_key
--- a/api/core/file/models.py
+++ b/api/core/file/models.py
@@ -90,7 +90,7 @@ class File(BaseModel):
    def markdown(self) -> str:
        url = self.generate_url()
        if self.type == FileType.IMAGE:
-            text = f"![{self.filename or ''}]({url})"
+            text = f'![{self.filename or ""}]({url})'
        else:
            text = f"[{self.filename or url}]({url})"

--- a/api/core/indexing_runner.py
+++ b/api/core/indexing_runner.py
@@ -530,6 +530,7 @@ class IndexingRunner:
        # chunk nodes by chunk size
        indexing_start_at = time.perf_counter()
        tokens = 0
+        chunk_size = 10
        if dataset_document.doc_form != IndexType.PARENT_CHILD_INDEX:
            # create keyword index
            create_keyword_thread = threading.Thread(
@@ -538,22 +539,11 @@ class IndexingRunner:
            )
            create_keyword_thread.start()

-        max_workers = 10
        if dataset.indexing_technique == "high_quality":
-            with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
+            with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
                futures = []
-
-                # Distribute documents into multiple groups based on the hash values of page_content
-                # This is done to prevent multiple threads from processing the same document,
-                # Thereby avoiding potential database insertion deadlocks
-                document_groups: list[list[Document]] = [[] for _ in range(max_workers)]
-                for document in documents:
-                    hash = helper.generate_text_hash(document.page_content)
-                    group_index = int(hash, 16) % max_workers
-                    document_groups[group_index].append(document)
-                for chunk_documents in document_groups:
-                    if len(chunk_documents) == 0:
-                        continue
+                for i in range(0, len(documents), chunk_size):
+                    chunk_documents = documents[i : i + chunk_size]
                    futures.append(
                        executor.submit(
                            self._process_chunk,
--- a/api/core/llm_generator/prompts.py
+++ b/api/core/llm_generator/prompts.py
@@ -131,7 +131,7 @@ JAVASCRIPT_CODE_GENERATOR_PROMPT_TEMPLATE = (
 SUGGESTED_QUESTIONS_AFTER_ANSWER_INSTRUCTION_PROMPT = (
    "Please help me predict the three most likely questions that human would ask, "
    "and keeping each question under 20 characters.\n"
-    "MAKE SURE your output is the SAME language as the Assistant's latest response. "
+    "MAKE SURE your output is the SAME language as the Assistant's latest response"
    "The output must be an array in JSON format following the specified schema:\n"
    '["question1","question2","question3"]\n'
 )
--- a/api/core/model_runtime/model_providers/__base/ai_model.py
+++ b/api/core/model_runtime/model_providers/__base/ai_model.py
@@ -221,12 +221,13 @@ class AIModel(ABC):
        :param credentials: model credentials
        :return: model schema
        """
-        # Try to get model schema from predefined models
-        for predefined_model in self.predefined_models():
-            if model == predefined_model.model:
-                return predefined_model
+        # get predefined models (predefined_models)
+        models = self.predefined_models()
+
+        model_map = {model.model: model for model in models}
+        if model in model_map:
+            return model_map[model]

-        # Try to get model schema from credentials
        if credentials:
            model_schema = self.get_customizable_model_schema_from_credentials(model, credentials)
            if model_schema:
--- a/api/core/model_runtime/model_providers/__base/tokenizers/gpt2_tokenzier.py
+++ b/api/core/model_runtime/model_providers/__base/tokenizers/gpt2_tokenzier.py
@@ -1,9 +1,6 @@
-import logging
 from threading import Lock
 from typing import Any

-logger = logging.getLogger(__name__)
-
 _tokenizer: Any = None
 _lock = Lock()

@@ -46,6 +43,5 @@ class GPT2Tokenizer:
                    base_path = abspath(__file__)
                    gpt2_tokenizer_path = join(dirname(base_path), "gpt2")
                    _tokenizer = TransformerGPT2Tokenizer.from_pretrained(gpt2_tokenizer_path)
-                    logger.info("Fallback to Transformers' GPT-2 tokenizer from tiktoken")

            return _tokenizer
--- a/api/core/model_runtime/model_providers/azure_openai/azure_openai.yaml
+++ b/api/core/model_runtime/model_providers/azure_openai/azure_openai.yaml
@@ -53,9 +53,6 @@ model_credential_schema:
      type: select
      required: true
      options:
-        - label:
-            en_US: 2024-12-01-preview
-          value: 2024-12-01-preview
        - label:
            en_US: 2024-10-01-preview
          value: 2024-10-01-preview
--- a/api/core/model_runtime/model_providers/azure_openai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/azure_openai/llm/llm.py
@@ -108,7 +108,7 @@ class AzureOpenAILargeLanguageModel(_CommonAzureOpenAI, LargeLanguageModel):
        ai_model_entity = self._get_ai_model_entity(base_model_name=base_model_name, model=model)

        if not ai_model_entity:
-            raise CredentialsValidateFailedError(f"Base Model Name {credentials['base_model_name']} is invalid")
+            raise CredentialsValidateFailedError(f'Base Model Name {credentials["base_model_name"]} is invalid')

        try:
            client = AzureOpenAI(**self._to_credential_kwargs(credentials))
--- a/api/core/model_runtime/model_providers/azure_openai/text_embedding/text_embedding.py
+++ b/api/core/model_runtime/model_providers/azure_openai/text_embedding/text_embedding.py
@@ -130,7 +130,7 @@ class AzureOpenAITextEmbeddingModel(_CommonAzureOpenAI, TextEmbeddingModel):
            raise CredentialsValidateFailedError("Base Model Name is required")

        if not self._get_ai_model_entity(credentials["base_model_name"], model):
-            raise CredentialsValidateFailedError(f"Base Model Name {credentials['base_model_name']} is invalid")
+            raise CredentialsValidateFailedError(f'Base Model Name {credentials["base_model_name"]} is invalid')

        try:
            credentials_kwargs = self._to_credential_kwargs(credentials)
--- a/api/core/model_runtime/model_providers/bedrock/bedrock.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/bedrock.yaml
@@ -44,7 +44,6 @@ provider_credential_schema:
      label:
        en_US: AWS Region
        zh_Hans: AWS 地区
-        ja_JP: AWS リージョン
      type: select
      default: us-east-1
      options:
@@ -52,77 +51,62 @@ provider_credential_schema:
          label:
            en_US: US East (N. Virginia)
            zh_Hans: 美国东部 (弗吉尼亚北部)
-            ja_JP: 米国 (バージニア北部)
        - value: us-east-2
          label:
            en_US: US East (Ohio)
-            zh_Hans: 美国东部 (俄亥俄)
-            ja_JP: 米国 (オハイオ)
+            zh_Hans: 美国东部 (弗吉尼亚北部)
        - value: us-west-2
          label:
            en_US: US West (Oregon)
            zh_Hans: 美国西部 (俄勒冈州)
-            ja_JP: 米国 (オレゴン)
        - value: ap-south-1
          label:
            en_US: Asia Pacific (Mumbai)
            zh_Hans: 亚太地区（孟买）
-            ja_JP: アジアパシフィック (ムンバイ)
        - value: ap-southeast-1
          label:
            en_US: Asia Pacific (Singapore)
            zh_Hans: 亚太地区 (新加坡)
-            ja_JP: アジアパシフィック (シンガポール)
        - value: ap-southeast-2
          label:
            en_US: Asia Pacific (Sydney)
            zh_Hans: 亚太地区 (悉尼)
-            ja_JP: アジアパシフィック (シドニー)
        - value: ap-northeast-1
          label:
            en_US: Asia Pacific (Tokyo)
            zh_Hans: 亚太地区 (东京)
-            ja_JP: アジアパシフィック (東京)
        - value: ap-northeast-2
          label:
            en_US: Asia Pacific (Seoul)
            zh_Hans: 亚太地区（首尔）
-            ja_JP: アジアパシフィック (ソウル)
        - value: ca-central-1
          label:
            en_US: Canada (Central)
            zh_Hans: 加拿大（中部）
-            ja_JP: カナダ (中部)
        - value: eu-central-1
          label:
            en_US: Europe (Frankfurt)
            zh_Hans: 欧洲 (法兰克福)
-            ja_JP: 欧州 (フランクフルト)
        - value: eu-west-1
          label:
            en_US: Europe (Ireland)
            zh_Hans: 欧洲（爱尔兰）
-            ja_JP: 欧州 (アイルランド)
        - value: eu-west-2
          label:
            en_US: Europe (London)
            zh_Hans: 欧洲西部 (伦敦)
-            ja_JP: 欧州 (ロンドン)
        - value: eu-west-3
          label:
            en_US: Europe (Paris)
            zh_Hans: 欧洲（巴黎）
-            ja_JP: 欧州 (パリ)
        - value: sa-east-1
          label:
            en_US: South America (São Paulo)
            zh_Hans: 南美洲（圣保罗）
-            ja_JP: 南米 (サンパウロ)
        - value: us-gov-west-1
          label:
            en_US: AWS GovCloud (US-West)
            zh_Hans: AWS GovCloud (US-West)
-            ja_JP: AWS GovCloud (米国西部)
    - variable: model_for_validation
      required: false
      label:
--- a/api/core/model_runtime/model_providers/bedrock/rerank/rerank.py
+++ b/api/core/model_runtime/model_providers/bedrock/rerank/rerank.py
@@ -70,7 +70,7 @@ class BedrockRerankModel(RerankModel):
        rerankingConfiguration = {
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
-                "numberOfResults": min(top_n, len(text_sources)),
+                "numberOfResults": top_n,
                "modelConfiguration": {
                    "modelArn": model_package_arn,
                },
--- a/api/core/model_runtime/model_providers/cohere/llm/llm.py
+++ b/api/core/model_runtime/model_providers/cohere/llm/llm.py
@@ -677,17 +677,16 @@ class CohereLargeLanguageModel(LargeLanguageModel):

        :return: model schema
        """
-        mode = credentials.get("mode")
-        base_model_schema = None
-        for predefined_model in self.predefined_models():
-            if (
-                mode == "chat" and predefined_model.model == "command-light-chat"
-            ) or predefined_model.model == "command-light":
-                base_model_schema = predefined_model
-                break
+        # get model schema
+        models = self.predefined_models()
+        model_map = {model.model: model for model in models}

-        if not base_model_schema:
-            raise ValueError("Model not found")
+        mode = credentials.get("mode")
+
+        if mode == "chat":
+            base_model_schema = model_map["command-light-chat"]
+        else:
+            base_model_schema = model_map["command-light"]

        base_model_schema = cast(AIModelEntity, base_model_schema)

--- a/api/core/model_runtime/model_providers/deepseek/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/deepseek/llm/_position.yaml
@@ -1,3 +1,2 @@
 - deepseek-chat
 - deepseek-coder
- deepseek-reasoner
--- a/api/core/model_runtime/model_providers/deepseek/llm/deepseek-chat.yaml
+++ b/api/core/model_runtime/model_providers/deepseek/llm/deepseek-chat.yaml
@@ -10,7 +10,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
-  context_size: 64000
+  context_size: 128000
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/deepseek/llm/deepseek-coder.yaml
+++ b/api/core/model_runtime/model_providers/deepseek/llm/deepseek-coder.yaml
@@ -10,7 +10,7 @@ features:
  - stream-tool-call
 model_properties:
  mode: chat
-  context_size: 64000
+  context_size: 128000
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/deepseek/llm/deepseek-reasoner.yaml
+++ b/api/core/model_runtime/model_providers/deepseek/llm/deepseek-reasoner.yaml
@@ -1,21 +0,0 @@
-model: deepseek-reasoner
-label:
-  zh_Hans: deepseek-reasoner
-  en_US: deepseek-reasoner
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 64000
-parameter_rules:
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 8192
-    default: 4096
-pricing:
-  input: "4"
-  output: "16"
-  unit: "0.000001"
-  currency: RMB
--- a/api/core/model_runtime/model_providers/deepseek/llm/llm.py
+++ b/api/core/model_runtime/model_providers/deepseek/llm/llm.py
@@ -1,13 +1,10 @@
-import json
 from collections.abc import Generator
 from typing import Optional, Union

-import requests
 from yarl import URL

-from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult
 from core.model_runtime.entities.message_entities import (
-    AssistantPromptMessage,
    PromptMessage,
    PromptMessageTool,
 )
@@ -27,6 +24,9 @@ class DeepseekLargeLanguageModel(OAIAPICompatLargeLanguageModel):
        user: Optional[str] = None,
    ) -> Union[LLMResult, Generator]:
        self._add_custom_parameters(credentials)
+        # {"response_format": "xx"} need convert to {"response_format": {"type": "xx"}}
+        if "response_format" in model_parameters:
+            model_parameters["response_format"] = {"type": model_parameters.get("response_format")}
        return super()._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream)

    def validate_credentials(self, model: str, credentials: dict) -> None:
@@ -39,208 +39,3 @@ class DeepseekLargeLanguageModel(OAIAPICompatLargeLanguageModel):
        credentials["mode"] = LLMMode.CHAT.value
        credentials["function_calling_type"] = "tool_call"
        credentials["stream_function_calling"] = "support"
-
-    def _handle_generate_stream_response(
-        self, model: str, credentials: dict, response: requests.Response, prompt_messages: list[PromptMessage]
-    ) -> Generator:
-        """
-        Handle llm stream response
-
-        :param model: model name
-        :param credentials: model credentials
-        :param response: streamed response
-        :param prompt_messages: prompt messages
-        :return: llm response chunk generator
-        """
-        full_assistant_content = ""
-        chunk_index = 0
-        is_reasoning_started = False  # Add flag to track reasoning state
-
-        def create_final_llm_result_chunk(
-            id: Optional[str], index: int, message: AssistantPromptMessage, finish_reason: str, usage: dict
-        ) -> LLMResultChunk:
-            # calculate num tokens
-            prompt_tokens = usage and usage.get("prompt_tokens")
-            if prompt_tokens is None:
-                prompt_tokens = self._num_tokens_from_string(model, prompt_messages[0].content)
-            completion_tokens = usage and usage.get("completion_tokens")
-            if completion_tokens is None:
-                completion_tokens = self._num_tokens_from_string(model, full_assistant_content)
-
-            # transform usage
-            usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
-
-            return LLMResultChunk(
-                id=id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
-            )
-
-        # delimiter for stream response, need unicode_escape
-        import codecs
-
-        delimiter = credentials.get("stream_mode_delimiter", "\n\n")
-        delimiter = codecs.decode(delimiter, "unicode_escape")
-
-        tools_calls: list[AssistantPromptMessage.ToolCall] = []
-
-        def increase_tool_call(new_tool_calls: list[AssistantPromptMessage.ToolCall]):
-            def get_tool_call(tool_call_id: str):
-                if not tool_call_id:
-                    return tools_calls[-1]
-
-                tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
-                if tool_call is None:
-                    tool_call = AssistantPromptMessage.ToolCall(
-                        id=tool_call_id,
-                        type="function",
-                        function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
-                    )
-                    tools_calls.append(tool_call)
-
-                return tool_call
-
-            for new_tool_call in new_tool_calls:
-                # get tool call
-                tool_call = get_tool_call(new_tool_call.function.name)
-                # update tool call
-                if new_tool_call.id:
-                    tool_call.id = new_tool_call.id
-                if new_tool_call.type:
-                    tool_call.type = new_tool_call.type
-                if new_tool_call.function.name:
-                    tool_call.function.name = new_tool_call.function.name
-                if new_tool_call.function.arguments:
-                    tool_call.function.arguments += new_tool_call.function.arguments
-
-        finish_reason = None  # The default value of finish_reason is None
-        message_id, usage = None, None
-        for chunk in response.iter_lines(decode_unicode=True, delimiter=delimiter):
-            chunk = chunk.strip()
-            if chunk:
-                # ignore sse comments
-                if chunk.startswith(":"):
-                    continue
-                decoded_chunk = chunk.strip().removeprefix("data:").lstrip()
-                if decoded_chunk == "[DONE]":  # Some provider returns "data: [DONE]"
-                    continue
-
-                try:
-                    chunk_json: dict = json.loads(decoded_chunk)
-                # stream ended
-                except json.JSONDecodeError as e:
-                    yield create_final_llm_result_chunk(
-                        id=message_id,
-                        index=chunk_index + 1,
-                        message=AssistantPromptMessage(content=""),
-                        finish_reason="Non-JSON encountered.",
-                        usage=usage,
-                    )
-                    break
-                # handle the error here. for issue #11629
-                if chunk_json.get("error") and chunk_json.get("choices") is None:
-                    raise ValueError(chunk_json.get("error"))
-
-                if chunk_json:
-                    if u := chunk_json.get("usage"):
-                        usage = u
-                if not chunk_json or len(chunk_json["choices"]) == 0:
-                    continue
-
-                choice = chunk_json["choices"][0]
-                finish_reason = chunk_json["choices"][0].get("finish_reason")
-                message_id = chunk_json.get("id")
-                chunk_index += 1
-
-                if "delta" in choice:
-                    delta = choice["delta"]
-                    is_reasoning = delta.get("reasoning_content")
-                    delta_content = delta.get("content") or delta.get("reasoning_content")
-
-                    assistant_message_tool_calls = None
-
-                    if "tool_calls" in delta and credentials.get("function_calling_type", "no_call") == "tool_call":
-                        assistant_message_tool_calls = delta.get("tool_calls", None)
-                    elif (
-                        "function_call" in delta
-                        and credentials.get("function_calling_type", "no_call") == "function_call"
-                    ):
-                        assistant_message_tool_calls = [
-                            {"id": "tool_call_id", "type": "function", "function": delta.get("function_call", {})}
-                        ]
-
-                    # assistant_message_function_call = delta.delta.function_call
-
-                    # extract tool calls from response
-                    if assistant_message_tool_calls:
-                        tool_calls = self._extract_response_tool_calls(assistant_message_tool_calls)
-                        increase_tool_call(tool_calls)
-
-                    if delta_content is None or delta_content == "":
-                        continue
-
-                    # Add markdown quote markers for reasoning content
-                    if is_reasoning:
-                        if not is_reasoning_started:
-                            delta_content = "> 💭 " + delta_content
-                            is_reasoning_started = True
-                        elif "\n\n" in delta_content:
-                            delta_content = delta_content.replace("\n\n", "\n> ")
-                        elif "\n" in delta_content:
-                            delta_content = delta_content.replace("\n", "\n> ")
-                    elif is_reasoning_started:
-                        # If we were in reasoning mode but now getting regular content,
-                        # add \n\n to close the reasoning block
-                        delta_content = "\n\n" + delta_content
-                        is_reasoning_started = False
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(
-                        content=delta_content,
-                    )
-
-                    # reset tool calls
-                    tool_calls = []
-                    full_assistant_content += delta_content
-                elif "text" in choice:
-                    choice_text = choice.get("text", "")
-                    if choice_text == "":
-                        continue
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(content=choice_text)
-                    full_assistant_content += choice_text
-                else:
-                    continue
-
-                yield LLMResultChunk(
-                    id=message_id,
-                    model=model,
-                    prompt_messages=prompt_messages,
-                    delta=LLMResultChunkDelta(
-                        index=chunk_index,
-                        message=assistant_prompt_message,
-                    ),
-                )
-
-            chunk_index += 1
-
-        if tools_calls:
-            yield LLMResultChunk(
-                id=message_id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(
-                    index=chunk_index,
-                    message=AssistantPromptMessage(tool_calls=tools_calls, content=""),
-                ),
-            )
-
-        yield create_final_llm_result_chunk(
-            id=message_id,
-            index=chunk_index,
-            message=AssistantPromptMessage(content=""),
-            finish_reason=finish_reason,
-            usage=usage,
-        )
--- a/api/core/model_runtime/model_providers/google/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/google/llm/_position.yaml
@@ -1,6 +1,5 @@
 - gemini-2.0-flash-exp
 - gemini-2.0-flash-thinking-exp-1219
- gemini-2.0-flash-thinking-exp-01-21
 - gemini-1.5-pro
 - gemini-1.5-pro-latest
 - gemini-1.5-pro-001
--- a/api/core/model_runtime/model_providers/google/llm/gemini-2.0-flash-thinking-exp-01-21.yaml
+++ b/api/core/model_runtime/model_providers/google/llm/gemini-2.0-flash-thinking-exp-01-21.yaml
@@ -1,39 +0,0 @@
-model: gemini-2.0-flash-thinking-exp-01-21
-label:
-  en_US: Gemini 2.0 Flash Thinking Exp 01-21
-model_type: llm
-features:
-  - agent-thought
-  - vision
-  - document
-  - video
-  - audio
-model_properties:
-  mode: chat
-  context_size: 32767
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-  - name: top_p
-    use_template: top_p
-  - name: top_k
-    label:
-      zh_Hans: 取样数量
-      en_US: Top k
-    type: int
-    help:
-      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
-      en_US: Only sample from the top K options for each subsequent token.
-    required: false
-  - name: max_output_tokens
-    use_template: max_tokens
-    default: 8192
-    min: 1
-    max: 8192
-  - name: json_schema
-    use_template: json_schema
-pricing:
-  input: '0.00'
-  output: '0.00'
-  unit: '0.000001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/huggingface_hub/text_embedding/text_embedding.py
+++ b/api/core/model_runtime/model_providers/huggingface_hub/text_embedding/text_embedding.py
@@ -162,9 +162,9 @@ class HuggingfaceHubTextEmbeddingModel(_CommonHuggingfaceHub, TextEmbeddingModel
    @staticmethod
    def _check_endpoint_url_model_repository_name(credentials: dict, model_name: str):
        try:
-            url = f"{HUGGINGFACE_ENDPOINT_API}{credentials['huggingface_namespace']}"
+            url = f'{HUGGINGFACE_ENDPOINT_API}{credentials["huggingface_namespace"]}'
            headers = {
-                "Authorization": f"Bearer {credentials['huggingfacehub_api_token']}",
+                "Authorization": f'Bearer {credentials["huggingfacehub_api_token"]}',
                "Content-Type": "application/json",
            }

--- a/api/core/model_runtime/model_providers/minimax/llm/llm.py
+++ b/api/core/model_runtime/model_providers/minimax/llm/llm.py
@@ -34,7 +34,6 @@ from core.model_runtime.model_providers.minimax.llm.types import MinimaxMessage

 class MinimaxLargeLanguageModel(LargeLanguageModel):
    model_apis = {
-        "minimax-text-01": MinimaxChatCompletionPro,
        "abab7-chat-preview": MinimaxChatCompletionPro,
        "abab6.5t-chat": MinimaxChatCompletionPro,
        "abab6.5s-chat": MinimaxChatCompletionPro,
--- a/api/core/model_runtime/model_providers/minimax/llm/minimax-text-01.yaml
+++ b/api/core/model_runtime/model_providers/minimax/llm/minimax-text-01.yaml
@@ -1,46 +0,0 @@
-model: minimax-text-01
-label:
-  en_US: Minimax-Text-01
-model_type: llm
-features:
-  - agent-thought
-  - tool-call
-  - stream-tool-call
-model_properties:
-  mode: chat
-  context_size: 1000192
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0.01
-    max: 1
-    default: 0.1
-  - name: top_p
-    use_template: top_p
-    min: 0.01
-    max: 1
-    default: 0.95
-  - name: max_tokens
-    use_template: max_tokens
-    required: true
-    default: 2048
-    min: 1
-    max: 1000192
-  - name: mask_sensitive_info
-    type: boolean
-    default: true
-    label:
-      zh_Hans: 隐私保护
-      en_US: Moderate
-    help:
-      zh_Hans: 对输出中易涉及隐私问题的文本信息进行打码，目前包括但不限于邮箱、域名、链接、证件号、家庭住址等，默认true，即开启打码
-      en_US: Mask the sensitive info of the generated content, such as email/domain/link/address/phone/id..
-  - name: presence_penalty
-    use_template: presence_penalty
-  - name: frequency_penalty
-    use_template: frequency_penalty
-pricing:
-  input: '0.001'
-  output: '0.008'
-  unit: '0.001'
-  currency: RMB
--- a/api/core/model_runtime/model_providers/moonshot/llm/llm.py
+++ b/api/core/model_runtime/model_providers/moonshot/llm/llm.py
@@ -44,6 +44,9 @@ class MoonshotLargeLanguageModel(OAIAPICompatLargeLanguageModel):
        self._add_custom_parameters(credentials)
        self._add_function_call(model, credentials)
        user = user[:32] if user else None
+        # {"response_format": "json_object"} need convert to {"response_format": {"type": "json_object"}}
+        if "response_format" in model_parameters:
+            model_parameters["response_format"] = {"type": model_parameters.get("response_format")}
        return super()._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)

    def validate_credentials(self, model: str, credentials: dict) -> None:
--- a/api/core/model_runtime/model_providers/novita/_assets/icon_l_en.svg
+++ b/api/core/model_runtime/model_providers/novita/_assets/icon_l_en.svg
@@ -1,11 +1,19 @@
-<svg width="88" height="24" viewBox="0 0 88 24" fill="none" xmlns="http://www.w3.org/2000/svg">
-<g clip-path="url(#clip0_1923_1287)">
-<path d="M24 18.8323V18.8326H14.3246L9.16716 13.6751V18.8326H0V18.8314L9.16716 9.66422V4H9.16774L24 18.8323Z" fill="black"/>
-</g>
-<path fill-rule="evenodd" clip-rule="evenodd" d="M73.2505 16.8061H76.5869V18.9145H73.9391C72.0857 18.9145 70.9202 17.8952 70.9202 15.9977V10.3921H69.0316V8.26609H70.9202L71.4677 5.47209H73.2329V8.26609H76.5869V10.3921H73.2505V16.8061ZM33.8133 4.85699L38.6679 15.681H38.809V4.85699H41.3333V18.9145H37.52L32.6654 8.09046H32.5243V18.9145H30V4.85699H33.8133ZM47.812 19.1254C44.7225 19.1254 42.7457 16.9641 42.7457 13.6079C42.7457 10.2517 44.6873 8.05518 47.812 8.05518C50.9367 8.05518 52.8429 10.1635 52.8429 13.6079C52.8429 17.0523 50.9014 19.1254 47.812 19.1254ZM47.812 17.017C49.1891 17.017 50.3363 16.5423 50.3715 15.1894V12.0265C50.3715 10.6383 49.2068 10.1635 47.812 10.1635C46.4172 10.1635 45.2171 10.6383 45.2171 12.0265V15.1894C45.2524 16.5599 46.4348 17.017 47.812 17.017ZM55.5444 8.24846L58.2979 16.6826H58.439L61.1926 8.24846H63.7346L59.9389 18.8968H56.7966L53.0186 8.24846H55.5429H55.5444ZM65.0419 8.26609H67.3722V18.9145H65.0419V8.26609ZM64.9001 4.85699H67.5126V6.86027H64.9001V4.85699ZM82.3064 19.143C79.4639 19.143 77.6458 16.9817 77.6458 13.6079C77.6458 10.2341 79.4286 8.07282 82.3064 8.07282C83.6483 8.07282 84.7425 8.59973 85.3958 9.58373H85.5369L85.9962 8.26609H87.7614V18.9145H85.9962L85.5369 17.6314H85.3958C84.6896 18.5625 83.5072 19.1423 82.3064 19.1423V19.143ZM82.7826 17.017C84.1774 17.017 85.3951 16.5776 85.4304 15.1894V12.0265C85.4304 10.603 84.159 10.1988 82.7297 10.1988C81.3004 10.1988 80.1172 10.6383 80.1172 12.0265V15.1894C80.1525 16.5952 81.3709 17.017 82.7826 17.017Z" fill="black"/>
+<svg width="162" height="36" viewBox="0 0 162 36" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path fill-rule="evenodd" clip-rule="evenodd" d="M2 0C0.895431 0 0 0.895432 0 2V29.1891C0 30.2937 0.895433 31.1891 2 31.1891H5.51171L16.0608 35.1377C16.7145 35.3824 17.4114 34.8991 17.4114 34.2012V11.3669C17.4114 10.533 16.894 9.78665 16.1131 9.49405L5.51171 5.52152H25.58V31.1891H29.0917C30.1963 31.1891 31.0917 30.2937 31.0917 29.1891V2C31.0917 0.895431 30.1963 0 29.0917 0H2ZM14.6022 23.7351C15.0558 23.956 15.4239 23.6812 15.4239 23.1185C15.4239 22.5557 15.0558 21.9204 14.6022 21.6995C14.1486 21.4775 13.7804 21.7545 13.7804 22.3161C13.7804 22.8777 14.1486 23.513 14.6022 23.7351Z" fill="white"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M2 0C0.895431 0 0 0.895432 0 2V29.1891C0 30.2937 0.895433 31.1891 2 31.1891H5.51171L16.0608 35.1377C16.7145 35.3824 17.4114 34.8991 17.4114 34.2012V11.3669C17.4114 10.533 16.894 9.78665 16.1131 9.49405L5.51171 5.52152H25.58V31.1891H29.0917C30.1963 31.1891 31.0917 30.2937 31.0917 29.1891V2C31.0917 0.895431 30.1963 0 29.0917 0H2ZM14.6022 23.7351C15.0558 23.956 15.4239 23.6812 15.4239 23.1185C15.4239 22.5557 15.0558 21.9204 14.6022 21.6995C14.1486 21.4775 13.7804 21.7545 13.7804 22.3161C13.7804 22.8777 14.1486 23.513 14.6022 23.7351Z" fill="url(#paint0_linear_1473_71)"/>
+<path d="M55.9397 27.8804H59.0566V19.0803C59.0566 14.9105 56.381 12.7172 52.8228 12.7172C51.0023 12.7172 49.3197 13.4483 48.2991 14.6668V12.9609H45.1546V27.8804H48.2991V19.5406C48.2991 16.8059 49.8162 15.3978 52.1332 15.3978C54.4226 15.3978 55.9397 16.8059 55.9397 19.5406V27.8804Z" fill="#11101A"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M69.7881 12.7172C74.1187 12.7172 77.539 15.7228 77.539 20.4071C77.539 25.0915 74.0083 28.1241 69.6502 28.1241C65.3196 28.1241 62.0372 25.0915 62.0372 20.4071C62.0372 15.7228 65.4575 12.7172 69.7881 12.7172ZM69.7342 15.3979C67.362 15.3979 65.2381 17.0225 65.2381 20.4071C65.2381 23.7918 67.2793 25.4435 69.6514 25.4435C71.996 25.4435 74.313 23.7918 74.313 20.4071C74.313 17.0225 72.0788 15.3979 69.7342 15.3979Z" fill="#11101A"/>
+<path d="M78.861 12.9609L84.6259 27.8804H88.3772L94.1697 12.9609H90.8321L86.5291 25.1185L82.2261 12.9609H78.861Z" fill="#11101A"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M100.13 9.00761C100.13 10.1178 99.2477 10.9842 98.1443 10.9842C97.0134 10.9842 96.1308 10.1178 96.1308 9.00761C96.1308 7.89745 97.0134 7.03098 98.1443 7.03098C99.2477 7.03098 100.13 7.89745 100.13 9.00761ZM99.6882 27.8804H96.5437V12.9609H99.6882V27.8804Z" fill="#11101A"/>
+<path d="M104.322 23.7376C104.322 26.7702 106.004 27.8804 108.708 27.8804H111.19V25.308H109.259C107.935 25.308 107.494 24.8477 107.494 23.7376V15.479H111.19V12.9609H107.494V9.25128H104.322V12.9609H102.529V15.479H104.322V23.7376Z" fill="#11101A"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M120.154 28.1241C116.209 28.1241 113.037 24.9561 113.037 20.353C113.037 15.7498 116.209 12.7172 120.209 12.7172C122.774 12.7172 124.539 13.9086 125.477 15.1271V12.9609H128.649V27.8804H125.477V25.6601C124.512 26.9327 122.691 28.1241 120.154 28.1241ZM120.87 25.4435C123.242 25.4435 125.476 23.6293 125.476 20.4071C125.476 17.212 123.242 15.3979 120.87 15.3979C118.526 15.3979 116.264 17.1308 116.264 20.353C116.264 23.5752 118.526 25.4435 120.87 25.4435Z" fill="#11101A"/>
+<path d="M136.043 26.0933C136.043 24.9832 135.16 24.1167 134.057 24.1167C132.926 24.1167 132.043 24.9832 132.043 26.0933C132.043 27.2035 132.926 28.07 134.057 28.07C135.16 28.07 136.043 27.2035 136.043 26.0933Z" fill="#11101A"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M145.502 28.1241C141.558 28.1241 138.386 24.9561 138.386 20.353C138.386 15.7498 141.558 12.7172 145.557 12.7172C148.123 12.7172 149.888 13.9086 150.826 15.1271V12.9609H153.998V27.8804H150.826V25.6601C149.86 26.9327 148.04 28.1241 145.502 28.1241ZM146.219 25.4435C148.591 25.4435 150.825 23.6293 150.825 20.4071C150.825 17.212 148.591 15.3979 146.219 15.3979C143.874 15.3979 141.612 17.1308 141.612 20.353C141.612 23.5752 143.874 25.4435 146.219 25.4435Z" fill="#11101A"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M161.722 9.00761C161.722 10.1178 160.84 10.9842 159.736 10.9842C158.605 10.9842 157.723 10.1178 157.723 9.00761C157.723 7.89745 158.605 7.03098 159.736 7.03098C160.84 7.03098 161.722 7.89745 161.722 9.00761ZM161.28 27.8804H158.136V12.9609H161.28V27.8804Z" fill="#11101A"/>
 <defs>
-<clipPath id="clip0_1923_1287">
-<rect width="24" height="14.8326" fill="white" transform="translate(0 4)"/>
-</clipPath>
+<linearGradient id="paint0_linear_1473_71" x1="31" y1="-2" x2="0.975591" y2="14.2625" gradientUnits="userSpaceOnUse">
+<stop stop-color="#2622FF"/>
+<stop offset="1" stop-color="#A717FF"/>
+</linearGradient>
 </defs>
 </svg>
--- a/api/core/model_runtime/model_providers/novita/_assets/icon_s_en.svg
+++ b/api/core/model_runtime/model_providers/novita/_assets/icon_s_en.svg
@@ -1,3 +1,10 @@
-<svg width="24" height="15" viewBox="0 0 24 15" fill="none" xmlns="http://www.w3.org/2000/svg">
-<path d="M24 14.8323V14.8326H14.3246L9.16716 9.67507V14.8326H0V14.8314L9.16716 5.66422V0H9.16774L24 14.8323Z" fill="black"/>
+<svg width="32" height="36" viewBox="0 0 32 36" fill="none" xmlns="http://www.w3.org/2000/svg">
+<path fill-rule="evenodd" clip-rule="evenodd" d="M2 0C0.895431 0 0 0.895432 0 2V29.1891C0 30.2937 0.895433 31.1891 2 31.1891H5.51171L16.0608 35.1377C16.7145 35.3824 17.4114 34.8991 17.4114 34.2012V11.3669C17.4114 10.533 16.894 9.78665 16.1131 9.49405L5.51171 5.52152H25.58V31.1891H29.0917C30.1963 31.1891 31.0917 30.2937 31.0917 29.1891V2C31.0917 0.895431 30.1963 0 29.0917 0H2ZM14.6022 23.7351C15.0558 23.956 15.4239 23.6812 15.4239 23.1185C15.4239 22.5557 15.0558 21.9204 14.6022 21.6995C14.1486 21.4775 13.7804 21.7545 13.7804 22.3161C13.7804 22.8777 14.1486 23.513 14.6022 23.7351Z" fill="white"/>
+<path fill-rule="evenodd" clip-rule="evenodd" d="M2 0C0.895431 0 0 0.895432 0 2V29.1891C0 30.2937 0.895433 31.1891 2 31.1891H5.51171L16.0608 35.1377C16.7145 35.3824 17.4114 34.8991 17.4114 34.2012V11.3669C17.4114 10.533 16.894 9.78665 16.1131 9.49405L5.51171 5.52152H25.58V31.1891H29.0917C30.1963 31.1891 31.0917 30.2937 31.0917 29.1891V2C31.0917 0.895431 30.1963 0 29.0917 0H2ZM14.6022 23.7351C15.0558 23.956 15.4239 23.6812 15.4239 23.1185C15.4239 22.5557 15.0558 21.9204 14.6022 21.6995C14.1486 21.4775 13.7804 21.7545 13.7804 22.3161C13.7804 22.8777 14.1486 23.513 14.6022 23.7351Z" fill="url(#paint0_linear_1473_97)"/>
+<defs>
+<linearGradient id="paint0_linear_1473_97" x1="31" y1="-2" x2="0.975591" y2="14.2625" gradientUnits="userSpaceOnUse">
+<stop stop-color="#2622FF"/>
+<stop offset="1" stop-color="#A717FF"/>
+</linearGradient>
+</defs>
 </svg>
--- a/api/core/model_runtime/model_providers/novita/llm/L3-8B-Stheno-v3.2.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/L3-8B-Stheno-v3.2.yaml
@@ -1,41 +0,0 @@
-model: Sao10K/L3-8B-Stheno-v3.2
-label:
-  zh_Hans: L3 8B Stheno V3.2
-  en_US: L3 8B Stheno V3.2
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 8192
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0005'
-  output: '0.0005'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/Nous-Hermes-2-Mixtral-8x7B-DPO.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/Nous-Hermes-2-Mixtral-8x7B-DPO.yaml
@@ -1,7 +1,7 @@
-model: qwen/qwen-2-vl-72b-instruct
+model: Nous-Hermes-2-Mixtral-8x7B-DPO
 label:
-  zh_Hans: Qwen 2 VL 72B Instruct
-  en_US: Qwen 2 VL 72B Instruct
+  zh_Hans: Nous-Hermes-2-Mixtral-8x7B-DPO
+  en_US: Nous-Hermes-2-Mixtral-8x7B-DPO
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0045'
-  output: '0.0045'
+  input: '0.0027'
+  output: '0.0027'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/_position.yaml
@@ -1,41 +0,0 @@
-# Deepseek Models
- deepseek/deepseek-r1
- deepseek/deepseek_v3
-
-# LLaMA Models
- meta-llama/llama-3.3-70b-instruct
- meta-llama/llama-3.2-11b-vision-instruct
- meta-llama/llama-3.2-3b-instruct
- meta-llama/llama-3.2-1b-instruct
- meta-llama/llama-3.1-70b-instruct
- meta-llama/llama-3.1-8b-instruct
- meta-llama/llama-3.1-8b-instruct-max
- meta-llama/llama-3.1-8b-instruct-bf16
- meta-llama/llama-3-70b-instruct
- meta-llama/llama-3-8b-instruct
-
-# Mistral Models
- mistralai/mistral-nemo
- mistralai/mistral-7b-instruct
-
-# Qwen Models
- qwen/qwen-2.5-72b-instruct
- qwen/qwen-2-72b-instruct
- qwen/qwen-2-vl-72b-instruct
- qwen/qwen-2-7b-instruct
-
-# Other Models
- sao10k/L3-8B-Stheno-v3.2
- sao10k/l3-70b-euryale-v2.1
- sao10k/l31-70b-euryale-v2.2
- sao10k/l3-8b-lunaris
- jondurbin/airoboros-l2-70b
- cognitivecomputations/dolphin-mixtral-8x22b
- google/gemma-2-9b-it
- nousresearch/hermes-2-pro-llama-3-8b
- sophosympatheia/midnight-rose-70b
- gryphe/mythomax-l2-13b
- nousresearch/nous-hermes-llama2-13b
- openchat/openchat-7b
- teknium/openhermes-2.5-mistral-7b
- microsoft/wizardlm-2-8x22b
--- a/api/core/model_runtime/model_providers/novita/llm/airoboros-l2-70b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/airoboros-l2-70b.yaml
@@ -1,7 +1,7 @@
 model: jondurbin/airoboros-l2-70b
 label:
-  zh_Hans: Airoboros L2 70B
-  en_US: Airoboros L2 70B
+  zh_Hans: jondurbin/airoboros-l2-70b
+  en_US: jondurbin/airoboros-l2-70b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/deepseek-r1.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/deepseek-r1.yaml
@@ -1,41 +0,0 @@
-model: deepseek/deepseek-r1
-label:
-  zh_Hans: DeepSeek R1
-  en_US: DeepSeek R1
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 64000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.04'
-  output: '0.04'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/deepseek_v3.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/deepseek_v3.yaml
@@ -1,41 +0,0 @@
-model: deepseek/deepseek_v3
-label:
-  zh_Hans: DeepSeek V3
-  en_US: DeepSeek V3
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 64000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0089'
-  output: '0.0089'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/dolphin-mixtral-8x22b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/dolphin-mixtral-8x22b.yaml
@@ -1,7 +1,7 @@
 model: cognitivecomputations/dolphin-mixtral-8x22b
 label:
-  zh_Hans: Dolphin Mixtral 8x22B
-  en_US: Dolphin Mixtral 8x22B
+  zh_Hans: cognitivecomputations/dolphin-mixtral-8x22b
+  en_US: cognitivecomputations/dolphin-mixtral-8x22b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/gemma-2-9b-it.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/gemma-2-9b-it.yaml
@@ -1,7 +1,7 @@
 model: google/gemma-2-9b-it
 label:
-  zh_Hans: Gemma 2 9B
-  en_US: Gemma 2 9B
+  zh_Hans: google/gemma-2-9b-it
+  en_US: google/gemma-2-9b-it
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/hermes-2-pro-llama-3-8b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/hermes-2-pro-llama-3-8b.yaml
@@ -1,7 +1,7 @@
 model: nousresearch/hermes-2-pro-llama-3-8b
 label:
-  zh_Hans: Hermes 2 Pro Llama 3 8B
-  en_US: Hermes 2 Pro Llama 3 8B
+  zh_Hans: nousresearch/hermes-2-pro-llama-3-8b
+  en_US: nousresearch/hermes-2-pro-llama-3-8b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/l3-70b-euryale-v2.1.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/l3-70b-euryale-v2.1.yaml
@@ -1,7 +1,7 @@
 model: sao10k/l3-70b-euryale-v2.1
 label:
-  zh_Hans: "L3 70B Euryale V2.1\t"
-  en_US: "L3 70B Euryale V2.1\t"
+  zh_Hans: sao10k/l3-70b-euryale-v2.1
+  en_US: sao10k/l3-70b-euryale-v2.1
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/l3-8b-lunaris.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/l3-8b-lunaris.yaml
@@ -1,41 +0,0 @@
-model: sao10k/l3-8b-lunaris
-label:
-  zh_Hans: "Sao10k L3 8B Lunaris"
-  en_US: "Sao10k L3 8B Lunaris"
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 8192
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0005'
-  output: '0.0005'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/l31-70b-euryale-v2.2.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/l31-70b-euryale-v2.2.yaml
@@ -1,41 +0,0 @@
-model: sao10k/l31-70b-euryale-v2.2
-label:
-  zh_Hans: L31 70B Euryale V2.2
-  en_US: L31 70B Euryale V2.2
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 16000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0148'
-  output: '0.0148'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3-70b-instruct.yaml
@@ -1,7 +1,7 @@
 model: meta-llama/llama-3-70b-instruct
 label:
-  zh_Hans: Llama3 70b Instruct
-  en_US: Llama3 70b Instruct
+  zh_Hans: meta-llama/llama-3-70b-instruct
+  en_US: meta-llama/llama-3-70b-instruct
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3-8b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3-8b-instruct.yaml
@@ -1,7 +1,7 @@
 model: meta-llama/llama-3-8b-instruct
 label:
-  zh_Hans: Llama 3 8B Instruct
-  en_US: Llama 3 8B Instruct
+  zh_Hans: meta-llama/llama-3-8b-instruct
+  en_US: meta-llama/llama-3-8b-instruct
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0004'
-  output: '0.0004'
+  input: '0.00063'
+  output: '0.00063'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.1-405b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.1-405b-instruct.yaml
@@ -1,7 +1,7 @@
-model: meta-llama/llama-3.2-3b-instruct
+model: meta-llama/llama-3.1-405b-instruct
 label:
-  zh_Hans: Llama 3.2 3B Instruct
-  en_US: Llama 3.2 3B Instruct
+  zh_Hans: meta-llama/llama-3.1-405b-instruct
+  en_US: meta-llama/llama-3.1-405b-instruct
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0003'
-  output: '0.0005'
+  input: '0.03'
+  output: '0.05'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.1-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.1-70b-instruct.yaml
@@ -1,13 +1,13 @@
 model: meta-llama/llama-3.1-70b-instruct
 label:
-  zh_Hans: Llama 3.1 70B Instruct
-  en_US: Llama 3.1 70B Instruct
+  zh_Hans: meta-llama/llama-3.1-70b-instruct
+  en_US: meta-llama/llama-3.1-70b-instruct
 model_type: llm
 features:
  - agent-thought
 model_properties:
  mode: chat
-  context_size: 32768
+  context_size: 8192
 parameter_rules:
  - name: temperature
    use_template: temperature
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0034'
-  output: '0.0039'
+  input: '0.0055'
+  output: '0.0076'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct-bf16.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct-bf16.yaml
@@ -1,41 +0,0 @@
-model: meta-llama/llama-3.1-8b-instruct-bf16
-label:
-  zh_Hans: Llama 3.1 8B Instruct BF16
-  en_US: Llama 3.1 8B Instruct BF16
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 8192
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0006'
-  output: '0.0006'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct-max.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct-max.yaml
@@ -1,41 +0,0 @@
-model: meta-llama/llama-3.1-8b-instruct-max
-label:
-  zh_Hans: "Llama3.1 8B Instruct Max\t"
-  en_US: "Llama3.1 8B Instruct Max\t"
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 16384
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0005'
-  output: '0.0005'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.1-8b-instruct.yaml
@@ -1,13 +1,13 @@
 model: meta-llama/llama-3.1-8b-instruct
 label:
-  zh_Hans: Llama 3.1 8B Instruct
-  en_US: Llama 3.1 8B Instruct
+  zh_Hans: meta-llama/llama-3.1-8b-instruct
+  en_US: meta-llama/llama-3.1-8b-instruct
 model_type: llm
 features:
  - agent-thought
 model_properties:
  mode: chat
-  context_size: 16384
+  context_size: 8192
 parameter_rules:
  - name: temperature
    use_template: temperature
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0005'
-  output: '0.0005'
+  input: '0.001'
+  output: '0.001'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.2-11b-vision-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.2-11b-vision-instruct.yaml
@@ -1,41 +0,0 @@
-model: meta-llama/llama-3.2-11b-vision-instruct
-label:
-  zh_Hans: "Llama 3.2 11B Vision Instruct\t"
-  en_US: "Llama 3.2 11B Vision Instruct\t"
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 32768
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0006'
-  output: '0.0006'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.2-1b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.2-1b-instruct.yaml
@@ -1,41 +0,0 @@
-model: meta-llama/llama-3.2-1b-instruct
-label:
-  zh_Hans: "Llama 3.2 1B Instruct\t"
-  en_US: "Llama 3.2 1B Instruct\t"
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 131000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0002'
-  output: '0.0002'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/llama-3.3-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/llama-3.3-70b-instruct.yaml
@@ -1,41 +0,0 @@
-model: meta-llama/llama-3.3-70b-instruct
-label:
-  zh_Hans: Llama 3.3 70B Instruct
-  en_US: Llama 3.3 70B Instruct
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 131072
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0039'
-  output: '0.0039'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/openchat-7b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/openchat-7b.yaml
@@ -1,7 +1,7 @@
-model: openchat/openchat-7b
+model: lzlv_70b
 label:
-  zh_Hans: OpenChat 7B
-  en_US: OpenChat 7B
+  zh_Hans: lzlv_70b
+  en_US: lzlv_70b
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0006'
-  output: '0.0006'
+  input: '0.0058'
+  output: '0.0078'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/midnight-rose-70b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/midnight-rose-70b.yaml
@@ -1,7 +1,7 @@
 model: sophosympatheia/midnight-rose-70b
 label:
-  zh_Hans: Midnight Rose 70B
-  en_US: Midnight Rose 70B
+  zh_Hans: sophosympatheia/midnight-rose-70b
+  en_US: sophosympatheia/midnight-rose-70b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/mistral-7b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/mistral-7b-instruct.yaml
@@ -1,7 +1,7 @@
 model: mistralai/mistral-7b-instruct
 label:
-  zh_Hans: Mistral 7B Instruct
-  en_US: Mistral 7B Instruct
+  zh_Hans: mistralai/mistral-7b-instruct
+  en_US: mistralai/mistral-7b-instruct
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/mistral-nemo.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/mistral-nemo.yaml
@@ -1,41 +0,0 @@
-model: mistralai/mistral-nemo
-label:
-  zh_Hans: Mistral Nemo
-  en_US: Mistral Nemo
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 131072
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0017'
-  output: '0.0017'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/mythomax-l2-13b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/mythomax-l2-13b.yaml
@@ -1,7 +1,7 @@
 model: gryphe/mythomax-l2-13b
 label:
-  zh_Hans: Mythomax L2 13B
-  en_US: Mythomax L2 13B
+  zh_Hans: gryphe/mythomax-l2-13b
+  en_US: gryphe/mythomax-l2-13b
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0009'
-  output: '0.0009'
+  input: '0.00119'
+  output: '0.00119'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/nous-hermes-llama2-13b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/nous-hermes-llama2-13b.yaml
@@ -1,7 +1,7 @@
 model: nousresearch/nous-hermes-llama2-13b
 label:
-  zh_Hans: Nous Hermes Llama2 13B
-  en_US: Nous Hermes Llama2 13B
+  zh_Hans: nousresearch/nous-hermes-llama2-13b
+  en_US: nousresearch/nous-hermes-llama2-13b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/openhermes-2.5-mistral-7b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/openhermes-2.5-mistral-7b.yaml
@@ -1,7 +1,7 @@
 model: teknium/openhermes-2.5-mistral-7b
 label:
-  zh_Hans: Openhermes2.5 Mistral 7B
-  en_US: Openhermes2.5 Mistral 7B
+  zh_Hans: teknium/openhermes-2.5-mistral-7b
+  en_US: teknium/openhermes-2.5-mistral-7b
 model_type: llm
 features:
  - agent-thought
--- a/api/core/model_runtime/model_providers/novita/llm/qwen-2-72b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/qwen-2-72b-instruct.yaml
@@ -1,41 +0,0 @@
-model: qwen/qwen-2-72b-instruct
-label:
-  zh_Hans: Qwen2 72B Instruct
-  en_US: Qwen2 72B Instruct
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 32768
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0034'
-  output: '0.0039'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/qwen-2-7b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/qwen-2-7b-instruct.yaml
@@ -1,41 +0,0 @@
-model: qwen/qwen-2-7b-instruct
-label:
-  zh_Hans: Qwen 2 7B Instruct
-  en_US: Qwen 2 7B Instruct
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 32768
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.00054'
-  output: '0.00054'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/qwen-2.5-72b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/qwen-2.5-72b-instruct.yaml
@@ -1,41 +0,0 @@
-model: qwen/qwen-2.5-72b-instruct
-label:
-  zh_Hans: Qwen 2.5 72B Instruct
-  en_US: Qwen 2.5 72B Instruct
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 32000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    min: 0
-    max: 2
-    default: 1
-  - name: top_p
-    use_template: top_p
-    min: 0
-    max: 1
-    default: 1
-  - name: max_tokens
-    use_template: max_tokens
-    min: 1
-    max: 2048
-    default: 512
-  - name: frequency_penalty
-    use_template: frequency_penalty
-    min: -2
-    max: 2
-    default: 0
-  - name: presence_penalty
-    use_template: presence_penalty
-    min: -2
-    max: 2
-    default: 0
-pricing:
-  input: '0.0038'
-  output: '0.004'
-  unit: '0.0001'
-  currency: USD
--- a/api/core/model_runtime/model_providers/novita/llm/wizardlm-2-8x22b.yaml
+++ b/api/core/model_runtime/model_providers/novita/llm/wizardlm-2-8x22b.yaml
@@ -1,7 +1,7 @@
 model: microsoft/wizardlm-2-8x22b
 label:
-  zh_Hans: Wizardlm 2 8x22B
-  en_US: Wizardlm 2 8x22B
+  zh_Hans: microsoft/wizardlm-2-8x22b
+  en_US: microsoft/wizardlm-2-8x22b
 model_type: llm
 features:
  - agent-thought
@@ -35,7 +35,7 @@ parameter_rules:
    max: 2
    default: 0
 pricing:
-  input: '0.0062'
-  output: '0.0062'
+  input: '0.0064'
+  output: '0.0064'
  unit: '0.0001'
  currency: USD
--- a/api/core/model_runtime/model_providers/novita/novita.yaml
+++ b/api/core/model_runtime/model_providers/novita/novita.yaml
@@ -1,6 +1,6 @@
 provider: novita
 label:
-  en_US: Novita AI
+  en_US: novita.ai
 description:
  en_US: An LLM API that matches various application scenarios with high cost-effectiveness.
  zh_Hans: 适配多种海外应用场景的高性价比 LLM API
@@ -8,13 +8,13 @@ icon_small:
  en_US: icon_s_en.svg
 icon_large:
  en_US: icon_l_en.svg
-background: "#c7fce2"
+background: "#eadeff"
 help:
  title:
-    en_US: Get your API key from Novita AI
-    zh_Hans: 从 Novita AI 获取 API Key
+    en_US: Get your API key from novita.ai
+    zh_Hans: 从 novita.ai 获取 API Key
  url:
-    en_US: https://novita.ai/settings/key-management?utm_source=dify&utm_medium=ch&utm_campaign=api
+    en_US: https://novita.ai/settings#key-management?utm_source=dify&utm_medium=ch&utm_campaign=api
 supported_model_types:
  - llm
 configurate_methods:
--- a/api/core/model_runtime/model_providers/openai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/openai/llm/llm.py
@@ -1,6 +1,5 @@
 import json
 import logging
-import re
 from collections.abc import Generator
 from typing import Any, Optional, Union, cast

@@ -341,6 +340,9 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
        :param credentials: provider credentials
        :return:
        """
+        # get predefined models
+        predefined_models = self.predefined_models()
+        predefined_models_map = {model.model: model for model in predefined_models}

        # transform credentials to kwargs for model instance
        credentials_kwargs = self._to_credential_kwargs(credentials)
@@ -356,10 +358,9 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
            base_model = model.id.split(":")[1]

            base_model_schema = None
-            for predefined_model in self.predefined_models():
-                if predefined_model.model in base_model:
+            for predefined_model_name, predefined_model in predefined_models_map.items():
+                if predefined_model_name in base_model:
                    base_model_schema = predefined_model
-                    break

            if not base_model_schema:
                continue
@@ -620,19 +621,11 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
        prompt_messages = self._clear_illegal_prompt_messages(model, prompt_messages)

        # o1 compatibility
-        block_as_stream = False
        if model.startswith("o1"):
            if "max_tokens" in model_parameters:
                model_parameters["max_completion_tokens"] = model_parameters["max_tokens"]
                del model_parameters["max_tokens"]

-            if re.match(r"^o1(-\d{4}-\d{2}-\d{2})?$", model):
-                if stream:
-                    block_as_stream = True
-                    stream = False
-                    if "stream_options" in extra_model_kwargs:
-                        del extra_model_kwargs["stream_options"]
-
            if "stop" in extra_model_kwargs:
                del extra_model_kwargs["stop"]

@@ -649,45 +642,7 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
        if stream:
            return self._handle_chat_generate_stream_response(model, credentials, response, prompt_messages, tools)

-        block_result = self._handle_chat_generate_response(model, credentials, response, prompt_messages, tools)
-
-        if block_as_stream:
-            return self._handle_chat_block_as_stream_response(block_result, prompt_messages, stop)
-
-        return block_result
-
-    def _handle_chat_block_as_stream_response(
-        self,
-        block_result: LLMResult,
-        prompt_messages: list[PromptMessage],
-        stop: Optional[list[str]] = None,
-    ) -> Generator[LLMResultChunk, None, None]:
-        """
-        Handle llm chat response
-        :param model: model name
-        :param credentials: credentials
-        :param response: response
-        :param prompt_messages: prompt messages
-        :param tools: tools for tool calling
-        :return: llm response chunk generator
-        """
-        text = block_result.message.content
-        text = cast(str, text)
-
-        if stop:
-            text = self.enforce_stop_tokens(text, stop)
-
-        yield LLMResultChunk(
-            model=block_result.model,
-            prompt_messages=prompt_messages,
-            system_fingerprint=block_result.system_fingerprint,
-            delta=LLMResultChunkDelta(
-                index=0,
-                message=block_result.message,
-                finish_reason="stop",
-                usage=block_result.usage,
-            ),
-        )
+        return self._handle_chat_generate_response(model, credentials, response, prompt_messages, tools)

    def _handle_chat_generate_response(
        self,
@@ -1184,15 +1139,13 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
            base_model = model.split(":")[1]

        # get model schema
-        base_model_schema = None
-        for predefined_model in self.predefined_models():
-            if base_model == predefined_model.model:
-                base_model_schema = predefined_model
-                break
-
-        if not base_model_schema:
+        models = self.predefined_models()
+        model_map = {model.model: model for model in models}
+        if base_model not in model_map:
            raise ValueError(f"Base model {base_model} not found")

+        base_model_schema = model_map[base_model]
+
        base_model_schema_features = base_model_schema.features or []
        base_model_schema_model_properties = base_model_schema.model_properties
        base_model_schema_parameters_rules = base_model_schema.parameter_rules
--- a/api/core/model_runtime/model_providers/perfxcloud/text_embedding/text_embedding.py
+++ b/api/core/model_runtime/model_providers/perfxcloud/text_embedding/text_embedding.py
@@ -1,13 +1,29 @@
+import json
+import time
+from decimal import Decimal
 from typing import Optional
+from urllib.parse import urljoin
+
+import numpy as np
+import requests

 from core.entities.embedding_type import EmbeddingInputType
-from core.model_runtime.entities.text_embedding_entities import TextEmbeddingResult
-from core.model_runtime.model_providers.openai_api_compatible.text_embedding.text_embedding import (
-    OAICompatEmbeddingModel,
+from core.model_runtime.entities.common_entities import I18nObject
+from core.model_runtime.entities.model_entities import (
+    AIModelEntity,
+    FetchFrom,
+    ModelPropertyKey,
+    ModelType,
+    PriceConfig,
+    PriceType,
 )
+from core.model_runtime.entities.text_embedding_entities import EmbeddingUsage, TextEmbeddingResult
+from core.model_runtime.errors.validate import CredentialsValidateFailedError
+from core.model_runtime.model_providers.__base.text_embedding_model import TextEmbeddingModel
+from core.model_runtime.model_providers.openai_api_compatible._common import _CommonOaiApiCompat


-class PerfXCloudEmbeddingModel(OAICompatEmbeddingModel):
+class OAICompatEmbeddingModel(_CommonOaiApiCompat, TextEmbeddingModel):
    """
    Model class for an OpenAI API-compatible text embedding model.
    """
@@ -31,10 +47,86 @@ class PerfXCloudEmbeddingModel(OAICompatEmbeddingModel):
        :return: embeddings result
        """

-        if "endpoint_url" not in credentials or credentials["endpoint_url"] == "":
-            credentials["endpoint_url"] = "https://cloud.perfxlab.cn/v1/"
+        # Prepare headers and payload for the request
+        headers = {"Content-Type": "application/json"}

-        return OAICompatEmbeddingModel._invoke(self, model, credentials, texts, user, input_type)
+        api_key = credentials.get("api_key")
+        if api_key:
+            headers["Authorization"] = f"Bearer {api_key}"
+        endpoint_url: Optional[str]
+        if "endpoint_url" not in credentials or credentials["endpoint_url"] == "":
+            endpoint_url = "https://cloud.perfxlab.cn/v1/"
+        else:
+            endpoint_url = credentials.get("endpoint_url")
+            assert endpoint_url is not None, "endpoint_url is required in credentials"
+            if not endpoint_url.endswith("/"):
+                endpoint_url += "/"
+
+        assert isinstance(endpoint_url, str)
+        endpoint_url = urljoin(endpoint_url, "embeddings")
+
+        extra_model_kwargs = {}
+        if user:
+            extra_model_kwargs["user"] = user
+
+        extra_model_kwargs["encoding_format"] = "float"
+
+        # get model properties
+        context_size = self._get_context_size(model, credentials)
+        max_chunks = self._get_max_chunks(model, credentials)
+
+        inputs = []
+        indices = []
+        used_tokens = 0
+
+        for i, text in enumerate(texts):
+            # Here token count is only an approximation based on the GPT2 tokenizer
+            # TODO: Optimize for better token estimation and chunking
+            num_tokens = self._get_num_tokens_by_gpt2(text)
+
+            if num_tokens >= context_size:
+                cutoff = int(np.floor(len(text) * (context_size / num_tokens)))
+                # if num tokens is larger than context length, only use the start
+                inputs.append(text[0:cutoff])
+            else:
+                inputs.append(text)
+            indices += [i]
+
+        batched_embeddings = []
+        _iter = range(0, len(inputs), max_chunks)
+
+        for i in _iter:
+            # Prepare the payload for the request
+            payload = {"input": inputs[i : i + max_chunks], "model": model, **extra_model_kwargs}
+
+            # Make the request to the OpenAI API
+            response = requests.post(endpoint_url, headers=headers, data=json.dumps(payload), timeout=(10, 300))
+
+            response.raise_for_status()  # Raise an exception for HTTP errors
+            response_data = response.json()
+
+            # Extract embeddings and used tokens from the response
+            embeddings_batch = [data["embedding"] for data in response_data["data"]]
+            embedding_used_tokens = response_data["usage"]["total_tokens"]
+
+            used_tokens += embedding_used_tokens
+            batched_embeddings += embeddings_batch
+
+        # calc usage
+        usage = self._calc_response_usage(model=model, credentials=credentials, tokens=used_tokens)
+
+        return TextEmbeddingResult(embeddings=batched_embeddings, usage=usage, model=model)
+
+    def get_num_tokens(self, model: str, credentials: dict, texts: list[str]) -> int:
+        """
+        Approximate number of tokens for given messages using GPT2 tokenizer
+
+        :param model: model name
+        :param credentials: model credentials
+        :param texts: texts to embed
+        :return:
+        """
+        return sum(self._get_num_tokens_by_gpt2(text) for text in texts)

    def validate_credentials(self, model: str, credentials: dict) -> None:
        """
@@ -44,7 +136,93 @@ class PerfXCloudEmbeddingModel(OAICompatEmbeddingModel):
        :param credentials: model credentials
        :return:
        """
-        if "endpoint_url" not in credentials or credentials["endpoint_url"] == "":
-            credentials["endpoint_url"] = "https://cloud.perfxlab.cn/v1/"
+        try:
+            headers = {"Content-Type": "application/json"}

-        OAICompatEmbeddingModel.validate_credentials(self, model, credentials)
+            api_key = credentials.get("api_key")
+
+            if api_key:
+                headers["Authorization"] = f"Bearer {api_key}"
+
+            endpoint_url: Optional[str]
+            if "endpoint_url" not in credentials or credentials["endpoint_url"] == "":
+                endpoint_url = "https://cloud.perfxlab.cn/v1/"
+            else:
+                endpoint_url = credentials.get("endpoint_url")
+                assert endpoint_url is not None, "endpoint_url is required in credentials"
+                if not endpoint_url.endswith("/"):
+                    endpoint_url += "/"
+
+            assert isinstance(endpoint_url, str)
+            endpoint_url = urljoin(endpoint_url, "embeddings")
+
+            payload = {"input": "ping", "model": model}
+
+            response = requests.post(url=endpoint_url, headers=headers, data=json.dumps(payload), timeout=(10, 300))
+
+            if response.status_code != 200:
+                raise CredentialsValidateFailedError(
+                    f"Credentials validation failed with status code {response.status_code}"
+                )
+
+            try:
+                json_result = response.json()
+            except json.JSONDecodeError as e:
+                raise CredentialsValidateFailedError("Credentials validation failed: JSON decode error")
+
+            if "model" not in json_result:
+                raise CredentialsValidateFailedError("Credentials validation failed: invalid response")
+        except CredentialsValidateFailedError:
+            raise
+        except Exception as ex:
+            raise CredentialsValidateFailedError(str(ex))
+
+    def get_customizable_model_schema(self, model: str, credentials: dict) -> AIModelEntity:
+        """
+        generate custom model entities from credentials
+        """
+        entity = AIModelEntity(
+            model=model,
+            label=I18nObject(en_US=model),
+            model_type=ModelType.TEXT_EMBEDDING,
+            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
+            model_properties={
+                ModelPropertyKey.CONTEXT_SIZE: int(credentials.get("context_size", 512)),
+                ModelPropertyKey.MAX_CHUNKS: 1,
+            },
+            parameter_rules=[],
+            pricing=PriceConfig(
+                input=Decimal(credentials.get("input_price", 0)),
+                unit=Decimal(credentials.get("unit", 0)),
+                currency=credentials.get("currency", "USD"),
+            ),
+        )
+
+        return entity
+
+    def _calc_response_usage(self, model: str, credentials: dict, tokens: int) -> EmbeddingUsage:
+        """
+        Calculate response usage
+
+        :param model: model name
+        :param credentials: model credentials
+        :param tokens: input tokens
+        :return: usage
+        """
+        # get input price info
+        input_price_info = self.get_price(
+            model=model, credentials=credentials, price_type=PriceType.INPUT, tokens=tokens
+        )
+
+        # transform usage
+        usage = EmbeddingUsage(
+            tokens=tokens,
+            total_tokens=tokens,
+            unit_price=input_price_info.unit_price,
+            price_unit=input_price_info.unit,
+            total_price=input_price_info.total_amount,
+            currency=input_price_info.currency,
+            latency=time.perf_counter() - self.started_at,
+        )
+
+        return usage
--- a/api/core/model_runtime/model_providers/siliconflow/llm/llm.py
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/llm.py
@@ -1,16 +1,9 @@
-import json
 from collections.abc import Generator
 from typing import Optional, Union

-import requests
-
 from core.model_runtime.entities.common_entities import I18nObject
-from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta
-from core.model_runtime.entities.message_entities import (
-    AssistantPromptMessage,
-    PromptMessage,
-    PromptMessageTool,
-)
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult
+from core.model_runtime.entities.message_entities import PromptMessage, PromptMessageTool
 from core.model_runtime.entities.model_entities import (
    AIModelEntity,
    FetchFrom,
@@ -36,6 +29,9 @@ class SiliconflowLargeLanguageModel(OAIAPICompatLargeLanguageModel):
        user: Optional[str] = None,
    ) -> Union[LLMResult, Generator]:
        self._add_custom_parameters(credentials)
+        # {"response_format": "json_object"} need convert to {"response_format": {"type": "json_object"}}
+        if "response_format" in model_parameters:
+            model_parameters["response_format"] = {"type": model_parameters.get("response_format")}
        return super()._invoke(model, credentials, prompt_messages, model_parameters, tools, stop, stream)

    def validate_credentials(self, model: str, credentials: dict) -> None:
@@ -96,208 +92,3 @@ class SiliconflowLargeLanguageModel(OAIAPICompatLargeLanguageModel):
                ),
            ],
        )
-
-    def _handle_generate_stream_response(
-        self, model: str, credentials: dict, response: requests.Response, prompt_messages: list[PromptMessage]
-    ) -> Generator:
-        """
-        Handle llm stream response
-
-        :param model: model name
-        :param credentials: model credentials
-        :param response: streamed response
-        :param prompt_messages: prompt messages
-        :return: llm response chunk generator
-        """
-        full_assistant_content = ""
-        chunk_index = 0
-        is_reasoning_started = False  # Add flag to track reasoning state
-
-        def create_final_llm_result_chunk(
-            id: Optional[str], index: int, message: AssistantPromptMessage, finish_reason: str, usage: dict
-        ) -> LLMResultChunk:
-            # calculate num tokens
-            prompt_tokens = usage and usage.get("prompt_tokens")
-            if prompt_tokens is None:
-                prompt_tokens = self._num_tokens_from_string(model, prompt_messages[0].content)
-            completion_tokens = usage and usage.get("completion_tokens")
-            if completion_tokens is None:
-                completion_tokens = self._num_tokens_from_string(model, full_assistant_content)
-
-            # transform usage
-            usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
-
-            return LLMResultChunk(
-                id=id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
-            )
-
-        # delimiter for stream response, need unicode_escape
-        import codecs
-
-        delimiter = credentials.get("stream_mode_delimiter", "\n\n")
-        delimiter = codecs.decode(delimiter, "unicode_escape")
-
-        tools_calls: list[AssistantPromptMessage.ToolCall] = []
-
-        def increase_tool_call(new_tool_calls: list[AssistantPromptMessage.ToolCall]):
-            def get_tool_call(tool_call_id: str):
-                if not tool_call_id:
-                    return tools_calls[-1]
-
-                tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
-                if tool_call is None:
-                    tool_call = AssistantPromptMessage.ToolCall(
-                        id=tool_call_id,
-                        type="function",
-                        function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
-                    )
-                    tools_calls.append(tool_call)
-
-                return tool_call
-
-            for new_tool_call in new_tool_calls:
-                # get tool call
-                tool_call = get_tool_call(new_tool_call.function.name)
-                # update tool call
-                if new_tool_call.id:
-                    tool_call.id = new_tool_call.id
-                if new_tool_call.type:
-                    tool_call.type = new_tool_call.type
-                if new_tool_call.function.name:
-                    tool_call.function.name = new_tool_call.function.name
-                if new_tool_call.function.arguments:
-                    tool_call.function.arguments += new_tool_call.function.arguments
-
-        finish_reason = None  # The default value of finish_reason is None
-        message_id, usage = None, None
-        for chunk in response.iter_lines(decode_unicode=True, delimiter=delimiter):
-            chunk = chunk.strip()
-            if chunk:
-                # ignore sse comments
-                if chunk.startswith(":"):
-                    continue
-                decoded_chunk = chunk.strip().removeprefix("data:").lstrip()
-                if decoded_chunk == "[DONE]":  # Some provider returns "data: [DONE]"
-                    continue
-
-                try:
-                    chunk_json: dict = json.loads(decoded_chunk)
-                # stream ended
-                except json.JSONDecodeError as e:
-                    yield create_final_llm_result_chunk(
-                        id=message_id,
-                        index=chunk_index + 1,
-                        message=AssistantPromptMessage(content=""),
-                        finish_reason="Non-JSON encountered.",
-                        usage=usage,
-                    )
-                    break
-                # handle the error here. for issue #11629
-                if chunk_json.get("error") and chunk_json.get("choices") is None:
-                    raise ValueError(chunk_json.get("error"))
-
-                if chunk_json:
-                    if u := chunk_json.get("usage"):
-                        usage = u
-                if not chunk_json or len(chunk_json["choices"]) == 0:
-                    continue
-
-                choice = chunk_json["choices"][0]
-                finish_reason = chunk_json["choices"][0].get("finish_reason")
-                message_id = chunk_json.get("id")
-                chunk_index += 1
-
-                if "delta" in choice:
-                    delta = choice["delta"]
-                    delta_content = delta.get("content")
-
-                    assistant_message_tool_calls = None
-
-                    if "tool_calls" in delta and credentials.get("function_calling_type", "no_call") == "tool_call":
-                        assistant_message_tool_calls = delta.get("tool_calls", None)
-                    elif (
-                        "function_call" in delta
-                        and credentials.get("function_calling_type", "no_call") == "function_call"
-                    ):
-                        assistant_message_tool_calls = [
-                            {"id": "tool_call_id", "type": "function", "function": delta.get("function_call", {})}
-                        ]
-
-                    # assistant_message_function_call = delta.delta.function_call
-
-                    # extract tool calls from response
-                    if assistant_message_tool_calls:
-                        tool_calls = self._extract_response_tool_calls(assistant_message_tool_calls)
-                        increase_tool_call(tool_calls)
-
-                    if delta_content is None or delta_content == "":
-                        continue
-
-                    # Check for think tags
-                    if "<think>" in delta_content:
-                        is_reasoning_started = True
-                        # Remove <think> tag and add markdown quote
-                        delta_content = "> 💭 " + delta_content.replace("<think>", "")
-                    elif "</think>" in delta_content:
-                        # Remove </think> tag and add newlines to end quote block
-                        delta_content = delta_content.replace("</think>", "") + "\n\n"
-                        is_reasoning_started = False
-                    elif is_reasoning_started:
-                        # Add quote markers for content within thinking block
-                        if "\n\n" in delta_content:
-                            delta_content = delta_content.replace("\n\n", "\n> ")
-                        elif "\n" in delta_content:
-                            delta_content = delta_content.replace("\n", "\n> ")
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(
-                        content=delta_content,
-                    )
-
-                    # reset tool calls
-                    tool_calls = []
-                    full_assistant_content += delta_content
-                elif "text" in choice:
-                    choice_text = choice.get("text", "")
-                    if choice_text == "":
-                        continue
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(content=choice_text)
-                    full_assistant_content += choice_text
-                else:
-                    continue
-
-                yield LLMResultChunk(
-                    id=message_id,
-                    model=model,
-                    prompt_messages=prompt_messages,
-                    delta=LLMResultChunkDelta(
-                        index=chunk_index,
-                        message=assistant_prompt_message,
-                    ),
-                )
-
-            chunk_index += 1
-
-        if tools_calls:
-            yield LLMResultChunk(
-                id=message_id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(
-                    index=chunk_index,
-                    message=AssistantPromptMessage(tool_calls=tools_calls, content=""),
-                ),
-            )
-
-        yield create_final_llm_result_chunk(
-            id=message_id,
-            index=chunk_index,
-            message=AssistantPromptMessage(content=""),
-            finish_reason=finish_reason,
-            usage=usage,
-        )
--- a/api/core/model_runtime/model_providers/spark/llm/_client.py
+++ b/api/core/model_runtime/model_providers/spark/llm/_client.py
@@ -21,7 +21,7 @@ class SparkLLMClient:
            domain = api_domain

        model_api_configs = {
-            "spark-lite": {"version": "v1.1", "chat_domain": "lite"},
+            "spark-lite": {"version": "v1.1", "chat_domain": "general"},
            "spark-pro": {"version": "v3.1", "chat_domain": "generalv3"},
            "spark-pro-128k": {"version": "pro-128k", "chat_domain": "pro-128k"},
            "spark-max": {"version": "v3.5", "chat_domain": "generalv3.5"},
--- a/api/core/model_runtime/model_providers/tongyi/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/_position.yaml
@@ -33,8 +33,6 @@
 - qwen2.5-3b-instruct
 - qwen2.5-1.5b-instruct
 - qwen2.5-0.5b-instruct
- qwen2.5-14b-instruct-1m
- qwen2.5-7b-instruct-1m
 - qwen2.5-coder-7b-instruct
 - qwen2-math-72b-instruct
 - qwen2-math-7b-instruct
--- a/api/core/model_runtime/model_providers/tongyi/llm/llm.py
+++ b/api/core/model_runtime/model_providers/tongyi/llm/llm.py
@@ -219,12 +219,8 @@ class TongyiLargeLanguageModel(LargeLanguageModel):
        if response.status_code not in {200, HTTPStatus.OK}:
            raise ServiceUnavailableError(response.message)
        # transform assistant message to prompt message
-        resp_content = response.output.choices[0].message.content
-        # special for qwen-vl
-        if isinstance(resp_content, list):
-            resp_content = resp_content[0]["text"]
        assistant_prompt_message = AssistantPromptMessage(
-            content=resp_content,
+            content=response.output.choices[0].message.content,
        )

        # transform usage
@@ -261,7 +257,8 @@ class TongyiLargeLanguageModel(LargeLanguageModel):
        for index, response in enumerate(responses):
            if response.status_code not in {200, HTTPStatus.OK}:
                raise ServiceUnavailableError(
-                    f"Failed to invoke model {model}, status code: {response.status_code}, message: {response.message}"
+                    f"Failed to invoke model {model}, status code: {response.status_code}, "
+                    f"message: {response.message}"
                )

            resp_finish_reason = response.output.choices[0].finish_reason
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen2.5-14b-instruct-1m.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen2.5-14b-instruct-1m.yaml
@@ -1,75 +0,0 @@
-# for more details, please refer to https://help.aliyun.com/zh/model-studio/getting-started/models
-model: qwen2.5-14b-instruct-1m
-label:
-  en_US: qwen2.5-14b-instruct-1m
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 1000000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    type: float
-    default: 0.3
-    min: 0.0
-    max: 2.0
-    help:
-      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
-      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
-  - name: max_tokens
-    use_template: max_tokens
-    type: int
-    default: 8192
-    min: 1
-    max: 8192
-    help:
-      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
-      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
-  - name: top_p
-    use_template: top_p
-    type: float
-    default: 0.8
-    min: 0.1
-    max: 0.9
-    help:
-      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
-      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
-  - name: top_k
-    type: int
-    min: 0
-    max: 99
-    label:
-      zh_Hans: 取样数量
-      en_US: Top k
-    help:
-      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
-      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
-  - name: seed
-    required: false
-    type: int
-    default: 1234
-    label:
-      zh_Hans: 随机种子
-      en_US: Random seed
-    help:
-      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
-      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
-  - name: repetition_penalty
-    required: false
-    type: float
-    default: 1.1
-    label:
-      zh_Hans: 重复惩罚
-      en_US: Repetition penalty
-    help:
-      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
-      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
-  - name: response_format
-    use_template: response_format
-pricing:
-  input: '0.001'
-  output: '0.003'
-  unit: '0.001'
-  currency: RMB
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen2.5-7b-instruct-1m.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen2.5-7b-instruct-1m.yaml
@@ -1,75 +0,0 @@
-# for more details, please refer to https://help.aliyun.com/zh/model-studio/getting-started/models
-model: qwen2.5-7b-instruct-1m
-label:
-  en_US: qwen2.5-7b-instruct-1m
-model_type: llm
-features:
-  - agent-thought
-model_properties:
-  mode: chat
-  context_size: 1000000
-parameter_rules:
-  - name: temperature
-    use_template: temperature
-    type: float
-    default: 0.3
-    min: 0.0
-    max: 2.0
-    help:
-      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
-      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
-  - name: max_tokens
-    use_template: max_tokens
-    type: int
-    default: 8192
-    min: 1
-    max: 8192
-    help:
-      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
-      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
-  - name: top_p
-    use_template: top_p
-    type: float
-    default: 0.8
-    min: 0.1
-    max: 0.9
-    help:
-      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
-      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
-  - name: top_k
-    type: int
-    min: 0
-    max: 99
-    label:
-      zh_Hans: 取样数量
-      en_US: Top k
-    help:
-      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
-      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
-  - name: seed
-    required: false
-    type: int
-    default: 1234
-    label:
-      zh_Hans: 随机种子
-      en_US: Random seed
-    help:
-      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
-      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
-  - name: repetition_penalty
-    required: false
-    type: float
-    default: 1.1
-    label:
-      zh_Hans: 重复惩罚
-      en_US: Repetition penalty
-    help:
-      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
-      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
-  - name: response_format
-    use_template: response_format
-pricing:
-  input: '0.0005'
-  output: '0.001'
-  unit: '0.001'
-  currency: RMB
--- a/api/core/model_runtime/model_providers/triton_inference_server/llm/llm.py
+++ b/api/core/model_runtime/model_providers/triton_inference_server/llm/llm.py
@@ -146,7 +146,7 @@ class TritonInferenceAILargeLanguageModel(LargeLanguageModel):
            elif credentials["completion_type"] == "completion":
                completion_type = LLMMode.COMPLETION.value
            else:
-                raise ValueError(f"completion_type {credentials['completion_type']} is not supported")
+                raise ValueError(f'completion_type {credentials["completion_type"]} is not supported')

        entity = AIModelEntity(
            model=model,
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Joel	333aae6795	fix: others headless change	2025-01-23 15:11:05 +08:00
Joel	e797e97b18	chore: some menu	2025-01-23 14:01:14 +08:00
Joel	4ad152f525	chore: change some	2025-01-23 10:30:49 +08:00
Joel	58403b8238	fix: build error	2025-01-22 16:14:04 +08:00
Joel	a465f092eb	fix: headers to v15	2025-01-22 15:51:44 +08:00
Joel	f981eb4640	fix: params to v15	2025-01-22 15:41:32 +08:00
Joel	25aaf53375	chore: temp	2025-01-21 18:30:35 +08:00
Joel	5ec7920c42	fix: fe check	2025-01-14 14:38:55 +08:00
Joel	4c49e48465	chore: template and config	2025-01-09 16:23:25 +08:00
Joel	ed93954add	feat: support config max chunk length by env in frontend	2025-01-09 14:42:01 +08:00