Compare commits

...

13 Commits

Author SHA1 Message Date
dependabot[bot]
fdd07c7e03 chore(deps): bump the python-packages group across 1 directory with 7 updates
Updates the requirements on bs4, [pypdfium2](https://github.com/pypdfium2-team/pypdfium2), [pyyaml](https://github.com/yaml/pyyaml), [unstructured](https://github.com/Unstructured-IO/unstructured), [pypandoc](https://github.com/JessicaTegner/pypandoc), [httpx-sse](https://github.com/florimondmanca/httpx-sse) and [nltk](https://github.com/nltk/nltk) to permit the latest version.

Updates `bs4` to 0.0.2

Updates `pypdfium2` from 5.6.0 to 5.7.0
- [Release notes](https://github.com/pypdfium2-team/pypdfium2/releases)
- [Commits](https://github.com/pypdfium2-team/pypdfium2/compare/5.6.0...5.7.0)

Updates `pyyaml` to 6.0.3
- [Release notes](https://github.com/yaml/pyyaml/releases)
- [Changelog](https://github.com/yaml/pyyaml/blob/6.0.3/CHANGES)
- [Commits](https://github.com/yaml/pyyaml/compare/6.0.1...6.0.3)

Updates `unstructured` to 0.22.18
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Unstructured-IO/unstructured/compare/0.21.5...0.22.18)

Updates `pypandoc` to 1.17
- [Release notes](https://github.com/JessicaTegner/pypandoc/releases)
- [Changelog](https://github.com/JessicaTegner/pypandoc/blob/master/release.md)
- [Commits](https://github.com/JessicaTegner/pypandoc/compare/v1.13...v1.17)

Updates `httpx-sse` to 0.4.3
- [Release notes](https://github.com/florimondmanca/httpx-sse/releases)
- [Changelog](https://github.com/florimondmanca/httpx-sse/blob/master/CHANGELOG.md)
- [Commits](https://github.com/florimondmanca/httpx-sse/compare/0.4.0...0.4.3)

Updates `nltk` to 3.9.4
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.9.1...3.9.4)

---
updated-dependencies:
- dependency-name: bs4
  dependency-version: 0.0.2
  dependency-type: direct:production
  dependency-group: python-packages
- dependency-name: pypdfium2
  dependency-version: 5.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python-packages
- dependency-name: pyyaml
  dependency-version: 6.0.3
  dependency-type: direct:production
  dependency-group: python-packages
- dependency-name: unstructured
  dependency-version: 0.22.18
  dependency-type: direct:production
  dependency-group: python-packages
- dependency-name: pypandoc
  dependency-version: '1.17'
  dependency-type: direct:production
  dependency-group: python-packages
- dependency-name: httpx-sse
  dependency-version: 0.4.3
  dependency-type: direct:production
  dependency-group: python-packages
- dependency-name: nltk
  dependency-version: 3.9.4
  dependency-type: direct:development
  dependency-group: python-packages
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-13 02:55:36 +00:00
dependabot[bot]
b7b03f8594 chore(deps): bump the python-packages group across 1 directory with 18 updates (#35023)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 02:43:41 +00:00
dependabot[bot]
61ef255809 chore(deps): bump the github-actions-dependencies group with 4 updates (#35018)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 02:27:38 +00:00
dependabot[bot]
08426376ac chore(deps): bump opentelemetry-propagator-b3 from 1.40.0 to 1.41.0 in /api in the opentelemetry group (#35017)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 02:01:41 +00:00
dependabot[bot]
d0262c899e chore(deps): bump the storage group in /api with 2 updates (#35020)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 01:59:03 +00:00
dependabot[bot]
152433d88a chore(deps-dev): bump the vdb group in /api with 4 updates (#35021)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 01:58:54 +00:00
dependabot[bot]
dece58d1a5 chore(deps-dev): bump the dev group in /api with 40 updates (#35022)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 01:58:43 +00:00
dependabot[bot]
70be474aac chore(deps): bump the storage group in /api with 3 updates (#35014)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-04-13 01:58:12 +00:00
dependabot[bot]
a852cbe7f2 chore(deps): bump the database group in /api with 2 updates (#35013)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-04-13 01:58:04 +00:00
dependabot[bot]
7df38d35c1 chore(deps): bump the google group in /api with 5 updates (#35010)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-04-13 01:57:18 +00:00
dependabot[bot]
ef29a5ee3d chore(deps): bump the flask group in /api with 3 updates (#35007)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-04-13 01:57:11 +00:00
dependabot[bot]
9a7fe7ef16 chore(deps): bump the llm group in /api with 4 updates (#35019)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 01:56:31 +00:00
-LAN-
8c4ea5c898 fix: external dataset tenant checks for bound knowledge APIs (#34734) 2026-04-13 01:47:57 +00:00
20 changed files with 547 additions and 452 deletions

View File

@@ -54,7 +54,7 @@ jobs:
run: uv run --project api bash dev/pytest/pytest_unit_tests.sh
- name: Upload unit coverage data
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: api-coverage-unit
path: coverage-unit
@@ -129,7 +129,7 @@ jobs:
api/tests/test_containers_integration_tests
- name: Upload integration coverage data
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: api-coverage-integration
path: coverage-integration

View File

@@ -81,7 +81,7 @@ jobs:
- name: Build Docker image
id: build
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: ${{ matrix.build_context }}
file: ${{ matrix.file }}
@@ -101,7 +101,7 @@ jobs:
touch "/tmp/digests/${sanitized_digest}"
- name: Upload digest
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: digests-${{ matrix.artifact_context }}-${{ env.PLATFORM_PAIR }}
path: /tmp/digests/*

View File

@@ -50,7 +50,7 @@ jobs:
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Build Docker Image
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
push: false
context: ${{ matrix.context }}

View File

@@ -21,7 +21,7 @@ jobs:
if: ${{ github.event.workflow_run.conclusion == 'success' && github.event.workflow_run.pull_requests[0].head.repo.full_name != github.repository }}
steps:
- name: Download pyrefly diff artifact
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
@@ -49,7 +49,7 @@ jobs:
run: unzip -o pyrefly_diff.zip
- name: Post comment
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |

View File

@@ -66,7 +66,7 @@ jobs:
echo ${{ github.event.pull_request.number }} > pr_number.txt
- name: Upload pyrefly diff
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: pyrefly_diff
path: |
@@ -75,7 +75,7 @@ jobs:
- name: Comment PR with pyrefly diff
if: ${{ github.event.pull_request.head.repo.full_name == github.repository && steps.line_count_check.outputs.same == 'false' }}
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |

View File

@@ -32,7 +32,7 @@ jobs:
run: uv sync --project api --dev
- name: Download type coverage artifact
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
@@ -73,7 +73,7 @@ jobs:
} > /tmp/type_coverage_comment.md
- name: Post comment
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |

View File

@@ -71,7 +71,7 @@ jobs:
echo ${{ github.event.pull_request.number }} > pr_number.txt
- name: Upload type coverage artifact
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: pyrefly_type_coverage
path: |
@@ -81,7 +81,7 @@ jobs:
- name: Comment PR with type coverage
if: ${{ github.event.pull_request.head.repo.full_name == github.repository }}
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |

View File

@@ -158,7 +158,7 @@ jobs:
- name: Run Claude Code for Translation Sync
if: steps.context.outputs.CHANGED_FILES != ''
uses: anthropics/claude-code-action@6e2bd52842c65e914eba5c8badd17560bd26b5de # v1.0.89
uses: anthropics/claude-code-action@b47fd721da662d48c5680e154ad16a73ed74d2e0 # v1.0.93
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -56,7 +56,7 @@ jobs:
- name: Trigger i18n sync workflow
if: steps.detect.outputs.has_changes == 'true'
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
BASE_SHA: ${{ steps.detect.outputs.base_sha }}
HEAD_SHA: ${{ steps.detect.outputs.head_sha }}

View File

@@ -53,7 +53,7 @@ jobs:
- name: Upload Cucumber report
if: ${{ !cancelled() }}
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: cucumber-report
path: e2e/cucumber-report
@@ -61,7 +61,7 @@ jobs:
- name: Upload E2E logs
if: ${{ !cancelled() }}
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: e2e-logs
path: e2e/.logs

View File

@@ -43,7 +43,7 @@ jobs:
- name: Upload blob report
if: ${{ !cancelled() }}
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: blob-report-${{ matrix.shardIndex }}
path: web/.vitest-reports/*

View File

@@ -353,13 +353,17 @@ class Dataset(Base):
if self.provider != "external":
return None
external_knowledge_binding = db.session.scalar(
select(ExternalKnowledgeBindings).where(ExternalKnowledgeBindings.dataset_id == self.id)
select(ExternalKnowledgeBindings).where(
ExternalKnowledgeBindings.dataset_id == self.id,
ExternalKnowledgeBindings.tenant_id == self.tenant_id,
)
)
if not external_knowledge_binding:
return None
external_knowledge_api = db.session.scalar(
select(ExternalKnowledgeApis).where(
ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id
ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id,
ExternalKnowledgeApis.tenant_id == self.tenant_id,
)
)
if external_knowledge_api is None or external_knowledge_api.settings is None:

View File

@@ -4,42 +4,42 @@ version = "1.13.3"
requires-python = "~=3.12.0"
dependencies = [
"aliyun-log-python-sdk~=0.9.37",
"aliyun-log-python-sdk~=0.9.44",
"arize-phoenix-otel~=0.15.0",
"azure-identity==1.25.3",
"beautifulsoup4==4.14.3",
"boto3==1.42.83",
"bs4~=0.0.1",
"cachetools~=5.3.0",
"celery~=5.6.2",
"charset-normalizer>=3.4.4",
"flask~=3.1.2",
"flask-compress>=1.17,<1.25",
"flask-cors~=6.0.0",
"boto3==1.42.88",
"bs4~=0.0.2",
"cachetools~=7.0.5",
"celery~=5.6.3",
"charset-normalizer>=3.4.7",
"flask~=3.1.3",
"flask-compress>=1.24,<1.25",
"flask-cors~=6.0.2",
"flask-login~=0.6.3",
"flask-migrate~=4.1.0",
"flask-orjson~=2.0.0",
"flask-sqlalchemy~=3.1.1",
"gevent~=25.9.1",
"gevent~=26.4.0",
"gmpy2~=2.3.0",
"google-api-core>=2.19.1",
"google-api-python-client==2.193.0",
"google-auth>=2.47.0",
"google-api-core>=2.30.3",
"google-api-python-client==2.194.0",
"google-auth>=2.49.2",
"google-auth-httplib2==0.3.1",
"google-cloud-aiplatform>=1.123.0",
"googleapis-common-protos>=1.65.0",
"google-cloud-aiplatform>=1.147.0",
"googleapis-common-protos>=1.74.0",
"graphon>=0.1.2",
"gunicorn~=25.3.0",
"httpx[socks]~=0.28.0",
"httpx[socks]~=0.28.1",
"jieba==0.42.1",
"json-repair>=0.55.1",
"langfuse>=3.0.0,<5.0.0",
"langsmith~=0.7.16",
"json-repair>=0.59.2",
"langfuse>=4.2.0,<5.0.0",
"langsmith~=0.7.30",
"markdown~=3.10.2",
"mlflow-skinny>=3.0.0",
"numpy~=1.26.4",
"mlflow-skinny>=3.11.1",
"numpy~=2.4.4",
"openpyxl~=3.1.5",
"opik~=1.10.37",
"opik~=1.11.2",
"litellm==1.83.0", # Pinned to avoid madoka dependency issue
"opentelemetry-api==1.40.0",
"opentelemetry-distro==0.61b0",
@@ -53,41 +53,41 @@ dependencies = [
"opentelemetry-instrumentation-httpx==0.61b0",
"opentelemetry-instrumentation-redis==0.61b0",
"opentelemetry-instrumentation-sqlalchemy==0.61b0",
"opentelemetry-propagator-b3==1.40.0",
"opentelemetry-propagator-b3==1.41.0",
"opentelemetry-proto==1.40.0",
"opentelemetry-sdk==1.40.0",
"opentelemetry-semantic-conventions==0.61b0",
"opentelemetry-util-http==0.61b0",
"pandas[excel,output-formatting,performance]~=3.0.1",
"pandas[excel,output-formatting,performance]~=3.0.2",
"psycogreen~=1.0.2",
"psycopg2-binary~=2.9.6",
"psycopg2-binary~=2.9.11",
"pycryptodome==3.23.0",
"pydantic~=2.12.5",
"pydantic-settings~=2.13.1",
"pyjwt~=2.12.0",
"pypdfium2==5.6.0",
"pyjwt~=2.12.1",
"pypdfium2==5.7.0",
"python-docx~=1.2.0",
"python-dotenv==1.2.2",
"pyyaml~=6.0.1",
"pyyaml~=6.0.3",
"readabilipy~=0.3.0",
"redis[hiredis]~=7.4.0",
"resend~=2.26.0",
"sentry-sdk[flask]~=2.55.0",
"sqlalchemy~=2.0.29",
"resend~=2.27.0",
"sentry-sdk[flask]~=2.57.0",
"sqlalchemy~=2.0.49",
"starlette==1.0.0",
"tiktoken~=0.12.0",
"transformers~=5.3.0",
"unstructured[docx,epub,md,ppt,pptx]~=0.21.5",
"pypandoc~=1.13",
"unstructured[docx,epub,md,ppt,pptx]~=0.22.18",
"pypandoc~=1.17",
"yarl~=1.23.0",
"sseclient-py~=1.9.0",
"httpx-sse~=0.4.0",
"sendgrid~=6.12.3",
"httpx-sse~=0.4.3",
"sendgrid~=6.12.5",
"flask-restx~=1.3.2",
"packaging~=23.2",
"croniter>=6.0.0",
"weaviate-client==4.20.4",
"apscheduler>=3.11.0",
"packaging~=26.0",
"croniter>=6.2.2",
"weaviate-client==4.20.5",
"apscheduler>=3.11.2",
"weave>=0.52.16",
"fastopenapi[flask]>=0.7.0",
"bleach~=6.3.0",
@@ -111,16 +111,16 @@ package = false
dev = [
"coverage~=7.13.4",
"dotenv-linter~=0.7.0",
"faker~=40.12.0",
"faker~=40.13.0",
"lxml-stubs~=0.5.1",
"basedpyright~=1.39.0",
"ruff~=0.15.5",
"pytest~=9.0.2",
"ruff~=0.15.10",
"pytest~=9.0.3",
"pytest-benchmark~=5.2.3",
"pytest-cov~=7.1.0",
"pytest-env~=1.6.0",
"pytest-mock~=3.15.1",
"testcontainers~=4.14.1",
"testcontainers~=4.14.2",
"types-aiofiles~=25.1.0",
"types-beautifulsoup4~=4.12.0",
"types-cachetools~=6.2.0",
@@ -130,8 +130,8 @@ dev = [
"types-docutils~=0.22.3",
"types-flask-cors~=6.0.0",
"types-flask-migrate~=4.1.0",
"types-gevent~=25.9.0",
"types-greenlet~=3.3.0",
"types-gevent~=26.4.0",
"types-greenlet~=3.4.0",
"types-html5lib~=1.1.11",
"types-markdown~=3.10.2",
"types-oauthlib~=3.3.0",
@@ -149,20 +149,20 @@ dev = [
"types-pyyaml~=6.0.12",
"types-regex~=2026.4.4",
"types-shapely~=2.1.0",
"types-simplejson>=3.20.0",
"types-six>=1.17.0",
"types-tensorflow>=2.18.0",
"types-tqdm>=4.67.0",
"types-simplejson>=3.20.0.20260408",
"types-six>=1.17.0.20260408",
"types-tensorflow>=2.18.0.20260408",
"types-tqdm>=4.67.3.20260408",
"types-ujson>=5.10.0",
"boto3-stubs>=1.38.20",
"types-jmespath>=1.0.2.20240106",
"hypothesis>=6.131.15",
"boto3-stubs>=1.42.88",
"types-jmespath>=1.1.0.20260408",
"hypothesis>=6.151.12",
"types_pyOpenSSL>=24.1.0",
"types_cffi>=1.17.0",
"types_setuptools>=80.9.0",
"types_cffi>=2.0.0.20260408",
"types_setuptools>=82.0.0.20260408",
"pandas-stubs~=3.0.0",
"scipy-stubs>=1.15.3.0",
"types-python-http-client>=3.3.7.20240910",
"types-python-http-client>=3.3.7.20260408",
"import-linter>=2.3",
"types-redis>=4.6.0.20241004",
"celery-types>=0.23.0",
@@ -180,10 +180,10 @@ dev = [
############################################################
storage = [
"azure-storage-blob==12.28.0",
"bce-python-sdk~=0.9.23",
"bce-python-sdk~=0.9.69",
"cos-python-sdk-v5==1.9.41",
"esdk-obs-python==3.26.2",
"google-cloud-storage>=3.0.0",
"google-cloud-storage>=3.10.1",
"opendal~=0.46.0",
"oss2==2.19.1",
"supabase~=2.18.1",
@@ -193,7 +193,7 @@ storage = [
############################################################
# [ Tools ] dependency group
############################################################
tools = ["cloudscraper~=1.2.71", "nltk~=3.9.1"]
tools = ["cloudscraper~=1.2.71", "nltk~=3.9.4"]
############################################################
# [ VDB ] dependency group
@@ -209,23 +209,23 @@ vdb = [
"elasticsearch==8.14.0",
"opensearch-py==3.1.0",
"oracledb==3.4.2",
"pgvecto-rs[sqlalchemy]~=0.2.1",
"pgvecto-rs[sqlalchemy]~=0.2.2",
"pgvector==0.4.2",
"pymilvus~=2.6.10",
"pymilvus~=2.6.12",
"pymochow==2.4.0",
"pyobvector~=0.2.17",
"qdrant-client==1.9.0",
"intersystems-irispython>=5.1.0",
"tablestore==6.4.3",
"tablestore==6.4.4",
"tcvectordb~=2.1.0",
"tidb-vector==0.0.15",
"upstash-vector==0.8.0",
"volcengine-compat~=1.0.0",
"weaviate-client==4.20.4",
"weaviate-client==4.20.5",
"xinference-client~=2.4.0",
"mo-vector~=0.1.13",
"mysql-connector-python>=9.3.0",
"holo-search-sdk>=0.4.1",
"holo-search-sdk>=0.4.2",
]
[tool.pyrefly]

View File

@@ -528,6 +528,8 @@ class DatasetService:
raise ValueError("External knowledge id is required.")
if not external_knowledge_api_id:
raise ValueError("External knowledge api id is required.")
# Ensure the referenced external API template exists and belongs to the dataset tenant.
ExternalDatasetService.get_external_knowledge_api(external_knowledge_api_id, dataset.tenant_id)
# Update metadata fields
dataset.updated_by = user.id if user else None
dataset.updated_at = naive_utc_now()

View File

@@ -317,7 +317,10 @@ class ExternalDatasetService:
external_knowledge_api = db.session.scalar(
select(ExternalKnowledgeApis)
.where(ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id)
.where(
ExternalKnowledgeApis.id == external_knowledge_binding.external_knowledge_api_id,
ExternalKnowledgeApis.tenant_id == tenant_id,
)
.limit(1)
)
if external_knowledge_api is None or external_knowledge_api.settings is None:

View File

@@ -1,3 +1,4 @@
import json
from unittest.mock import Mock, patch
from uuid import uuid4
@@ -7,7 +8,7 @@ from sqlalchemy.orm import Session
from core.rag.index_processor.constant.index_type import IndexTechniqueType
from models.account import Account, Tenant, TenantAccountJoin, TenantAccountRole
from models.dataset import Dataset, ExternalKnowledgeBindings
from models.dataset import Dataset, ExternalKnowledgeApis, ExternalKnowledgeBindings
from models.enums import DataSourceType
from services.dataset_service import DatasetService
from services.errors.account import NoPermissionError
@@ -103,6 +104,34 @@ class DatasetUpdateTestDataFactory:
db_session_with_containers.commit()
return binding
@staticmethod
def create_external_knowledge_api(
db_session_with_containers: Session,
tenant_id: str,
created_by: str,
api_id: str | None = None,
name: str = "test-api",
) -> ExternalKnowledgeApis:
"""Create a real external knowledge API template for tenant-scoped update validation."""
external_api = ExternalKnowledgeApis(
tenant_id=tenant_id,
created_by=created_by,
updated_by=created_by,
name=name,
description="test description",
settings=json.dumps(
{
"endpoint": "https://example.com",
"api_key": "test-api-key",
}
),
)
if api_id is not None:
external_api.id = api_id
db_session_with_containers.add(external_api)
db_session_with_containers.commit()
return external_api
class TestDatasetServiceUpdateDataset:
"""
@@ -138,6 +167,11 @@ class TestDatasetServiceUpdateDataset:
)
binding_id = binding.id
db_session_with_containers.expunge(binding)
external_api = DatasetUpdateTestDataFactory.create_external_knowledge_api(
db_session_with_containers,
tenant_id=tenant.id,
created_by=user.id,
)
update_data = {
"name": "new_name",
@@ -145,7 +179,7 @@ class TestDatasetServiceUpdateDataset:
"external_retrieval_model": "new_model",
"permission": "only_me",
"external_knowledge_id": "new_knowledge_id",
"external_knowledge_api_id": str(uuid4()),
"external_knowledge_api_id": external_api.id,
}
result = DatasetService.update_dataset(dataset.id, update_data, user)
@@ -218,11 +252,16 @@ class TestDatasetServiceUpdateDataset:
created_by=user.id,
provider="external",
)
external_api = DatasetUpdateTestDataFactory.create_external_knowledge_api(
db_session_with_containers,
tenant_id=tenant.id,
created_by=user.id,
)
update_data = {
"name": "new_name",
"external_knowledge_id": "knowledge_id",
"external_knowledge_api_id": str(uuid4()),
"external_knowledge_api_id": external_api.id,
}
with pytest.raises(ValueError) as context:

View File

@@ -12,7 +12,7 @@ This test suite covers:
import json
import pickle
from datetime import UTC, datetime
from unittest.mock import patch
from unittest.mock import Mock, patch
from uuid import uuid4
from core.rag.index_processor.constant.index_type import IndexTechniqueType
@@ -25,6 +25,7 @@ from models.dataset import (
Document,
DocumentSegment,
Embedding,
ExternalKnowledgeBindings,
)
from models.enums import (
DataSourceType,
@@ -180,6 +181,24 @@ class TestDatasetModelValidation:
assert result["top_k"] == 2
assert result["score_threshold"] == 0.0
def test_dataset_external_knowledge_info_returns_none_for_cross_tenant_template(self):
"""Test external datasets fail closed when the bound template is outside the tenant."""
dataset = Dataset(
tenant_id=str(uuid4()),
name="External Dataset",
data_source_type=DataSourceType.UPLOAD_FILE,
created_by=str(uuid4()),
provider="external",
)
binding = Mock(spec=ExternalKnowledgeBindings)
binding.external_knowledge_id = "knowledge-1"
binding.external_knowledge_api_id = str(uuid4())
with patch("models.dataset.db") as mock_db:
mock_db.session.scalar.side_effect = [binding, None]
assert dataset.external_knowledge_info is None
def test_dataset_retrieval_model_dict_property(self):
"""Test retrieval_model_dict property with default values."""
# Arrange

View File

@@ -532,6 +532,9 @@ class TestDatasetServiceCreationAndUpdate:
with (
patch.object(DatasetService, "_update_external_knowledge_binding") as update_binding,
patch(
"services.dataset_service.ExternalDatasetService.get_external_knowledge_api", return_value=object()
) as get_external_knowledge_api,
patch("services.dataset_service.naive_utc_now", return_value=now),
patch("services.dataset_service.db") as mock_db,
):
@@ -557,6 +560,7 @@ class TestDatasetServiceCreationAndUpdate:
assert dataset.permission == DatasetPermissionEnum.PARTIAL_TEAM
assert dataset.updated_by == "user-1"
assert dataset.updated_at is now
get_external_knowledge_api.assert_called_once_with("api-1", dataset.tenant_id)
update_binding.assert_called_once_with("dataset-1", "knowledge-1", "api-1")
mock_db.session.add.assert_called_once_with(dataset)
mock_db.session.commit.assert_called_once()
@@ -574,6 +578,31 @@ class TestDatasetServiceCreationAndUpdate:
with pytest.raises(ValueError, match=message):
DatasetService._update_external_dataset(dataset, payload, SimpleNamespace(id="user-1"))
def test_update_external_dataset_rejects_cross_tenant_external_api_id(self):
dataset = DatasetServiceUnitDataFactory.create_dataset_mock(dataset_id="dataset-1")
with (
patch(
"services.dataset_service.ExternalDatasetService.get_external_knowledge_api",
side_effect=ValueError("api template not found"),
) as get_external_knowledge_api,
patch.object(DatasetService, "_update_external_knowledge_binding") as update_binding,
patch("services.dataset_service.db") as mock_db,
):
with pytest.raises(ValueError, match="api template not found"):
DatasetService._update_external_dataset(
dataset,
{
"external_knowledge_id": "knowledge-1",
"external_knowledge_api_id": "foreign-api",
},
SimpleNamespace(id="user-1"),
)
get_external_knowledge_api.assert_called_once_with("foreign-api", dataset.tenant_id)
update_binding.assert_not_called()
mock_db.session.commit.assert_not_called()
def test_update_external_knowledge_binding_updates_changed_binding_values(self):
binding = SimpleNamespace(external_knowledge_id="old-knowledge", external_knowledge_api_id="old-api")
session = MagicMock()

View File

@@ -1560,6 +1560,17 @@ class TestExternalDatasetServiceFetchRetrieval:
with pytest.raises(ValueError, match="external knowledge binding not found"):
ExternalDatasetService.fetch_external_knowledge_retrieval("tenant-123", "dataset-123", "query", {})
@patch("services.external_knowledge_service.db")
def test_fetch_external_knowledge_retrieval_cross_tenant_api_template_error(self, mock_db, factory):
"""Test error when a binding points to an API template outside the dataset tenant."""
# Arrange
binding = factory.create_external_knowledge_binding_mock()
mock_db.session.scalar.side_effect = [binding, None]
# Act & Assert
with pytest.raises(ValueError, match="external api template not found"):
ExternalDatasetService.fetch_external_knowledge_retrieval("tenant-123", "dataset-123", "query", {})
@patch("services.external_knowledge_service.ExternalDatasetService.process_external_api")
@patch("services.external_knowledge_service.db")
def test_fetch_external_knowledge_retrieval_empty_results(self, mock_db, mock_process, factory):

716
api/uv.lock generated

File diff suppressed because it is too large Load Diff