mirror of
https://github.com/langgenius/dify.git
synced 2026-02-28 12:25:14 +00:00
Merge branch 'main' into 2-25-vinext
This commit is contained in:
168
.agents/skills/backend-code-review/SKILL.md
Normal file
168
.agents/skills/backend-code-review/SKILL.md
Normal file
@@ -0,0 +1,168 @@
|
||||
---
|
||||
name: backend-code-review
|
||||
description: Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `api/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review.
|
||||
---
|
||||
|
||||
# Backend Code Review
|
||||
|
||||
## When to use this skill
|
||||
|
||||
Use this skill whenever the user asks to **review, analyze, or improve** backend code (e.g., `.py`) under the `api/` directory. Supports the following review modes:
|
||||
|
||||
- **Pending-change review**: when the user asks to review current changes (inspect staged/working-tree files slated for commit to get the changes).
|
||||
- **Code snippets review**: when the user pastes code snippets (e.g., a function/class/module excerpt) into the chat and asks for a review.
|
||||
- **File-focused review**: when the user points to specific files and asks for a review of those files (one file or a small, explicit set of files, e.g., `api/...`, `api/app.py`).
|
||||
|
||||
Do NOT use this skill when:
|
||||
|
||||
- The request is about frontend code or UI (e.g., `.tsx`, `.ts`, `.js`, `web/`).
|
||||
- The user is not asking for a review/analysis/improvement of backend code.
|
||||
- The scope is not under `api/` (unless the user explicitly asks to review backend-related changes outside `api/`).
|
||||
|
||||
## How to use this skill
|
||||
|
||||
Follow these steps when using this skill:
|
||||
|
||||
1. **Identify the review mode** (pending-change vs snippet vs file-focused) based on the user’s input. Keep the scope tight: review only what the user provided or explicitly referenced.
|
||||
2. Follow the rules defined in **Checklist** to perform the review. If no Checklist rule matches, apply **General Review Rules** as a fallback to perform the best-effort review.
|
||||
3. Compose the final output strictly follow the **Required Output Format**.
|
||||
|
||||
Notes when using this skill:
|
||||
- Always include actionable fixes or suggestions (including possible code snippets).
|
||||
- Use best-effort `File:Line` references when a file path and line numbers are available; otherwise, use the most specific identifier you can.
|
||||
|
||||
## Checklist
|
||||
|
||||
- db schema design: if the review scope includes code/files under `api/models/` or `api/migrations/`, follow [references/db-schema-rule.md](references/db-schema-rule.md) to perform the review
|
||||
- architecture: if the review scope involves controller/service/core-domain/libs/model layering, dependency direction, or moving responsibilities across modules, follow [references/architecture-rule.md](references/architecture-rule.md) to perform the review
|
||||
- repositories abstraction: if the review scope contains table/model operations (e.g., `select(...)`, `session.execute(...)`, joins, CRUD) and is not under `api/repositories`, `api/core/repositories`, or `api/extensions/*/repositories/`, follow [references/repositories-rule.md](references/repositories-rule.md) to perform the review
|
||||
- sqlalchemy patterns: if the review scope involves SQLAlchemy session/query usage, db transaction/crud usage, or raw SQL usage, follow [references/sqlalchemy-rule.md](references/sqlalchemy-rule.md) to perform the review
|
||||
|
||||
## General Review Rules
|
||||
|
||||
### 1. Security Review
|
||||
|
||||
Check for:
|
||||
- SQL injection vulnerabilities
|
||||
- Server-Side Request Forgery (SSRF)
|
||||
- Command injection
|
||||
- Insecure deserialization
|
||||
- Hardcoded secrets/credentials
|
||||
- Improper authentication/authorization
|
||||
- Insecure direct object references
|
||||
|
||||
### 2. Performance Review
|
||||
|
||||
Check for:
|
||||
- N+1 queries
|
||||
- Missing database indexes
|
||||
- Memory leaks
|
||||
- Blocking operations in async code
|
||||
- Missing caching opportunities
|
||||
|
||||
### 3. Code Quality Review
|
||||
|
||||
Check for:
|
||||
- Code forward compatibility
|
||||
- Code duplication (DRY violations)
|
||||
- Functions doing too much (SRP violations)
|
||||
- Deep nesting / complex conditionals
|
||||
- Magic numbers/strings
|
||||
- Poor naming
|
||||
- Missing error handling
|
||||
- Incomplete type coverage
|
||||
|
||||
### 4. Testing Review
|
||||
|
||||
Check for:
|
||||
- Missing test coverage for new code
|
||||
- Tests that don't test behavior
|
||||
- Flaky test patterns
|
||||
- Missing edge cases
|
||||
|
||||
## Required Output Format
|
||||
|
||||
When this skill invoked, the response must exactly follow one of the two templates:
|
||||
|
||||
### Template A (any findings)
|
||||
|
||||
```markdown
|
||||
# Code Review Summary
|
||||
|
||||
Found <X> critical issues need to be fixed:
|
||||
|
||||
## 🔴 Critical (Must Fix)
|
||||
|
||||
### 1. <brief description of the issue>
|
||||
|
||||
FilePath: <path> line <line>
|
||||
<relevant code snippet or pointer>
|
||||
|
||||
#### Explanation
|
||||
|
||||
<detailed explanation and references of the issue>
|
||||
|
||||
#### Suggested Fix
|
||||
|
||||
1. <brief description of suggested fix>
|
||||
2. <code example> (optional, omit if not applicable)
|
||||
|
||||
---
|
||||
... (repeat for each critical issue) ...
|
||||
|
||||
Found <Y> suggestions for improvement:
|
||||
|
||||
## 🟡 Suggestions (Should Consider)
|
||||
|
||||
### 1. <brief description of the suggestion>
|
||||
|
||||
FilePath: <path> line <line>
|
||||
<relevant code snippet or pointer>
|
||||
|
||||
#### Explanation
|
||||
|
||||
<detailed explanation and references of the suggestion>
|
||||
|
||||
#### Suggested Fix
|
||||
|
||||
1. <brief description of suggested fix>
|
||||
2. <code example> (optional, omit if not applicable)
|
||||
|
||||
---
|
||||
... (repeat for each suggestion) ...
|
||||
|
||||
Found <Z> optional nits:
|
||||
|
||||
## 🟢 Nits (Optional)
|
||||
### 1. <brief description of the nit>
|
||||
|
||||
FilePath: <path> line <line>
|
||||
<relevant code snippet or pointer>
|
||||
|
||||
#### Explanation
|
||||
|
||||
<explanation and references of the optional nit>
|
||||
|
||||
#### Suggested Fix
|
||||
|
||||
- <minor suggestions>
|
||||
|
||||
---
|
||||
... (repeat for each nits) ...
|
||||
|
||||
## ✅ What's Good
|
||||
|
||||
- <Positive feedback on good patterns>
|
||||
```
|
||||
|
||||
- If there are no critical issues or suggestions or option nits or good points, just omit that section.
|
||||
- If the issue number is more than 10, summarize as "Found 10+ critical issues/suggestions/optional nits" and only output the first 10 items.
|
||||
- Don't compress the blank lines between sections; keep them as-is for readability.
|
||||
- If there is any issue requires code changes, append a brief follow-up question to ask whether the user wants to apply the fix(es) after the structured output. For example: "Would you like me to use the Suggested fix(es) to address these issues?"
|
||||
|
||||
### Template B (no issues)
|
||||
|
||||
```markdown
|
||||
## Code Review Summary
|
||||
✅ No issues found.
|
||||
```
|
||||
@@ -0,0 +1,91 @@
|
||||
# Rule Catalog — Architecture
|
||||
|
||||
## Scope
|
||||
- Covers: controller/service/core-domain/libs/model layering, dependency direction, responsibility placement, observability-friendly flow.
|
||||
|
||||
## Rules
|
||||
|
||||
### Keep business logic out of controllers
|
||||
- Category: maintainability
|
||||
- Severity: critical
|
||||
- Description: Controllers should parse input, call services, and return serialized responses. Business decisions inside controllers make behavior hard to reuse and test.
|
||||
- Suggested fix: Move domain/business logic into the service or core/domain layer. Keep controller handlers thin and orchestration-focused.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
@bp.post("/apps/<app_id>/publish")
|
||||
def publish_app(app_id: str):
|
||||
payload = request.get_json() or {}
|
||||
if payload.get("force") and current_user.role != "admin":
|
||||
raise ValueError("only admin can force publish")
|
||||
app = App.query.get(app_id)
|
||||
app.status = "published"
|
||||
db.session.commit()
|
||||
return {"result": "ok"}
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
@bp.post("/apps/<app_id>/publish")
|
||||
def publish_app(app_id: str):
|
||||
payload = PublishRequest.model_validate(request.get_json() or {})
|
||||
app_service.publish_app(app_id=app_id, force=payload.force, actor_id=current_user.id)
|
||||
return {"result": "ok"}
|
||||
```
|
||||
|
||||
### Preserve layer dependency direction
|
||||
- Category: best practices
|
||||
- Severity: critical
|
||||
- Description: Controllers may depend on services, and services may depend on core/domain abstractions. Reversing this direction (for example, core importing controller/web modules) creates cycles and leaks transport concerns into domain code.
|
||||
- Suggested fix: Extract shared contracts into core/domain or service-level modules and make upper layers depend on lower, not the reverse.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
# core/policy/publish_policy.py
|
||||
from controllers.console.app import request_context
|
||||
|
||||
def can_publish() -> bool:
|
||||
return request_context.current_user.is_admin
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# core/policy/publish_policy.py
|
||||
def can_publish(role: str) -> bool:
|
||||
return role == "admin"
|
||||
|
||||
# service layer adapts web/user context to domain input
|
||||
allowed = can_publish(role=current_user.role)
|
||||
```
|
||||
|
||||
### Keep libs business-agnostic
|
||||
- Category: maintainability
|
||||
- Severity: critical
|
||||
- Description: Modules under `api/libs/` should remain reusable, business-agnostic building blocks. They must not encode product/domain-specific rules, workflow orchestration, or business decisions.
|
||||
- Suggested fix:
|
||||
- If business logic appears in `api/libs/`, extract it into the appropriate `services/` or `core/` module and keep `libs` focused on generic, cross-cutting helpers.
|
||||
- Keep `libs` dependencies clean: avoid importing service/controller/domain-specific modules into `api/libs/`.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
# api/libs/conversation_filter.py
|
||||
from services.conversation_service import ConversationService
|
||||
|
||||
def should_archive_conversation(conversation, tenant_id: str) -> bool:
|
||||
# Domain policy and service dependency are leaking into libs.
|
||||
service = ConversationService()
|
||||
if service.has_paid_plan(tenant_id):
|
||||
return conversation.idle_days > 90
|
||||
return conversation.idle_days > 30
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# api/libs/datetime_utils.py (business-agnostic helper)
|
||||
def older_than_days(idle_days: int, threshold_days: int) -> bool:
|
||||
return idle_days > threshold_days
|
||||
|
||||
# services/conversation_service.py (business logic stays in service/core)
|
||||
from libs.datetime_utils import older_than_days
|
||||
|
||||
def should_archive_conversation(conversation, tenant_id: str) -> bool:
|
||||
threshold_days = 90 if has_paid_plan(tenant_id) else 30
|
||||
return older_than_days(conversation.idle_days, threshold_days)
|
||||
```
|
||||
157
.agents/skills/backend-code-review/references/db-schema-rule.md
Normal file
157
.agents/skills/backend-code-review/references/db-schema-rule.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Rule Catalog — DB Schema Design
|
||||
|
||||
## Scope
|
||||
- Covers: model/base inheritance, schema boundaries in model properties, tenant-aware schema design, index redundancy checks, dialect portability in models, and cross-database compatibility in migrations.
|
||||
- Does NOT cover: session lifecycle, transaction boundaries, and query execution patterns (handled by `sqlalchemy-rule.md`).
|
||||
|
||||
## Rules
|
||||
|
||||
### Do not query other tables inside `@property`
|
||||
- Category: [maintainability, performance]
|
||||
- Severity: critical
|
||||
- Description: A model `@property` must not open sessions or query other tables. This hides dependencies across models, tightly couples schema objects to data access, and can cause N+1 query explosions when iterating collections.
|
||||
- Suggested fix:
|
||||
- Keep model properties pure and local to already-loaded fields.
|
||||
- Move cross-table data fetching to service/repository methods.
|
||||
- For list/batch reads, fetch required related data explicitly (join/preload/bulk query) before rendering derived values.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
class Conversation(TypeBase):
|
||||
__tablename__ = "conversations"
|
||||
|
||||
@property
|
||||
def app_name(self) -> str:
|
||||
with Session(db.engine, expire_on_commit=False) as session:
|
||||
app = session.execute(select(App).where(App.id == self.app_id)).scalar_one()
|
||||
return app.name
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
class Conversation(TypeBase):
|
||||
__tablename__ = "conversations"
|
||||
|
||||
@property
|
||||
def display_title(self) -> str:
|
||||
return self.name or "Untitled"
|
||||
|
||||
|
||||
# Service/repository layer performs explicit batch fetch for related App rows.
|
||||
```
|
||||
|
||||
### Prefer including `tenant_id` in model definitions
|
||||
- Category: maintainability
|
||||
- Severity: suggestion
|
||||
- Description: In multi-tenant domains, include `tenant_id` in schema definitions whenever the entity belongs to tenant-owned data. This improves data isolation safety and keeps future partitioning/sharding strategies practical as data volume grows.
|
||||
- Suggested fix:
|
||||
- Add a `tenant_id` column and ensure related unique/index constraints include tenant dimension when applicable.
|
||||
- Propagate `tenant_id` through service/repository contracts to keep access paths tenant-aware.
|
||||
- Exception: if a table is explicitly designed as non-tenant-scoped global metadata, document that design decision clearly.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
from sqlalchemy.orm import Mapped
|
||||
|
||||
class Dataset(TypeBase):
|
||||
__tablename__ = "datasets"
|
||||
id: Mapped[str] = mapped_column(StringUUID, primary_key=True)
|
||||
name: Mapped[str] = mapped_column(sa.String(255), nullable=False)
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
from sqlalchemy.orm import Mapped
|
||||
|
||||
class Dataset(TypeBase):
|
||||
__tablename__ = "datasets"
|
||||
id: Mapped[str] = mapped_column(StringUUID, primary_key=True)
|
||||
tenant_id: Mapped[str] = mapped_column(StringUUID, nullable=False, index=True)
|
||||
name: Mapped[str] = mapped_column(sa.String(255), nullable=False)
|
||||
```
|
||||
|
||||
### Detect and avoid duplicate/redundant indexes
|
||||
- Category: performance
|
||||
- Severity: suggestion
|
||||
- Description: Review index definitions for leftmost-prefix redundancy. For example, index `(a, b, c)` can safely cover most lookups for `(a, b)`. Keeping both may increase write overhead and can mislead the optimizer into suboptimal execution plans.
|
||||
- Suggested fix:
|
||||
- Before adding an index, compare against existing composite indexes by leftmost-prefix rules.
|
||||
- Drop or avoid creating redundant prefixes unless there is a proven query-pattern need.
|
||||
- Apply the same review standard in both model `__table_args__` and migration index DDL.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
__table_args__ = (
|
||||
sa.Index("idx_msg_tenant_app", "tenant_id", "app_id"),
|
||||
sa.Index("idx_msg_tenant_app_created", "tenant_id", "app_id", "created_at"),
|
||||
)
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
__table_args__ = (
|
||||
# Keep the wider index unless profiling proves a dedicated short index is needed.
|
||||
sa.Index("idx_msg_tenant_app_created", "tenant_id", "app_id", "created_at"),
|
||||
)
|
||||
```
|
||||
|
||||
### Avoid PostgreSQL-only dialect usage in models; wrap in `models.types`
|
||||
- Category: maintainability
|
||||
- Severity: critical
|
||||
- Description: Model/schema definitions should avoid PostgreSQL-only constructs directly in business models. When database-specific behavior is required, encapsulate it in `api/models/types.py` using both PostgreSQL and MySQL dialect implementations, then consume that abstraction from model code.
|
||||
- Suggested fix:
|
||||
- Do not directly place dialect-only types/operators in model columns when a portable wrapper can be used.
|
||||
- Add or extend wrappers in `models.types` (for example, `AdjustedJSON`, `LongText`, `BinaryData`) to normalize behavior across PostgreSQL and MySQL.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
from sqlalchemy.dialects.postgresql import JSONB
|
||||
from sqlalchemy.orm import Mapped
|
||||
|
||||
class ToolConfig(TypeBase):
|
||||
__tablename__ = "tool_configs"
|
||||
config: Mapped[dict] = mapped_column(JSONB, nullable=False)
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
from sqlalchemy.orm import Mapped
|
||||
|
||||
from models.types import AdjustedJSON
|
||||
|
||||
class ToolConfig(TypeBase):
|
||||
__tablename__ = "tool_configs"
|
||||
config: Mapped[dict] = mapped_column(AdjustedJSON(), nullable=False)
|
||||
```
|
||||
|
||||
### Guard migration incompatibilities with dialect checks and shared types
|
||||
- Category: maintainability
|
||||
- Severity: critical
|
||||
- Description: Migration scripts under `api/migrations/versions/` must account for PostgreSQL/MySQL incompatibilities explicitly. For dialect-sensitive DDL or defaults, branch on the active dialect (for example, `conn.dialect.name == "postgresql"`), and prefer reusable compatibility abstractions from `models.types` where applicable.
|
||||
- Suggested fix:
|
||||
- In migration upgrades/downgrades, bind connection and branch by dialect for incompatible SQL fragments.
|
||||
- Reuse `models.types` wrappers in column definitions when that keeps behavior aligned with runtime models.
|
||||
- Avoid one-dialect-only migration logic unless there is a documented, deliberate compatibility exception.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
with op.batch_alter_table("dataset_keyword_tables") as batch_op:
|
||||
batch_op.add_column(
|
||||
sa.Column(
|
||||
"data_source_type",
|
||||
sa.String(255),
|
||||
server_default=sa.text("'database'::character varying"),
|
||||
nullable=False,
|
||||
)
|
||||
)
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
def _is_pg(conn) -> bool:
|
||||
return conn.dialect.name == "postgresql"
|
||||
|
||||
|
||||
conn = op.get_bind()
|
||||
default_expr = sa.text("'database'::character varying") if _is_pg(conn) else sa.text("'database'")
|
||||
|
||||
with op.batch_alter_table("dataset_keyword_tables") as batch_op:
|
||||
batch_op.add_column(
|
||||
sa.Column("data_source_type", sa.String(255), server_default=default_expr, nullable=False)
|
||||
)
|
||||
```
|
||||
@@ -0,0 +1,61 @@
|
||||
# Rule Catalog - Repositories Abstraction
|
||||
|
||||
## Scope
|
||||
- Covers: when to reuse existing repository abstractions, when to introduce new repositories, and how to preserve dependency direction between service/core and infrastructure implementations.
|
||||
- Does NOT cover: SQLAlchemy session lifecycle and query-shape specifics (handled by `sqlalchemy-rule.md`), and table schema/migration design (handled by `db-schema-rule.md`).
|
||||
|
||||
## Rules
|
||||
|
||||
### Introduce repositories abstraction
|
||||
- Category: maintainability
|
||||
- Severity: suggestion
|
||||
- Description: If a table/model already has a repository abstraction, all reads/writes/queries for that table should use the existing repository. If no repository exists, introduce one only when complexity justifies it, such as large/high-volume tables, repeated complex query logic, or likely storage-strategy variation.
|
||||
- Suggested fix:
|
||||
- First check `api/repositories`, `api/core/repositories`, and `api/extensions/*/repositories/` to verify whether the table/model already has a repository abstraction. If it exists, route all operations through it and add missing repository methods instead of bypassing it with ad-hoc SQLAlchemy access.
|
||||
- If no repository exists, add one only when complexity warrants it (for example, repeated complex queries, large data domains, or multiple storage strategies), while preserving dependency direction (service/core depends on abstraction; infra provides implementation).
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
# Existing repository is ignored and service uses ad-hoc table queries.
|
||||
class AppService:
|
||||
def archive_app(self, app_id: str, tenant_id: str) -> None:
|
||||
app = self.session.execute(
|
||||
select(App).where(App.id == app_id, App.tenant_id == tenant_id)
|
||||
).scalar_one()
|
||||
app.archived = True
|
||||
self.session.commit()
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# Case A: Existing repository must be reused for all table operations.
|
||||
class AppService:
|
||||
def archive_app(self, app_id: str, tenant_id: str) -> None:
|
||||
app = self.app_repo.get_by_id(app_id=app_id, tenant_id=tenant_id)
|
||||
app.archived = True
|
||||
self.app_repo.save(app)
|
||||
|
||||
# If the query is missing, extend the existing abstraction.
|
||||
active_apps = self.app_repo.list_active_for_tenant(tenant_id=tenant_id)
|
||||
```
|
||||
- Bad:
|
||||
```python
|
||||
# No repository exists, but large-domain query logic is scattered in service code.
|
||||
class ConversationService:
|
||||
def list_recent_for_app(self, app_id: str, tenant_id: str, limit: int) -> list[Conversation]:
|
||||
...
|
||||
# many filters/joins/pagination variants duplicated across services
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# Case B: Introduce repository for large/complex domains or storage variation.
|
||||
class ConversationRepository(Protocol):
|
||||
def list_recent_for_app(self, app_id: str, tenant_id: str, limit: int) -> list[Conversation]: ...
|
||||
|
||||
class SqlAlchemyConversationRepository:
|
||||
def list_recent_for_app(self, app_id: str, tenant_id: str, limit: int) -> list[Conversation]:
|
||||
...
|
||||
|
||||
class ConversationService:
|
||||
def __init__(self, conversation_repo: ConversationRepository):
|
||||
self.conversation_repo = conversation_repo
|
||||
```
|
||||
139
.agents/skills/backend-code-review/references/sqlalchemy-rule.md
Normal file
139
.agents/skills/backend-code-review/references/sqlalchemy-rule.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Rule Catalog — SQLAlchemy Patterns
|
||||
|
||||
## Scope
|
||||
- Covers: SQLAlchemy session and transaction lifecycle, query construction, tenant scoping, raw SQL boundaries, and write-path concurrency safeguards.
|
||||
- Does NOT cover: table/model schema and migration design details (handled by `db-schema-rule.md`).
|
||||
|
||||
## Rules
|
||||
|
||||
### Use Session context manager with explicit transaction control behavior
|
||||
- Category: best practices
|
||||
- Severity: critical
|
||||
- Description: Session and transaction lifecycle must be explicit and bounded on write paths. Missing commits can silently drop intended updates, while ad-hoc or long-lived transactions increase contention, lock duration, and deadlock risk.
|
||||
- Suggested fix:
|
||||
- Use **explicit `session.commit()`** after completing a related write unit.
|
||||
- Or use **`session.begin()` context manager** for automatic commit/rollback on a scoped block.
|
||||
- Keep transaction windows short: avoid network I/O, heavy computation, or unrelated work inside the transaction.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
# Missing commit: write may never be persisted.
|
||||
with Session(db.engine, expire_on_commit=False) as session:
|
||||
run = session.get(WorkflowRun, run_id)
|
||||
run.status = "cancelled"
|
||||
|
||||
# Long transaction: external I/O inside a DB transaction.
|
||||
with Session(db.engine, expire_on_commit=False) as session, session.begin():
|
||||
run = session.get(WorkflowRun, run_id)
|
||||
run.status = "cancelled"
|
||||
call_external_api()
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# Option 1: explicit commit.
|
||||
with Session(db.engine, expire_on_commit=False) as session:
|
||||
run = session.get(WorkflowRun, run_id)
|
||||
run.status = "cancelled"
|
||||
session.commit()
|
||||
|
||||
# Option 2: scoped transaction with automatic commit/rollback.
|
||||
with Session(db.engine, expire_on_commit=False) as session, session.begin():
|
||||
run = session.get(WorkflowRun, run_id)
|
||||
run.status = "cancelled"
|
||||
|
||||
# Keep non-DB work outside transaction scope.
|
||||
call_external_api()
|
||||
```
|
||||
|
||||
### Enforce tenant_id scoping on shared-resource queries
|
||||
- Category: security
|
||||
- Severity: critical
|
||||
- Description: Reads and writes against shared tables must be scoped by `tenant_id` to prevent cross-tenant data leakage or corruption.
|
||||
- Suggested fix: Add `tenant_id` predicate to all tenant-owned entity queries and propagate tenant context through service/repository interfaces.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
stmt = select(Workflow).where(Workflow.id == workflow_id)
|
||||
workflow = session.execute(stmt).scalar_one_or_none()
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
stmt = select(Workflow).where(
|
||||
Workflow.id == workflow_id,
|
||||
Workflow.tenant_id == tenant_id,
|
||||
)
|
||||
workflow = session.execute(stmt).scalar_one_or_none()
|
||||
```
|
||||
|
||||
### Prefer SQLAlchemy expressions over raw SQL by default
|
||||
- Category: maintainability
|
||||
- Severity: suggestion
|
||||
- Description: Raw SQL should be exceptional. ORM/Core expressions are easier to evolve, safer to compose, and more consistent with the codebase.
|
||||
- Suggested fix: Rewrite straightforward raw SQL into SQLAlchemy `select/update/delete` expressions; keep raw SQL only when required by clear technical constraints.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
row = session.execute(
|
||||
text("SELECT * FROM workflows WHERE id = :id AND tenant_id = :tenant_id"),
|
||||
{"id": workflow_id, "tenant_id": tenant_id},
|
||||
).first()
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
stmt = select(Workflow).where(
|
||||
Workflow.id == workflow_id,
|
||||
Workflow.tenant_id == tenant_id,
|
||||
)
|
||||
row = session.execute(stmt).scalar_one_or_none()
|
||||
```
|
||||
|
||||
### Protect write paths with concurrency safeguards
|
||||
- Category: quality
|
||||
- Severity: critical
|
||||
- Description: Multi-writer paths without explicit concurrency control can silently overwrite data. Choose the safeguard based on contention level, lock scope, and throughput cost instead of defaulting to one strategy.
|
||||
- Suggested fix:
|
||||
- **Optimistic locking**: Use when contention is usually low and retries are acceptable. Add a version (or updated_at) guard in `WHERE` and treat `rowcount == 0` as a conflict.
|
||||
- **Redis distributed lock**: Use when the critical section spans multiple steps/processes (or includes non-DB side effects) and you need cross-worker mutual exclusion.
|
||||
- **SELECT ... FOR UPDATE**: Use when contention is high on the same rows and strict in-transaction serialization is required. Keep transactions short to reduce lock wait/deadlock risk.
|
||||
- In all cases, scope by `tenant_id` and verify affected row counts for conditional writes.
|
||||
- Example:
|
||||
- Bad:
|
||||
```python
|
||||
# No tenant scope, no conflict detection, and no lock on a contested write path.
|
||||
session.execute(update(WorkflowRun).where(WorkflowRun.id == run_id).values(status="cancelled"))
|
||||
session.commit() # silently overwrites concurrent updates
|
||||
```
|
||||
- Good:
|
||||
```python
|
||||
# 1) Optimistic lock (low contention, retry on conflict)
|
||||
result = session.execute(
|
||||
update(WorkflowRun)
|
||||
.where(
|
||||
WorkflowRun.id == run_id,
|
||||
WorkflowRun.tenant_id == tenant_id,
|
||||
WorkflowRun.version == expected_version,
|
||||
)
|
||||
.values(status="cancelled", version=WorkflowRun.version + 1)
|
||||
)
|
||||
if result.rowcount == 0:
|
||||
raise WorkflowStateConflictError("stale version, retry")
|
||||
|
||||
# 2) Redis distributed lock (cross-worker critical section)
|
||||
lock_name = f"workflow_run_lock:{tenant_id}:{run_id}"
|
||||
with redis_client.lock(lock_name, timeout=20):
|
||||
session.execute(
|
||||
update(WorkflowRun)
|
||||
.where(WorkflowRun.id == run_id, WorkflowRun.tenant_id == tenant_id)
|
||||
.values(status="cancelled")
|
||||
)
|
||||
session.commit()
|
||||
|
||||
# 3) Pessimistic lock with SELECT ... FOR UPDATE (high contention)
|
||||
run = session.execute(
|
||||
select(WorkflowRun)
|
||||
.where(WorkflowRun.id == run_id, WorkflowRun.tenant_id == tenant_id)
|
||||
.with_for_update()
|
||||
).scalar_one()
|
||||
run.status = "cancelled"
|
||||
session.commit()
|
||||
```
|
||||
1
.claude/skills/backend-code-review
Symbolic link
1
.claude/skills/backend-code-review
Symbolic link
@@ -0,0 +1 @@
|
||||
../../.agents/skills/backend-code-review
|
||||
18
.github/workflows/pyrefly-diff.yml
vendored
18
.github/workflows/pyrefly-diff.yml
vendored
@@ -74,14 +74,16 @@ jobs:
|
||||
}
|
||||
|
||||
const body = diff.trim()
|
||||
? `### Pyrefly Diff
|
||||
<details>
|
||||
<summary>base → PR</summary>
|
||||
|
||||
\`\`\`diff
|
||||
${diff}
|
||||
\`\`\`
|
||||
</details>`
|
||||
? [
|
||||
'### Pyrefly Diff',
|
||||
'<details>',
|
||||
'<summary>base → PR</summary>',
|
||||
'',
|
||||
'```diff',
|
||||
diff,
|
||||
'```',
|
||||
'</details>',
|
||||
].join('\n')
|
||||
: '### Pyrefly Diff\nNo changes detected.';
|
||||
|
||||
await github.rest.issues.createComment({
|
||||
|
||||
@@ -669,16 +669,14 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):
|
||||
) -> Generator[StreamResponse, None, None]:
|
||||
"""Handle retriever resources events."""
|
||||
self._message_cycle_manager.handle_retriever_resources(event)
|
||||
return
|
||||
yield # Make this a generator
|
||||
yield from ()
|
||||
|
||||
def _handle_annotation_reply_event(
|
||||
self, event: QueueAnnotationReplyEvent, **kwargs
|
||||
) -> Generator[StreamResponse, None, None]:
|
||||
"""Handle annotation reply events."""
|
||||
self._message_cycle_manager.handle_annotation_reply(event)
|
||||
return
|
||||
yield # Make this a generator
|
||||
yield from ()
|
||||
|
||||
def _handle_message_replace_event(
|
||||
self, event: QueueMessageReplaceEvent, **kwargs
|
||||
|
||||
Reference in New Issue
Block a user