Files
2026-01-16 17:01:44 +08:00
..
2026-01-16 11:47:55 +08:00
2026-01-07 13:57:55 +08:00
2026-01-16 11:47:55 +08:00

Memory Module

This module provides memory management for LLM conversations, enabling context retention across dialogue turns.

Overview

The memory module contains two types of memory implementations:

  1. TokenBufferMemory - Conversation-level memory (existing)
  2. NodeTokenBufferMemory - Node-level memory (Chatflow only)

Note

: NodeTokenBufferMemory is only available in Chatflow (advanced-chat mode). This is because it requires both conversation_id and node_id, which are only present in Chatflow. Standard Workflow mode does not have conversation_id and therefore cannot use node-level memory.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Memory Architecture                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                      TokenBufferMemory                               │   │
│  │  Scope: Conversation                                                 │   │
│  │  Storage: Database (Message table)                                   │   │
│  │  Key: conversation_id                                                │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                    NodeTokenBufferMemory                             │   │
│  │  Scope: Node within Conversation                                     │   │
│  │  Storage: WorkflowNodeExecutionModel.outputs["context"]              │   │
│  │  Key: (conversation_id, node_id, workflow_run_id)                    │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

TokenBufferMemory (Existing)

Purpose

TokenBufferMemory retrieves conversation history from the Message table and converts it to PromptMessage objects for LLM context.

Key Features

  • Conversation-scoped: All messages within a conversation are candidates
  • Thread-aware: Uses parent_message_id to extract only the current thread (supports regeneration scenarios)
  • Token-limited: Truncates history to fit within max_token_limit
  • File support: Handles MessageFile attachments (images, documents, etc.)

Data Flow

Message Table                    TokenBufferMemory              LLM
     │                                  │                        │
     │  SELECT * FROM messages          │                        │
     │  WHERE conversation_id = ?       │                        │
     │  ORDER BY created_at DESC        │                        │
     ├─────────────────────────────────▶│                        │
     │                                  │                        │
     │                    extract_thread_messages()              │
     │                                  │                        │
     │                    build_prompt_message_with_files()      │
     │                                  │                        │
     │                    truncate by max_token_limit            │
     │                                  │                        │
     │                                  │  Sequence[PromptMessage]
     │                                  ├───────────────────────▶│
     │                                  │                        │

Thread Extraction

When a user regenerates a response, a new thread is created:

Message A (user)
    └── Message A' (assistant)
            └── Message B (user)
                    └── Message B' (assistant)
            └── Message A'' (assistant, regenerated)  ← New thread
                    └── Message C (user)
                            └── Message C' (assistant)

extract_thread_messages() traces back from the latest message using parent_message_id to get only the current thread: [A, A'', C, C']

Usage

from core.memory.token_buffer_memory import TokenBufferMemory

memory = TokenBufferMemory(conversation=conversation, model_instance=model_instance)
history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit=100)

NodeTokenBufferMemory

Purpose

NodeTokenBufferMemory provides node-scoped memory within a conversation. Each LLM node in a workflow can maintain its own independent conversation history.

Use Cases

  1. Multi-LLM Workflows: Different LLM nodes need separate context
  2. Iterative Processing: An LLM node in a loop needs to accumulate context across iterations
  3. Specialized Agents: Each agent node maintains its own dialogue history

Design: Zero Extra Storage

Key insight: LLM node already saves complete context in outputs["context"].

Each LLM node execution outputs:

outputs = {
    "text": clean_text,
    "context": self._build_context(prompt_messages, clean_text),  # Complete dialogue history!
    ...
}

This outputs["context"] contains:

  • All previous user/assistant messages (excluding system prompt)
  • The current assistant response

No separate storage needed - we just read from the last execution's outputs["context"].

Benefits

Aspect Old Design (Object Storage) New Design (outputs["context"])
Storage Separate JSON file Already in WorkflowNodeExecutionModel
Concurrency Race condition risk No issue (each execution is INSERT)
Cleanup Need separate cleanup task Follows node execution lifecycle
Migration Required None
Complexity High Low

Data Flow

WorkflowNodeExecutionModel        NodeTokenBufferMemory           LLM Node
     │                                  │                           │
     │                                  │◀── get_history_prompt_messages()
     │                                  │                           │
     │  SELECT outputs FROM             │                           │
     │  workflow_node_executions        │                           │
     │  WHERE workflow_run_id = ?       │                           │
     │  AND node_id = ?                 │                           │
     │◀─────────────────────────────────┤                           │
     │                                  │                           │
     │  outputs["context"]              │                           │
     ├─────────────────────────────────▶│                           │
     │                                  │                           │
     │                    deserialize PromptMessages                │
     │                                  │                           │
     │                    truncate by max_token_limit               │
     │                                  │                           │
     │                                  │  Sequence[PromptMessage]  │
     │                                  ├──────────────────────────▶│
     │                                  │                           │

Thread Tracking

Thread extraction still uses Message table's parent_message_id structure:

  1. Query Message table for conversation → get thread's workflow_run_ids
  2. Get the last completed workflow_run_id in the thread
  3. Query WorkflowNodeExecutionModel for that execution's outputs["context"]

API

class NodeTokenBufferMemory:
    def __init__(
        self,
        app_id: str,
        conversation_id: str,
        node_id: str,
        tenant_id: str,
        model_instance: ModelInstance,
    ):
        """Initialize node-level memory."""
        ...

    def get_history_prompt_messages(
        self,
        *,
        max_token_limit: int = 2000,
        message_limit: int | None = None,
    ) -> Sequence[PromptMessage]:
        """
        Retrieve history as PromptMessage sequence.
        
        Reads from last completed execution's outputs["context"].
        """
        ...

    # Legacy methods (no-op, kept for compatibility)
    def add_messages(self, *args, **kwargs) -> None: pass
    def flush(self) -> None: pass
    def clear(self) -> None: pass

Configuration

Add to MemoryConfig in core/workflow/nodes/llm/entities.py:

class MemoryMode(StrEnum):
    CONVERSATION = "conversation"  # Use TokenBufferMemory (default)
    NODE = "node"                  # Use NodeTokenBufferMemory (Chatflow only)

class MemoryConfig(BaseModel):
    role_prefix: RolePrefix | None = None
    window: MemoryWindowConfig | None = None
    query_prompt_template: str | None = None
    mode: MemoryMode = MemoryMode.CONVERSATION

Mode Behavior:

Mode Memory Class Scope Availability
conversation TokenBufferMemory Entire conversation All app modes
node NodeTokenBufferMemory Per-node in conversation Chatflow only

When mode=node is used in a non-Chatflow context (no conversation_id), it falls back to no memory.


Comparison

Feature TokenBufferMemory NodeTokenBufferMemory
Scope Conversation Node within Conversation
Storage Database (Message table) WorkflowNodeExecutionModel.outputs
Thread Support Yes Yes
File Support Yes (via MessageFile) Yes (via context serialization)
Token Limit Yes Yes
Use Case Standard chat apps Complex workflows

Extending to Other Nodes

Currently only LLM Node outputs context in its outputs. To enable node memory for other nodes:

  1. Add outputs["context"] = self._build_context(prompt_messages, response) in the node
  2. The NodeTokenBufferMemory will automatically pick it up

Nodes that could potentially support this:

  • question_classifier
  • parameter_extractor
  • agent

Future Considerations

  1. Cleanup: Node memory lifecycle follows WorkflowNodeExecutionModel, which already has cleanup mechanisms
  2. Compression: For very long conversations, consider summarization strategies
  3. Extension: Other nodes may benefit from node-level memory