Note

: NodeTokenBufferMemory is only available in Chatflow (advanced-chat mode). This is because it requires both conversation_id and node_id, which are only present in Chatflow. Standard Workflow mode does not have conversation_id and therefore cannot use node-level memory.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Memory Architecture                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                      TokenBufferMemory                               │   │
│  │  Scope: Conversation                                                 │   │
│  │  Storage: Database (Message table)                                   │   │
│  │  Key: conversation_id                                                │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                    NodeTokenBufferMemory                             │   │
│  │  Scope: Node within Conversation                                     │   │
│  │  Storage: WorkflowNodeExecutionModel.outputs["context"]              │   │
│  │  Key: (conversation_id, node_id, workflow_run_id)                    │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

TokenBufferMemory (Existing)

Purpose

TokenBufferMemory retrieves conversation history from the Message table and converts it to PromptMessage objects for LLM context.

Key Features

Conversation-scoped: All messages within a conversation are candidates
Thread-aware: Uses parent_message_id to extract only the current thread (supports regeneration scenarios)
Token-limited: Truncates history to fit within max_token_limit
File support: Handles MessageFile attachments (images, documents, etc.)

Data Flow

Message Table                    TokenBufferMemory              LLM
     │                                  │                        │
     │  SELECT * FROM messages          │                        │
     │  WHERE conversation_id = ?       │                        │
     │  ORDER BY created_at DESC        │                        │
     ├─────────────────────────────────▶│                        │
     │                                  │                        │
     │                    extract_thread_messages()              │
     │                                  │                        │
     │                    build_prompt_message_with_files()      │
     │                                  │                        │
     │                    truncate by max_token_limit            │
     │                                  │                        │
     │                                  │  Sequence[PromptMessage]
     │                                  ├───────────────────────▶│
     │                                  │                        │

Thread Extraction

When a user regenerates a response, a new thread is created:

Message A (user)
    └── Message A' (assistant)
            └── Message B (user)
                    └── Message B' (assistant)
            └── Message A'' (assistant, regenerated)  ← New thread
                    └── Message C (user)
                            └── Message C' (assistant)

extract_thread_messages() traces back from the latest message using parent_message_id to get only the current thread: [A, A'', C, C']

Usage

from core.memory.token_buffer_memory import TokenBufferMemory

memory = TokenBufferMemory(conversation=conversation, model_instance=model_instance)
history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit=100)

NodeTokenBufferMemory

Purpose

NodeTokenBufferMemory provides node-scoped memory within a conversation. Each LLM node in a workflow can maintain its own independent conversation history.

Use Cases

Multi-LLM Workflows: Different LLM nodes need separate context
Iterative Processing: An LLM node in a loop needs to accumulate context across iterations
Specialized Agents: Each agent node maintains its own dialogue history

Design: Zero Extra Storage

Key insight: LLM node already saves complete context in outputs["context"].

Each LLM node execution outputs:

outputs = {
    "text": clean_text,
    "context": self._build_context(prompt_messages, clean_text),  # Complete dialogue history!
    ...
}

This outputs["context"] contains:

All previous user/assistant messages (excluding system prompt)
The current assistant response

No separate storage needed - we just read from the last execution's outputs["context"].

Benefits

Aspect	Old Design (Object Storage)	New Design (outputs["context"])
Storage	Separate JSON file	Already in WorkflowNodeExecutionModel
Concurrency	Race condition risk	No issue (each execution is INSERT)
Cleanup	Need separate cleanup task	Follows node execution lifecycle
Migration	Required	None
Complexity	High	Low

Data Flow

WorkflowNodeExecutionModel        NodeTokenBufferMemory           LLM Node
     │                                  │                           │
     │                                  │◀── get_history_prompt_messages()
     │                                  │                           │
     │  SELECT outputs FROM             │                           │
     │  workflow_node_executions        │                           │
     │  WHERE workflow_run_id = ?       │                           │
     │  AND node_id = ?                 │                           │
     │◀─────────────────────────────────┤                           │
     │                                  │                           │
     │  outputs["context"]              │                           │
     ├─────────────────────────────────▶│                           │
     │                                  │                           │
     │                    deserialize PromptMessages                │
     │                                  │                           │
     │                    truncate by max_token_limit               │
     │                                  │                           │
     │                                  │  Sequence[PromptMessage]  │
     │                                  ├──────────────────────────▶│
     │                                  │                           │

Thread Tracking

Thread extraction still uses Message table's parent_message_id structure:

Query Message table for conversation → get thread's workflow_run_ids
Get the last completed workflow_run_id in the thread
Query WorkflowNodeExecutionModel for that execution's outputs["context"]

API

class NodeTokenBufferMemory:
    def __init__(
        self,
        app_id: str,
        conversation_id: str,
        node_id: str,
        tenant_id: str,
        model_instance: ModelInstance,
    ):
        """Initialize node-level memory."""
        ...

    def get_history_prompt_messages(
        self,
        *,
        max_token_limit: int = 2000,
        message_limit: int | None = None,
    ) -> Sequence[PromptMessage]:
        """
        Retrieve history as PromptMessage sequence.
        
        Reads from last completed execution's outputs["context"].
        """
        ...

    # Legacy methods (no-op, kept for compatibility)
    def add_messages(self, *args, **kwargs) -> None: pass
    def flush(self) -> None: pass
    def clear(self) -> None: pass

Configuration

Add to MemoryConfig in core/workflow/nodes/llm/entities.py:

class MemoryMode(StrEnum):
    CONVERSATION = "conversation"  # Use TokenBufferMemory (default)
    NODE = "node"                  # Use NodeTokenBufferMemory (Chatflow only)

class MemoryConfig(BaseModel):
    role_prefix: RolePrefix | None = None
    window: MemoryWindowConfig | None = None
    query_prompt_template: str | None = None
    mode: MemoryMode = MemoryMode.CONVERSATION

Mode Behavior:

Mode	Memory Class	Scope	Availability
`conversation`	TokenBufferMemory	Entire conversation	All app modes
`node`	NodeTokenBufferMemory	Per-node in conversation	Chatflow only

When mode=node is used in a non-Chatflow context (no conversation_id), it falls back to no memory.

Comparison

Feature	TokenBufferMemory	NodeTokenBufferMemory
Scope	Conversation	Node within Conversation
Storage	Database (Message table)	WorkflowNodeExecutionModel.outputs
Thread Support	Yes	Yes
File Support	Yes (via MessageFile)	Yes (via context serialization)
Token Limit	Yes	Yes
Use Case	Standard chat apps	Complex workflows

Extending to Other Nodes

Currently only LLM Node outputs context in its outputs. To enable node memory for other nodes:

Add outputs["context"] = self._build_context(prompt_messages, response) in the node
The NodeTokenBufferMemory will automatically pick it up

Nodes that could potentially support this:

question_classifier
parameter_extractor
agent

Future Considerations

Cleanup: Node memory lifecycle follows WorkflowNodeExecutionModel, which already has cleanup mechanisms
Compression: For very long conversations, consider summarization strategies
Extension: Other nodes may benefit from node-level memory