Memory Module
This module provides memory management for LLM conversations, enabling context retention across dialogue turns.
Overview
The memory module contains two types of memory implementations:
- TokenBufferMemory - Conversation-level memory (existing)
- NodeTokenBufferMemory - Node-level memory (Chatflow only)
Note
:
NodeTokenBufferMemoryis only available in Chatflow (advanced-chat mode). This is because it requires bothconversation_idandnode_id, which are only present in Chatflow. Standard Workflow mode does not haveconversation_idand therefore cannot use node-level memory.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Memory Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────-┐ │
│ │ TokenBufferMemory │ │
│ │ Scope: Conversation │ │
│ │ Storage: Database (Message table) │ │
│ │ Key: conversation_id │ │
│ └─────────────────────────────────────────────────────────────────────-┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────-┐ │
│ │ NodeTokenBufferMemory │ │
│ │ Scope: Node within Conversation │ │
│ │ Storage: WorkflowNodeExecutionModel.outputs["context"] │ │
│ │ Key: (conversation_id, node_id, workflow_run_id) │ │
│ └─────────────────────────────────────────────────────────────────────-┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
TokenBufferMemory (Existing)
Purpose
TokenBufferMemory retrieves conversation history from the Message table and converts it to PromptMessage objects for LLM context.
Key Features
- Conversation-scoped: All messages within a conversation are candidates
- Thread-aware: Uses
parent_message_idto extract only the current thread (supports regeneration scenarios) - Token-limited: Truncates history to fit within
max_token_limit - File support: Handles
MessageFileattachments (images, documents, etc.)
Data Flow
Message Table TokenBufferMemory LLM
│ │ │
│ SELECT * FROM messages │ │
│ WHERE conversation_id = ? │ │
│ ORDER BY created_at DESC │ │
├─────────────────────────────────▶│ │
│ │ │
│ extract_thread_messages() │
│ │ │
│ build_prompt_message_with_files() │
│ │ │
│ truncate by max_token_limit │
│ │ │
│ │ Sequence[PromptMessage]
│ ├───────────────────────▶│
│ │ │
Thread Extraction
When a user regenerates a response, a new thread is created:
Message A (user)
└── Message A' (assistant)
└── Message B (user)
└── Message B' (assistant)
└── Message A'' (assistant, regenerated) ← New thread
└── Message C (user)
└── Message C' (assistant)
extract_thread_messages() traces back from the latest message using parent_message_id to get only the current thread: [A, A'', C, C']
Usage
from core.memory.token_buffer_memory import TokenBufferMemory
memory = TokenBufferMemory(conversation=conversation, model_instance=model_instance)
history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit=100)
NodeTokenBufferMemory
Purpose
NodeTokenBufferMemory provides node-scoped memory within a conversation. Each LLM node in a workflow can maintain its own independent conversation history.
Use Cases
- Multi-LLM Workflows: Different LLM nodes need separate context
- Iterative Processing: An LLM node in a loop needs to accumulate context across iterations
- Specialized Agents: Each agent node maintains its own dialogue history
Design: Zero Extra Storage
Key insight: LLM node already saves complete context in outputs["context"].
Each LLM node execution outputs:
outputs = {
"text": clean_text,
"context": self._build_context(prompt_messages, clean_text), # Complete dialogue history!
...
}
This outputs["context"] contains:
- All previous user/assistant messages (excluding system prompt)
- The current assistant response
No separate storage needed - we just read from the last execution's outputs["context"].
Benefits
| Aspect | Old Design (Object Storage) | New Design (outputs["context"]) |
|---|---|---|
| Storage | Separate JSON file | Already in WorkflowNodeExecutionModel |
| Concurrency | Race condition risk | No issue (each execution is INSERT) |
| Cleanup | Need separate cleanup task | Follows node execution lifecycle |
| Migration | Required | None |
| Complexity | High | Low |
Data Flow
WorkflowNodeExecutionModel NodeTokenBufferMemory LLM Node
│ │ │
│ │◀── get_history_prompt_messages()
│ │ │
│ SELECT outputs FROM │ │
│ workflow_node_executions │ │
│ WHERE workflow_run_id = ? │ │
│ AND node_id = ? │ │
│◀─────────────────────────────────┤ │
│ │ │
│ outputs["context"] │ │
├─────────────────────────────────▶│ │
│ │ │
│ deserialize PromptMessages │
│ │ │
│ truncate by max_token_limit │
│ │ │
│ │ Sequence[PromptMessage] │
│ ├──────────────────────────▶│
│ │ │
Thread Tracking
Thread extraction still uses Message table's parent_message_id structure:
- Query
Messagetable for conversation → get thread'sworkflow_run_ids - Get the last completed
workflow_run_idin the thread - Query
WorkflowNodeExecutionModelfor that execution'soutputs["context"]
API
class NodeTokenBufferMemory:
def __init__(
self,
app_id: str,
conversation_id: str,
node_id: str,
tenant_id: str,
model_instance: ModelInstance,
):
"""Initialize node-level memory."""
...
def get_history_prompt_messages(
self,
*,
max_token_limit: int = 2000,
message_limit: int | None = None,
) -> Sequence[PromptMessage]:
"""
Retrieve history as PromptMessage sequence.
Reads from last completed execution's outputs["context"].
"""
...
# Legacy methods (no-op, kept for compatibility)
def add_messages(self, *args, **kwargs) -> None: pass
def flush(self) -> None: pass
def clear(self) -> None: pass
Configuration
Add to MemoryConfig in core/workflow/nodes/llm/entities.py:
class MemoryMode(StrEnum):
CONVERSATION = "conversation" # Use TokenBufferMemory (default)
NODE = "node" # Use NodeTokenBufferMemory (Chatflow only)
class MemoryConfig(BaseModel):
role_prefix: RolePrefix | None = None
window: MemoryWindowConfig | None = None
query_prompt_template: str | None = None
mode: MemoryMode = MemoryMode.CONVERSATION
Mode Behavior:
| Mode | Memory Class | Scope | Availability |
|---|---|---|---|
conversation |
TokenBufferMemory | Entire conversation | All app modes |
node |
NodeTokenBufferMemory | Per-node in conversation | Chatflow only |
When
mode=nodeis used in a non-Chatflow context (no conversation_id), it falls back to no memory.
Comparison
| Feature | TokenBufferMemory | NodeTokenBufferMemory |
|---|---|---|
| Scope | Conversation | Node within Conversation |
| Storage | Database (Message table) | WorkflowNodeExecutionModel.outputs |
| Thread Support | Yes | Yes |
| File Support | Yes (via MessageFile) | Yes (via context serialization) |
| Token Limit | Yes | Yes |
| Use Case | Standard chat apps | Complex workflows |
Extending to Other Nodes
Currently only LLM Node outputs context in its outputs. To enable node memory for other nodes:
- Add
outputs["context"] = self._build_context(prompt_messages, response)in the node - The
NodeTokenBufferMemorywill automatically pick it up
Nodes that could potentially support this:
question_classifierparameter_extractoragent
Future Considerations
- Cleanup: Node memory lifecycle follows
WorkflowNodeExecutionModel, which already has cleanup mechanisms - Compression: For very long conversations, consider summarization strategies
- Extension: Other nodes may benefit from node-level memory