Memory System

Noorle agents use a three-tier memory system that balances efficiency, context retention, and cost. This design prevents token waste and ensures agents remember important information across long conversations.

Why Three Tiers?

Without memory management, long conversations become expensive: Result: 25x more efficient conversations, same context quality.

The Three Tiers

Tier 1: Working Memory (Recent Context)

Storage: Redis (in-memory cache) Lifetime: Hours to days Scope: Last N messages (configurable, default 20) Access: Every inference What it contains:

Most recent user messages
Agent responses
Tool call results
Current context and state

Example:

Working Memory (last 5 messages)
├─ User: "Search for AI trends"
├─ Agent: "I'll search the web..."
├─ Tool Call: web_search(query="AI trends")
├─ Tool Result: ["Article 1: ...", "Article 2: ..."]
└─ Agent: "Based on the search, here are trends..."

Why fast access matters:

Included in every LLM prompt
If latency > 100ms, user perceives slowness
Redis provides sub-millisecond access
In-process cache provides fallback

Configuration:

{
  "memory_config": {
    "working_memory_size": 20,        // messages to keep
    "working_memory_ttl": 86400       // seconds (1 day)
  }
}

Tier 2: Summary Memory (Compressed History)

Storage: Object storage + Redis cache Lifetime: Days to months Scope: Summarized sessions and key insights Access: As needed (cached for 1 hour) What happens: When working memory reaches capacity (20 messages), old messages are summarized:

Situation:
  Working memory is full (20 messages)
  New message arrives
  Need to make room

Process:
  1. Compress oldest 10 messages into summary
     "User asked about AI trends. Agent researched
      and identified 3 key developments: 1) Multimodal
      models improving, 2) Cost decreasing, 3) Enterprise
      adoption accelerating."

  2. Store summary in object storage with metadata
     {
       "id": "summary-123",
       "time_range": "2024-03-01 to 2024-03-05",
       "summary": "...",
       "key_facts": ["...", "..."],
       "embedding": [0.12, 0.34, ...]
     }

  3. Cache summary in Redis

  4. Discard original messages from working memory

  5. Working memory now has space for new message

Token efficiency:

10 original messages = 5,000 tokens
1 LLM-generated summary = 500 tokens

Compression ratio: 10x reduction

Retrieval: Agent can access summaries when needed:

Tier 3: Archive (Long-term Storage)

Storage: Object storage (immutable) Lifetime: Forever (compliance) Scope: All history beyond summaries Access: Rarely (audit, compliance) What it contains:

Message journal (all original messages)
Summaries older than 1 month
Complete conversation transcript
Legal hold data (if applicable)

Never deleted (except by explicit request). Used for:

Compliance: Audit trail of all agent activity
Legal discovery: Retrieve conversations from specific date range
Debugging: Understand what happened in past conversations
ML training: Fine-tune models on real conversations (with consent)

Example access:

# Retrieve all messages from agent during March 2024
GET /api/agents/{agent_id}/archive?start=2024-03-01&end=2024-03-31

Response:
{
  "messages": [
    {
      "timestamp": "2024-03-05T10:30:00Z",
      "role": "user",
      "content": "What are market trends?"
    },
    {
      "timestamp": "2024-03-05T10:31:00Z",
      "role": "assistant",
      "content": "..."
    }
  ]
}

Memory Flow Over Time

Semantic Search in Memory

Memory summaries are embedded as vectors for semantic search: How it works:

Summary is created
LLM extracts key topics
Topics are embedded as vectors
Vectors stored with summary
Semantic queries find relevant summaries

Configuration

Fine-tune memory behavior per agent:

{
  "memory_config": {
    "working_memory_size": 20,          // messages
    "working_memory_ttl": 86400,        // 1 day (seconds)
    "enable_summarization": true,
    "summarization_threshold": 20,      // summarize when this many messages
    "summary_window_size": 10,          // compress 10 messages at a time
    "enable_semantic_search": true,     // enable vector search
    "archive_retention_days": 2555      // keep archive for 7 years
  }
}

Trade-offs:

Setting	Cheap	Expensive
`working_memory_size`	5 messages	100 messages
`enable_summarization`	false	true
`summarization_threshold`	10 (frequent)	100 (rare)
`enable_semantic_search`	false	true

Increase for more context, decrease for cost savings.

Example: Long Conversation

User conversation over 1 month: 1000+ messages

Day 1-5:  Working memory captures 20 messages
          Discussions about AI trends, pricing, implementation

Day 6-10: Messages M1-M10 summarized
          Summaries stored, original messages removed
          New messages M21-M30 added to working memory

Day 20:   Users asks: "What was the conclusion about pricing?"
          Agent searches summaries (not working memory)
          Finds: "User concerned about costs. Recommended
                  tiered pricing model."
          Agent answers: "Based on our discussion, we
                        concluded that tiered pricing
                        balances cost and features."

Month 2:  All summaries and archives available
          but working memory only contains recent chat
          Token usage: minimal, cost: low

Year 2:   Audit request: "Show all conversations from 2024"
          Retrieve from archive: 365 days of history
          Complete transcript with all messages
          Used for compliance verification

Memory Best Practices

Enable Summarization

For conversations longer than 1 hour. Reduces cost 10x.

Set Appropriate TTL

Working memory 1 day for customer support, 1 hour for real-time agents.

Use Semantic Search

Enable for agents that need historical context. Helps find relevant info.

Archive for Compliance

Always keep archives for regulatory requirements. 7 years typical.

Memory Costs

For current pricing, see Pricing.

Next: Learn about Knowledge and RAG for semantic search.

Getting Started

Core Concepts

Security & Auth

Why Three Tiers?

The Three Tiers

Tier 1: Working Memory (Recent Context)

Tier 2: Summary Memory (Compressed History)

Tier 3: Archive (Long-term Storage)

Memory Flow Over Time

Semantic Search in Memory

Configuration

Example: Long Conversation

Memory Best Practices

Enable Summarization

Set Appropriate TTL

Use Semantic Search

Archive for Compliance

Memory Costs

Getting Started

Core Concepts

Security & Auth

Documentation Index

​Why Three Tiers?

​The Three Tiers

​Tier 1: Working Memory (Recent Context)

​Tier 2: Summary Memory (Compressed History)

​Tier 3: Archive (Long-term Storage)

​Memory Flow Over Time

​Semantic Search in Memory

​Configuration

​Example: Long Conversation

​Memory Best Practices

Enable Summarization

Set Appropriate TTL

Use Semantic Search

Archive for Compliance

​Memory Costs

Why Three Tiers?

The Three Tiers

Tier 1: Working Memory (Recent Context)

Tier 2: Summary Memory (Compressed History)

Tier 3: Archive (Long-term Storage)

Memory Flow Over Time

Semantic Search in Memory

Configuration

Example: Long Conversation

Memory Best Practices

Memory Costs