OpenClaw 记忆机制深度解析

OpenClaw 内存系统完整技术文档（基于源码分析）

📚 目录

核心概念
架构设计
源码分析
配置系统
搜索流程
高级特性
调试与测试

核心概念

记忆的本质

OpenClaw 的记忆是纯 Markdown 文件，文件是唯一真相源（source of truth），模型只"记住"写入磁盘的内容。

记忆文件结构

~/.openclaw/workspace/
├── MEMORY.md              # 长期记忆（仅主会话加载）
└── memory/
    └── YYYY-MM-DD.md     # 每日记录（今日+昨日自动加载）

两个核心工具

工具	用途	输入	输出
`memory_search`	语义搜索	query → snippets + 路径 + 行号
`memory_get`	定向读取	path + from + lines → 文本片段

触发时机

会话启动时：自动读取 memory/YYYY-MM-DD.md（今日）和 memory/YYYY-MM-DD.md（昨日）
MEMORY.md：仅在主会话中加载，群聊/渠道不加载（安全边界）
写入：需要手动触发，AI 不会自动写入（除非 compaction flush）

架构设计

组件关系

┌─────────────────────────────────────────────┐
│         Agent Workspace               │
│  ~/.openclaw/workspace/              │
│  ├── MEMORY.md                      │
│  └── memory/                       │
│      └── YYYY-MM-DD.md             │
└──────────────┬──────────────────────────┘
               │
               ↓ 文件监听
┌──────────────▼──────────────────────────┐
│    Memory Search Manager              │
│  (src/memory/index.ts)               │
│                                    │
│  ┌──────────────────────────────┐      │
│  │  Config Resolver        │      │
│  │  (memory-search.ts)       │      │
│  └──────────────────────────────┘      │
│                                    │
│  ┌──────────────────────────────┐      │
│  │  SQLite Indexer         │      │
│  │  (memory-schema.ts)        │      │
│  └──────────────────────────────┘      │
│                                    │
│  ┌──────────────────────────────┐      │
│  │  Embedding Providers    │      │
│  │  • OpenAI               │      │
│  │  • Gemini              │      │
│  │  • Local (GGUF)        │      │
│  │  • Voyage / Mistral     │      │
│  └──────────────────────────────┘      │
└──────────────┬──────────────────────────┘
               │
               ↓ 工具接口
┌──────────────▼──────────────────────────┐
│    Agent Tools (memory-tool.ts)       │
│                                    │
│  • memory_search                    │
│  • memory_get                      │
└────────────────────────────────────────┘

数据库 Schema

SQLite 表结构

files 表：

sql

CREATE TABLE files (
  path TEXT PRIMARY KEY,      -- 文件路径
  source TEXT,                 -- 来源：'memory' | 'sessions'
  hash TEXT,                   -- 文件内容哈希
  mtime INTEGER,               -- 修改时间
  size INTEGER                  -- 文件大小
);

chunks 表：

sql

CREATE TABLE chunks (
  id TEXT PRIMARY KEY,          -- 块唯一 ID
  path TEXT,                   -- 所属文件
  source TEXT,                 -- 来源：'memory' | 'sessions'
  start_line INTEGER,            -- 起始行号
  end_line INTEGER,              -- 结束行号
  hash TEXT,                   -- 内容哈希
  model TEXT,                   -- 嵌入模型
  text TEXT,                   -- 原始文本
  embedding TEXT,               -- 向量数据（JSON）
  updated_at INTEGER             -- 更新时间
);

embedding_cache 表：

sql

CREATE TABLE embedding_cache (
  provider TEXT,                -- 提供者：openai | gemini | local...
  model TEXT,                  -- 模型名称
  provider_key TEXT,           -- 提供者配置键
  hash TEXT,                   -- 文本哈希
  embedding TEXT,              -- 嵌入向量
  dims INTEGER,                 -- 向量维度
  updated_at INTEGER,           -- 缓存时间
  PRIMARY KEY (provider, model, provider_key, hash)
);

FTS5 虚拟表（全文搜索）：

sql

CREATE VIRTUAL TABLE chunks_fts USING fts5 (
  text,                    -- 可搜索文本
  id UNINDEXED,           -- 块 ID（不索引）
  path UNINDEXED,          -- 文件路径（不索引）
  source UNINDEXED,         -- 来源（不索引）
  model UNINDEXED,          -- 模型（不索引）
  start_line UNINDEXED,      -- 起始行（不索引）
  end_line UNINDEXED         -- 结束行（不索引）
);

源码分析

1. 配置解析 (`src/agents/memory-search.ts`)

核心默认值

typescript

const DEFAULT_CHUNK_TOKENS = 400;      // 每个块的 token 数量
const DEFAULT_CHUNK_OVERLAP = 80;       // 块之间的重叠 token 数
const DEFAULT_MAX_RESULTS = 6;          // 返回的最大结果数
const DEFAULT_MIN_SCORE = 0.35;        // 最小相关性分数（0-1）
const DEFAULT_HYBRID_ENABLED = true;   // 混合搜索开关
const DEFAULT_VECTOR_WEIGHT = 0.7;       // 向量搜索权重（语义）
const DEFAULT_TEXT_WEIGHT = 0.3;         // 全文搜索权重（关键词）
const DEFAULT_MMR_ENABLED = false;      // MMR 去重开关
const DEFAULT_TEMPORAL_DECAY_ENABLED = false;  // 时间衰减开关
const DEFAULT_TEMPORAL_DECAY_HALF_LIFE = 30;  // 半衰期（天）
const DEFAULT_CACHE_ENABLED = true;      // 嵌入缓存开关

Provider 自动选择逻辑

typescript

function resolveProvider(): string {
  // 1. 本地模式（指定了 modelPath）
  if (local.modelPath && fileExists(local.modelPath)) {
    return "local";
  }

  // 2. 检查各个提供者的 API 密钥
  if (hasOpenAIKey()) return "openai";
  if (hasGeminiKey()) return "gemini";
  if (hasVoyageKey()) return "voyage";
  if (hasMistralKey()) return "mistral";

  // 3. 无可用配置
  return "none"; // 记忆搜索保持禁用状态
}

归一化配置

typescript

function mergeConfig(
  defaults: MemorySearchConfig | undefined,
  overrides: MemorySearchConfig | undefined,
  agentId: string,
): ResolvedMemorySearchConfig {
  // 1. 确定是否启用
  const enabled = overrides?.enabled ?? defaults?.enabled ?? true;

  // 2. 解析 provider
  const provider = overrides?.provider ?? defaults?.provider ?? "auto";

  // 3. 归一化向量权重（确保和为 1.0）
  const sum = vectorWeight + textWeight;
  const normalizedVectorWeight = sum > 0 ? vectorWeight / sum : 0.7;
  const normalizedTextWeight = sum > 0 ? textWeight / sum : 0.3;

  // 4. 安全边界检查
  const overlap = clampNumber(chunking.overlap, 0, chunking.tokens - 1);
  const minScore = clampNumber(query.minScore, 0, 1);

  // 5. 解析存储路径
  const storePath = resolveStorePath(agentId, overrides?.store?.path);
  // 支持 {agentId} token 替换
  const withToken = raw.includes("{agentId}")
    ? raw.replaceAll("{agentId}", agentId)
    : raw;

  return { enabled, provider, normalizedVectorWeight, ... };
}

2. 工具实现 (`src/agents/tools/memory-tool.ts`)

memory_search 工具

typescript

{
  name: "memory_search",
  description: "Mandatory recall step: semantically search MEMORY.md + memory/*.md",
  parameters: {
    query: string,      // 必需：搜索查询
    maxResults?: number, // 可选：最大结果数
    minScore?: number    // 可选：最小相关性
  },
  execute: async (_toolCallId, params) => {
    // 1. 获取搜索管理器
    const { manager, error } = await getMemorySearchManager({ cfg, agentId });
    if (!manager) {
      return jsonResult(buildMemorySearchUnavailableResult(error));
    }

    // 2. 执行搜索
    const rawResults = await manager.search(query, {
      maxResults,
      minScore,
      sessionKey: options.agentSessionKey,
    });

    // 3. 装饰引注（如果启用）
    const citationsMode = resolveMemoryCitationsMode(cfg);
    const includeCitations = shouldIncludeCitations({
      mode: citationsMode,
      sessionKey: options.agentSessionKey,
    });
    const decorated = decorateCitations(rawResults, includeCitations);

    // 4. QMD 后端字符限制
    const status = manager.status();
    const resolved = resolveMemoryBackendConfig({ cfg, agentId });
    const results = status.backend === "qmd"
      ? clampResultsByInjectedChars(decorated, resolved.qmd?.limits.maxInjectedChars)
      : decorated;

    // 5. 返回完整状态
    return jsonResult({
      results,
      provider: status.provider,
      model: status.model,
      fallback: status.fallback,
      citations: citationsMode,
      mode: searchMode,
    });
  }
}

引注装饰

typescript

function decorateCitations(results: MemorySearchResult[], include: boolean) {
  if (!include) {
    return results.map((entry) => ({ ...entry, citation: undefined }));
  }

  return results.map((entry) => {
    const citation = formatCitation(entry);
    const snippet = `${entry.snippet.trim()}\n\nSource: ${citation}`;
    return { ...entry, citation, snippet };
  });
}

function formatCitation(entry: MemorySearchResult): string {
  const lineRange = entry.startLine === entry.endLine
    ? `#L${entry.startLine}`
    : `#L${entry.startLine}-L${entry.endLine}`;
  return `${entry.path}${lineRange}`;
}

memory_get 工具

typescript

{
  name: "memory_get",
  description: "Safe snippet read from MEMORY.md or memory/*.md",
  parameters: {
    path: string,         // 必需：文件路径（工作区相对）
    from?: number,       // 可选：起始行
    lines?: number        // 可选：读取行数
  },
  execute: async (_toolCallId, params) => {
    const { manager, error } = await getMemorySearchManager({ cfg, agentId });
    if (!manager) {
      return jsonResult({ path: relPath, text: "", disabled: true, error });
    }

    try {
      const result = await manager.readFile({
        relPath,
        from: from ?? undefined,
        lines: lines ?? undefined,
      });
      return jsonResult(result);
    } catch (err) {
      // 优雅降级：文件不存在时返回空文本
      const message = err instanceof Error ? err.message : String(err);
      return jsonResult({ path: relPath, text: "", disabled: true, error: message });
    }
  }
}

3. 数据库 Schema (`src/memory/memory-schema.ts`)

初始化流程

typescript

function ensureMemoryIndexSchema(params: {
  db: DatabaseSync;
  embeddingCacheTable: string;
  ftsTable: string;
  ftsEnabled: boolean;
}): { ftsAvailable: boolean; ftsError?: string } {
  // 1. 创建元数据表
  db.exec(`CREATE TABLE IF NOT EXISTS meta (key TEXT PRIMARY KEY, value TEXT NOT NULL);`);

  // 2. 创建文件表
  db.exec(`CREATE TABLE IF NOT EXISTS files (
    path TEXT PRIMARY KEY,
    source TEXT NOT NULL DEFAULT 'memory',
    hash TEXT NOT NULL,
    mtime INTEGER NOT NULL,
    size INTEGER NOT NULL
  );`);

  // 3. 创建块表
  db.exec(`CREATE TABLE IF NOT EXISTS chunks (
    id TEXT PRIMARY KEY,
    path TEXT NOT NULL,
    source TEXT NOT NULL DEFAULT 'memory',
    start_line INTEGER NOT NULL,
    end_line INTEGER NOT NULL,
    hash TEXT NOT NULL,
    model TEXT NOT NULL,
    text TEXT NOT NULL,
    embedding TEXT NOT NULL,
    updated_at INTEGER NOT NULL
  );`);

  // 4. 创建嵌入缓存表
  db.exec(`CREATE TABLE IF NOT EXISTS ${embeddingCacheTable} (
    provider TEXT NOT NULL,
    model TEXT NOT NULL,
    provider_key TEXT NOT NULL,
    hash TEXT NOT NULL,
    embedding TEXT NOT NULL,
    dims INTEGER,
    updated_at INTEGER NOT NULL,
    PRIMARY KEY (provider, model, provider_key, hash)
  );`);

  // 5. 创建 FTS5 虚拟表（全文搜索）
  if (ftsEnabled) {
    try {
      db.exec(`CREATE VIRTUAL TABLE IF NOT EXISTS ${ftsTable} USING fts5(
        text,
        id UNINDEXED,
        path UNINDEXED,
        source UNINDEXED,
        model UNINDEXED,
        start_line UNINDEXED,
        end_line UNINDEXED
      );`);
      ftsAvailable = true;
    } catch (err) {
      const message = err instanceof Error ? err.message : String(err);
      ftsAvailable = false;
      ftsError = message;  // 记录 FTS 不可用
    }
  }

  return { ftsAvailable, ...(ftsError ? { ftsError } : {}) };
}

动态添加列

typescript

function ensureColumn(
  db: DatabaseSync,
  table: "files" | "chunks",
  column: string,
  definition: string,
): void {
  // 1. 检查列是否存在
  const rows = db.prepare(`PRAGMA table_info(${table})`).all();
  if (rows.some((row) => row.name === column)) {
    return;  // 已存在，直接返回
  }

  // 2. 动态添加列（支持迁移）
  db.exec(`ALTER TABLE ${table} ADD COLUMN ${column} ${definition}`);
}

配置系统

完整配置树

json5

{
  memory: {
    backend: "sqlite",           // "sqlite" | "qmd"
    citations: "auto",            // "auto" | "on" | "off"

    // QMD 后端配置（实验性）
    qmd: {
      command: "qmd",            // 可执行路径
      searchMode: "search",        // "search" | "vsearch" | "query"
      includeDefaultMemory: true,  // 自动索引 MEMORY.md + memory/**/*.md
      update: {
        interval: "5m",           // 更新间隔
        debounceMs: 15000,        // 防抖延迟
        onBoot: true,
        waitForBootSync: false,   // 是否阻塞启动同步
        commandTimeoutMs: 10000,
        updateTimeoutMs: 120000,
        embedTimeoutMs: 600000,
      },
      paths: [
        { name: "docs", path: "~/notes", pattern: "**/*.md" }
      ],
      sessions: {
        enabled: false,
        retentionDays: 90,
        exportDir: "~/.openclaw/sessions"
      },
      limits: {
        maxResults: 6,
        maxSnippetChars: 700,
        maxInjectedChars: 10000,
        timeoutMs: 4000
      },
      scope: {
        default: "deny",
        rules: [
          { action: "allow", match: { chatType: "direct" } },
          { action: "deny", match: { keyPrefix: "discord:channel:" } },
          { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } }
        ]
      }
    }
  },

  agents: {
    defaults: {
      memorySearch: {
        enabled: true,               // 总开关
        sources: ["memory"],         // ["memory"] | ["memory", "sessions"]
        provider: "auto",           // "openai" | "gemini" | "local" | "voyage" | "mistral" | "auto"
        fallback: "none",           // 失败时的回退

        // 远程嵌入配置
        remote: {
          baseUrl: undefined,        // OpenAI 兼容端点
          apiKey: undefined,         // API 密钥
          headers: {},               // 额外请求头
          batch: {
            enabled: false,        // 批量模式
            wait: true,
            concurrency: 2,
            pollIntervalMs: 2000,
            timeoutMinutes: 60
          }
        },

        // 本地嵌入配置
        local: {
          modelPath: undefined,    // GGUF 文件或 hf: URI
          modelCacheDir: undefined  // 缓存目录
        },

        // 存储配置
        store: {
          driver: "sqlite",
          path: "{agentId}.sqlite",  // 支持 {agentId} token
          vector: {
            enabled: true,
            extensionPath: undefined    // sqlite-vec 路径
          }
        },

        // 分块配置
        chunking: {
          tokens: 400,            // 每块 token 数
          overlap: 80              // 块重叠 token 数
        },

        // 同步配置
        sync: {
          onSessionStart: true,    // 会话开始时同步
          onSearch: true,          // 搜索时同步
          watch: true,              // 监听文件变化
          watchDebounceMs: 1500,    // 防抖 1.5 秒
          intervalMinutes: 0,        // 定期间隔（0 = 仅按需）
          sessions: {
            deltaBytes: 100000,     // 触发同步的增量字节
            deltaMessages: 50        // 触发同步的增量消息数
          }
        },

        // 查询配置
        query: {
          maxResults: 6,           // 默认返回数量
          minScore: 0.35,         // 最小相关性（0-1）
          hybrid: {
            enabled: true,
            vectorWeight: 0.7,        // 语义权重
            textWeight: 0.3,          // 关键词权重
            candidateMultiplier: 4,     // 候选放大倍数

            // MMR 去重
            mmr: {
              enabled: false,
              lambda: 0.7              // 0 = 最大多样性，1 = 最大相关性
            },

            // 时间衰减
            temporalDecay: {
              enabled: false,
              halfLifeDays: 30         // 半衰期（天）
            }
          }
        },

        // 缓存配置
        cache: {
          enabled: true,
          maxEntries: 50000
        }
      },

      // 自动 flush 配置
      compaction: {
        reserveTokensFloor: 20000,
        memoryFlush: {
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
        }
      }
    }
  }
}

配置优先级

全局默认值（agents.defaults.memorySearch）
    ↓
Agent 级覆盖（agents.list[].memorySearch）
    ↓
运行时解析（normalizeSources, clampValues）
    ↓
最终配置（ResolvedMemorySearchConfig）

搜索流程

完整流程图

┌─────────────────────────────────────────┐
│  Agent 调用 memory_search          │
│  query: "home network setup"      │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  1. 参数验证                      │
│  • query（必需）                  │
│  • maxResults（可选）             │
│  • minScore（可选）               │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  2. 获取搜索管理器              │
│  getMemorySearchManager()          │
│  • 加载 SQLite 数据库              │
│  • 检查向量表可用性           │
│  • 检查 FTS5 可用性            │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  3. 同步索引（如果需要）         │
│  manager.ensureSynced()             │
│  • 检查文件哈希                  │
│  • 重新嵌入变更的块              │
│  • 更新 FTS5 索引               │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  4a. 向量搜索                    │
│  SELECT * FROM chunks              │
│  WHERE vector_distance(embedding, ?) │
│  ORDER BY distance LIMIT N * K      │
│  (使用 sqlite-vec 加速）         │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  4b. 全文搜索（BM25）            │
│  SELECT * FROM chunks_fts          │
│  WHERE chunks_fts MATCH ?          │
│  ORDER BY bm25(chunks_fts) LIMIT N * K │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  5. 合并结果                     │
│  • Union 按块 ID               │
│  • weightedSum(vector, text)        │
│  • finalScore = ...                │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  6. 后处理（可选）                │
│  ┌──────────────────────────┐      │
│  │ a. 时间衰减            │      │
│  │    decayed = score * e^(-λ*age)│      │
│  └──────────────────────────┘      │
│  ┌──────────────────────────┐      │
│  │ b. MMR 去重            │      │
│  │    λ*relevance - (1-λ)*maxSim │      │
│  └──────────────────────────┘      │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  7. 排序和截断                  │
│  ORDER BY finalScore DESC LIMIT N   │
└────────────┬────────────────────────┘
             │
             ↓
┌────────────▼────────────────────────┐
│  8. 装饰输出                     │
│  • 添加引注（Source: path#line）│
│  • 限制字符数（QMD）             │
│  • 返回给 Agent                  │
└────────────┬────────────────────────┘
             │
             ↓
         返回给 AI

混合搜索实现细节

为什么要混合？

向量搜索擅长：

语义匹配（"相同意思，不同表达"）
- "Mac Studio gateway host" ↔ "the machine running the gateway"
- "debounce file updates" ↔ "avoid indexing on every write"

弱点：

精确信号弱（高熵值）
- IDs (a828e60, b3b9895a…)
- 代码符号 (memorySearch.query.hybrid)
- 错误字符串（"sqlite-vec unavailable"）

全文搜索（BM25）恰恰相反：

精确匹配强，但语义匹配弱

合并算法

typescript

// 1. 从两侧检索候选池
const vectorCandidates = fetchTopK(
  `maxResults * candidateMultiplier`,  // 例如：6 * 4 = 24 个
  ORDER BY vector_distance(embedding, query)
);

const textCandidates = fetchTopK(
  `maxResults * candidateMultiplier`,
  ORDER BY bm25(chunks_fts)
);

// 2. 将 BM25 排名转换为 0-1 分数
const textScore = 1 / (1 + bm25Rank);  // 排名越低，分数越高

// 3. 按 ID 联合候选并计算加权分数
const finalScore = vectorWeight * vectorScore + textWeight * textScore;

// 4. vectorWeight + textWeight 归一化为 1.0
const normalizedVectorWeight = sum > 0 ? vectorWeight / sum : 0.7;
const normalizedTextWeight = sum > 0 ? textWeight / sum : 0.3;

候选放大

typescript

const candidateMultiplier = 4;  // 默认值
const maxResults = 6;            // 默认返回数

// 每侧检索候选数
const candidateCount = maxResults * candidateMultiplier;  // 6 * 4 = 24

// 为什么？
// 合并后可能有很多重叠或低分结果，需要更大的候选池
// 最终只返回 maxResults 个

高级特性

1. MMR 去重（Maximal Marginal Relevance）

问题场景

查询 "home network setup" 时，可能返回：

1. memory/2026-02-10.md  → "Configured Omada router, set VLAN 10"
2. memory/2026-02-08.md  → "Configured Omada router, moved IoT to VLAN 10"
3. memory/2026-02-05.md  → "Set up AdGuard DNS on 192.168.10.2"
4. memory/network.md       → "Router: Omada ER605, AdGuard: 192.168.10.2"

问题：1 和 2 几乎重复，浪费了 2 个结果位。

MMR 算法

iteratively select result that maximizes:
  λ × relevance − (1−λ) × max_similarity_to_selected

参数权衡

lambda	效果	适用场景
1.0	纯相关性（无多样性）	结果必须是最相关的，即使重复
0.0	最大多样性（忽略相关性）	想要广度覆盖，牺牲精确度
0.7（默认）	平衡，略偏相关性	日常使用，平衡多样性和精确度

配置示例

json5

{
  query: {
    hybrid: {
      mmr: {
        enabled: true,
        lambda: 0.7
      }
    }
  }
}

2. 时间衰减（Temporal Decay）

为什么要衰减？

长期运行的 Agent 累积数百个每日笔记。没有衰减，6 个月前的旧笔记可能压倒昨天的更新。

衰减公式

typescript

decayedScore = score × e^(-λ × ageInDays)

// 其中 λ = ln(2) / halfLifeDays

半衰期效果（halfLifeDays = 30）

年龄	衰减乘数	说明
今天	1.00 (100%)	无衰减
7 天前	0.84 (84%)	轻微衰减
30 天前	0.50 (50%)	50% 保留
90 天前	0.125 (12.5%)	显著衰减
180 天前	0.016 (~1.6%)	几乎消失

永不过期文件

MEMORY.md（根记忆文件）
memory/ 中的非日期文件（例如 memory/projects.md, memory/network.md）
这些包含持久的参考信息，应始终正常排名

日期提取规则

typescript

// 1. 优先从文件名提取日期
if (path matches /memory/(\d{4})-(\d{2})-(\d{2})\.md/) {
  const date = extractDate(path);  // YYYY-MM-DD
  ageInDays = (today - date) / (24 * 60 * 60 * 1000);
}

// 2. 回退到文件修改时间
else {
  ageInDays = (now - mtime) / (24 * 60 * 60 * 1000);
}

配置示例

json5

{
  query: {
    hybrid: {
      temporalDecay: {
        enabled: true,
        halfLifeDays: 30
      }
    }
  }
}

3. 嵌入缓存（Embedding Cache）

为什么需要缓存？

typescript

// 没有：
每次更新文件 → 重新嵌入所有块 → 耗时 + 耗钱

// 有缓存：
每次更新文件 → 检查 hash → 只嵌入新的/变更的块

缓存表结构

sql

PRIMARY KEY (provider, model, provider_key, hash)

provider：哪个嵌入提供者
model：使用的具体模型
provider_key：提供者配置的唯一键（例如 API 密钥哈希）
hash：文本内容的 SHA256

查询流程

typescript

// 1. 计算文本哈希
const textHash = sha256(text);

// 2. 尝试从缓存获取
const cached = db.prepare(
  `SELECT embedding FROM embedding_cache
   WHERE provider = ? AND model = ? AND provider_key = ? AND hash = ?`
).get(textHash, model, apiKey, textHash);

if (cached) {
  return cached.embedding;  // 命中缓存！
}

// 3. 缓存未命中，调用 API
const embedding = await provider.embed(text);

// 4. 写入缓存
db.prepare(
  `INSERT OR REPLACE INTO embedding_cache
   (provider, model, provider_key, hash, embedding, dims, updated_at)
   VALUES (?, ?, ?, ?, ?, ?, ?)`
).run(provider, model, apiKey, textHash, JSON.stringify(embedding), embedding.length, Date.now());

4. QMD 后端（实验性）

架构

┌────────────────────────────────────┐
│  OpenClaw Gateway               │
└──────────┬───────────────────────┘
           │ sh/qmd subprocess
           ↓
┌──────────▼──────────────────────┐
│  QMD Sidecar（独立进程）    │
│  • BM25 + Vectors + Rerank   │
│  • 完全本地运行               │
│  • Bun + node-llama-cpp       │
└──────────┬──────────────────────┘
           │ HTTP/JSON IPC
           ↓
┌──────────▼──────────────────────┐
│  QMD SQLite Index             │
│  ~/.openclaw/agents/<id>/qmd/ │
└────────────────────────────────────┘

为什么 sidecar？

进程隔离：嵌入失败不会拖垮主 Gateway
语言自由：QMD 可以用任何技术实现
独立演进：OpenClaw 和 QMD 可以独立更新
热重启：更新 QMD 不会重启 OpenClaw

配置示例

json5

{
  memory: {
    backend: "qmd",
    qmd: {
      command: "qmd",
      searchMode: "search",  // "search" | "vsearch" | "query"
      includeDefaultMemory: true,
      update: { interval: "5m" },
      paths: [
        { name: "notes", path: "~/Documents/notes", pattern: "**/*.md" }
      ]
    }
  }
}

调试与测试

1. 启用详细日志

bash

# 环境变量
export DEBUG="openclaw:memory*"
export DEBUG="openclaw:memory:search"
export DEBUG="openclaw:memory:index"

# 运行 OpenClaw
openclaw gateway

2. 检查 SQLite 数据库

bash

# 1. 定位数据库
MEMORY_DB="$HOME/.openclaw/memory/main.sqlite"

# 2. 查看块统计
sqlite3 "$MEMORY_DB" \
  "SELECT source, COUNT(*), AVG(LENGTH(embedding)) as avg_dim
   FROM chunks GROUP BY source;"

# 3. 查看嵌入提供者分布
sqlite3 "$MEMORY_DB" \
  "SELECT provider, model, COUNT(*)
   FROM chunks GROUP BY provider, model;"

# 4. 查看 FTS5 索引
sqlite3 "$MEMORY_DB" \
  ".schema chunks_fts"

# 5. 检查缓存命中率
sqlite3 "$MEMORY_DB" \
  "SELECT
    (SELECT COUNT(*) FROM chunks) as total,
    (SELECT COUNT(*) FROM embedding_cache) as cached,
    ROUND((cached * 100.0) / total, 2) as cache_hit_rate
   FROM (SELECT 1);"

3. 手动重建索引

bash

# 查看状态
openclaw memory status --deep

# 强制重建
openclaw memory index --verbose

# 清空重建（危险！）
rm "$HOME/.openclaw/memory/main.sqlite"
openclaw memory index

4. 测试搜索性能

bash

# 基准测试
time openclaw memory search "test query"

# 比较不同配置
# 关闭时间衰减
jq '.agents.defaults.memorySearch.query.hybrid.temporalDecay.enabled = false' \
  ~/.openclaw/openclaw.json > tmp.json && mv tmp.json ~/.openclaw/openclaw.json

# 启用 MMR
jq '.agents.defaults.memorySearch.query.hybrid.mmr.enabled = true' \
  ~/.openclaw/openclaw.json > tmp.json && mv tmp.json ~/.openclaw/openclaw.json

# 测试前后差异

5. 模拟嵌入失败

typescript

// 来自测试文件 memory-tool.test.ts
it("returns explicit unavailable metadata for quota failures", async () => {
  setMemorySearchImpl(async () => {
    throw new Error("openai embeddings failed: 429 insufficient_quota");
  });

  const tool = createMemorySearchTool({ config });
  const result = await tool.execute("quota", { query: "hello" });

  expect(result.details).toEqual({
    results: [],
    disabled: true,
    unavailable: true,
    error: "openai embeddings failed: 429 insufficient_quota",
    warning: "Memory search is unavailable because embedding provider quota is exhausted.",
    action: "Top up or switch embedding provider, then retry memory_search.",
  });
});

关键要点：

配额错误（429）vs 其他错误（不同警告）
返回 disabled: true 标记工具不可用
给用户明确的操作建议

6. 测试混合搜索行为

bash

# 创建测试记忆
echo "# Test: duplicate entry 1
Network setup: Omada router on VLAN 10" >> ~/.openclaw/workspace/memory/test-$(date +%Y-%m-%d).md
echo "# Test: duplicate entry 2
Network setup: Omada router on VLAN 10" >> ~/.openclaw/workspace/memory/test-$(date +%Y-%m-%d).md
echo "# Test: diverse entry
Home network: 192.168.10.1, AdGuard on 192.168.10.2" >> ~/.openclaw/workspace/memory/test-$(date +%Y-%m-%d).md

# 重建索引
openclaw memory index

# 测试搜索
openclaw memory search "network setup"

# 预期：
# - 不启用 MMR：返回前 2 个相似结果
# - 启用 MMR：返回 2 个不同结果

7. 监控文件监听

bash

# 观察防抖行为
watch -n 0.1 ~/.openclaw/workspace/memory/

# 在另一个终端写入
echo "Test entry" >> ~/.openclaw/workspace/memory/test.md

# 预期：
# - 文件写入 → 1.5 秒防抖 → 开始索引
# - 在 /tmp/openclaw/*.log 中看到 "index dirty" 标记

最佳实践

1. 记忆写作

写入 MEMORY.md 当：

持久的决策和偏好
项目参考信息
技术文档摘要
需要在所有会话中访问的内容

写入 memory/YYYY-MM-DD.md 当：

当天的笔记和日志
临时上下文
对话记录
事件时间线

避免：

"记住这个" → 不写入，就忘记了
敏感信息 → MEMORY.md 在群聊中会加载

2. 搜索优化

typescript

// ✅ 好的查询（具体）
memory_search("Rod's work schedule this week")

// ❌ 差的查询（太泛）
memory_search("work")

// ✅ 使用关键词（精确匹配）
memory_search("error: sqlite-vec unavailable")

// ❌ 自然语言（可能语义匹配但错过精确符号）
memory_search("the database vector thing isn't working")

3. 性能调优

json5

{
  memorySearch: {
    // 减少 chunk 大小（更精确，但更多向量）
    chunking: { tokens: 200, overlap: 40 },

    // 增加 minScore（减少噪声）
    query: { minScore: 0.5 },

    // 启用缓存（重要！）
    cache: { enabled: true, maxEntries: 100000 },

    // 减少 maxResults（更快）
    query: { maxResults: 4 }
  }
}

4. 安全边界

memory_get 路径限制：

只允许 MEMORY.md 或 memory/ 下的文件
拒绝路径遍历（../, /, C:\）
源码：readStringParam(params, "path", { required: true })

memory_search 会话边界：

主会话（直接）：显示引注
群聊/频道：隐藏引注（shouldIncludeCitations()）

常见问题

Q1: 搜索返回空结果？

检查清单：

配置是否启用？
bash
```
openclaw memory status --deep
```
数据库是否存在？
bash
```
ls -lh ~/.openclaw/memory/
```

记忆文件是否存在？

bash

ls ~/.openclaw/workspace/MEMORY.md
ls ~/.openclaw/workspace/memory/

嵌入提供者是否配置？

bash

jq '.agents.defaults.memorySearch' ~/.openclaw/openclaw.json

是否触发索引？
bash
```
openclaw memory index --verbose
```

Q2: 搜索很慢？

优化建议：

启用 sqlite-vec（默认已启用）
启用嵌入缓存（cache.enabled = true）
减少 maxResults
增加 minScore 过滤低相关结果

Q3: 引注不显示？

原因：

memory.citations = "off"：显式关闭
群聊/频道：auto 模式下隐藏引注

修复：

json5

{
  memory: {
    citations: "on"  // 强制显示
  }
}

Q4: 旧笔记总是排在前面？

启用时间衰减：

json5

{
  agents: {
    defaults: {
      memorySearch: {
        query: {
          hybrid: {
            temporalDecay: {
              enabled: true,
              halfLifeDays: 30
            }
          }
        }
      }
    }
  }
}

Q5: 如何重置索引？

bash

# 方法 1：删除数据库（快速）
rm ~/.openclaw/memory/main.sqlite
openclaw memory index

# 方法 2：清空表（保留配置）
sqlite3 ~/.openclaw/memory/main.sqlite \
  "DELETE FROM chunks; DELETE FROM files; DELETE FROM embedding_cache;"
openclaw memory index

# 方法 3：更改嵌入指纹（触发自动重建）
# 修改 provider 或 model 配置

总结

OpenClaw 的记忆系统是一个精心设计的双层架构：

存储层：纯 Markdown 文件（人类可读）
索引层：SQLite + 向量 + FTS5（机器可读）
工具层：memory_search + memory_get（Agent 可用）
优化层：混合搜索 + MMR + 时间衰减（智能检索）

核心哲学：

文件是真相源（source of truth）
模型只记住写入磁盘的内容
自动化工具减少人工维护

扩展点：

新嵌入提供者（只需实现接口）
新后端（QMD sidecar 模式）
自定义评分算法（混合搜索可扩展）

文档版本： 1.0 最后更新： 2026-03-05 基于源码： OpenClaw 2026.3.2 (85377a2)

OpenClaw 记忆机制深度解析 ​

📚 目录 ​

核心概念 ​

记忆的本质 ​

记忆文件结构 ​

两个核心工具 ​

触发时机 ​

架构设计 ​

组件关系 ​

数据库 Schema ​

SQLite 表结构 ​

源码分析 ​

1. 配置解析 (src/agents/memory-search.ts) ​

核心默认值 ​

Provider 自动选择逻辑 ​

归一化配置 ​

2. 工具实现 (src/agents/tools/memory-tool.ts) ​

memory_search 工具 ​

引注装饰 ​

memory_get 工具 ​

3. 数据库 Schema (src/memory/memory-schema.ts) ​

初始化流程 ​

动态添加列 ​

配置系统 ​

完整配置树 ​

配置优先级 ​

搜索流程 ​

完整流程图 ​

混合搜索实现细节 ​

为什么要混合？ ​

合并算法 ​

候选放大 ​

高级特性 ​

1. MMR 去重（Maximal Marginal Relevance） ​

问题场景 ​

MMR 算法 ​

参数权衡 ​

配置示例 ​

2. 时间衰减（Temporal Decay） ​

为什么要衰减？ ​

衰减公式 ​

半衰期效果（halfLifeDays = 30） ​

永不过期文件 ​

日期提取规则 ​

配置示例 ​

3. 嵌入缓存（Embedding Cache） ​

为什么需要缓存？ ​

缓存表结构 ​

查询流程 ​

4. QMD 后端（实验性） ​

架构 ​

为什么 sidecar？ ​

配置示例 ​

调试与测试 ​

1. 启用详细日志 ​

2. 检查 SQLite 数据库 ​

3. 手动重建索引 ​

4. 测试搜索性能 ​

5. 模拟嵌入失败 ​

6. 测试混合搜索行为 ​

7. 监控文件监听 ​

最佳实践 ​

1. 记忆写作 ​

2. 搜索优化 ​

3. 性能调优 ​

4. 安全边界 ​

常见问题 ​

Q1: 搜索返回空结果？ ​

Q2: 搜索很慢？ ​

Q3: 引注不显示？ ​

Q4: 旧笔记总是排在前面？ ​

Q5: 如何重置索引？ ​

总结 ​

OpenClaw 记忆机制深度解析

📚 目录

核心概念

记忆的本质

记忆文件结构

两个核心工具

触发时机

架构设计

组件关系

数据库 Schema

SQLite 表结构

源码分析

1. 配置解析 (`src/agents/memory-search.ts`)

核心默认值

Provider 自动选择逻辑

归一化配置

2. 工具实现 (`src/agents/tools/memory-tool.ts`)

memory_search 工具

引注装饰

memory_get 工具

3. 数据库 Schema (`src/memory/memory-schema.ts`)

初始化流程

动态添加列

配置系统

完整配置树

配置优先级

搜索流程

完整流程图

混合搜索实现细节

为什么要混合？

合并算法

候选放大

高级特性

1. MMR 去重（Maximal Marginal Relevance）

问题场景

MMR 算法

参数权衡

配置示例

2. 时间衰减（Temporal Decay）

为什么要衰减？

衰减公式

半衰期效果（halfLifeDays = 30）

永不过期文件

日期提取规则

配置示例

3. 嵌入缓存（Embedding Cache）

为什么需要缓存？

缓存表结构

查询流程

4. QMD 后端（实验性）

架构

为什么 sidecar？

配置示例

调试与测试

1. 启用详细日志

2. 检查 SQLite 数据库

3. 手动重建索引

4. 测试搜索性能

5. 模拟嵌入失败

6. 测试混合搜索行为

7. 监控文件监听

最佳实践

1. 记忆写作

2. 搜索优化

3. 性能调优

4. 安全边界

常见问题

Q1: 搜索返回空结果？

Q2: 搜索很慢？

Q3: 引注不显示？

Q4: 旧笔记总是排在前面？

Q5: 如何重置索引？

总结