Pilot Knowledge Base Rag Setup

通过4个代理部署文档摄入与检索管道,支持向量搜索与知识库构建。

已扫描
适合谁
技术负责人、AI工程团队
不适合谁
无技术背景的普通用户、仅需单文件共享的用户
国内可用性
需网络配置。可能需要网络配置或第三方服务可访问。
安装难度
新手友好(★☆☆)。基于终端操作、依赖、API Key 和本地环境要求的初步判断。

安装与下载

openclaw skills install @teoslayer/pilot-knowledge-base-rag-setup

Skill 说明

命令、参数、文件名以原文为准

知识库(RAG)设置

部署 4 个代理:ingestembedindexquery

角色

角色主机名技能目的
ingest<prefix>-rag-ingestpilot-s3-bridge, pilot-share, pilot-chunk-transfer, pilot-cron拉取并拆分文档
embedder<prefix>-rag-embedderpilot-task-parallel, pilot-share, pilot-metrics, pilot-task-chain生成向量嵌入
indexer<prefix>-rag-indexerpilot-database-bridge, pilot-share, pilot-task-chain, pilot-health将嵌入存储在向量数据库中
query<prefix>-rag-querypilot-api-gateway, pilot-health, pilot-load-balancer, pilot-metrics提供搜索查询服务

设置步骤

步骤 1: 询问用户要设置的角色和前缀。

步骤 2: 安装技能:

# ingest:
clawhub install pilot-s3-bridge pilot-share pilot-chunk-transfer pilot-cron
# embedder:
clawhub install pilot-task-parallel pilot-share pilot-metrics pilot-task-chain
# indexer:
clawhub install pilot-database-bridge pilot-share pilot-task-chain pilot-health
# query:
clawhub install pilot-api-gateway pilot-health pilot-load-balancer pilot-metrics

步骤 3: 设置主机名,并将配置写入 ~/.pilot/setups/knowledge-base-rag.json

步骤 4: 在管道中完成握手:ingest ↔ embedderembedder ↔ indexerindexer ↔ query

各角色的配置模板

ingest

{
  "setup": "knowledge-base-rag", "role": "ingest", "role_name": "文档摄入",
  "hostname": "<prefix>-rag-ingest",
  "skills": {
    "pilot-s3-bridge": "从 S3 存储桶拉取文档。",
    "pilot-share": "将文档文件发送给 embedder。",
    "pilot-chunk-transfer": "将大文档拆分为多个块。",
    "pilot-cron": "安排定期的文档摄入任务。"
  },
  "data_flows": [{ "direction": "send", "peer": "<prefix>-rag-embedder", "port": 1001, "topic": "doc-ingested", "description": "文档块" }],
  "handshakes_needed": ["<prefix>-rag-embedder"]
}

embedder

{
  "setup": "knowledge-base-rag", "role": "embedder", "role_name": "嵌入生成器",
  "hostname": "<prefix>-rag-embedder",
  "skills": {
    "pilot-task-parallel": "并行生成嵌入以提升吞吐量。",
    "pilot-share": "接收来自 ingest 的文档,向 indexer 发送嵌入结果。",
    "pilot-metrics": "跟踪嵌入的吞吐量和延迟。",
    "pilot-task-chain": "串联分块与嵌入处理步骤。"
  },
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-rag-ingest", "port": 1001, "topic": "doc-ingested", "description": "文档块" },
    { "direction": "send", "peer": "<prefix>-rag-indexer", "port": 1001, "topic": "embeddings-ready", "description": "向量嵌入" }
  ],
  "handshakes_needed": ["<prefix>-rag-ingest", "<prefix>-rag-indexer"]
}

indexer

{
  "setup": "knowledge-base-rag", "role": "indexer", "role_name": "向量索引器",
  "hostname": "<prefix>-rag-indexer",
  "skills": {
    "pilot-database-bridge": "将嵌入写入向量数据库。",
    "pilot-share": "接收来自 embedder 的嵌入。",
    "pilot-task-chain": "串联索引操作。",
    "pilot-health": "监控索引健康状态和查询延迟。"
  },
  "data_flows": [
    { "direction": "receive", "peer": "<prefix>-rag-embedder", "port": 1001, "topic": "embeddings-ready", "description": "向量嵌入" },
    { "direction": "receive", "peer": "<prefix>-rag-query", "port": 1001, "topic": "search-query", "description": "搜索查询" },
    { "direction": "send", "peer": "<prefix>-rag-query", "port": 1001, "topic": "search-results", "description": "排序后的结果" }
  ],
  "handshakes_needed": ["<prefix>-rag-embedder", "<prefix>-rag-query"]
}

query

{
  "setup": "knowledge-base-rag", "role": "query", "role_name": "查询服务器",
  "hostname": "<prefix>-rag-query",
  "skills": {
    "pilot-api-gateway": "接收外部客户端的搜索查询。",
    "pilot-health": "监控查询端点的健康状态。",
    "pilot-load-balancer": "在 indexer 副本之间分发查询。",
    "pilot-metrics": "跟踪每秒查询数(QPS)、延迟和结果质量。"
  },
  "data_flows": [
    { "direction": "send", "peer": "<prefix>-rag-indexer", "port": 1001, "topic": "search-query", "description": "搜索查询" },
    { "direction": "receive", "peer": "<prefix>-rag-indexer", "port": 1001, "topic": "search-results", "description": "排序后的结果" }
  ],
  "handshakes_needed": ["<prefix>-rag-indexer"]
}

数据流

  • ingest → embedder:文档块(端口 1001)
  • embedder → indexer:向量嵌入(端口 1001)
  • query ↔ indexer:搜索查询与结果(端口 1001)

工作流示例

# 在 ingest 上执行:
pilotctl --json send-file <prefix>-rag-embedder ./docs/guide.pdf
pilotctl --json publish <prefix>-rag-embedder doc-ingested '{"doc_id":"doc-42","chunks":24}'
# 在 embedder 上执行:
pilotctl --json publish <prefix>-rag-indexer embeddings-ready '{"doc_id":"doc-42","vectors":24,"dims":1536}'
# 在 query 上执行:
pilotctl --json task submit <prefix>-rag-indexer --task '{"query":"如何实现认证?","top_k":5}'

依赖项

需要 pilot-protocol 技能、pilotctl 可执行文件、clawhub 可执行文件,以及一个正在运行的守护进程。

T
@teoslayer

已收录 1 个 Skill

相关推荐