Compare commits

..

No commits in common. "6b435876273516a36d58421584f55c1ecf183cb1" and "f6342f7e62dbc45e6a1bdd6c7dfe248f8befd2c7" have entirely different histories.

1227 changed files with 314584 additions and 16328 deletions

View File

@ -1,49 +0,0 @@
{
"default": true,
"MD001": false,
"MD003": false,
"MD004": false,
"MD005": false,
"MD007": false,
"MD009": false,
"MD010": false,
"MD012": false,
"MD013": false,
"MD014": false,
"MD018": false,
"MD019": false,
"MD022": false,
"MD023": false,
"MD024": false,
"MD025": false,
"MD026": false,
"MD027": false,
"MD028": false,
"MD029": false,
"MD030": false,
"MD031": false,
"MD032": false,
"MD033": false,
"MD034": false,
"MD036": false,
"MD037": false,
"MD038": false,
"MD039": false,
"MD040": false,
"MD041": false,
"MD042": false,
"MD045": false,
"MD046": false,
"MD047": false,
"MD049": false,
"MD050": false,
"MD051": false,
"MD052": false,
"MD053": false,
"MD055": false,
"MD056": false,
"MD058": false,
"MD059": false,
"MD060": false
}

9
.gitignore vendored
View File

@ -46,10 +46,6 @@ ENV/
*.log
logs/
# Skill Seekers (vendored tool output)
output/
assets/skills/skills-skills/scripts/.venv-skill-seekers/
libs/external/tmux
libs/external/.tmux
@ -72,13 +68,10 @@ libs/external/.tmux
.mypy_cache/
# Backup
backups/gz/
assets/repo/backups/gz/
backups/
*.bak
*.tmp
# Wiki (separate repo)
.github/wiki/
1
codex resume *

9
.gitmodules vendored
View File

@ -1,9 +0,0 @@
[submodule "repo/.tmux"]
path = assets/repo/.tmux
url = https://github.com/gpakosz/.tmux.git
[submodule "repo/tmux"]
path = assets/repo/tmux
url = https://github.com/tmux/tmux.git
[submodule "repo/claude-official-skills"]
path = assets/repo/claude-official-skills
url = https://github.com/anthropics/skills.git

158
AGENTS.md
View File

@ -7,22 +7,21 @@
## 1. Mission & Scope目标与边界
### 允许的操作
- 读取、修改顶层文档:`README.md`、`AGENTS.md`、`CONTRIBUTING.md` 等
- 读取、修改 `assets/documents/`、`assets/prompts/`、`assets/skills/`、`assets/workflow/`、`assets/config/`、`assets/tools/`、`assets/repo/` 下的文档与代码
- 读取、修改 `i18n/`、`libs/` 下的文档与代码
- 执行 `make lint`、备份脚本、prompts-library 转换工具
- 新增/修改提示词、技能、文档
- 提交符合规范的 commit
### 禁止的操作
- 修改 `.github/workflows/` 中的 CI 配置(除非任务明确要求)
- 删除或覆盖 `assets/repo/backups/gz/` 中的存档文件
- 删除或覆盖 `backups/gz/` 中的存档文件
- 修改 `LICENSE`、`CODE_OF_CONDUCT.md`
- 在代码中硬编码密钥、Token 或敏感凭证
- 未经确认的大范围重构
### 敏感区域(禁止自动修改)
- `.github/workflows/*.yml` - CI/CD 配置
- `assets/repo/backups/gz/` - 历史备份存档
- `backups/gz/` - 历史备份存档
- `.env*` 文件(如存在)
---
@ -31,7 +30,7 @@
```bash
# 1. 拉取最新代码
git pull --rebase origin develop
git pull origin main
# 2. 运行 lint 检查
make lint
@ -45,7 +44,7 @@ make lint
# 5. 提交变更
git add -A
git commit -m "feat|fix|docs|chore: scope - summary"
git push origin develop
git push
```
---
@ -63,9 +62,9 @@ git push origin develop
|:---|:---|:---|
| `make help` | 列出所有 Make 目标 | 无 |
| `make lint` | 校验全仓库 Markdown | 需安装 markdownlint-cli |
| `bash assets/repo/backups/一键备份.sh` | 创建完整项目备份 | 无 |
| `python3 assets/repo/backups/快速备份.py` | Python 版备份脚本 | Python 3.8+ |
| `cd assets/repo/prompts-library && python3 main.py` | 提示词格式转换 | pandas, openpyxl, PyYAML |
| `bash backups/一键备份.sh` | 创建完整项目备份 | 无 |
| `python backups/快速备份.py` | Python 版备份脚本 | Python 3.8+ |
| `cd libs/external/prompts-library && python main.py` | 提示词格式转换 | pandas, openpyxl, PyYAML |
### prompts-library 支持的转换模式
1. Excel → Docs将 Excel 工作簿转换为 Markdown 文档目录
@ -80,20 +79,18 @@ git push origin develop
### 架构原则
- 保持根目录扁平,避免巨石文件
- 三层内容架构:`assets/documents/` (知识) → `assets/prompts/` (指令) → `assets/skills/` (能力)
- 多语言资产统一放在 `i18n/<lang>/`遵循三层结构documents / prompts / skills
- 新增语言遵循现有目录层级
### 模块边界
- `assets/documents/` - 中文知识库(方法论/入门/实战/资源)
- `assets/prompts/` - 提示词入口与云端索引
- `assets/skills/` - 可复用技能库(每个子目录一个 Skill
- `assets/workflow/` - 可复用工作流模板(自动开发闭环等)
- `assets/config/` - 工具与开发配置(例如 Codex CLI
- `assets/tools/` - 预留:自定义脚本/小工具(保持可替换、可审计)
- `assets/repo/` - 外部工具与依赖(含 Git submodule
- `i18n/zh/` - 中文主语料(默认)
- `i18n/en/` - 英文版本
- `libs/common/` - 通用模块
- `libs/external/` - 外部工具与依赖
### 依赖添加规则
- 新增工具或库时记录安装方式、最小版本与来源
- 外部依赖来源记录在 `assets/repo/` 目录下
- 外部依赖来源记录在 `libs/external/` 目录下
- 引入第三方脚本需标明许可证与来源
### 禁止行为
@ -130,55 +127,14 @@ git push origin develop
.
├── README.md # 项目主文档
├── AGENTS.md # AI Agent 行为准则(本文件)
├── CLAUDE.md # Claude 模型上下文(合并在本文件末尾)
├── GEMINI.md # Gemini 模型上下文
├── Makefile # 自动化脚本
├── LICENSE # MIT 许可证
├── CODE_OF_CONDUCT.md # 行为准则
├── CONTRIBUTING.md # 贡献指南
├── .gitignore # Git 忽略规则
├── assets/ # 外部资源(指向在线表格)
│ ├── README.md # 远程表格索引(唯一真相源)
│ ├── AGENTS.md # assets/ 目录规则
│ ├── config/ # 工具与开发配置
│ │ └── .codex/ # Codex CLI 配置(项目级)
│ │ ├── config.toml # Codex CLI 配置文件
│ │ └── AGENTS.md # Codex/Agent 指南(本目录)
│ ├── documents/ # 文档库
│ │ ├── 05-哲学与方法论/ # 最高思想纲领与方法论
│ │ ├── 00-基础指南/ # 核心原则与底层逻辑
│ │ ├── 01-入门指南/ # 从零开始教程
│ │ ├── 02-方法论/ # 具体工具与技巧
│ │ └── 03-实战/ # 项目实战案例
│ ├── prompts/ # 提示词库(指向云端表格)
│ │ ├── README.md # 在线表格链接
│ │ └── AGENTS.md # prompts/ 目录规则
│ ├── skills/ # 技能库(扁平化,详见 assets/skills/README.md
│ │ ├── README.md # skills 总览与索引
│ │ ├── AGENTS.md # skills/ 目录规则
│ │ ├── skills-skills/ # 元技能核心
│ │ ├── sop-generator/ # SOP 生成
│ │ ├── canvas-dev/ # Canvas白板驱动开发
│ │ └── ... # 更多技能
│ ├── tools/ # 工具目录(预留)
│ │ └── .gitkeep # 保持空目录被 Git 追踪
│ ├── workflow/ # 工作流模板
│ │ ├── auto-dev-loop/ # 自动开发循环
│ │ └── canvas-dev/ # Canvas白板驱动开发
│ └── repo/ # 外部工具与依赖镜像(含 Git submodule
│ ├── README.md # 外部工具索引
│ ├── AGENTS.md # assets/repo/ 目录规则
│ ├── prompts-library/ # Excel ↔ Markdown 互转工具
│ ├── chat-vault/ # AI 聊天记录保存工具
│ ├── Skill_Seekers-development/ # Skills 制作器
│ ├── html-tools-main/ # HTML 工具集
│ ├── my-nvim/ # Neovim 配置
│ ├── MCPlayerTransfer/ # MC 玩家迁移工具
│ ├── XHS-image-to-PDF-conversion/ # 小红书图片转 PDF
│ ├── backups/ # 历史备份脚本快照
│ ├── .tmux/ # oh-my-tmux (submodule)
│ ├── tmux/ # tmux 源码 (submodule)
│ └── claude-official-skills/ # Claude 官方 skills (submodule)
├── .github/ # GitHub 配置
│ ├── workflows/ # CI/CD 工作流
│ │ ├── ci.yml # Markdown lint + link checker
@ -189,15 +145,62 @@ git push origin develop
│ ├── SECURITY.md # 安全政策
│ ├── FUNDING.yml # 赞助配置
│ └── wiki/ # GitHub Wiki 内容
├── i18n/ # 多语言资产 (27 种语言)
│ ├── README.md # 多语言索引
│ ├── zh/ # 中文主语料
│ │ ├── documents/ # 文档库
│ │ │ ├── -01-哲学与方法论/ # 最高思想纲领与方法论
│ │ │ ├── 00-基础指南/ # 核心原则与底层逻辑
│ │ │ ├── 01-入门指南/ # 从零开始教程
│ │ │ ├── 02-方法论/ # 具体工具与技巧
│ │ │ ├── 03-实战/ # 项目实战案例
│ │ │ └── 04-资源/ # 外部资源聚合
│ │ ├── prompts/ # 提示词库
│ │ │ ├── 00-元提示词/ # 生成提示词的提示词
│ │ │ ├── 01-系统提示词/ # AI 系统级提示词
│ │ │ ├── 02-编程提示词/ # 编程相关提示词
│ │ │ └── 03-用户提示词/ # 用户自定义提示词
│ │ └── skills/ # 技能库
│ │ ├── 00-元技能/ # 生成技能的元技能
│ │ │ ├── claude-skills/ # 元技能核心
│ │ │ └── sop-generator/ # SOP 生成与规范化技能
│ │ ├── 01-AI工具/ # AI CLI 和工具
│ │ ├── 02-数据库/ # 数据库技能
│ │ ├── 03-加密货币/ # 加密货币/量化交易
│ │ └── 04-开发工具/ # 通用开发工具
│ ├── en/ # 英文版本(结构同 zh/
│ └── ... # 其他语言骨架
├── libs/ # 核心库代码
│ ├── common/ # 通用模块
│ │ ├── models/ # 模型定义
│ │ └── utils/ # 工具函数
│ ├── database/ # 数据库模块(预留)
│ └── external/ # 外部工具
│ ├── prompts-library/ # Excel ↔ Markdown 互转工具
│ ├── chat-vault/ # AI 聊天记录保存工具
│ ├── Skill_Seekers-development/ # Skills 制作器
│ ├── html-tools-main/ # HTML 工具集Markdown 编辑器、任务卡片生成等)
│ ├── l10n-tool/ # 多语言翻译脚本
│ ├── my-nvim/ # Neovim 配置
│ ├── MCPlayerTransfer/ # MC 玩家迁移工具
│ └── XHS-image-to-PDF-conversion/ # 小红书图片转 PDF
└── backups/ # 备份脚本与存档
├── 一键备份.sh # Shell 备份脚本
├── 快速备份.py # Python 备份脚本
├── README.md # 备份说明
└── gz/ # 压缩存档目录
```
### 关键入口文件
- `README.md` - 项目主文档,面向人类开发者
- `AGENTS.md` - AI Agent 操作手册(本文件)
- `assets/repo/prompts-library/main.py` - 提示词转换工具入口
- `assets/repo/backups/一键备份.sh` - 备份脚本入口
- `assets/skills/tmux-autopilot/` - tmux 自动化操控技能(基于 oh-my-tmux含 capture-pane/send-keys/蜂群巡检脚本)
- `assets/skills/sop-generator/` - SOP 生成与规范化技能(输入资料/需求 -> 标准 SOP
- `libs/external/prompts-library/main.py` - 提示词转换工具入口
- `backups/一键备份.sh` - 备份脚本入口
- `i18n/zh/skills/04-开发工具/tmux-autopilot/` - tmux 自动化操控技能(基于 oh-my-tmux含 capture-pane/send-keys/蜂群巡检脚本)
- `i18n/zh/skills/00-元技能/sop-generator/` - SOP 生成与规范化技能(输入资料/需求 -> 标准 SOP
---
@ -207,9 +210,8 @@ git push origin develop
|:---|:---|:---|
| `make lint` 失败 | 未安装 markdownlint-cli | `npm install -g markdownlint-cli` |
| prompts-library 报错 | 缺少 Python 依赖 | `pip install pandas openpyxl PyYAML rich InquirerPy` |
| CI markdown-lint 失败 | `.github/lint_config.json` 缺失 | TODO新增 `.github/lint_config.json` 或调整 `.github/workflows/ci.yml` 的 lint 命令(需任务明确授权) |
| CI link-checker 失败 | 文档中存在失效链接 | 检查并修复 Markdown 中的链接 |
| 备份脚本权限不足 | Shell 脚本无执行权限 | `chmod +x assets/repo/backups/一键备份.sh` |
| 备份脚本权限不足 | Shell 脚本无执行权限 | `chmod +x backups/一键备份.sh` |
---
@ -251,12 +253,13 @@ feat|fix|docs|chore|refactor|test: scope - summary
**任何功能/命令/配置/目录/工作流变化必须同步更新:**
- `README.md` - 面向人类开发者
- `AGENTS.md` - 面向 AI Agent本文件
- `GEMINI.md` - Gemini 模型上下文
**不确定的内容用 TODO 标注,不允许猜测。**
---
# Claude 上下文(合并在本文件)
# CLAUDE.md
本节为 Claude 系列模型提供项目上下文。
@ -268,25 +271,24 @@ feat|fix|docs|chore|refactor|test: scope - summary
```bash
# 提示词库转换
cd assets/repo/prompts-library && python3 main.py
cd libs/external/prompts-library && python3 main.py
# Lint 所有 Markdown 文件
make lint
# 创建完整项目备份
bash assets/repo/backups/一键备份.sh
bash backups/一键备份.sh
```
## Architecture & Structure
### Core Directories
- **`assets/prompts/`**: 提示词库入口(指向云端表格)
- **`assets/skills/`**: 扁平化技能库(详见 assets/skills/README.md
- **`assets/documents/`**: 知识库05-哲学与方法论、00-基础指南、01-入门指南、02-方法论、03-实战)
- **`assets/`**: 外部资源(在线表格)入口与使用说明
- **`assets/repo/prompts-library/`**: Excel ↔ Markdown 转换工具
- **`assets/repo/chat-vault/`**: AI 聊天记录保存工具
- **`assets/repo/backups/`**: 备份脚本与存档
- **`i18n/zh/prompts/`**: 核心提示词库00-元提示词、01-系统提示词、02-编程提示词、03-用户提示词)
- **`i18n/zh/skills/`**: 模块化技能库00-元技能、01-AI工具、02-数据库、03-加密货币、04-开发工具)
- **`i18n/zh/documents/`**: 知识库(-01-哲学与方法论、00-基础指南、01-入门指南、02-方法论、03-实战、04-资源)
- **`libs/external/prompts-library/`**: Excel ↔ Markdown 转换工具
- **`libs/external/chat-vault/`**: AI 聊天记录保存工具
- **`backups/`**: 备份脚本与存档
### Key Technical Details
1. **Prompt Organization**: 提示词使用 `(row,col)_` 前缀进行分类
@ -303,7 +305,7 @@ bash assets/repo/backups/一键备份.sh
---
# Gemini 上下文(合并在本文件)
# GEMINI.md - 项目上下文文档
## 项目概述

View File

@ -16,7 +16,7 @@ help:
lint:
@echo "Linting markdown files..."
@npm install -g markdownlint-cli
@markdownlint --config .github/lint_config.json '**/*.md'
@markdownlint **/*.md
build:
@echo "Building the project..."

485
README.md
View File

@ -10,6 +10,8 @@
<div align="center">
[中文](./README.md) | [English](./i18n/en/README.md)
# Vibe Coding 指南
**一个通过与 AI 结对编程,将想法变为现实的终极工作站**
@ -24,41 +26,41 @@
<a href="LICENSE"><img src="https://img.shields.io/github/license/tukuaiai/vibe-coding-cn?label=%E8%AE%B8%E5%8F%AF%E8%AF%81&style=for-the-badge" alt="许可证"></a>
<a href="https://github.com/tukuaiai/vibe-coding-cn"><img src="https://img.shields.io/github/languages/top/tukuaiai/vibe-coding-cn?label=%E4%B8%BB%E8%A6%81%E8%AF%AD%E8%A8%80&style=for-the-badge" alt="主要语言"></a>
<a href="https://github.com/tukuaiai/vibe-coding-cn"><img src="https://img.shields.io/github/languages/code-size/tukuaiai/vibe-coding-cn?label=%E4%BB%A3%E7%A0%81%E9%87%8F&style=for-the-badge" alt="代码量"></a>
<a href="https://x.com/123olp"><img src="https://img.shields.io/badge/X-@开发者的X-black?style=for-the-badge&logo=x" alt="X"></a>
<a href="https://x.com/123olp"><img src="https://img.shields.io/badge/X-@123olp-black?style=for-the-badge&logo=x" alt="X"></a>
<a href="https://t.me/glue_coding"><img src="https://img.shields.io/badge/聊天-Telegram-blue?style=for-the-badge&logo=telegram" alt="交流群"></a>
</p>
<!-- 资源直达 - 按重要性分组 -->
<!-- 🔴 核心理念 (红色系) -->
<p>
<a href="./assets/documents/05-哲学与方法论/README.md"><img src="https://img.shields.io/badge/🔮_哲学方法论-底层协议-purple?style=for-the-badge" alt="哲学与方法论"></a>
<a href="./assets/documents/01-入门指南/00-Vibe%20Coding%20哲学原理.md"><img src="https://img.shields.io/badge/🧠_核心哲学-必读-crimson?style=for-the-badge" alt="核心哲学"></a>
<a href="./assets/documents/00-基础指南/胶水编程.md"><img src="https://img.shields.io/badge/🧬_胶水编程-银弹-red?style=for-the-badge" alt="胶水编程"></a>
<a href="./assets/documents/02-方法论/图形化AI协作-Canvas白板驱动开发.md"><img src="https://img.shields.io/badge/🎨_Canvas白板-图形驱动-orange?style=for-the-badge" alt="Canvas白板驱动开发"></a>
<a href="./assets/documents/01-入门指南/README.md"><img src="https://img.shields.io/badge/🚀_从零开始-新手入门-red?style=for-the-badge" alt="从零开始"></a>
<a href="./assets/documents/00-基础指南/血的教训.md"><img src="https://img.shields.io/badge/🩸_血的教训-必看-red?style=for-the-badge" alt="血的教训"></a>
<a href="./assets/documents/00-基础指南/语言层要素.md"><img src="https://img.shields.io/badge/📊_语言层要素-12层框架-gold?style=for-the-badge" alt="语言层要素"></a>
<a href="./assets/documents/00-基础指南/常见坑汇总.md"><img src="https://img.shields.io/badge/🕳_常见坑-避坑指南-yellow?style=for-the-badge" alt="常见坑汇总"></a>
<a href="./assets/documents/00-基础指南/强前置条件约束.md"><img src="https://img.shields.io/badge/🚫_硬约束-铁律-darkred?style=for-the-badge" alt="强前置条件约束"></a>
<a href="./assets/README.md"><img src="https://img.shields.io/badge/📡_信息源-聚合-teal?style=for-the-badge" alt="信息源聚合"></a>
<a href="./assets/documents/00-基础指南/A%20Formalization%20of%20Recursive%20Self-Optimizing%20Generative%20Systems.md"><img src="https://img.shields.io/badge/📐_元方法论-递归优化-darkorange?style=for-the-badge" alt="元方法论"></a>
<a href="./assets/documents/00-基础指南/编程之道.md"><img src="https://img.shields.io/badge/🧭_编程之道-道法术-orange?style=for-the-badge" alt="编程之道"></a>
<a href="./assets/documents/03-实战/README.md"><img src="https://img.shields.io/badge/🎬_实战案例-项目实操-orange?style=for-the-badge" alt="实战案例"></a>
<a href="./assets/README.md"><img src="https://img.shields.io/badge/🛠_工具集-速查-teal?style=for-the-badge" alt="工具集"></a>
<a href="./assets/prompts/"><img src="https://img.shields.io/badge/💬_提示词-精选-purple?style=for-the-badge" alt="提示词精选"></a>
<a href="./assets/skills/"><img src="https://img.shields.io/badge/⚡_Skills-技能大全-forestgreen?style=for-the-badge" alt="skills技能大全"></a>
<a href="./i18n/zh/documents/-01-哲学与方法论/README.md"><img src="https://img.shields.io/badge/🔮_哲学方法论-底层协议-purple?style=for-the-badge" alt="哲学与方法论"></a>
<a href="./i18n/zh/documents/01-入门指南/00-Vibe%20Coding%20哲学原理.md"><img src="https://img.shields.io/badge/🧠_核心哲学-必读-crimson?style=for-the-badge" alt="核心哲学"></a>
<a href="./i18n/zh/documents/00-基础指南/胶水编程.md"><img src="https://img.shields.io/badge/🧬_胶水编程-银弹-red?style=for-the-badge" alt="胶水编程"></a>
<a href="./i18n/zh/documents/02-方法论/图形化AI协作-Canvas白板驱动开发.md"><img src="https://img.shields.io/badge/🎨_Canvas白板-图形驱动-orange?style=for-the-badge" alt="Canvas白板驱动开发"></a>
<a href="./i18n/zh/documents/01-入门指南/README.md"><img src="https://img.shields.io/badge/🚀_从零开始-新手入门-red?style=for-the-badge" alt="从零开始"></a>
<a href="./i18n/zh/documents/00-基础指南/血的教训.md"><img src="https://img.shields.io/badge/🩸_血的教训-必看-red?style=for-the-badge" alt="血的教训"></a>
<a href="./i18n/zh/documents/00-基础指南/语言层要素.md"><img src="https://img.shields.io/badge/📊_语言层要素-12层框架-gold?style=for-the-badge" alt="语言层要素"></a>
<a href="./i18n/zh/documents/00-基础指南/常见坑汇总.md"><img src="https://img.shields.io/badge/🕳_常见坑-避坑指南-yellow?style=for-the-badge" alt="常见坑汇总"></a>
<a href="./i18n/zh/documents/00-基础指南/强前置条件约束.md"><img src="https://img.shields.io/badge/🚫_硬约束-铁律-darkred?style=for-the-badge" alt="强前置条件约束"></a>
<a href="./i18n/zh/documents/04-资源/外部资源聚合.md"><img src="https://img.shields.io/badge/📡_信息源-聚合-teal?style=for-the-badge" alt="信息源聚合"></a>
<a href="./i18n/zh/documents/00-基础指南/A%20Formalization%20of%20Recursive%20Self-Optimizing%20Generative%20Systems.md"><img src="https://img.shields.io/badge/📐_元方法论-递归优化-darkorange?style=for-the-badge" alt="元方法论"></a>
<a href="./i18n/zh/documents/00-基础指南/编程之道.md"><img src="https://img.shields.io/badge/🧭_编程之道-道法术-orange?style=for-the-badge" alt="编程之道"></a>
<a href="./i18n/zh/documents/03-实战/README.md"><img src="https://img.shields.io/badge/🎬_实战案例-项目实操-orange?style=for-the-badge" alt="实战案例"></a>
<a href="./i18n/zh/documents/04-资源/工具集.md"><img src="https://img.shields.io/badge/🛠_工具集-速查-teal?style=for-the-badge" alt="工具集"></a>
<a href="./i18n/zh/prompts/"><img src="https://img.shields.io/badge/💬_提示词-精选-purple?style=for-the-badge" alt="提示词精选"></a>
<a href="./i18n/zh/skills/"><img src="https://img.shields.io/badge/⚡_Skills-技能大全-forestgreen?style=for-the-badge" alt="skills技能大全"></a>
<a href="https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203"><img src="https://img.shields.io/badge/📋_提示词-在线表格-blue?style=for-the-badge" alt="提示词在线表格"></a>
<a href="https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools"><img src="https://img.shields.io/badge/🔧_系统提示词-仓库-slateblue?style=for-the-badge" alt="系统提示词仓库"></a>
<a href="./assets/repo/chat-vault/README.md"><img src="https://img.shields.io/badge/🔐_Chat_Vault-AI会话库-gold?style=for-the-badge" alt="Chat Vault"></a>
<a href="./libs/external/chat-vault/README_CN.md"><img src="https://img.shields.io/badge/🔐_Chat_Vault-AI会话库-gold?style=for-the-badge" alt="Chat Vault"></a>
</p>
[📋 工具与资源](#tools)
[🚀 从零开始](#getting-started)
[🎯 原仓库翻译](#translation)
[⚙️ 完整设置流程](#setup)
[📞 联系方式](#contact)
[✨ 支持项目](#support)
[🤝 参与贡献](#contributing)
[📋 工具资源](#-器-工具与资源)
[🚀 从零开始](#-从零开始)
[🎯 原仓库翻译](#-原仓库翻译)
[⚙️ 完整设置流程](#-完整设置流程)
[📞 联系方式](#-联系方式)
[✨ 支持项目](#-支持项目)
[🤝 参与贡献](#-参与贡献)
本仓库的 AI 解读链接:[zread.ai/tukuaiai/vibe-coding-cn](https://zread.ai/tukuaiai/vibe-coding-cn/1-overview)
@ -66,20 +68,14 @@
## 🎲 前言
**这是一个不断生长和自我否定的项目,当下的一切经验和能力都可能因 AI 能力的变化而失去意义,所以请时刻保持以 AI 为主的思维,重视这次宇宙级的变革,所有的经验都可能失效,辩证的看🙏🙏🙏****Vibe Coding** 是一个与 AI 结对编程的终极工作流程,旨在帮助开发者丝滑地将想法变为现实。本指南详细介绍了从项目构思、技术选型、实施规划到具体开发、调试和扩展的全过程,强调以**规划驱动**和**模块化****索引构建**为核心(受限于模型上下文窗口而生成的解决策略),避免让 AI 失控导致项目混乱Vibe Coding氛围编程是一种以自然语言驱动、让LLM生成大部分代码的开发方式主张“先沉浸式做出能跑的东西”以极低门槛快速产出原型但也伴随可控性与可靠性风险由由计算机科学家 [Andrej Karpathy](https://x.com/karpathy) 首次提出。
> **核心理念**: *规划就是一切。* 谨慎让 AI 全局自主规划,否则你的代码库会变成一团无法管理的乱麻。
**注意**:以下经验分享并非普遍适用,请在具体实践中结合场景,辩证采纳(点击标题可以展开收起内容)
**这是一个不断生长和自我否定的项目,当下的一切经验和能力都可能因 AI 能力的进化而失去意义,所以请时刻保持以 AI 为主的思维,不要固步自封,所有的经验都可能失效,辩证的看🙏🙏🙏**
---
<a id="getting-started"></a>
<details>
<summary><strong>1 分钟快速开始</strong></summary>
<summary><strong>⚡ 5 分钟快速开始</strong></summary>
## ⚡ 1 分钟快速开始
## ⚡ 5 分钟快速开始
> 已有网络和开发环境?直接开始 Vibe Coding
@ -103,62 +99,30 @@
**第 2 步**:跟着 AI 的指导,把想法变成现实 🚀
**就这么简单!** 更多内容(从零开始)请继续阅读 👇
**就这么简单!** 更多进阶内容请继续阅读 👇
</details>
---
## 🚀 从零开始
完全新手?按顺序完成以下步骤:
0. [00-Vibe Coding 哲学原理](./assets/documents/01-入门指南/00-Vibe%20Coding%20哲学原理.md) - 理解核心理念
1. [01-网络环境配置](./assets/documents/01-入门指南/01-网络环境配置.md) - 配置网络访问
2. [02-开发环境搭建](./assets/documents/01-入门指南/02-开发环境搭建.md) - 复制提示词给 AI让 AI 指导你搭建环境
3. [03-IDE配置](./assets/documents/01-入门指南/03-IDE配置.md) - 配置 VS Code 编辑器
4. [04-OpenCode-CLI配置](./assets/documents/01-入门指南/04-OpenCode-CLI配置.md) - 免费 AI CLI 工具,支持 GLM-4.7/MiniMax M2.1 等模型
0. [00-Vibe Coding 哲学原理](./i18n/zh/documents/01-入门指南/00-Vibe%20Coding%20哲学原理.md) - 理解核心理念
1. [01-网络环境配置](./i18n/zh/documents/01-入门指南/01-网络环境配置.md) - 配置网络访问
2. [02-开发环境搭建](./i18n/zh/documents/01-入门指南/02-开发环境搭建.md) - 复制提示词给 AI让 AI 指导你搭建环境
3. [03-IDE配置](./i18n/zh/documents/01-入门指南/03-IDE配置.md) - 配置 VS Code 编辑器
4. [04-OpenCode-CLI配置](./i18n/zh/documents/01-入门指南/04-OpenCode-CLI配置.md) - 免费 AI CLI 工具,支持 GLM-4.7/MiniMax M2.1 等模型
</details>
---
<details>
<summary><strong>🧪 实验性方法</strong></summary>
## 🧪 实验性方法
> 下面是一些“可能随时推翻重写”的实验性方法与范式:先看一眼,觉得对你有用再深入。
**建议阅读顺序(从抽象到落地)**
1. 🔑 元方法论:用“生成器/优化器”的递归闭环让系统自我进化
2. 🧬 胶水编程:复用成熟轮子,把注意力放在“连接方式”
3. 🎨 Canvas白板驱动开发让白板成为单一真相源降低协作与上下文成本
4. 🐝 AI蜂群协作让多个 AI 在 tmux 下互相感知、协作、分工
5. 🔮 哲学方法论工具箱:把抽象方法论落到可验证、可迭代的工程动作
<details>
<summary><strong>🔑 元方法论</strong></summary>
> 一句话:用“生成器/优化器”的递归闭环,构建一个能持续自我优化的 AI 系统。
>
> 延伸阅读:[A Formalization of Recursive Self-Optimizing Generative Systems](./assets/documents/00-基础指南/A%20Formalization%20of%20Recursive%20Self-Optimizing%20Generative%20Systems.md)
### 核心角色
- **α-提示词(生成器)**:一个“母体”提示词,其唯一职责是生成其他提示词或技能。
- **Ω-提示词(优化器)**:另一个“母体”提示词,其唯一职责是优化其他提示词或技能。
### 递归生命周期(最小闭环)
1. **创生Bootstrap**:使用 AI 生成 `α-提示词``Ω-提示词` 的初始版本v1
2. **自省与进化Self-Correction & Evolution**:用 `Ω-提示词v1` 优化 `α-提示词v1`,得到更强的 `α-提示词v2`
3. **创造Generation**:使用进化后的 `α-提示词v2` 生成目标提示词与技能。
4. **循环与飞跃Recursive Loop**:将新产物(甚至包括新版本的 `Ω-提示词`)回灌系统,再次用于优化 `α-提示词`,启动持续进化。
### 终极目标
- 通过持续的递归优化循环,让系统在每次迭代中实现自我超越,逼近预设的预期状态。
</details>
<details>
<details open>
<summary><strong>🧬 胶水编程 (Glue Coding)</strong></summary>
> 一句话:能抄不写,能连不造,能复用不原创。
> **软件工程的圣杯与银弹**
胶水编程是 Vibe Coding 的终极进化形态,目标是把注意力从“造轮子”迁移到“连接方式”,从而缓解三大致命缺陷:
胶水编程是 Vibe Coding 的终极进化形态,可能完美解决三大致命缺陷:
| 问题 | 解法 |
|:---|:---|
@ -166,14 +130,16 @@
| 🧩 复杂性爆炸 | ✅ 每个模块都是久经考验的轮子 |
| 🎓 门槛过高 | ✅ 你只需要描述"连接方式" |
👉 [深入了解胶水编程](./assets/documents/00-基础指南/胶水编程.md)
**核心理念**:能抄不写,能连不造,能复用不原创。
👉 [深入了解胶水编程](./i18n/zh/documents/00-基础指南/胶水编程.md)
</details>
<details>
<details open>
<summary><strong>🎨 Canvas白板驱动开发</strong></summary>
> 一句话:让白板成为单一真相源,用“图形”降低协作与上下文成本。
> **图形化AI协作的新范式**
传统开发:代码 → 口头沟通 → 脑补架构 → 代码失控
@ -187,14 +153,14 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
**核心理念**:图形是第一公民,代码是白板的序列化形式。
👉 [深入了解Canvas白板驱动开发](./assets/documents/02-方法论/图形化AI协作-Canvas白板驱动开发.md)
👉 [深入了解Canvas白板驱动开发](./i18n/zh/documents/02-方法论/图形化AI协作-Canvas白板驱动开发.md)
</details>
<details>
<details open>
<summary><strong>🐝 AI蜂群协作</strong></summary>
> 一句话:把多个 AI 变成“可互相感知与协作的集群”,人从瓶颈变为调度者。
> **基于 tmux 的多 AI Agent 协作系统**
传统模式:人 ←→ AI₁, 人 ←→ AI₂, 人 ←→ AI₃ (人是瓶颈)
@ -208,14 +174,14 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
**核心突破**AI 不再是孤立的,而是可以互相感知、通讯、控制的集群。
👉 [深入了解AI蜂群协作](./assets/documents/02-方法论/AI蜂群协作-tmux多Agent协作系统.md)
👉 [深入了解AI蜂群协作](./i18n/zh/documents/02-方法论/AI蜂群协作-tmux多Agent协作系统.md)
</details>
<details>
<details open>
<summary><strong>🔮 哲学方法论工具箱</strong></summary>
> 一句话:把抽象方法论落到可验证、可迭代、可收敛的工程产出。
> **把 Vibe 系统化为可验证、可迭代、可收敛的工程产出**
23 种哲学方法论 + Python 工具 + 可复制提示词,覆盖:
@ -229,46 +195,99 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
**核心理念**:哲学不是空谈,是可落地的工程方法。
👉 [深入了解哲学方法论工具箱](./assets/documents/05-哲学与方法论/README.md)
👉 [深入了解哲学方法论工具箱](./i18n/zh/documents/-01-哲学与方法论/README.md)
</details>
---
## 🖼️ 概览
**Vibe Coding** 是一个与 AI 结对编程的终极工作流程,旨在帮助开发者丝滑地将想法变为现实。本指南详细介绍了从项目构思、技术选型、实施规划到具体开发、调试和扩展的全过程,强调以**规划驱动**和**模块化**为核心,避免让 AI 失控导致项目混乱。
> **核心理念**: *规划就是一切。* 谨慎让 AI 自主规划,否则你的代码库会变成一团无法管理的乱麻。
**注意**:以下经验分享并非普遍适用,请在具体实践中结合场景,辩证采纳。
<details open>
<summary><strong>🔑 元方法论 (Meta-Methodology)</strong></summary>
该思想的核心是构建一个能够**自我优化**的 AI 系统。其递归本质可分解为以下步骤:
> 延伸阅读:[A Formalization of Recursive Self-Optimizing Generative Systems](./i18n/zh/documents/00-基础指南/A%20Formalization%20of%20Recursive%20Self-Optimizing%20Generative%20Systems.md)
#### 1. 定义核心角色:
* **α-提示词 (生成器)**: 一个“母体”提示词,其唯一职责是**生成**其他提示词或技能。
* **Ω-提示词 (优化器)**: 另一个“母体”提示词,其唯一职责是**优化**其他提示词或技能。
#### 2. 描述递归的生命周期:
1. **创生 (Bootstrap)**:
* 使用 AI 生成 `α-提示词``Ω-提示词` 的初始版本 (v1)。
2. **自省与进化 (Self-Correction & Evolution)**:
* 使用 `Ω-提示词 (v1)` **优化** `α-提示词 (v1)`,从而得到一个更强大的 `α-提示词 (v2)`
3. **创造 (Generation)**:
* 使用**进化后的** `α-提示词 (v2)` 生成所有需要的目标提示词和技能。
4. **循环与飞跃 (Recursive Loop)**:
* 将新生成的、更强大的产物(甚至包括新版本的 `Ω-提示词`)反馈给系统,再次用于优化 `α-提示词`,从而启动持续进化。
#### 3. 终极目标:
通过此持续的**递归优化循环**,系统在每次迭代中实现**自我超越**,无限逼近预设的**预期状态**。
</details>
<details>
<summary><strong>🧭 经验</strong></summary>
<details open>
<summary><strong>🧭 方法论精要 (道·法·术)</strong></summary>
## 🧭 经验
## 🧭
* **状态,变换;数据,函数;输入,处理,输出;抽象/收敛,展开;可解释性;层级;过程;全称/特称,肯定/否定**
* **明确任务中的:目的,对象,约束**
* **人下 AI 上**
* **凡是 AI 能做的,就不要人工做**
* **一切问题问 AI**
* **目的主导:开发过程中的一切动作围绕"目的"展开**
* **上下文是 vibe coding 的第一性要素,垃圾进,垃圾出**
* **系统性思考,从 实体,链接,功能/目的 开始**
* **数据与函数是编程的一切**
* **先结构,后代码**
* **使用帕累托法则关注重要的那20%**
* **逆向思考,先明确你的需求,从满足需求为起点构建代码**
* **重复,多尝试几次**
* **模仿优先,不重复造轮子,先问 AI 有没有合适的仓库下载下来改glue coding 基于 vibe coding全新的方法**
* **系统性思考,实体,链接,功能/目的,三个维度**
* **数据与函数即是编程的一切**
* **输入,处理,输出刻画整个过程**
* **多问 AI 是什么?,为什么?,怎么做?(黄金圈法则)**
* **先结构,后代码,一定要规划好框架,不然后面技术债还不完**
* **奥卡姆剃刀定理,如无必要,勿增代码**
* **帕累托法则关注重要的那20%**
* **逆向思考,先明确你的需求,从需求逆向构建代码**
* **重复,多试几次,实在不行重新开个窗口,**
* **专注,极致的专注可以击穿代码,一次只做一件事(神人除外)**
## 🧩 法
* **一句话目标 + 非目标**
* **正交性(这个分场景)**
* **能抄不写,不重复造轮子,先问 AI 有没有合适的仓库下载下来改glue coding全新范式**
* **一定要看官方文档,先把官方文档爬下来喂给 AI让 AI 找工具下载到本地)**
* **按职责拆模块**
* **接口先行,实现后补**
* **一次只改一个模块**
* **文档即上下文,不是事后补**
* **明确写清:能改什么,不能改什么**
* **Debug 只给:预期 vs 实际 + 最小复现**
* **测试可交给 AI断言人审**
## 🛠️ 术
* 明确写清:**能改什么,不能改什么**
* Debug 只给:**预期 vs 实际 + 最小复现**
* 测试可交给 AI**断言人审**
* 代码一多就**切会话**
* **AI 犯的错误使用提示词整理为经验持久化存储遇到问题始终无法解决就让AI检索这个收集的问题然后寻找解决方案**
</details>
<a id="tools"></a>
<details open>
<summary><strong>📋 器 (工具与资源)</strong></summary>
<details>
<summary><strong>📋 工具与资源</strong></summary>
## 📋 工具与资源
## 📋 器
### 集成开发环境 (IDE) & 终端
@ -311,12 +330,12 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
* [**第三方系统提示词学习库**](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools): 用于学习和参考其他 AI 工具的系统提示词。
* [**Skills 制作器**](https://github.com/yusufkaraaslan/Skill_Seekers): 可根据需求生成定制化 Skills 的工具。
* [**元提示词**](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203): 用于生成提示词的高级提示词。
* [**通用项目架构模板**](./assets/documents/00-基础指南/通用项目架构模板.md): 可用于快速搭建标准化的项目目录结构。
* [**元技能Skills 的 Skills**](./assets/skills/skills-skills/SKILL.md): 用于生成 Skills 的元技能。
* [**SOP 生成 Skill**](./assets/skills/sop-generator/SKILL.md): 将资料/需求整理为可执行 SOP 的技能。
* [**tmux快捷键大全**](./assets/documents/02-方法论/tmux快捷键大全.md): tmux 的快捷键参考文档。
* [**LazyVim快捷键大全**](./assets/documents/02-方法论/LazyVim快捷键大全.md): LazyVim 的快捷键参考文档。
* [**手机远程 Vibe Coding**](./assets/documents/02-方法论/关于手机ssh任意位置链接本地计算机基于frp实现的方法.md): 基于 frp 实现手机 SSH 远程控制本地电脑进行 Vibe Coding。
* [**通用项目架构模板**](./i18n/zh/documents/00-基础指南/通用项目架构模板.md): 可用于快速搭建标准化的项目目录结构。
* [**元技能Skills 的 Skills**](./i18n/zh/skills/00-元技能/claude-skills/SKILL.md): 用于生成 Skills 的元技能。
* [**SOP 生成 Skill**](./i18n/zh/skills/00-元技能/sop-generator/SKILL.md): 将资料/需求整理为可执行 SOP 的技能。
* [**tmux快捷键大全**](./i18n/zh/documents/02-方法论/tmux快捷键大全.md): tmux 的快捷键参考文档。
* [**LazyVim快捷键大全**](./i18n/zh/documents/02-方法论/LazyVim快捷键大全.md): LazyVim 的快捷键参考文档。
* [**手机远程 Vibe Coding**](./i18n/zh/documents/02-方法论/关于手机ssh任意位置链接本地计算机基于frp实现的方法.md): 基于 frp 实现手机 SSH 远程控制本地电脑进行 Vibe Coding。
### 外部教程与资源
@ -330,38 +349,38 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
### 项目内部文档
* [**胶水编程 (Glue Coding)**](./assets/documents/00-基础指南/): 软件工程的圣杯与银弹Vibe Coding 的终极进化形态。
* [**Chat Vault**](./assets/repo/chat-vault/): AI 聊天记录保存工具,支持 Codex/Kiro/Gemini/Claude CLI。
* [**prompts-library 工具说明**](./assets/repo/prompts-library/): 支持 Excel 与 Markdown 格式互转,包含数百个精选提示词。
* [**编程提示词集合**](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203): 适用于 Vibe Coding 流程的专用提示词(云端表格)。
* [**系统提示词构建原则**](./assets/documents/00-基础指南/系统提示词构建原则.md): 构建高效 AI 系统提示词的综合指南。
* [**开发经验总结**](./assets/documents/00-基础指南/开发经验.md): 变量命名、文件结构、编码规范、架构原则等。
* [**通用项目架构模板**](./assets/documents/00-基础指南/通用项目架构模板.md): 多种项目类型的标准目录结构。
* [**Augment MCP 配置文档**](./assets/documents/02-方法论/auggie-mcp配置文档.md): Augment 上下文引擎配置说明。
* [**系统提示词集合**](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203): AI 开发的系统提示词,含多版本开发规范(云端表格)。
* [**外部资源(在线表格)**](./assets/README.md): 外部资源的唯一真相源(按类型分表),本地 Markdown 保留为历史参考。
* [**胶水编程 (Glue Coding)**](./i18n/zh/documents/00-基础指南/): 软件工程的圣杯与银弹Vibe Coding 的终极进化形态。
* [**Chat Vault**](./libs/external/chat-vault/): AI 聊天记录保存工具,支持 Codex/Kiro/Gemini/Claude CLI。
* [**prompts-library 工具说明**](./libs/external/prompts-library/): 支持 Excel 与 Markdown 格式互转,包含数百个精选提示词。
* [**编程提示词集合**](./i18n/zh/prompts/02-编程提示词/): 适用于 Vibe Coding 流程的专用提示词。
* [**系统提示词构建原则**](./i18n/zh/documents/00-基础指南/系统提示词构建原则.md): 构建高效 AI 系统提示词的综合指南。
* [**开发经验总结**](./i18n/zh/documents/00-基础指南/开发经验.md): 变量命名、文件结构、编码规范、架构原则等。
* [**通用项目架构模板**](./i18n/zh/documents/00-基础指南/通用项目架构模板.md): 多种项目类型的标准目录结构。
* [**Augment MCP 配置文档**](./i18n/zh/documents/02-方法论/auggie-mcp配置文档.md): Augment 上下文引擎配置说明。
* [**系统提示词集合**](./i18n/zh/prompts/01-系统提示词/): AI 开发的系统提示词,含多版本开发规范。
* [**外部资源聚合**](./i18n/zh/documents/04-资源/外部资源聚合.md): GitHub 精选仓库、AI 工具平台、提示词资源、优质博主汇总。
---
</details>
<details open>
<summary><strong>编码模型性能分级参考</strong></summary>
## 编码模型性能分级参考
建议只选择第一梯队模型处理复杂任务,以确保最佳效果与效率。
* **第一梯队**: `codex-5.1-max-xhigh`, `claude-opus-4.5-xhigh`, `gpt-5.2-xhigh`
---
</details>
<details>
<summary><strong>🏁 编码模型性能分级参考</strong></summary>
<summary><strong>项目目录结构概览</strong></summary>
## 🏁 编码模型性能分级参考
建议只选择苹果模型处理复杂任务,以确保最佳效果与效率。
* **苹果**: [gpt-5.2-xhigh](https://chatgpt.com/codex)
---
</details>
<details>
<summary><strong>🗂️ 项目目录结构概览</strong></summary>
## 🗂️ 项目目录结构概览
### 项目目录结构概览
本项目 `vibe-coding-cn` 的核心结构主要围绕知识管理、AI 提示词的组织与自动化展开。以下是经过整理和简化的目录树及各部分说明:
@ -369,54 +388,13 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
.
├── README.md # 项目主文档
├── AGENTS.md # AI Agent 行为准则
├── GEMINI.md # Gemini 模型上下文
├── Makefile # 自动化脚本
├── LICENSE # MIT 许可证
├── CODE_OF_CONDUCT.md # 行为准则
├── CONTRIBUTING.md # 贡献指南
├── .gitignore # Git 忽略规则
├── assets/ # 外部资源(指向在线表格)
│ ├── README.md # 远程表格索引(唯一真相源)
│ ├── AGENTS.md # assets/ 目录规则
│ ├── config/ # 工具与开发配置
│ │ └── .codex/ # Codex CLI 配置(项目级)
│ │ ├── config.toml # Codex CLI 配置文件
│ │ └── AGENTS.md # Codex/Agent 指南(本目录)
│ ├── documents/ # 文档库
│ │ ├── 05-哲学与方法论/ # 最高思想纲领与方法论
│ │ ├── 00-基础指南/ # 核心原则与底层逻辑
│ │ ├── 01-入门指南/ # 从零开始教程
│ │ ├── 02-方法论/ # 具体工具与技巧
│ │ └── 03-实战/ # 项目实战案例
│ ├── prompts/ # 提示词库(指向云端表格)
│ │ ├── README.md # 在线表格链接
│ │ └── AGENTS.md # prompts/ 目录规则
│ ├── skills/ # 技能库(扁平化)
│ │ ├── README.md # skills 总览与索引
│ │ ├── AGENTS.md # skills/ 目录规则
│ │ ├── skills-skills/ # 元技能核心
│ │ ├── sop-generator/ # SOP 生成
│ │ ├── canvas-dev/ # Canvas白板驱动开发
│ │ └── ... # 更多技能
│ ├── tools/ # 工具目录(预留)
│ │ └── .gitkeep # 保持空目录被 Git 追踪
│ ├── workflow/ # 工作流模板
│ │ ├── auto-dev-loop/ # 自动开发循环
│ │ └── canvas-dev/ # Canvas白板驱动开发
│ └── repo/ # 外部工具与依赖镜像(含 Git submodule
│ ├── README.md # 外部工具索引
│ ├── prompts-library/ # Excel ↔ Markdown 互转工具
│ ├── chat-vault/ # AI 聊天记录保存工具
│ ├── Skill_Seekers-development/ # Skills 制作器
│ ├── html-tools-main/ # HTML 工具集
│ ├── my-nvim/ # Neovim 配置
│ ├── MCPlayerTransfer/ # MC 玩家迁移工具
│ ├── XHS-image-to-PDF-conversion/ # 小红书图片转 PDF
│ ├── backups/ # 历史备份脚本快照
│ ├── .tmux/ # oh-my-tmux (submodule)
│ ├── tmux/ # tmux 源码 (submodule)
│ └── claude-official-skills/ # Claude 官方 skills (submodule)
├── .github/ # GitHub 配置
│ ├── workflows/ # CI/CD 工作流
│ │ ├── ci.yml # Markdown lint + link checker
@ -427,19 +405,62 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
│ ├── SECURITY.md # 安全政策
│ ├── FUNDING.yml # 赞助配置
│ └── wiki/ # GitHub Wiki 内容
├── i18n/ # 多语言资产 (27 种语言)
│ ├── README.md # 多语言索引
│ ├── zh/ # 中文主语料
│ │ ├── documents/ # 文档库
│ │ │ ├── -01-哲学与方法论/ # 最高思想纲领与方法论
│ │ │ ├── 00-基础指南/ # 核心原则与底层逻辑
│ │ │ ├── 01-入门指南/ # 从零开始教程
│ │ │ ├── 02-方法论/ # 具体工具与技巧
│ │ │ ├── 03-实战/ # 项目实战案例
│ │ │ └── 04-资源/ # 外部资源聚合
│ │ ├── prompts/ # 提示词库
│ │ │ ├── 00-元提示词/ # 生成提示词的提示词
│ │ │ ├── 01-系统提示词/ # AI 系统级提示词
│ │ │ ├── 02-编程提示词/ # 编程相关提示词
│ │ │ └── 03-用户提示词/ # 用户自定义提示词
│ │ └── skills/ # 技能库
│ │ ├── 00-元技能/ # 生成技能的元技能
│ │ ├── 01-AI工具/ # AI CLI 和工具
│ │ ├── 02-数据库/ # 数据库技能
│ │ ├── 03-加密货币/ # 加密货币/量化交易
│ │ └── 04-开发工具/ # 通用开发工具
│ ├── en/ # 英文版本(结构同 zh/
│ └── ... # 其他语言骨架
├── libs/ # 核心库代码
│ ├── common/ # 通用模块
│ │ ├── models/ # 模型定义
│ │ └── utils/ # 工具函数
│ ├── database/ # 数据库模块(预留)
│ └── external/ # 外部工具
│ ├── prompts-library/ # Excel ↔ Markdown 互转工具
│ ├── chat-vault/ # AI 聊天记录保存工具
│ ├── Skill_Seekers-development/ # Skills 制作器
│ ├── l10n-tool/ # 多语言翻译脚本
│ ├── my-nvim/ # Neovim 配置
│ ├── MCPlayerTransfer/ # MC 玩家迁移工具
│ └── XHS-image-to-PDF-conversion/ # 小红书图片转 PDF
└── backups/ # 备份脚本与存档
├── 一键备份.sh # Shell 备份脚本
├── 快速备份.py # Python 备份脚本
├── README.md # 备份说明
└── gz/ # 压缩存档目录
```
</details>
---
<details>
<summary><strong>📺 演示与产出</strong></summary>
</details>
## 📺 演示与产出
一句话Vibe Coding = **规划驱动 + 上下文固定 + AI 结对执行**,让「从想法到可维护代码」变成一条可审计的流水线,而不是一团无法迭代的巨石文件。
**你能得到**
- 成体系的提示词工具链:[云端表格](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203) 提供系统提示词约束 AI 行为边界,编程提示词提供需求澄清、计划、执行的全链路脚本。
- 成体系的提示词工具链:`i18n/zh/prompts/01-系统提示词/` 约束 AI 行为边界,`i18n/zh/prompts/02-编程提示词/` 提供需求澄清、计划、执行的全链路脚本。
- 闭环交付路径:需求 → 上下文文档 → 实施计划 → 分步实现 → 自测 → 进度记录,全程可复盘、可移交。
<details>
@ -449,15 +470,14 @@ Canvas方式**代码 ⇄ 白板 ⇄ AI ⇄ 人类**,白板成为单一真
核心资产映射:
```
assets/prompts/
README.md # 云端表格入口(元/系统/编程/用户提示词)
assets/skills/
README.md # skills 总览与索引
assets/documents/
00-基础指南/代码组织.md, 00-基础指南/通用项目架构模板.md, 00-基础指南/开发经验.md, 00-基础指南/系统提示词构建原则.md 等知识库
assets/
README.md # 外部资源(在线表格)唯一真相源入口
assets/repo/backups/
i18n/zh/prompts/
00-元提示词/ # 用于生成提示词的高级提示词
01-系统提示词/ # 约束 AI 行为边界的系统级提示词
02-编程提示词/ # 需求澄清、计划、执行链的核心提示词
03-用户提示词/ # 可复用的用户侧提示词
i18n/zh/documents/
04-资源/代码组织.md, 04-资源/通用项目架构模板.md, 00-基础指南/开发经验.md, 00-基础指南/系统提示词构建原则.md 等知识库
backups/
一键备份.sh, 快速备份.py # 本地/远端快照脚本
```
@ -478,8 +498,8 @@ graph TB
subgraph ingest_layer[数据接入与采集层]
excel_raw[prompt_excel/*.xlsx]
md_raw[prompt_docs/外部MD输入]
excel_to_docs[assets/repo/prompts-library/scripts/excel_to_docs.py]
docs_to_excel[assets/repo/prompts-library/scripts/docs_to_excel.py]
excel_to_docs[prompts-library/scripts/excel_to_docs.py]
docs_to_excel[prompts-library/scripts/docs_to_excel.py]
ingest_bus[标准化数据帧]
ext_sheet --> excel_raw
ext_md --> md_raw
@ -499,11 +519,11 @@ graph TB
end
subgraph consume_layer[执行与消费层]
artifacts_md --> catalog_coding[prompts(在线)/编程提示词]
artifacts_md --> catalog_system[prompts(在线)/系统提示词]
artifacts_md --> catalog_meta[prompts(在线)/元提示词]
artifacts_md --> catalog_user[prompts(在线)/用户提示词]
artifacts_md --> docs_repo[assets/documents/*]
artifacts_md --> catalog_coding[i18n/zh/prompts/02-编程提示词]
artifacts_md --> catalog_system[i18n/zh/prompts/01-系统提示词]
artifacts_md --> catalog_meta[i18n/zh/prompts/00-元提示词]
artifacts_md --> catalog_user[i18n/zh/prompts/03-用户提示词]
artifacts_md --> docs_repo[i18n/zh/documents/*]
artifacts_md --> new_consumer[预留:其他下游渠道]
catalog_coding --> ai_flow[AI 结对编程流程]
ai_flow --> deliverables[项目上下文 / 计划 / 代码产出]
@ -517,9 +537,9 @@ graph TB
subgraph infra_layer[基础设施与横切能力层]
git[Git 版本控制] --> orchestrator
backups[assets/repo/backups/一键备份.sh · assets/repo/backups/快速备份.py] --> artifacts_md
backups[backups/一键备份.sh · backups/快速备份.py] --> artifacts_md
deps[requirements.txt · scripts/requirements.txt] --> orchestrator
config[assets/repo/prompts-library/scripts/config.yaml] --> orchestrator
config[prompts-library/scripts/config.yaml] --> orchestrator
monitor[预留:日志与监控] --> orchestrator
end
```
@ -529,9 +549,7 @@ graph TB
</details>
<details>
<summary><strong>📈 性能基准 (可选)</strong></summary>
## 📈 性能基准 (可选)
<summary>📈 性能基准 (可选)</summary>
本仓库定位为「流程与提示词」而非性能型代码库,建议跟踪下列可观测指标(当前主要依赖人工记录,可在 `progress.md` 中打分/留痕):
@ -566,13 +584,6 @@ gantt
---
</details>
<a id="translation"></a>
<details>
<summary><strong>🎯 原仓库翻译</strong></summary>
## 🎯 原仓库翻译
> 以下内容翻译自原仓库 [EnzeD/vibe-coding](https://github.com/EnzeD/vibe-coding)
@ -583,15 +594,14 @@ gantt
本指南同时适用于 CLI 终端版本和 VSCode 扩展版本Codex 和 Claude Code 都有扩展,且界面更新)。
*(注:本指南早期版本使用的是 **Grok 3**,后来切换到 **Gemini 3.1 Pro**,现在我们使用的是 **Claude 4.6**(或 **gpt-5.3-codex (xhigh)**)*
*(注:本指南早期版本使用的是 **Grok 3**,后来切换到 **Gemini 2.5 Pro**,现在我们使用的是 **Claude 4.6**(或 **gpt-5.3-codex (xhigh)**)*
*(注2如果你想使用 Cursor请查看本指南的 [1.1 版本](https://github.com/EnzeD/vibe-coding/tree/1.1.1),但我们认为它目前不如 Codex CLI 或 Claude Code 强大)*
---
<a id="setup"></a>
## ⚙️ 完整设置流程
<details>
<summary><strong>⚙️ 完整设置流程</strong></summary>
<details>
<summary><strong>1. 游戏设计文档Game Design Document</strong></summary>
@ -601,7 +611,7 @@ gantt
</details>
<details>
<summary><strong>2. 技术栈与 Agent 规则(<code>AGENTS.md</code> / 自定义 rules</strong></summary>
<summary><strong>2. 技术栈与 <code>CLAUDE.md</code> / <code>Agents.md</code></strong></summary>
- 让 **gpt-5.3-codex****Claude Opus 4.6** 为你的游戏推荐最合适的技术栈例如多人3D游戏用 ThreeJS + WebSocket保存为 `tech-stack.md`
- 要求它提出 **最简单但最健壮** 的技术栈。
@ -644,7 +654,10 @@ gantt
- `architecture.md`(新建一个空文件,用于记录每个文件的作用)
</details>
## 🎮 Vibe Coding 开发基础游戏
</details>
<details>
<summary><strong>🎮 Vibe Coding 开发基础游戏</strong></summary>
现在进入最爽的阶段!
@ -675,14 +688,20 @@ gantt
- 重复此流程,直到整个 `implementation-plan.md` 全部完成。
</details>
## ✨ 添加细节功能
</details>
<details>
<summary><strong>✨ 添加细节功能</strong></summary>
恭喜!你已经做出了基础游戏!可能还很粗糙、缺少功能,但现在可以尽情实验和打磨了。
- 想要雾效、后期处理、特效、音效?更好的飞机/汽车/城堡?绝美天空?
- 每增加一个主要功能,就新建一个 `feature-implementation.md`,写短步骤+测试。
- 继续增量式实现和测试。
## 🐞 修复 Bug 与卡壳情况
</details>
<details>
<summary><strong>🐞 修复 Bug 与卡壳情况</strong></summary>
<details>
<summary><strong>常规修复</strong></summary>
@ -703,7 +722,10 @@ gantt
- 用 [RepoPrompt](https://repoprompt.com/) 或 [uithub](https://uithub.com/) 把整个代码库合成一个文件,然后丢给 **gpt-5.3-codex 或 Claude** 求救。
</details>
## 💡 技巧与窍门
</details>
<details>
<summary><strong>💡 技巧与窍门</strong></summary>
<details>
<summary><strong>Claude Code & Codex 使用技巧</strong></summary>
@ -729,7 +751,10 @@ gantt
- 在 Claude Code 中触发深度思考的关键词强度:`think` < `think hard` < `think harder` < `ultrathink`
</details>
## ❓ 常见问题解答 (FAQ)
</details>
<details>
<summary><strong>❓ 常见问题解答 (FAQ)</strong></summary>
- **Q: 我在做应用不是游戏,这个流程一样吗?**
- **A:** 基本完全一样!把 GDD 换成 PRD产品需求文档即可。你也可以先用 v0、Lovable、Bolt.new 快速原型,再把代码搬到 GitHub然后克隆到本地用本指南继续开发。
@ -747,8 +772,6 @@ gantt
---
<a id="contact"></a>
## 📞 联系方式
- **GitHub**: [tukuaiai](https://github.com/tukuaiai)
@ -756,12 +779,10 @@ gantt
- **Telegram**: [@desci0](https://t.me/desci0)
- **Telegram 交流群**: [glue_coding](https://t.me/glue_coding)
- **Telegram 频道**: [tradecat_ai_channel](https://t.me/tradecat_ai_channel)
- **邮箱**: tukuai.ai@gmail.com
- **邮箱**: tukuai.ai@gmail.com (回复可能不及时)
---
<a id="support"></a>
## ✨ 支持项目
救救孩子,感谢了,好人一生平安🙏🙏🙏
@ -798,8 +819,6 @@ gantt
---
<a id="contributing"></a>
## 🤝 参与贡献
我们热烈欢迎各种形式的贡献。如果您对本项目有任何想法或建议,请随时开启一个 [Issue](https://github.com/tukuaiai/vibe-coding-cn/issues) 或提交一个 [Pull Request](https://github.com/tukuaiai/vibe-coding-cn/pulls)。

View File

@ -1,25 +0,0 @@
# assets/ 目录 Agent 指南
本目录用于统一收纳仓库的关键资产与索引入口,包含:
- `assets/README.md`:外部资源在线表格入口(唯一真相源)
- `assets/documents/`:知识库(方法论/入门/实战)
- `assets/prompts/`:提示词库入口(指向云端表格)
- `assets/skills/`:技能库(可复用 Skills
- `assets/workflow/`:工作流模板
- `assets/repo/`:外部工具与依赖镜像(含 Git submodule
- `assets/config/`:工具与开发配置(含 Codex CLI 配置)
- `assets/tools/`:工具目录(预留)
其中“外部资源”类入口文档的典型形态包括:
- 在线表格(资源库、索引表、清单)
- 远程文档(规范、课程、外部知识库)
- 其它需要在仓库内保留“可追溯入口”的外部资源
## 约束
- 不在此目录存放敏感信息Token、私钥、个人隐私
- `assets/repo/` 下的第三方镜像/子模块:除非任务明确要求,否则不要做“顺手改动/格式化/批量替换”。
- 尽量只存“入口文档与说明”,不要复制粘贴大段第三方内容。
- 外部资产必须写清:用途、维护者、更新方式、与仓库内文档的关系(谁是唯一真相源)。

View File

@ -1,22 +0,0 @@
# 📎 Assets
本目录用于统一收纳仓库的关键资产与索引入口:
- `assets/documents/`:知识库(方法论/入门/实战)
- `assets/prompts/`:提示词库入口(指向云端表格)
- `assets/skills/`:技能库(可复用 Skills
- `assets/workflow/`:工作流模板
- `assets/repo/`:外部工具与依赖镜像(含 Git submodule
- `assets/config/`:工具与开发配置(含 Codex CLI 配置)
- `assets/tools/`:工具目录(预留)
- `assets/README.md`:外部资源在线表格入口(唯一真相源)
## 外部资源在线表格(唯一真相源)
- 外部资源(按类型分表):`Google Sheets`
- [外部资源在线表格Google Sheets](https://docs.google.com/spreadsheets/d/1DY0JfSph_OqaSkVPlrnQrg7OKyPUuhDHsCh-431ot-I/edit?usp=sharing)
## 与仓库文档的关系
- 外部资源的新增/删除/去重/更新,以在线表格为准。
- 旧的 `documents/04-资源/` 已移除,仓库内所有入口统一指向 `assets/README.md`

View File

@ -1,49 +0,0 @@
# `assets/config/.codex/` 用法说明
本目录用于在仓库内版本化管理 Codex CLI 的“全局配置基线”,便于多人同步、审阅与回滚。
你只需要把本目录里的两个文件复制到 **Codex Home**(默认 `~/.codex/`)即可生效:
- `assets/config/.codex/config.toml``~/.codex/config.toml`
- `assets/config/.codex/AGENTS.md``~/.codex/AGENTS.md`
## 1. 一键安装(推荐)
在仓库根目录执行:
```bash
mkdir -p ~/.codex
cp -f assets/config/.codex/config.toml ~/.codex/config.toml
cp -f assets/config/.codex/AGENTS.md ~/.codex/AGENTS.md
```
## 2. 路径示例
### Linux / WSL实际生效位置
- `\\wsl.localhost\\Ubuntu\\home\\<你的用户名>\\.codex\\config.toml`
- `\\wsl.localhost\\Ubuntu\\home\\<你的用户名>\\.codex\\AGENTS.md`
(在 WSL 内对应:`~/.codex/config.toml` 与 `~/.codex/AGENTS.md`
### Windows原生
Codex Home 默认是 `~/.codex/`;在 Windows 上 `~` 通常展开为用户目录:
- `C:\\Users\\<你的用户名>\\.codex\\config.toml`
- `C:\\Users\\<你的用户名>\\.codex\\AGENTS.md`
如果你自己的 Codex Home 被改到了其它位置(例如 `C:\\Users\\<你的用户名>\\.config\\...`),请把两份文件复制到你实际的 Codex Home。
## 3. 配置优先级(重要)
- **全局配置**`~/.codex/config.toml`
- **项目覆盖**:在项目根目录创建 `.codex/config.toml`(仅对当前项目生效)
如果你想把某些配置“只对本仓库生效”,建议使用项目覆盖(`.codex/config.toml`),全局配置只保留你长期通用的习惯与安全策略。
## 4. 参考(官方文档)
- Configuration / Config file说明 `~/.codex/config.toml` 与项目级 `.codex/config.toml` 的优先级
- Custom instructions / Global instructions说明 `~/.codex/AGENTS.md` 的全局指令加载方式

View File

@ -1,126 +0,0 @@
# ==================== 基础配置 ====================
# 模型:
# - 这里填写 Codex CLI 支持的模型名(字符串)。
# - 建议写成你常用的默认模型,临时切换用命令行 `-m` 覆盖更合适。
# - 经验上:`*-codex` 更偏“写代码/改代码”,非 `*-codex` 更偏通用对话(以你实际使用体验为准)。
model = "gpt-5.2"
# 推理强度(思考深度):
# - low → 更快,适合“明确指令 + 小改动”
# - medium → 均衡,适合多数日常任务
# - high → 更深,适合复杂重构/疑难排障
# - xhigh → 最深,适合架构级设计/大范围推理(可能更慢)
# 注意:
# - 不同模型对选项支持范围可能不同;遇到报错优先降一档再试。
model_reasoning_effort = "xhigh"
# 运行策略
# sandbox_mode
# - 用来约束/放开文件系统、命令执行等能力(不同版本 Codex/运行器实现可能略有差异)。
# - `danger-full-access` 表示最大权限:可读写任意路径、可执行任意命令。
# - 仅在你明确知道自己在做什么、并且仓库/机器可信时使用。
sandbox_mode = "danger-full-access"
# approval_policy
# - 控制是否需要用户确认(例如写文件、跑命令等高影响操作)。
# - `never` 表示不再弹确认,自动执行。
# - 风险提示:如果你经常在不熟的目录/不可信脚本环境里使用,建议改成更保守的策略。
approval_policy = "never"
# web_search
# - 控制联网搜索能力策略(以你安装的 Codex CLI 版本为准)。
# - `live` 通常表示允许实时联网搜索(适用于需要最新信息的任务)。
web_search = "live"
# 交互风格
# personality
# - 影响输出风格(例如更务实/更解释型等)。
personality = "pragmatic"
# 指令来源(可选,与 AGENTS.md 二选一)
# experimental_instructions_file
# - 指定一份“系统指令/长期提示词”文件路径。
# - 如果项目内已经用 `AGENTS.md` 管理行为准则,通常不需要再额外打开。
# experimental_instructions_file = "/home/lenovo/.codex/custom-instructions.md"
# ==================== MCP 默认配置 ====================
# startup_timeout_ms
# - MCPModel Context Protocol服务器启动/握手的超时时间(毫秒)。
# - 如果你启用了某些 MCPnpx/node 启动慢),可以把这个值适当调大。
startup_timeout_ms = 20000
# ==================== UI 与提示 ====================
[tui]
# 是否在 TUI终端 UI里启用通知提示。
notifications = true
[notice]
# 这些开关用于隐藏某些“迁移/提示”类消息,减少噪音(仅影响 UI不影响核心功能
hide_gpt5_1_migration_prompt = true
"hide_gpt-5.1-codex-max_migration_prompt" = true
hide_rate_limit_model_nudge = true
[notice.model_migrations]
# 模型迁移映射:
# - 当某些老模型名不可用/被迁移时,用这里的映射做自动替换。
# - 建议只保留你确实用得到的映射,避免未来产生“我没注意但被自动换了”的困惑。
"gpt-5.1-codex-max" = "gpt-5.2-codex"
"gpt-5.2" = "gpt-5.3-codex"
# ==================== MCP Servers示例默认关闭 ====================
# 说明:
# - 下面这些块默认都注释掉,作为“可复制的模板”。
# - 启用方式:取消注释对应的 `[mcp_servers."name"]` 段,并按需修改 `command/args/cwd`。
# - 维护原则:宁可少开,按需启用;避免“全开导致启动慢/不稳定/难排障”。
# Context7 - 最新官方文档 MCP
# [mcp_servers."context7"]
# command = "npx"
# args = ["-y", "@upstash/context7-mcp@latest"]
# startup_timeout_ms = 20000
# Completion Notifier - 完成声音提示
# [mcp_servers."completion-notifier"]
# command = "node"
# args = ["/home/lenovo/.codex/mcp-servers/completion-notifier/#index.js"]
# startup_timeout_ms = 20000
# chrome-devtools
# [mcp_servers."chrome-devtools"]
# command = "npx"
# args = ["-y", "chrome-devtools-mcp@latest"]
# startup_timeout_ms = 20000
# [mcp_servers."playwright"]
# command = "npx"
# args = ["-y", "@playwright/mcp@latest"]
# startup_timeout_ms = 20000
# [mcp_servers."puppeteer"]
# command = "npx"
# args = ["-y", "puppeteer-mcp-server"]
# startup_timeout_ms = 20000
# [mcp_servers."n8n"]
# command = "npx"
# args = ["-y", "n8n-mcp@latest"]
# startup_timeout_ms = 20000
# [mcp_servers."maverick"]
# command = "npx"
# args = ["-y", "mcp-remote", "http://localhost:8003/sse/"]
# cwd = "/home/lenovo/maverick-mcp"
# startup_timeout_ms = 20000
# [mcp_servers."happy"]
# command = "happy"
# args = ["codex"]
# startup_timeout_ms = 20000
# Augment - 代码库检索 MCP
# [mcp_servers."auggie-mcp"]
# command = "auggie"
# args = ["-w", "/mnt/c/Users/lenovo", "--mcp"]
# startup_timeout_ms = 200000

View File

@ -1,9 +0,0 @@
# Config`assets/config/`
本目录用于集中存放“工具/开发环境配置”的仓库内基线,便于多人同步、审阅与回滚。
## Codex CLI
- 配置位置:`assets/config/.codex/`
- 使用说明:`assets/config/.codex/README.md`

View File

@ -1,28 +0,0 @@
# 辩证法在 Vibe Coding 里的用法:正反合
把辩证法的“正反合”用到 Vibe Coding我把每次写代码都当一轮“三段论”。
## 正:当前状态(先跑通)
- 让模型按直觉快速给出“最顺的实现”
- 目标只有一个:尽快跑通主路径
## 反:审计与调优(再打脸)
- 立刻站在“挑刺者”视角反驳它
- 列出失败模式、边界条件、性能与安全隐患
- 用测试、类型、lint、基准把反驳落地
## 合:根据审核修正(再收敛)
- 把速度与约束合起来
- 重构接口、收敛依赖、补齐测试与文档
- 形成下一轮更稳定的起点
## 实践口诀
先顺写 → 再打脸 → 再收敛
## 一句话总结
Vibe 负责生成可能性,正反合负责把可能性变成工程确定性。

View File

@ -1,35 +0,0 @@
# Documents 目录 Agent 指南
## 目录用途
`assets/documents/` 存放项目知识库文档,包含方法论、入门指南、实战案例等。
## 目录结构
```
assets/documents/
├── 05-哲学与方法论/ # 最高思想纲领
├── 00-基础指南/ # 核心原则与底层逻辑
├── 01-入门指南/ # 从零开始教程
├── 02-方法论/ # 具体工具与技巧
└── 03-实战/ # 项目实战案例
```
## 操作规范
### 允许
- 新增/修改文档内容
- 修复错误和过时信息
- 添加新的实战案例
- 为每个一级目录维护 `README.md` 作为索引入口(如存在)
### 禁止
- 删除现有文档(除非明确要求)
- 修改目录编号前缀规则
- 大规模重命名/移动文件导致链接失效(如必须调整,需同步更新引用)
## 命名规范
- 文件名使用中文
- 使用 Markdown 格式
- 编号前缀保持一致性

View File

@ -1,32 +0,0 @@
# Prompts 目录 Agent 指南
## 目录用途
`assets/prompts/` 提示词库入口,实际内容已迁移至云端表格。
## 在线资源
**主表格**[提示词云端表格](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203)
**原版表格**[原版本(非直观易读)](https://docs.google.com/spreadsheets/d/1ngoQOhJqdguwNAilCl1joNwTje7FWWN9WiI2bo5VhpU/edit?gid=1890901677#gid=1890901677)
## 表格结构
- **工作表**:每个 Sheet 代表一类提示词
- **横轴**提示词迭代版本1a → 1b → 1c
- **纵轴**不同提示词提示词1、提示词2、...
## 操作规范
### 允许
- 更新 README.md 中的链接和说明
- 同步云端表格的结构变化到文档
### 禁止
- 在本地创建提示词文件(应添加到云端表格)
- 删除 README.md
- 在本目录写入敏感信息(密钥/Token/个人路径等)
## 相关工具
- `assets/repo/prompts-library/` - Excel ↔ Markdown 互转工具

View File

@ -1,33 +0,0 @@
# 💬 提示词库
提示词资源已迁移至云端表格,便于实时更新和协作编辑。
## 📋 在线提示词库
**[👉 点击访问提示词云端表格](https://docs.google.com/spreadsheets/d/1Ifk_dLF25ULSxcfGem1hXzJsi7_RBUNAki8SBCuvkJA/edit?gid=1254297203#gid=1254297203)**
## 表格结构说明
- **工作表Sheet**:底部每个工作表代表一类提示词
- **横轴(列)**:表示提示词的迭代版本(如 1a → 1b → 1c体现演化过程
- **纵轴(行)**表示不同的提示词提示词1、提示词2、...),便于对比变化趋势
## 提示词分类
| 工作表 | 说明 |
|:---|:---|
| 元提示词 | 生成提示词的提示词 |
| 系统提示词 | AI 系统级提示词 |
| 编程提示词 | 编程相关提示词 |
| 用户提示词 | 用户自定义提示词 |
## 相关资源
- [技能库](../skills/) - 比提示词更高级的能力封装
- [文档库](../documents/) - 方法论与开发经验
- [prompts-library 工具](../repo/prompts-library/)
Excel ↔ Markdown 互转工具
## 原版表格
如需查看非直观易读的原版本:[原版表格](https://docs.google.com/spreadsheets/d/1ngoQOhJqdguwNAilCl1joNwTje7FWWN9WiI2bo5VhpU/edit?gid=1890901677#gid=1890901677)

@ -1 +0,0 @@
Subproject commit 87dcd13a28aeb5f18baee630e24b3f5765ae3a4f

View File

@ -1,26 +0,0 @@
# assets/repo/ 目录 Agent 指南
本目录用于收纳 **外部工具/第三方项目**(含 Git submodule保持“主仓库资产”和“外部依赖”边界清晰、可审计、可更新。
## 目录结构(约定)
```text
assets/repo/
├── AGENTS.md # 本文件(目录级行为准则)
├── README.md # 外部工具索引
├── .tmux/ # submoduleoh-my-tmux 配置
├── tmux/ # submoduletmux 源码
└── claude-official-skills/ # submoduleClaude 官方 skills 仓库Anthropic
```
## 操作规范
### 允许
- 新增外部依赖(优先 Git submodule确保可复现
- 更新 submodule 指针(明确记录上游来源与用途)
### 禁止 / 不推荐
- 直接复制粘贴大型第三方仓库内容到主仓库(优先 submodule
- 将 submodule 替换为本地绝对路径软链接(会导致他人环境不可用)

View File

@ -1,46 +0,0 @@
# 🔌 assets/repo/:外部集成与第三方工具
`assets/repo/` 用来收纳第三方工具、外部依赖与集成模块(含 Git submodule。核心原则是
- **尽量原样保留**:避免“魔改后不可升级”
- **隔离依赖与风险**:外部工具的依赖不要污染主仓库
- **可追溯**:来源、许可证、用法要写清楚
## 目录结构
```
assets/repo/
├── AGENTS.md # 本目录的 Agent 行为准则
├── README.md # 本文件(外部工具索引)
├── .tmux/ # submoduleoh-my-tmux 配置
├── tmux/ # submoduletmux 源码
├── claude-official-skills/ # submoduleClaude 官方 skills 仓库Anthropic
├── prompts-library/ # Excel ↔ Markdown 转换工具
├── chat-vault/ # AI 聊天记录保存工具
├── Skill_Seekers-development/ # Skills 制作器
├── html-tools-main/ # HTML 工具集
├── my-nvim/ # Neovim 配置(含 nvim-config/
├── MCPlayerTransfer/ # MC 玩家迁移工具
├── XHS-image-to-PDF-conversion/ # 图片合并 PDF 工具
└── backups/ # 历史备份脚本快照
```
## 工具清单(入口与文档)
- `chat-vault/`AI 聊天记录保存工具(详见 `chat-vault/README.md`
- `prompts-library/`:提示词 Excel ↔ Markdown 批量互转与索引生成(详见 `prompts-library/README.md`
- `Skill_Seekers-development/`Skills 抓取/制作器(详见 `Skill_Seekers-development/README.md`
- `html-tools-main/`HTML 工具集(详见 `html-tools-main/README.md`
- `my-nvim/`:个人 Neovim 配置(详见 `my-nvim/README.md`
- `MCPlayerTransfer/`MC 玩家迁移工具(详见 `MCPlayerTransfer/README.md`
- `XHS-image-to-PDF-conversion/`:图片合并 PDF详见 `XHS-image-to-PDF-conversion/README.md`
- `.tmux/`、`tmux/`、`claude-official-skills/`:以 submodule 形式引入的上游仓库
> 📝 系统提示词已迁移到云端表格,入口见 [`assets/prompts/README.md`](../prompts/README.md)。
## 新增外部工具(最小清单)
1. 创建目录:`assets/repo/<tool-name>/`
2. 必备文件:`README.md`(用途/入口/依赖/输入输出)、许可证与来源说明(如 `LICENSE` / `SOURCE.md`
3. 依赖约束:尽量使用工具自带的虚拟环境/容器化方式,不影响仓库其他部分
4. 文档同步:在本 README 增加一行工具说明,保证可发现性

@ -1 +0,0 @@
Subproject commit 1ed29a03dc852d30fa6ef2ca53a67dc2c2c2c563

View File

@ -1,29 +0,0 @@
# Neovim 配置LazyVim说明
本目录是一套可直接复制到 `~/.config/nvim/` 的 Neovim 配置,基于 LazyVim + lazy.nvim。
## 目录结构
```text
nvim-config/
├── init.lua # 入口:加载 config.lazy
├── lazy-lock.json # 插件锁定版本
├── lazyvim.json # LazyVim 元数据extras/install_version
└── lua/
├── config/ # options/keymaps/autocmds/lazy 基础配置
└── plugins/ # 以“文件为单位”的插件/覆盖配置
├── ui.lua # UIneo-tree/bufferline 等)覆盖
└── snacks.lua # Snacks 默认策略(显示隐藏/被忽略文件)
```
## 关键约定
- 对人可见文本(注释/日志/文档)用中文;代码符号(变量/函数/模块名)用英文。
- 插件覆盖优先放在 `lua/plugins/*.lua`,避免在 `config/*` 里堆逻辑。
- “默认显示隐藏文件”的入口在 `lua/plugins/snacks.lua`
- `picker.sources.files/explorer/grep`: `hidden=true`、`ignored=true`
## 变更记录
- 2026-02-20新增 `lua/plugins/snacks.lua`,让 Snacks Explorer/Picker 默认显示隐藏与被忽略文件。

View File

@ -1,15 +0,0 @@
-- Keymaps are automatically loaded on the VeryLazy event
-- Default keymaps that are always set: https://github.com/LazyVim/LazyVim/blob/main/lua/lazyvim/config/keymaps.lua
-- Add any additional keymaps here
-- ==================== 文件搜索:包含隐藏/被忽略文件 ====================
-- 说明:
-- - LazyVim 默认 <leader>ff 通常不包含 .gitignore 忽略的文件
-- - 这个快捷键用于“真的找不到文件时”兜底(例如 .env
vim.keymap.set("n", "<leader>fF", function()
require("telescope.builtin").find_files({
hidden = true,
no_ignore = true,
no_ignore_parent = true,
})
end, { desc = "Find All Files (hidden + ignored)" })

View File

@ -1,28 +0,0 @@
return {
{
"folke/snacks.nvim",
opts = function(_, opts)
-- ==================== 默认显示隐藏/被忽略文件 ====================
-- LazyVim install_version=8 默认 picker/explorer 都优先走 Snacks这里统一打开 hidden/ignored。
opts.picker = opts.picker or {}
opts.picker.sources = opts.picker.sources or {}
local sources = opts.picker.sources
sources.files = vim.tbl_deep_extend("force", sources.files or {}, {
hidden = true,
ignored = true,
})
sources.explorer = vim.tbl_deep_extend("force", sources.explorer or {}, {
hidden = true,
ignored = true,
})
sources.grep = vim.tbl_deep_extend("force", sources.grep or {}, {
hidden = true,
ignored = true,
})
end,
},
}

@ -1 +0,0 @@
Subproject commit 615c27c11789948df2db09e113e882f82dfb3e1c

View File

@ -1,53 +0,0 @@
# Skills 目录 Agent 指南
本目录用于收纳可复用的 **Skills技能模块**:每个子目录代表一个“可触发、可复用、可交付”的能力包,通常包含入口文档 `SKILL.md`,以及可选的脚本/参考资料/资产文件。
## 目录结构(约定)
```text
assets/skills/
├── AGENTS.md # 本文件(目录级行为准则)
├── README.md # skills 总览与索引
├── <skill-name>/ # 一个技能 = 一个目录
│ ├── SKILL.md # 入口:触发条件/边界/交付物/流程
│ ├── references/ # (可选) 参考资料与索引
│ ├── scripts/ # (可选) 可执行脚本/自动化
│ ├── assets/ # (可选) 模板/样例/静态资源
│ └── agents/ # (可选) Agent 元数据(如 openai.yaml
└── skills-skills/ # 元技能:生成/校验/脚手架化其它技能
```
## 外部仓库引用(约定)
- 允许在 `assets/skills/` 下放置“只读引用”的软链接,用于引入外部权威仓库的内容(便于统一索引与检索)。
- 为保证可复现性:软链接目标必须落在仓库内,并优先使用 Git submodule 管理(避免链接到个人机器的绝对路径)。
## 模块职责与边界
- 每个 `<skill-name>/` 必须以 `SKILL.md` 作为入口,明确:
- 触发条件(何时用)
- 不适用/边界(何时不用)
- 交付物(要产出什么文件/结论)
- 最小可复现流程(命令/步骤/输入输出)
- 技能目录之间尽量 **无隐式耦合**:不要依赖别的技能目录中的“隐式文件路径/脚本副作用”。
- 通用逻辑优先下沉到仓库的通用库目录(如后续引入),技能目录只保留“该领域必要的最薄封装”。
## 操作规范
### 允许
- 新增技能目录(优先复用现有模板与规范)
- 迭代现有 `SKILL.md` 的触发条件、边界与交付物定义
- 为技能补齐 `references/` 索引或 `scripts/` 自动化
### 禁止 / 不推荐
- 在 `assets/skills/` 下按“编号分类目录”拆层级(保持扁平,靠 `README.md` 建索引)
- 让脚本默认写入不可审计的全局路径(优先输出到技能目录内或明确的 `artifacts/`
## 快速定位(常用技能)
- `assets/skills/tmux-autopilot/`tmux 自动化操控与多 Agent 协作
- `assets/skills/canvas-dev/`Canvas 白板驱动开发
- `assets/skills/sop-generator/`SOP 生成与规范化
- `assets/skills/markdown-to-epub/`Markdown → EPUB 稳定构建
- `assets/skills/skills-skills/`:元技能(技能生成/校验/脚手架)
- `assets/skills/claude-official-skills/`Claude 官方 skills 仓库Anthropic的软链接入口

View File

@ -1,90 +0,0 @@
# 🎯 AI Skills 技能库
`assets/skills/` 目录存放 AI 技能Skills这些是比提示词更高级的能力封装可以让 AI 在特定领域表现出专家级水平。当前包含 **20 个**专业技能。
## Skills 一览表
### 🔮 元技能(生成 Skills 的 Skills
| 技能 | 说明 |
|:---|:---|
| [skills-skills](./skills-skills/SKILL.md) | ⭐ 生成 Skills 的 Skills |
| [sop-generator](./sop-generator/SKILL.md) | SOP 生成与规范化 |
### 🤖 AI 工具
| 技能 | 说明 |
|:---|:---|
| [canvas-dev](./canvas-dev/SKILL.md) | ⭐ Canvas白板驱动开发AI架构总师 |
| [headless-cli](./headless-cli/SKILL.md) | 无头模式 AI CLI 调用Gemini/Claude/Codex |
| [claude-code-guide](./claude-code-guide/SKILL.md) | Claude Code CLI 使用指南 |
| [claude-cookbooks](./claude-cookbooks/SKILL.md) | Claude API 最佳实践 |
### 🗄️ 数据库
| 技能 | 说明 |
|:---|:---|
| [postgresql](./postgresql/SKILL.md) | ⭐ PostgreSQL 完整专家技能 |
| [timescaledb](./timescaledb/SKILL.md) | PostgreSQL 时序扩展 |
### 💰 加密货币 / 量化交易
| 技能 | 说明 |
|:---|:---|
| [ccxt](./ccxt/SKILL.md) | 加密货币交易所统一 API |
| [coingecko](./coingecko/SKILL.md) | CoinGecko 行情 API |
| [cryptofeed](./cryptofeed/SKILL.md) | 加密货币实时数据流 |
| [hummingbot](./hummingbot/SKILL.md) | 量化交易机器人框架 |
| [polymarket](./polymarket/SKILL.md) | 预测市场 API |
### 🛠️ 开发工具
| 技能 | 说明 |
|:---|:---|
| [ddd-doc-steward](./ddd-doc-steward/SKILL.md) | 文档驱动开发DDD文档管家 |
| [telegram-dev](./telegram-dev/SKILL.md) | Telegram Bot 开发 |
| [twscrape](./twscrape/SKILL.md) | Twitter/X 数据抓取 |
| [snapdom](./snapdom/SKILL.md) | DOM 快照与测试 |
| [proxychains](./proxychains/SKILL.md) | 代理链配置与使用 |
| [tmux-autopilot](./tmux-autopilot/SKILL.md) | tmux 自动化操控AI蜂群协作 |
### ⚡ 生产力
| 技能 | 说明 |
|:---|:---|
| [markdown-to-epub](./markdown-to-epub/SKILL.md) | Markdown 转 EPUB 电子书 |
## 外部技能仓库(软链接)
- `assets/skills/claude-official-skills/`:来自 Claude 官方 skills 仓库Anthropic
本仓库以 Git submodule 的形式落在 `assets/repo/claude-official-skills/`
并通过软链接暴露到 `assets/skills/` 下便于浏览与复用。
- 初始化/更新方式:`git submodule update --init --recursive`
- Skills 大全网站:`https://skills.sh/`
## 快速使用
```bash
# 查看元技能
cat assets/skills/skills-skills/SKILL.md
# 查看无头 CLI 技能
cat assets/skills/headless-cli/SKILL.md
# 查看 PostgreSQL 技能
cat assets/skills/postgresql/SKILL.md
```
## 创建自定义 Skill
使用元技能生成:
1. 准备领域资料(文档、代码、规范)
2. 将资料和 `skills-skills/SKILL.md` 一起提供给 AI
3. AI 会生成针对该领域的专用 Skill
## 相关资源
- [元技能文件](./skills-skills/SKILL.md) - 生成 Skills 的 Skills
- [提示词库](../prompts/) - 更细粒度的提示词集合
- [文档库](../documents/) - 方法论与开发经验
- [skills.sh](https://skills.sh/) - Skill 大全网站

View File

@ -1 +0,0 @@
../repo/claude-official-skills

View File

@ -1,57 +0,0 @@
# assets/skills/skills-skills
This directory is a **meta-skill**: it turns arbitrary domain material (docs/APIs/code/specs) into a reusable Skill (`SKILL.md` + `references/` + `scripts/` + `assets/`), and ships an executable quality gate + scaffolding.
## Layout
```
skills-skills/
|-- AGENTS.md
|-- SKILL.md
|-- assets/
| |-- template-minimal.md
| `-- template-complete.md
|-- scripts/
| |-- Skill_Seekers-development/
| |-- create-skill.sh
| |-- skill-seekers-bootstrap.sh
| |-- skill-seekers-configs -> Skill_Seekers-development/configs
| |-- skill-seekers-import.sh
| |-- skill-seekers.sh
| |-- skill-seekers-src -> Skill_Seekers-development/src
| |-- skill-seekers-update.sh
| `-- validate-skill.sh
`-- references/
|-- index.md
|-- README.md
|-- anti-patterns.md
|-- skill-seekers.md
|-- quality-checklist.md
`-- skill-spec.md
```
## File Responsibilities
- `assets/skills/skills-skills/SKILL.md`: entrypoint (triggers, deliverables, workflow, quality gate, tooling).
- `assets/skills/skills-skills/assets/template-minimal.md`: minimal template (small domains / quick bootstrap).
- `assets/skills/skills-skills/assets/template-complete.md`: full template (production-grade / complex domains).
- `assets/skills/skills-skills/scripts/create-skill.sh`: scaffold generator (minimal/full, output dir, overwrite).
- `assets/skills/skills-skills/scripts/Skill_Seekers-development/`: vendored Skill Seekers source snapshot (code + configs; excludes upstream Markdown docs).
- `assets/skills/skills-skills/scripts/skill-seekers-bootstrap.sh`: create a local venv and install deps for the vendored Skill Seekers tool.
- `assets/skills/skills-skills/scripts/skill-seekers.sh`: run Skill Seekers from vendored source (docs/github/pdf -> output/<name>/).
- `assets/skills/skills-skills/scripts/skill-seekers-import.sh`: import output/<name>/ into the canonical assets/skills/<name>/ tree.
- `assets/skills/skills-skills/scripts/skill-seekers-update.sh`: update the vendored source snapshot from upstream (network required).
- `assets/skills/skills-skills/scripts/validate-skill.sh`: spec validator (supports `--strict`).
- `assets/skills/skills-skills/references/index.md`: navigation for this meta-skill's reference docs.
- `assets/skills/skills-skills/references/README.md`: upstream official reference (lightly adjusted to keep links working in this repo).
- `assets/skills/skills-skills/references/skill-spec.md`: the local Skill spec (MUST/SHOULD/NEVER).
- `assets/skills/skills-skills/references/quality-checklist.md`: quality gate checklist + scoring.
- `assets/skills/skills-skills/references/anti-patterns.md`: common failure modes and how to fix them.
- `assets/skills/skills-skills/references/skill-seekers.md`: how to use the vendored tool as a mandatory first-draft generator.
## Dependencies & Boundaries
- `scripts/*.sh`: depend on `bash` + common POSIX tooling; some scripts require extra tooling:
- `skill-seekers-bootstrap.sh`: requires `python3` + `pip` (network required for PyPI).
- `skill-seekers-update.sh`: requires `curl` + `tar` + `rsync` (network required).
- This directory is about "how to build Skills", not about any specific domain; domain knowledge belongs in `assets/skills/<domain>/`.

View File

@ -1,39 +0,0 @@
# Skill Seekers内置工具使用说明
本目录把 `Skill_Seekers-development` 的源码作为 `skills-skills` 的必备工具内置,用于把「文档 / GitHub 仓库 / PDF」快速转成一个可落地的 Skill 初稿。
## 目录约定
- 工具源码:`assets/skills/skills-skills/scripts/Skill_Seekers-development/`
- 运行入口:`assets/skills/skills-skills/scripts/skill-seekers.sh`
- 依赖初始化:`assets/skills/skills-skills/scripts/skill-seekers-bootstrap.sh`
- 导入到本仓库:`assets/skills/skills-skills/scripts/skill-seekers-import.sh`
- 更新源码快照:`assets/skills/skills-skills/scripts/skill-seekers-update.sh`(需要网络)
## 推荐工作流(强约束)
1. 用 Skill Seekers 生成初稿到 `output/<name>/`
2. 导入到 `assets/skills/<name>/`
3. 用 `validate-skill.sh --strict` 做质量闸门
4. 回到 `skills-skills` 的规范对 `SKILL.md` 做“可激活性”与“边界”修订
## 最小可执行示例
```bash
# 1) 初始化(只需一次)
./assets/skills/skills-skills/scripts/skill-seekers-bootstrap.sh
# 2) 生成(示例:抓 docs 配置)
./assets/skills/skills-skills/scripts/skill-seekers.sh -- scrape --config ./assets/skills/skills-skills/scripts/Skill_Seekers-development/configs/react.json
# 3) 导入到 skills/
./assets/skills/skills-skills/scripts/skill-seekers-import.sh react
# 4) 严格校验
./assets/skills/skills-skills/scripts/validate-skill.sh assets/skills/react --strict
```
## 设计原则
- `skills/skills-skills/` 负责:规范、模板、闸门、可激活性;不直接承载领域知识。
- Skill Seekers 负责:抓取与初稿生成;最终交付仍以本仓库的 `validate-skill.sh --strict` 为准。

View File

@ -1,21 +0,0 @@
MIT License
Copyright (c) 2025 [Your Name/Username]
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -1,31 +0,0 @@
{
"name": "ansible-core",
"description": "Ansible Core 2.19 skill for automation and configuration management",
"base_url": "https://docs.ansible.com/ansible-core/2.19/",
"selectors": {
"main_content": "div[role=main]",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [],
"exclude": ["/_static/", "/_images/", "/_downloads/", "/search.html", "/genindex.html", "/py-modindex.html", "/index.html", "/roadmap/"]
},
"categories": {
"getting_started": ["getting_started", "getting-started", "introduction", "overview"],
"installation": ["installation_guide", "installation", "setup"],
"inventory": ["inventory_guide", "inventory"],
"playbooks": ["playbook_guide", "playbooks", "playbook"],
"modules": ["module_plugin_guide", "modules", "plugins"],
"collections": ["collections_guide", "collections"],
"vault": ["vault_guide", "vault", "encryption"],
"commands": ["command_guide", "commands", "cli"],
"porting": ["porting_guides", "porting", "migration"],
"os_specific": ["os_guide", "platform"],
"tips": ["tips_tricks", "tips", "tricks", "best-practices"],
"community": ["community", "contributing", "contributions"],
"development": ["dev_guide", "development", "developing"]
},
"rate_limit": 0.5,
"max_pages": 800
}

View File

@ -1,30 +0,0 @@
{
"name": "astro",
"description": "Astro web framework for content-focused websites. Use for Astro components, islands architecture, content collections, SSR/SSG, and modern web development.",
"base_url": "https://docs.astro.build/en/getting-started/",
"start_urls": [
"https://docs.astro.build/en/getting-started/",
"https://docs.astro.build/en/install/auto/",
"https://docs.astro.build/en/core-concepts/project-structure/",
"https://docs.astro.build/en/core-concepts/astro-components/",
"https://docs.astro.build/en/core-concepts/astro-pages/"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/en/"],
"exclude": ["/blog", "/integrations"]
},
"categories": {
"getting_started": ["getting-started", "install", "tutorial"],
"core_concepts": ["core-concepts", "project-structure", "components", "pages"],
"guides": ["guides", "deploy", "migrate"],
"configuration": ["configuration", "config", "typescript"],
"integrations": ["integrations", "framework", "adapter"]
},
"rate_limit": 0.5,
"max_pages": 100
}

View File

@ -1,37 +0,0 @@
{
"name": "claude-code",
"description": "Claude Code CLI and development environment. Use for Claude Code features, tools, workflows, MCP integration, configuration, and AI-assisted development.",
"base_url": "https://docs.claude.com/en/docs/claude-code/",
"start_urls": [
"https://docs.claude.com/en/docs/claude-code/overview",
"https://docs.claude.com/en/docs/claude-code/quickstart",
"https://docs.claude.com/en/docs/claude-code/common-workflows",
"https://docs.claude.com/en/docs/claude-code/mcp",
"https://docs.claude.com/en/docs/claude-code/settings",
"https://docs.claude.com/en/docs/claude-code/troubleshooting",
"https://docs.claude.com/en/docs/claude-code/iam"
],
"selectors": {
"main_content": "#content-container",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/claude-code/"],
"exclude": ["/api-reference/", "/claude-ai/", "/claude.ai/", "/prompt-engineering/", "/changelog/"]
},
"categories": {
"getting_started": ["overview", "quickstart", "installation", "setup", "terminal-config"],
"workflows": ["workflow", "common-workflows", "git", "testing", "debugging", "interactive"],
"mcp": ["mcp", "model-context-protocol"],
"configuration": ["config", "settings", "preferences", "customize", "hooks", "statusline", "model-config", "memory", "output-styles"],
"agents": ["agent", "task", "subagent", "sub-agent", "specialized"],
"skills": ["skill", "agent-skill"],
"integrations": ["ide-integrations", "vs-code", "jetbrains", "plugin", "marketplace"],
"deployment": ["bedrock", "vertex", "deployment", "network", "gateway", "devcontainer", "sandboxing", "third-party"],
"reference": ["reference", "api", "command", "cli-reference", "slash", "checkpointing", "headless", "sdk"],
"enterprise": ["iam", "security", "monitoring", "analytics", "costs", "legal", "data-usage"]
},
"rate_limit": 0.5,
"max_pages": 200
}

View File

@ -1,34 +0,0 @@
{
"name": "django",
"description": "Django web framework for Python. Use for Django models, views, templates, ORM, authentication, and web development.",
"base_url": "https://docs.djangoproject.com/en/stable/",
"start_urls": [
"https://docs.djangoproject.com/en/stable/intro/",
"https://docs.djangoproject.com/en/stable/topics/db/models/",
"https://docs.djangoproject.com/en/stable/topics/http/views/",
"https://docs.djangoproject.com/en/stable/topics/templates/",
"https://docs.djangoproject.com/en/stable/topics/forms/",
"https://docs.djangoproject.com/en/stable/topics/auth/",
"https://docs.djangoproject.com/en/stable/ref/models/"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre"
},
"url_patterns": {
"include": ["/intro/", "/topics/", "/ref/", "/howto/"],
"exclude": ["/faq/", "/misc/", "/releases/"]
},
"categories": {
"getting_started": ["intro", "tutorial", "install"],
"models": ["models", "database", "orm", "queries"],
"views": ["views", "urlconf", "routing"],
"templates": ["templates", "template"],
"forms": ["forms", "form"],
"authentication": ["auth", "authentication", "user"],
"api": ["ref", "reference"]
},
"rate_limit": 0.3,
"max_pages": 500
}

View File

@ -1,49 +0,0 @@
{
"name": "django",
"description": "Complete Django framework knowledge combining official documentation and Django codebase. Use when building Django applications, understanding ORM internals, or debugging Django issues.",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://docs.djangoproject.com/en/stable/",
"extract_api": true,
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre"
},
"url_patterns": {
"include": [],
"exclude": ["/search/", "/genindex/"]
},
"categories": {
"getting_started": ["intro", "tutorial", "install"],
"models": ["models", "orm", "queries", "database"],
"views": ["views", "urls", "templates"],
"forms": ["forms", "modelforms"],
"admin": ["admin"],
"api": ["ref/"],
"topics": ["topics/"],
"security": ["security", "csrf", "authentication"]
},
"rate_limit": 0.5,
"max_pages": 300
},
{
"type": "github",
"repo": "django/django",
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"include_code": true,
"code_analysis_depth": "surface",
"file_patterns": [
"django/db/**/*.py",
"django/views/**/*.py",
"django/forms/**/*.py",
"django/contrib/admin/**/*.py"
]
}
]
}

View File

@ -1,17 +0,0 @@
{
"name": "example_manual",
"description": "Example PDF documentation skill",
"pdf_path": "docs/manual.pdf",
"extract_options": {
"chunk_size": 10,
"min_quality": 5.0,
"extract_images": true,
"min_image_size": 100
},
"categories": {
"getting_started": ["introduction", "getting started", "quick start", "setup"],
"tutorial": ["tutorial", "guide", "walkthrough", "example"],
"api": ["api", "reference", "function", "class", "method"],
"advanced": ["advanced", "optimization", "performance", "best practices"]
}
}

View File

@ -1,33 +0,0 @@
{
"name": "fastapi",
"description": "FastAPI modern Python web framework. Use for building APIs, async endpoints, dependency injection, and Python backend development.",
"base_url": "https://fastapi.tiangolo.com/",
"start_urls": [
"https://fastapi.tiangolo.com/tutorial/",
"https://fastapi.tiangolo.com/tutorial/first-steps/",
"https://fastapi.tiangolo.com/tutorial/path-params/",
"https://fastapi.tiangolo.com/tutorial/body/",
"https://fastapi.tiangolo.com/tutorial/dependencies/",
"https://fastapi.tiangolo.com/advanced/",
"https://fastapi.tiangolo.com/reference/"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/tutorial/", "/advanced/", "/reference/"],
"exclude": ["/help/", "/external-links/", "/deployment/"]
},
"categories": {
"getting_started": ["first-steps", "tutorial", "intro"],
"path_operations": ["path", "operations", "routing"],
"request_data": ["request", "body", "query", "parameters"],
"dependencies": ["dependencies", "injection"],
"security": ["security", "oauth", "authentication"],
"database": ["database", "sql", "orm"]
},
"rate_limit": 0.5,
"max_pages": 250
}

View File

@ -1,45 +0,0 @@
{
"name": "fastapi",
"description": "Complete FastAPI knowledge combining official documentation and FastAPI codebase. Use when building FastAPI applications, understanding async patterns, or working with Pydantic models.",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://fastapi.tiangolo.com/",
"extract_api": true,
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": ["/img/", "/js/"]
},
"categories": {
"getting_started": ["tutorial", "first-steps"],
"path_operations": ["path-params", "query-params", "body"],
"dependencies": ["dependencies"],
"security": ["security", "oauth2"],
"database": ["sql-databases"],
"advanced": ["advanced", "async", "middleware"],
"deployment": ["deployment"]
},
"rate_limit": 0.5,
"max_pages": 150
},
{
"type": "github",
"repo": "tiangolo/fastapi",
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"include_code": true,
"code_analysis_depth": "surface",
"file_patterns": [
"fastapi/**/*.py"
]
}
]
}

View File

@ -1,41 +0,0 @@
{
"name": "fastapi_test",
"description": "FastAPI test - unified scraping with limited pages",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://fastapi.tiangolo.com/",
"extract_api": true,
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": ["/img/", "/js/"]
},
"categories": {
"getting_started": ["tutorial", "first-steps"],
"path_operations": ["path-params", "query-params"],
"api": ["reference"]
},
"rate_limit": 0.5,
"max_pages": 20
},
{
"type": "github",
"repo": "tiangolo/fastapi",
"include_issues": false,
"include_changelog": false,
"include_releases": true,
"include_code": true,
"code_analysis_depth": "surface",
"file_patterns": [
"fastapi/routing.py",
"fastapi/applications.py"
]
}
]
}

View File

@ -1,63 +0,0 @@
{
"name": "godot",
"description": "Godot Engine game development. Use for Godot projects, GDScript/C# coding, scene setup, node systems, 2D/3D development, physics, animation, UI, shaders, or any Godot-specific questions.",
"base_url": "https://docs.godotengine.org/en/stable/",
"start_urls": [
"https://docs.godotengine.org/en/stable/getting_started/introduction/index.html",
"https://docs.godotengine.org/en/stable/tutorials/scripting/gdscript/index.html",
"https://docs.godotengine.org/en/stable/tutorials/2d/index.html",
"https://docs.godotengine.org/en/stable/tutorials/3d/index.html",
"https://docs.godotengine.org/en/stable/tutorials/physics/index.html",
"https://docs.godotengine.org/en/stable/tutorials/animation/index.html",
"https://docs.godotengine.org/en/stable/classes/index.html"
],
"selectors": {
"main_content": "div[role='main']",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [
"/getting_started/",
"/tutorials/",
"/classes/"
],
"exclude": [
"/genindex.html",
"/search.html",
"/_static/",
"/_sources/"
]
},
"categories": {
"getting_started": ["introduction", "getting_started", "first", "your_first"],
"scripting": ["scripting", "gdscript", "c#", "csharp"],
"2d": ["/2d/", "sprite", "canvas", "tilemap"],
"3d": ["/3d/", "spatial", "mesh", "3d_"],
"physics": ["physics", "collision", "rigidbody", "characterbody"],
"animation": ["animation", "tween", "animationplayer"],
"ui": ["ui", "control", "gui", "theme"],
"shaders": ["shader", "material", "visual_shader"],
"audio": ["audio", "sound"],
"networking": ["networking", "multiplayer", "rpc"],
"export": ["export", "platform", "deploy"]
},
"rate_limit": 0.5,
"max_pages": 40000,
"_comment": "=== NEW: Split Strategy Configuration ===",
"split_strategy": "router",
"split_config": {
"target_pages_per_skill": 5000,
"create_router": true,
"split_by_categories": ["scripting", "2d", "3d", "physics", "shaders"],
"router_name": "godot",
"parallel_scraping": true
},
"_comment2": "=== NEW: Checkpoint Configuration ===",
"checkpoint": {
"enabled": true,
"interval": 1000
}
}

View File

@ -1,47 +0,0 @@
{
"name": "godot",
"description": "Godot Engine game development. Use for Godot projects, GDScript/C# coding, scene setup, node systems, 2D/3D development, physics, animation, UI, shaders, or any Godot-specific questions.",
"base_url": "https://docs.godotengine.org/en/stable/",
"start_urls": [
"https://docs.godotengine.org/en/stable/getting_started/introduction/index.html",
"https://docs.godotengine.org/en/stable/tutorials/scripting/gdscript/index.html",
"https://docs.godotengine.org/en/stable/tutorials/2d/index.html",
"https://docs.godotengine.org/en/stable/tutorials/3d/index.html",
"https://docs.godotengine.org/en/stable/tutorials/physics/index.html",
"https://docs.godotengine.org/en/stable/tutorials/animation/index.html",
"https://docs.godotengine.org/en/stable/classes/index.html"
],
"selectors": {
"main_content": "div[role='main']",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [
"/getting_started/",
"/tutorials/",
"/classes/"
],
"exclude": [
"/genindex.html",
"/search.html",
"/_static/",
"/_sources/"
]
},
"categories": {
"getting_started": ["introduction", "getting_started", "first", "your_first"],
"scripting": ["scripting", "gdscript", "c#", "csharp"],
"2d": ["/2d/", "sprite", "canvas", "tilemap"],
"3d": ["/3d/", "spatial", "mesh", "3d_"],
"physics": ["physics", "collision", "rigidbody", "characterbody"],
"animation": ["animation", "tween", "animationplayer"],
"ui": ["ui", "control", "gui", "theme"],
"shaders": ["shader", "material", "visual_shader"],
"audio": ["audio", "sound"],
"networking": ["networking", "multiplayer", "rpc"],
"export": ["export", "platform", "deploy"]
},
"rate_limit": 0.5,
"max_pages": 500
}

View File

@ -1,19 +0,0 @@
{
"name": "godot",
"repo": "godotengine/godot",
"description": "Godot Engine - Multi-platform 2D and 3D game engine",
"github_token": null,
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"include_code": false,
"file_patterns": [
"core/**/*.h",
"core/**/*.cpp",
"scene/**/*.h",
"scene/**/*.cpp",
"servers/**/*.h",
"servers/**/*.cpp"
]
}

View File

@ -1,50 +0,0 @@
{
"name": "godot",
"description": "Complete Godot Engine knowledge base combining official documentation and source code analysis",
"merge_mode": "claude-enhanced",
"sources": [
{
"type": "documentation",
"base_url": "https://docs.godotengine.org/en/stable/",
"extract_api": true,
"selectors": {
"main_content": "div[role='main']",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [],
"exclude": ["/search.html", "/_static/", "/_images/"]
},
"categories": {
"getting_started": ["introduction", "getting_started", "step_by_step"],
"scripting": ["scripting", "gdscript", "c_sharp"],
"2d": ["2d", "canvas", "sprite", "animation"],
"3d": ["3d", "spatial", "mesh", "shader"],
"physics": ["physics", "collision", "rigidbody"],
"api": ["api", "class", "reference", "method"]
},
"rate_limit": 0.5,
"max_pages": 500
},
{
"type": "github",
"repo": "godotengine/godot",
"github_token": null,
"code_analysis_depth": "deep",
"include_code": true,
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"file_patterns": [
"core/**/*.h",
"core/**/*.cpp",
"scene/**/*.h",
"scene/**/*.cpp",
"servers/**/*.h",
"servers/**/*.cpp"
]
}
]
}

View File

@ -1,18 +0,0 @@
{
"name": "hono",
"description": "Hono web application framework for building fast, lightweight APIs. Use for Hono routing, middleware, context handling, and modern JavaScript/TypeScript web development.",
"llms_txt_url": "https://hono.dev/llms-full.txt",
"base_url": "https://hono.dev/docs",
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": []
},
"categories": {},
"rate_limit": 0.5,
"max_pages": 50
}

View File

@ -1,48 +0,0 @@
{
"name": "kubernetes",
"description": "Kubernetes container orchestration platform. Use for K8s clusters, deployments, pods, services, networking, storage, configuration, and DevOps tasks.",
"base_url": "https://kubernetes.io/docs/",
"start_urls": [
"https://kubernetes.io/docs/home/",
"https://kubernetes.io/docs/concepts/",
"https://kubernetes.io/docs/tasks/",
"https://kubernetes.io/docs/tutorials/",
"https://kubernetes.io/docs/reference/"
],
"selectors": {
"main_content": "main",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [
"/docs/concepts/",
"/docs/tasks/",
"/docs/tutorials/",
"/docs/reference/",
"/docs/setup/"
],
"exclude": [
"/search/",
"/blog/",
"/training/",
"/partners/",
"/community/",
"/_print/",
"/case-studies/"
]
},
"categories": {
"getting_started": ["getting-started", "setup", "learning-environment"],
"concepts": ["concepts", "overview", "architecture"],
"workloads": ["workloads", "pods", "deployments", "replicaset", "statefulset", "daemonset"],
"services": ["services", "networking", "ingress", "service"],
"storage": ["storage", "volumes", "persistent"],
"configuration": ["configuration", "configmap", "secret"],
"security": ["security", "rbac", "policies", "authentication"],
"tasks": ["tasks", "administer", "configure"],
"tutorials": ["tutorials", "stateless", "stateful"]
},
"rate_limit": 0.5,
"max_pages": 1000
}

View File

@ -1,34 +0,0 @@
{
"name": "laravel",
"description": "Laravel PHP web framework. Use for Laravel models, routes, controllers, Blade templates, Eloquent ORM, authentication, and PHP web development.",
"base_url": "https://laravel.com/docs/9.x/",
"start_urls": [
"https://laravel.com/docs/9.x/installation",
"https://laravel.com/docs/9.x/routing",
"https://laravel.com/docs/9.x/controllers",
"https://laravel.com/docs/9.x/views",
"https://laravel.com/docs/9.x/blade",
"https://laravel.com/docs/9.x/eloquent",
"https://laravel.com/docs/9.x/migrations",
"https://laravel.com/docs/9.x/authentication"
],
"selectors": {
"main_content": "#main-content",
"title": "h1",
"code_blocks": "pre"
},
"url_patterns": {
"include": ["/docs/9.x/", "/docs/10.x/", "/docs/11.x/"],
"exclude": ["/api/", "/packages/"]
},
"categories": {
"getting_started": ["installation", "configuration", "structure", "deployment"],
"routing": ["routing", "middleware", "controllers"],
"views": ["views", "blade", "templates"],
"models": ["eloquent", "database", "migrations", "seeding", "queries"],
"authentication": ["authentication", "authorization", "passwords"],
"api": ["api", "resources", "requests", "responses"]
},
"rate_limit": 0.3,
"max_pages": 500
}

View File

@ -1,17 +0,0 @@
{
"name": "python-tutorial-test",
"description": "Python tutorial for testing MCP tools",
"base_url": "https://docs.python.org/3/tutorial/",
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": []
},
"categories": {},
"rate_limit": 0.3,
"max_pages": 10
}

View File

@ -1,31 +0,0 @@
{
"name": "react",
"description": "React framework for building user interfaces. Use for React components, hooks, state management, JSX, and modern frontend development.",
"base_url": "https://react.dev/",
"start_urls": [
"https://react.dev/learn",
"https://react.dev/learn/quick-start",
"https://react.dev/learn/thinking-in-react",
"https://react.dev/reference/react",
"https://react.dev/reference/react-dom",
"https://react.dev/reference/react/hooks"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/learn", "/reference"],
"exclude": ["/community", "/blog"]
},
"categories": {
"getting_started": ["quick-start", "installation", "tutorial"],
"hooks": ["usestate", "useeffect", "usememo", "usecallback", "usecontext", "useref", "hook"],
"components": ["component", "props", "jsx"],
"state": ["state", "context", "reducer"],
"api": ["api", "reference"]
},
"rate_limit": 0.5,
"max_pages": 300
}

View File

@ -1,15 +0,0 @@
{
"name": "react",
"repo": "facebook/react",
"description": "React JavaScript library for building user interfaces",
"github_token": null,
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"include_code": false,
"file_patterns": [
"packages/**/*.js",
"packages/**/*.ts"
]
}

View File

@ -1,44 +0,0 @@
{
"name": "react",
"description": "Complete React knowledge base combining official documentation and React codebase insights. Use when working with React, understanding API changes, or debugging React internals.",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://react.dev/",
"extract_api": true,
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": ["/blog/", "/community/"]
},
"categories": {
"getting_started": ["learn", "installation", "quick-start"],
"components": ["components", "props", "state"],
"hooks": ["hooks", "usestate", "useeffect", "usecontext"],
"api": ["api", "reference"],
"advanced": ["context", "refs", "portals", "suspense"]
},
"rate_limit": 0.5,
"max_pages": 200
},
{
"type": "github",
"repo": "facebook/react",
"include_issues": true,
"max_issues": 100,
"include_changelog": true,
"include_releases": true,
"include_code": true,
"code_analysis_depth": "surface",
"file_patterns": [
"packages/react/src/**/*.js",
"packages/react-dom/src/**/*.js"
]
}
]
}

View File

@ -1,108 +0,0 @@
{
"name": "steam-economy-complete",
"description": "Complete Steam Economy system including inventory, microtransactions, trading, and monetization. Use for ISteamInventory API, ISteamEconomy API, IInventoryService Web API, Steam Wallet integration, in-app purchases, item definitions, trading, crafting, market integration, and all economy features for game developers.",
"base_url": "https://partner.steamgames.com/doc/",
"start_urls": [
"https://partner.steamgames.com/doc/features/inventory",
"https://partner.steamgames.com/doc/features/microtransactions",
"https://partner.steamgames.com/doc/features/microtransactions/implementation",
"https://partner.steamgames.com/doc/api/ISteamInventory",
"https://partner.steamgames.com/doc/webapi/ISteamEconomy",
"https://partner.steamgames.com/doc/webapi/IInventoryService",
"https://partner.steamgames.com/doc/features/inventory/economy"
],
"selectors": {
"main_content": "div.documentation_bbcode",
"title": "div.docPageTitle",
"code_blocks": "div.bb_code"
},
"url_patterns": {
"include": [
"/features/inventory",
"/features/microtransactions",
"/api/ISteamInventory",
"/webapi/ISteamEconomy",
"/webapi/IInventoryService"
],
"exclude": [
"/home",
"/sales",
"/marketing",
"/legal",
"/finance",
"/login",
"/search",
"/steamworks/apps",
"/steamworks/partner"
]
},
"categories": {
"getting_started": [
"overview",
"getting started",
"introduction",
"quickstart",
"setup"
],
"inventory_system": [
"inventory",
"item definition",
"item schema",
"item properties",
"itemdefs",
"ISteamInventory"
],
"microtransactions": [
"microtransaction",
"purchase",
"payment",
"checkout",
"wallet",
"transaction"
],
"economy_api": [
"ISteamEconomy",
"economy",
"asset",
"context"
],
"inventory_webapi": [
"IInventoryService",
"webapi",
"web api",
"http"
],
"trading": [
"trading",
"trade",
"exchange",
"market"
],
"crafting": [
"crafting",
"recipe",
"combine",
"exchange"
],
"pricing": [
"pricing",
"price",
"cost",
"currency"
],
"implementation": [
"integration",
"implementation",
"configure",
"best practices"
],
"examples": [
"example",
"sample",
"tutorial",
"walkthrough"
]
},
"rate_limit": 0.7,
"max_pages": 1000
}

View File

@ -1,30 +0,0 @@
{
"name": "tailwind",
"description": "Tailwind CSS utility-first framework for rapid UI development. Use for Tailwind utilities, responsive design, custom configurations, and modern CSS workflows.",
"base_url": "https://tailwindcss.com/docs",
"start_urls": [
"https://tailwindcss.com/docs/installation",
"https://tailwindcss.com/docs/utility-first",
"https://tailwindcss.com/docs/responsive-design",
"https://tailwindcss.com/docs/hover-focus-and-other-states"
],
"selectors": {
"main_content": "div.prose",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/docs"],
"exclude": ["/blog", "/resources"]
},
"categories": {
"getting_started": ["installation", "editor-setup", "intellisense"],
"core_concepts": ["utility-first", "responsive", "hover-focus", "dark-mode"],
"layout": ["container", "columns", "flex", "grid"],
"typography": ["font-family", "font-size", "text-align", "text-color"],
"backgrounds": ["background-color", "background-image", "gradient"],
"customization": ["configuration", "theme", "plugins"]
},
"rate_limit": 0.5,
"max_pages": 100
}

View File

@ -1,17 +0,0 @@
{
"name": "test-manual",
"description": "Manual test config",
"base_url": "https://test.example.com/",
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": [],
"exclude": []
},
"categories": {},
"rate_limit": 0.5,
"max_pages": 50
}

View File

@ -1,31 +0,0 @@
{
"name": "vue",
"description": "Vue.js progressive JavaScript framework. Use for Vue components, reactivity, composition API, and frontend development.",
"base_url": "https://vuejs.org/",
"start_urls": [
"https://vuejs.org/guide/introduction.html",
"https://vuejs.org/guide/quick-start.html",
"https://vuejs.org/guide/essentials/application.html",
"https://vuejs.org/guide/components/registration.html",
"https://vuejs.org/guide/reusability/composables.html",
"https://vuejs.org/api/"
],
"selectors": {
"main_content": "main",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/guide/", "/api/", "/examples/"],
"exclude": ["/about/", "/sponsor/", "/partners/"]
},
"categories": {
"getting_started": ["quick-start", "introduction", "essentials"],
"components": ["component", "props", "events"],
"reactivity": ["reactivity", "reactive", "ref", "computed"],
"composition_api": ["composition", "setup"],
"api": ["api", "reference"]
},
"rate_limit": 0.5,
"max_pages": 200
}

View File

@ -1,195 +0,0 @@
#!/usr/bin/env python3
"""
Demo: Conflict Detection and Reporting
This demonstrates the unified scraper's ability to detect and report
conflicts between documentation and code implementation.
"""
import sys
import json
from pathlib import Path
# Add CLI to path
sys.path.insert(0, str(Path(__file__).parent / 'cli'))
from conflict_detector import ConflictDetector
print("=" * 70)
print("UNIFIED SCRAPER - CONFLICT DETECTION DEMO")
print("=" * 70)
print()
# Load test data
print("📂 Loading test data...")
print(" - Documentation APIs from example docs")
print(" - Code APIs from example repository")
print()
with open('cli/conflicts.json', 'r') as f:
conflicts_data = json.load(f)
conflicts = conflicts_data['conflicts']
summary = conflicts_data['summary']
print(f"✅ Loaded {summary['total']} conflicts")
print()
# Display summary
print("=" * 70)
print("CONFLICT SUMMARY")
print("=" * 70)
print()
print(f"📊 **Total Conflicts**: {summary['total']}")
print()
print("**By Type:**")
for conflict_type, count in summary['by_type'].items():
if count > 0:
emoji = "📖" if conflict_type == "missing_in_docs" else "💻" if conflict_type == "missing_in_code" else "⚠️"
print(f" {emoji} {conflict_type}: {count}")
print()
print("**By Severity:**")
for severity, count in summary['by_severity'].items():
if count > 0:
emoji = "🔴" if severity == "high" else "🟡" if severity == "medium" else "🟢"
print(f" {emoji} {severity.upper()}: {count}")
print()
# Display detailed conflicts
print("=" * 70)
print("DETAILED CONFLICT REPORTS")
print("=" * 70)
print()
# Group by severity
high = [c for c in conflicts if c['severity'] == 'high']
medium = [c for c in conflicts if c['severity'] == 'medium']
low = [c for c in conflicts if c['severity'] == 'low']
# Show high severity first
if high:
print("🔴 **HIGH SEVERITY CONFLICTS** (Requires immediate attention)")
print("-" * 70)
for conflict in high:
print()
print(f"**API**: `{conflict['api_name']}`")
print(f"**Type**: {conflict['type']}")
print(f"**Issue**: {conflict['difference']}")
print(f"**Suggestion**: {conflict['suggestion']}")
if conflict['docs_info']:
print(f"\n**Documented as**:")
print(f" Signature: {conflict['docs_info'].get('raw_signature', 'N/A')}")
if conflict['code_info']:
print(f"\n**Implemented as**:")
params = conflict['code_info'].get('parameters', [])
param_str = ', '.join(f"{p['name']}: {p.get('type_hint', 'Any')}" for p in params if p['name'] != 'self')
print(f" Signature: {conflict['code_info']['name']}({param_str})")
print(f" Return type: {conflict['code_info'].get('return_type', 'None')}")
print(f" Location: {conflict['code_info'].get('source', 'N/A')}:{conflict['code_info'].get('line', '?')}")
print()
# Show medium severity
if medium:
print("🟡 **MEDIUM SEVERITY CONFLICTS** (Review recommended)")
print("-" * 70)
for conflict in medium[:3]: # Show first 3
print()
print(f"**API**: `{conflict['api_name']}`")
print(f"**Type**: {conflict['type']}")
print(f"**Issue**: {conflict['difference']}")
if conflict['code_info']:
print(f"**Location**: {conflict['code_info'].get('source', 'N/A')}")
if len(medium) > 3:
print(f"\n ... and {len(medium) - 3} more medium severity conflicts")
print()
# Example: How conflicts appear in final skill
print("=" * 70)
print("HOW CONFLICTS APPEAR IN SKILL.MD")
print("=" * 70)
print()
example_conflict = high[0] if high else medium[0] if medium else conflicts[0]
print("```markdown")
print("## 🔧 API Reference")
print()
print("### ⚠️ APIs with Conflicts")
print()
print(f"#### `{example_conflict['api_name']}`")
print()
print(f"⚠️ **Conflict**: {example_conflict['difference']}")
print()
if example_conflict.get('docs_info'):
print("**Documentation says:**")
print("```")
print(example_conflict['docs_info'].get('raw_signature', 'N/A'))
print("```")
print()
if example_conflict.get('code_info'):
print("**Code implementation:**")
print("```python")
params = example_conflict['code_info'].get('parameters', [])
param_strs = []
for p in params:
if p['name'] == 'self':
continue
param_str = p['name']
if p.get('type_hint'):
param_str += f": {p['type_hint']}"
if p.get('default'):
param_str += f" = {p['default']}"
param_strs.append(param_str)
sig = f"def {example_conflict['code_info']['name']}({', '.join(param_strs)})"
if example_conflict['code_info'].get('return_type'):
sig += f" -> {example_conflict['code_info']['return_type']}"
print(sig)
print("```")
print()
print("*Source: both (conflict)*")
print("```")
print()
# Key takeaways
print("=" * 70)
print("KEY TAKEAWAYS")
print("=" * 70)
print()
print("✅ **What the Unified Scraper Does:**")
print(" 1. Extracts APIs from both documentation and code")
print(" 2. Compares them to detect discrepancies")
print(" 3. Classifies conflicts by type and severity")
print(" 4. Provides actionable suggestions")
print(" 5. Shows both versions transparently in the skill")
print()
print("⚠️ **Common Conflict Types:**")
print(" - **Missing in docs**: Undocumented features in code")
print(" - **Missing in code**: Documented but not implemented")
print(" - **Signature mismatch**: Different parameters/types")
print(" - **Description mismatch**: Different explanations")
print()
print("🎯 **Value:**")
print(" - Identifies documentation gaps")
print(" - Catches outdated documentation")
print(" - Highlights implementation differences")
print(" - Creates single source of truth showing reality")
print()
print("=" * 70)
print("END OF DEMO")
print("=" * 70)

View File

@ -1,11 +0,0 @@
{
"mcpServers": {
"skill-seeker": {
"command": "python3",
"args": [
"/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/mcp/server.py"
],
"cwd": "/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers"
}
}
}

View File

@ -1,13 +0,0 @@
[mypy]
python_version = 3.10
warn_return_any = False
warn_unused_configs = True
disallow_untyped_defs = False
check_untyped_defs = True
ignore_missing_imports = True
no_implicit_optional = True
show_error_codes = True
# Gradual typing - be lenient for now
disallow_incomplete_defs = False
disallow_untyped_calls = False

View File

@ -1,149 +0,0 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "skill-seekers"
version = "2.1.1"
description = "Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills"
readme = "README.md"
requires-python = ">=3.10"
license = {text = "MIT"}
authors = [
{name = "Yusuf Karaaslan"}
]
keywords = [
"claude",
"ai",
"documentation",
"scraping",
"skills",
"llm",
"mcp",
"automation"
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: Software Development :: Documentation",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Text Processing :: Markup :: Markdown",
]
# Core dependencies
dependencies = [
"requests>=2.32.5",
"beautifulsoup4>=4.14.2",
"PyGithub>=2.5.0",
"mcp>=1.18.0",
"httpx>=0.28.1",
"httpx-sse>=0.4.3",
"PyMuPDF>=1.24.14",
"Pillow>=11.0.0",
"pytesseract>=0.3.13",
"pydantic>=2.12.3",
"pydantic-settings>=2.11.0",
"python-dotenv>=1.1.1",
"jsonschema>=4.25.1",
"click>=8.3.0",
"Pygments>=2.19.2",
]
[project.optional-dependencies]
# Development dependencies
dev = [
"pytest>=8.4.2",
"pytest-cov>=7.0.0",
"coverage>=7.11.0",
]
# MCP server dependencies (included by default, but optional)
mcp = [
"mcp>=1.18.0",
"httpx>=0.28.1",
"httpx-sse>=0.4.3",
"uvicorn>=0.38.0",
"starlette>=0.48.0",
"sse-starlette>=3.0.2",
]
# All optional dependencies combined
all = [
"pytest>=8.4.2",
"pytest-cov>=7.0.0",
"coverage>=7.11.0",
"mcp>=1.18.0",
"httpx>=0.28.1",
"httpx-sse>=0.4.3",
"uvicorn>=0.38.0",
"starlette>=0.48.0",
"sse-starlette>=3.0.2",
]
[project.urls]
Homepage = "https://github.com/yusufkaraaslan/Skill_Seekers"
Repository = "https://github.com/yusufkaraaslan/Skill_Seekers"
"Bug Tracker" = "https://github.com/yusufkaraaslan/Skill_Seekers/issues"
Documentation = "https://github.com/yusufkaraaslan/Skill_Seekers#readme"
[project.scripts]
# Main unified CLI
skill-seekers = "skill_seekers.cli.main:main"
# Individual tool entry points
skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
skill-seekers-github = "skill_seekers.cli.github_scraper:main"
skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
skill-seekers-enhance = "skill_seekers.cli.enhance_skill_local:main"
skill-seekers-package = "skill_seekers.cli.package_skill:main"
skill-seekers-upload = "skill_seekers.cli.upload_skill:main"
skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
[tool.setuptools]
packages = ["skill_seekers", "skill_seekers.cli", "skill_seekers.mcp", "skill_seekers.mcp.tools"]
[tool.setuptools.package-dir]
"" = "src"
[tool.setuptools.package-data]
skill_seekers = ["py.typed"]
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = "-v --tb=short --strict-markers"
[tool.coverage.run]
source = ["src/skill_seekers"]
omit = ["*/tests/*", "*/__pycache__/*", "*/venv/*"]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"raise AssertionError",
"raise NotImplementedError",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
"@abstractmethod",
]
[tool.uv]
dev-dependencies = [
"pytest>=8.4.2",
"pytest-cov>=7.0.0",
"coverage>=7.11.0",
]
[tool.uv.sources]
# Use PyPI for all dependencies

View File

@ -1,42 +0,0 @@
annotated-types==0.7.0
anyio==4.11.0
attrs==25.4.0
beautifulsoup4==4.14.2
certifi==2025.10.5
charset-normalizer==3.4.4
click==8.3.0
coverage==7.11.0
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
httpx-sse==0.4.3
idna==3.11
iniconfig==2.3.0
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
mcp==1.18.0
packaging==25.0
pluggy==1.6.0
pydantic==2.12.3
pydantic-settings==2.11.0
pydantic_core==2.41.4
PyGithub==2.5.0
Pygments==2.19.2
PyMuPDF==1.24.14
Pillow==11.0.0
pytesseract==0.3.13
pytest==8.4.2
pytest-cov==7.0.0
python-dotenv==1.1.1
python-multipart==0.0.20
referencing==0.37.0
requests==2.32.5
rpds-py==0.27.1
sniffio==1.3.1
soupsieve==2.8
sse-starlette==3.0.2
starlette==0.48.0
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.5.0
uvicorn==0.38.0

View File

@ -1,266 +0,0 @@
#!/bin/bash
# Skill Seeker MCP Server - Quick Setup Script
# This script automates the MCP server setup for Claude Code
set -e # Exit on error
echo "=================================================="
echo "Skill Seeker MCP Server - Quick Setup"
echo "=================================================="
echo ""
# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color
# Step 1: Check Python version
echo "Step 1: Checking Python version..."
if ! command -v python3 &> /dev/null; then
echo -e "${RED}❌ Error: python3 not found${NC}"
echo "Please install Python 3.7 or higher"
exit 1
fi
PYTHON_VERSION=$(python3 --version | cut -d' ' -f2)
echo -e "${GREEN}${NC} Python $PYTHON_VERSION found"
echo ""
# Step 2: Get repository path
REPO_PATH=$(pwd)
echo "Step 2: Repository location"
echo "Path: $REPO_PATH"
echo ""
# Step 3: Install dependencies
echo "Step 3: Installing Python dependencies..."
# Check if we're in a virtual environment
if [[ -n "$VIRTUAL_ENV" ]]; then
echo -e "${GREEN}${NC} Virtual environment detected: $VIRTUAL_ENV"
PIP_INSTALL_CMD="pip install"
elif [[ -d "venv" ]]; then
echo -e "${YELLOW}${NC} Virtual environment found but not activated"
echo "Activating venv..."
source venv/bin/activate
PIP_INSTALL_CMD="pip install"
else
echo -e "${YELLOW}${NC} No virtual environment found"
echo "It's recommended to use a virtual environment to avoid conflicts."
echo ""
read -p "Would you like to create one now? (y/n) " -n 1 -r
echo ""
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Creating virtual environment..."
python3 -m venv venv || {
echo -e "${RED}❌ Failed to create virtual environment${NC}"
echo "Falling back to system install..."
PIP_INSTALL_CMD="pip3 install --user --break-system-packages"
}
if [[ -d "venv" ]]; then
source venv/bin/activate
PIP_INSTALL_CMD="pip install"
echo -e "${GREEN}${NC} Virtual environment created and activated"
fi
else
echo "Proceeding with system install (using --user --break-system-packages)..."
echo -e "${YELLOW}Note:${NC} This may override system-managed packages"
PIP_INSTALL_CMD="pip3 install --user --break-system-packages"
fi
fi
echo "This will install: mcp, requests, beautifulsoup4"
read -p "Continue? (y/n) " -n 1 -r
echo ""
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Installing package in editable mode..."
$PIP_INSTALL_CMD -e . || {
echo -e "${RED}❌ Failed to install package${NC}"
exit 1
}
echo -e "${GREEN}${NC} Dependencies installed successfully"
else
echo "Skipping dependency installation"
fi
echo ""
# Step 4: Test MCP server
echo "Step 4: Testing MCP server..."
timeout 3 python3 src/skill_seekers/mcp/server.py 2>/dev/null || {
if [ $? -eq 124 ]; then
echo -e "${GREEN}${NC} MCP server starts correctly (timeout expected)"
else
echo -e "${YELLOW}${NC} MCP server test inconclusive, but may still work"
fi
}
echo ""
# Step 5: Optional - Run tests
echo "Step 5: Run test suite? (optional)"
read -p "Run MCP tests to verify everything works? (y/n) " -n 1 -r
echo ""
if [[ $REPLY =~ ^[Yy]$ ]]; then
# Check if pytest is installed
if ! command -v pytest &> /dev/null; then
echo "Installing pytest..."
$PIP_INSTALL_CMD pytest || {
echo -e "${YELLOW}${NC} Could not install pytest, skipping tests"
}
fi
if command -v pytest &> /dev/null; then
echo "Running MCP server tests..."
python3 -m pytest tests/test_mcp_server.py -v --tb=short || {
echo -e "${RED}❌ Some tests failed${NC}"
echo "The server may still work, but please check the errors above"
}
fi
else
echo "Skipping tests"
fi
echo ""
# Step 6: Configure Claude Code
echo "Step 6: Configure Claude Code"
echo "=================================================="
echo ""
echo "You need to add this configuration to Claude Code:"
echo ""
echo -e "${YELLOW}Configuration file:${NC} ~/.config/claude-code/mcp.json"
echo ""
echo "Add this JSON configuration (paths are auto-detected for YOUR system):"
echo ""
echo -e "${GREEN}{"
echo " \"mcpServers\": {"
echo " \"skill-seeker\": {"
echo " \"command\": \"python3\","
echo " \"args\": ["
echo " \"$REPO_PATH/src/skill_seekers/mcp/server.py\""
echo " ],"
echo " \"cwd\": \"$REPO_PATH\""
echo " }"
echo " }"
echo -e "}${NC}"
echo ""
echo -e "${YELLOW}Note:${NC} The paths above are YOUR actual paths (not placeholders!)"
echo ""
# Ask if user wants auto-configure
echo ""
read -p "Auto-configure Claude Code now? (y/n) " -n 1 -r
echo ""
if [[ $REPLY =~ ^[Yy]$ ]]; then
# Check if config already exists
if [ -f ~/.config/claude-code/mcp.json ]; then
echo -e "${YELLOW}⚠ Warning: ~/.config/claude-code/mcp.json already exists${NC}"
echo "Current contents:"
cat ~/.config/claude-code/mcp.json
echo ""
read -p "Overwrite? (y/n) " -n 1 -r
echo ""
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Skipping auto-configuration"
echo "Please manually add the skill-seeker server to your config"
exit 0
fi
fi
# Create config directory
mkdir -p ~/.config/claude-code
# Write configuration with actual expanded path
cat > ~/.config/claude-code/mcp.json << EOF
{
"mcpServers": {
"skill-seeker": {
"command": "python3",
"args": [
"$REPO_PATH/src/skill_seekers/mcp/server.py"
],
"cwd": "$REPO_PATH"
}
}
}
EOF
echo -e "${GREEN}${NC} Configuration written to ~/.config/claude-code/mcp.json"
echo ""
echo "Configuration contents:"
cat ~/.config/claude-code/mcp.json
echo ""
# Verify the path exists
if [ -f "$REPO_PATH/src/skill_seekers/mcp/server.py" ]; then
echo -e "${GREEN}${NC} Verified: MCP server file exists at $REPO_PATH/src/skill_seekers/mcp/server.py"
else
echo -e "${RED}❌ Warning: MCP server not found at $REPO_PATH/src/skill_seekers/mcp/server.py${NC}"
echo "Please check the path!"
fi
else
echo "Skipping auto-configuration"
echo "Please manually configure Claude Code using the JSON above"
echo ""
echo "IMPORTANT: Replace \$REPO_PATH with the actual path: $REPO_PATH"
fi
echo ""
# Step 7: Test the configuration
if [ -f ~/.config/claude-code/mcp.json ]; then
echo "Step 7: Testing MCP configuration..."
echo "Checking if paths are correct..."
# Extract the configured path
if command -v jq &> /dev/null; then
CONFIGURED_PATH=$(jq -r '.mcpServers["skill-seeker"].args[0]' ~/.config/claude-code/mcp.json 2>/dev/null || echo "")
if [ -n "$CONFIGURED_PATH" ] && [ -f "$CONFIGURED_PATH" ]; then
echo -e "${GREEN}${NC} MCP server path is valid: $CONFIGURED_PATH"
elif [ -n "$CONFIGURED_PATH" ]; then
echo -e "${YELLOW}${NC} Warning: Configured path doesn't exist: $CONFIGURED_PATH"
fi
else
echo "Install 'jq' for config validation: brew install jq (macOS) or apt install jq (Linux)"
fi
fi
echo ""
# Step 8: Final instructions
echo "=================================================="
echo "Setup Complete!"
echo "=================================================="
echo ""
echo "Next steps:"
echo ""
echo " 1. ${YELLOW}Restart Claude Code${NC} (quit and reopen, don't just close window)"
echo " 2. In Claude Code, test with: ${GREEN}\"List all available configs\"${NC}"
echo " 3. You should see 9 Skill Seeker tools available"
echo ""
echo "Available MCP Tools:"
echo " • generate_config - Create new config files"
echo " • estimate_pages - Estimate scraping time"
echo " • scrape_docs - Scrape documentation"
echo " • package_skill - Create .zip files"
echo " • list_configs - Show available configs"
echo " • validate_config - Validate config files"
echo ""
echo "Example commands to try in Claude Code:"
echo "${GREEN}List all available configs${NC}"
echo "${GREEN}Validate configs/react.json${NC}"
echo "${GREEN}Generate config for Tailwind at https://tailwindcss.com/docs${NC}"
echo ""
echo "Documentation:"
echo " • MCP Setup Guide: ${YELLOW}docs/MCP_SETUP.md${NC}"
echo " • Full docs: ${YELLOW}README.md${NC}"
echo ""
echo "Troubleshooting:"
echo " • Check logs: ~/Library/Logs/Claude Code/ (macOS)"
echo " • Test server: python3 src/skill_seekers/mcp/server.py"
echo " • Run tests: python3 -m pytest tests/test_mcp_server.py -v"
echo ""
echo "Happy skill creating! 🚀"

View File

@ -1,22 +0,0 @@
"""
Skill Seekers - Convert documentation, GitHub repos, and PDFs into Claude AI skills.
This package provides tools for automatically scraping, organizing, and packaging
documentation from various sources into uploadable Claude AI skills.
"""
__version__ = "2.0.0"
__author__ = "Yusuf Karaaslan"
__license__ = "MIT"
# Expose main components for easier imports
from skill_seekers.cli import __version__ as cli_version
from skill_seekers.mcp import __version__ as mcp_version
__all__ = [
"__version__",
"__author__",
"__license__",
"cli_version",
"mcp_version",
]

View File

@ -1,39 +0,0 @@
"""Skill Seekers CLI tools package.
This package provides command-line tools for converting documentation
websites into Claude AI skills.
Main modules:
- doc_scraper: Main documentation scraping and skill building tool
- llms_txt_detector: Detect llms.txt files at documentation URLs
- llms_txt_downloader: Download llms.txt content
- llms_txt_parser: Parse llms.txt markdown content
- pdf_scraper: Extract documentation from PDF files
- enhance_skill: AI-powered skill enhancement (API-based)
- enhance_skill_local: AI-powered skill enhancement (local)
- estimate_pages: Estimate page count before scraping
- package_skill: Package skills into .zip files
- upload_skill: Upload skills to Claude
- utils: Shared utility functions
"""
from .llms_txt_detector import LlmsTxtDetector
from .llms_txt_downloader import LlmsTxtDownloader
from .llms_txt_parser import LlmsTxtParser
try:
from .utils import open_folder, read_reference_files
except ImportError:
# utils.py might not exist in all configurations
open_folder = None
read_reference_files = None
__version__ = "2.0.0"
__all__ = [
"LlmsTxtDetector",
"LlmsTxtDownloader",
"LlmsTxtParser",
"open_folder",
"read_reference_files",
]

View File

@ -1,500 +0,0 @@
#!/usr/bin/env python3
"""
Code Analyzer for GitHub Repositories
Extracts code signatures at configurable depth levels:
- surface: File tree only (existing behavior)
- deep: Parse files for signatures, parameters, types
- full: Complete AST analysis (future enhancement)
Supports multiple languages with language-specific parsers.
"""
import ast
import re
import logging
from typing import Dict, List, Any, Optional
from dataclasses import dataclass, asdict
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class Parameter:
"""Represents a function parameter."""
name: str
type_hint: Optional[str] = None
default: Optional[str] = None
@dataclass
class FunctionSignature:
"""Represents a function/method signature."""
name: str
parameters: List[Parameter]
return_type: Optional[str] = None
docstring: Optional[str] = None
line_number: Optional[int] = None
is_async: bool = False
is_method: bool = False
decorators: List[str] = None
def __post_init__(self):
if self.decorators is None:
self.decorators = []
@dataclass
class ClassSignature:
"""Represents a class signature."""
name: str
base_classes: List[str]
methods: List[FunctionSignature]
docstring: Optional[str] = None
line_number: Optional[int] = None
class CodeAnalyzer:
"""
Analyzes code at different depth levels.
"""
def __init__(self, depth: str = 'surface'):
"""
Initialize code analyzer.
Args:
depth: Analysis depth ('surface', 'deep', 'full')
"""
self.depth = depth
def analyze_file(self, file_path: str, content: str, language: str) -> Dict[str, Any]:
"""
Analyze a single file based on depth level.
Args:
file_path: Path to file in repository
content: File content as string
language: Programming language (Python, JavaScript, etc.)
Returns:
Dict containing extracted signatures
"""
if self.depth == 'surface':
return {} # Surface level doesn't analyze individual files
logger.debug(f"Analyzing {file_path} (language: {language}, depth: {self.depth})")
try:
if language == 'Python':
return self._analyze_python(content, file_path)
elif language in ['JavaScript', 'TypeScript']:
return self._analyze_javascript(content, file_path)
elif language in ['C', 'C++']:
return self._analyze_cpp(content, file_path)
else:
logger.debug(f"No analyzer for language: {language}")
return {}
except Exception as e:
logger.warning(f"Error analyzing {file_path}: {e}")
return {}
def _analyze_python(self, content: str, file_path: str) -> Dict[str, Any]:
"""Analyze Python file using AST."""
try:
tree = ast.parse(content)
except SyntaxError as e:
logger.debug(f"Syntax error in {file_path}: {e}")
return {}
classes = []
functions = []
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
class_sig = self._extract_python_class(node)
classes.append(asdict(class_sig))
elif isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef):
# Only top-level functions (not methods)
# Fix AST parser to check isinstance(parent.body, list) before 'in' operator
is_method = False
try:
is_method = any(isinstance(parent, ast.ClassDef)
for parent in ast.walk(tree)
if hasattr(parent, 'body') and isinstance(parent.body, list) and node in parent.body)
except (TypeError, AttributeError):
# If body is not iterable or check fails, assume it's a top-level function
is_method = False
if not is_method:
func_sig = self._extract_python_function(node)
functions.append(asdict(func_sig))
return {
'classes': classes,
'functions': functions
}
def _extract_python_class(self, node: ast.ClassDef) -> ClassSignature:
"""Extract class signature from AST node."""
# Extract base classes
bases = []
for base in node.bases:
if isinstance(base, ast.Name):
bases.append(base.id)
elif isinstance(base, ast.Attribute):
bases.append(f"{base.value.id}.{base.attr}" if hasattr(base.value, 'id') else base.attr)
# Extract methods
methods = []
for item in node.body:
if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
method_sig = self._extract_python_function(item, is_method=True)
methods.append(method_sig)
# Extract docstring
docstring = ast.get_docstring(node)
return ClassSignature(
name=node.name,
base_classes=bases,
methods=methods,
docstring=docstring,
line_number=node.lineno
)
def _extract_python_function(self, node, is_method: bool = False) -> FunctionSignature:
"""Extract function signature from AST node."""
# Extract parameters
params = []
for arg in node.args.args:
param_type = None
if arg.annotation:
param_type = ast.unparse(arg.annotation) if hasattr(ast, 'unparse') else None
params.append(Parameter(
name=arg.arg,
type_hint=param_type
))
# Extract defaults
defaults = node.args.defaults
if defaults:
# Defaults are aligned to the end of params
num_no_default = len(params) - len(defaults)
for i, default in enumerate(defaults):
param_idx = num_no_default + i
if param_idx < len(params):
try:
params[param_idx].default = ast.unparse(default) if hasattr(ast, 'unparse') else str(default)
except:
params[param_idx].default = "..."
# Extract return type
return_type = None
if node.returns:
try:
return_type = ast.unparse(node.returns) if hasattr(ast, 'unparse') else None
except:
pass
# Extract decorators
decorators = []
for decorator in node.decorator_list:
try:
if hasattr(ast, 'unparse'):
decorators.append(ast.unparse(decorator))
elif isinstance(decorator, ast.Name):
decorators.append(decorator.id)
except:
pass
# Extract docstring
docstring = ast.get_docstring(node)
return FunctionSignature(
name=node.name,
parameters=params,
return_type=return_type,
docstring=docstring,
line_number=node.lineno,
is_async=isinstance(node, ast.AsyncFunctionDef),
is_method=is_method,
decorators=decorators
)
def _analyze_javascript(self, content: str, file_path: str) -> Dict[str, Any]:
"""
Analyze JavaScript/TypeScript file using regex patterns.
Note: This is a simplified approach. For production, consider using
a proper JS/TS parser like esprima or ts-morph.
"""
classes = []
functions = []
# Extract class definitions
class_pattern = r'class\s+(\w+)(?:\s+extends\s+(\w+))?\s*\{'
for match in re.finditer(class_pattern, content):
class_name = match.group(1)
base_class = match.group(2) if match.group(2) else None
# Try to extract methods (simplified)
class_block_start = match.end()
# This is a simplification - proper parsing would track braces
class_block_end = content.find('}', class_block_start)
if class_block_end != -1:
class_body = content[class_block_start:class_block_end]
methods = self._extract_js_methods(class_body)
else:
methods = []
classes.append({
'name': class_name,
'base_classes': [base_class] if base_class else [],
'methods': methods,
'docstring': None,
'line_number': content[:match.start()].count('\n') + 1
})
# Extract top-level functions
func_pattern = r'(?:async\s+)?function\s+(\w+)\s*\(([^)]*)\)'
for match in re.finditer(func_pattern, content):
func_name = match.group(1)
params_str = match.group(2)
is_async = 'async' in match.group(0)
params = self._parse_js_parameters(params_str)
functions.append({
'name': func_name,
'parameters': params,
'return_type': None, # JS doesn't have type annotations (unless TS)
'docstring': None,
'line_number': content[:match.start()].count('\n') + 1,
'is_async': is_async,
'is_method': False,
'decorators': []
})
# Extract arrow functions assigned to const/let
arrow_pattern = r'(?:const|let|var)\s+(\w+)\s*=\s*(?:async\s+)?\(([^)]*)\)\s*=>'
for match in re.finditer(arrow_pattern, content):
func_name = match.group(1)
params_str = match.group(2)
is_async = 'async' in match.group(0)
params = self._parse_js_parameters(params_str)
functions.append({
'name': func_name,
'parameters': params,
'return_type': None,
'docstring': None,
'line_number': content[:match.start()].count('\n') + 1,
'is_async': is_async,
'is_method': False,
'decorators': []
})
return {
'classes': classes,
'functions': functions
}
def _extract_js_methods(self, class_body: str) -> List[Dict]:
"""Extract method signatures from class body."""
methods = []
# Match method definitions
method_pattern = r'(?:async\s+)?(\w+)\s*\(([^)]*)\)'
for match in re.finditer(method_pattern, class_body):
method_name = match.group(1)
params_str = match.group(2)
is_async = 'async' in match.group(0)
# Skip constructor keyword detection
if method_name in ['if', 'for', 'while', 'switch']:
continue
params = self._parse_js_parameters(params_str)
methods.append({
'name': method_name,
'parameters': params,
'return_type': None,
'docstring': None,
'line_number': None,
'is_async': is_async,
'is_method': True,
'decorators': []
})
return methods
def _parse_js_parameters(self, params_str: str) -> List[Dict]:
"""Parse JavaScript parameter string."""
params = []
if not params_str.strip():
return params
# Split by comma (simplified - doesn't handle complex default values)
param_list = [p.strip() for p in params_str.split(',')]
for param in param_list:
if not param:
continue
# Check for default value
if '=' in param:
name, default = param.split('=', 1)
name = name.strip()
default = default.strip()
else:
name = param
default = None
# Check for type annotation (TypeScript)
type_hint = None
if ':' in name:
name, type_hint = name.split(':', 1)
name = name.strip()
type_hint = type_hint.strip()
params.append({
'name': name,
'type_hint': type_hint,
'default': default
})
return params
def _analyze_cpp(self, content: str, file_path: str) -> Dict[str, Any]:
"""
Analyze C/C++ header file using regex patterns.
Note: This is a simplified approach focusing on header files.
For production, consider using libclang or similar.
"""
classes = []
functions = []
# Extract class definitions (simplified - doesn't handle nested classes)
class_pattern = r'class\s+(\w+)(?:\s*:\s*public\s+(\w+))?\s*\{'
for match in re.finditer(class_pattern, content):
class_name = match.group(1)
base_class = match.group(2) if match.group(2) else None
classes.append({
'name': class_name,
'base_classes': [base_class] if base_class else [],
'methods': [], # Simplified - would need to parse class body
'docstring': None,
'line_number': content[:match.start()].count('\n') + 1
})
# Extract function declarations
func_pattern = r'(\w+(?:\s*\*|\s*&)?)\s+(\w+)\s*\(([^)]*)\)'
for match in re.finditer(func_pattern, content):
return_type = match.group(1).strip()
func_name = match.group(2)
params_str = match.group(3)
# Skip common keywords
if func_name in ['if', 'for', 'while', 'switch', 'return']:
continue
params = self._parse_cpp_parameters(params_str)
functions.append({
'name': func_name,
'parameters': params,
'return_type': return_type,
'docstring': None,
'line_number': content[:match.start()].count('\n') + 1,
'is_async': False,
'is_method': False,
'decorators': []
})
return {
'classes': classes,
'functions': functions
}
def _parse_cpp_parameters(self, params_str: str) -> List[Dict]:
"""Parse C++ parameter string."""
params = []
if not params_str.strip() or params_str.strip() == 'void':
return params
# Split by comma (simplified)
param_list = [p.strip() for p in params_str.split(',')]
for param in param_list:
if not param:
continue
# Check for default value
default = None
if '=' in param:
param, default = param.rsplit('=', 1)
param = param.strip()
default = default.strip()
# Extract type and name (simplified)
# Format: "type name" or "type* name" or "type& name"
parts = param.split()
if len(parts) >= 2:
param_type = ' '.join(parts[:-1])
param_name = parts[-1]
else:
param_type = param
param_name = "unknown"
params.append({
'name': param_name,
'type_hint': param_type,
'default': default
})
return params
if __name__ == '__main__':
# Test the analyzer
python_code = '''
class Node2D:
"""Base class for 2D nodes."""
def move_local_x(self, delta: float, snap: bool = False) -> None:
"""Move node along local X axis."""
pass
async def tween_position(self, target: tuple, duration: float = 1.0):
"""Animate position to target."""
pass
def create_sprite(texture: str) -> Node2D:
"""Create a new sprite node."""
return Node2D()
'''
analyzer = CodeAnalyzer(depth='deep')
result = analyzer.analyze_file('test.py', python_code, 'Python')
print("Analysis Result:")
print(f"Classes: {len(result.get('classes', []))}")
print(f"Functions: {len(result.get('functions', []))}")
if result.get('classes'):
cls = result['classes'][0]
print(f"\nClass: {cls['name']}")
print(f" Methods: {len(cls['methods'])}")
for method in cls['methods']:
params = ', '.join([f"{p['name']}: {p['type_hint']}" + (f" = {p['default']}" if p.get('default') else "")
for p in method['parameters']])
print(f" {method['name']}({params}) -> {method['return_type']}")

View File

@ -1,376 +0,0 @@
#!/usr/bin/env python3
"""
Unified Config Validator
Validates unified config format that supports multiple sources:
- documentation (website scraping)
- github (repository scraping)
- pdf (PDF document scraping)
Also provides backward compatibility detection for legacy configs.
"""
import json
import logging
from typing import Dict, Any, List, Optional, Union
from pathlib import Path
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ConfigValidator:
"""
Validates unified config format and provides backward compatibility.
"""
# Valid source types
VALID_SOURCE_TYPES = {'documentation', 'github', 'pdf'}
# Valid merge modes
VALID_MERGE_MODES = {'rule-based', 'claude-enhanced'}
# Valid code analysis depth levels
VALID_DEPTH_LEVELS = {'surface', 'deep', 'full'}
def __init__(self, config_or_path: Union[Dict[str, Any], str]):
"""
Initialize validator with config dict or file path.
Args:
config_or_path: Either a config dict or path to config JSON file
"""
if isinstance(config_or_path, dict):
self.config_path = None
self.config = config_or_path
else:
self.config_path = config_or_path
self.config = self._load_config()
self.is_unified = self._detect_format()
def _load_config(self) -> Dict[str, Any]:
"""Load JSON config file."""
try:
with open(self.config_path, 'r', encoding='utf-8') as f:
return json.load(f)
except FileNotFoundError:
raise ValueError(f"Config file not found: {self.config_path}")
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON in config file: {e}")
def _detect_format(self) -> bool:
"""
Detect if config is unified format or legacy.
Returns:
True if unified format (has 'sources' array)
False if legacy format
"""
return 'sources' in self.config and isinstance(self.config['sources'], list)
def validate(self) -> bool:
"""
Validate config based on detected format.
Returns:
True if valid
Raises:
ValueError if invalid with detailed error message
"""
if self.is_unified:
return self._validate_unified()
else:
return self._validate_legacy()
def _validate_unified(self) -> bool:
"""Validate unified config format."""
logger.info("Validating unified config format...")
# Required top-level fields
if 'name' not in self.config:
raise ValueError("Missing required field: 'name'")
if 'description' not in self.config:
raise ValueError("Missing required field: 'description'")
if 'sources' not in self.config:
raise ValueError("Missing required field: 'sources'")
# Validate sources array
sources = self.config['sources']
if not isinstance(sources, list):
raise ValueError("'sources' must be an array")
if len(sources) == 0:
raise ValueError("'sources' array cannot be empty")
# Validate merge_mode (optional)
merge_mode = self.config.get('merge_mode', 'rule-based')
if merge_mode not in self.VALID_MERGE_MODES:
raise ValueError(f"Invalid merge_mode: '{merge_mode}'. Must be one of {self.VALID_MERGE_MODES}")
# Validate each source
for i, source in enumerate(sources):
self._validate_source(source, i)
logger.info(f"✅ Unified config valid: {len(sources)} sources")
return True
def _validate_source(self, source: Dict[str, Any], index: int):
"""Validate individual source configuration."""
# Check source has 'type' field
if 'type' not in source:
raise ValueError(f"Source {index}: Missing required field 'type'")
source_type = source['type']
if source_type not in self.VALID_SOURCE_TYPES:
raise ValueError(
f"Source {index}: Invalid type '{source_type}'. "
f"Must be one of {self.VALID_SOURCE_TYPES}"
)
# Type-specific validation
if source_type == 'documentation':
self._validate_documentation_source(source, index)
elif source_type == 'github':
self._validate_github_source(source, index)
elif source_type == 'pdf':
self._validate_pdf_source(source, index)
def _validate_documentation_source(self, source: Dict[str, Any], index: int):
"""Validate documentation source configuration."""
if 'base_url' not in source:
raise ValueError(f"Source {index} (documentation): Missing required field 'base_url'")
# Optional but recommended fields
if 'selectors' not in source:
logger.warning(f"Source {index} (documentation): No 'selectors' specified, using defaults")
if 'max_pages' in source and not isinstance(source['max_pages'], int):
raise ValueError(f"Source {index} (documentation): 'max_pages' must be an integer")
def _validate_github_source(self, source: Dict[str, Any], index: int):
"""Validate GitHub source configuration."""
if 'repo' not in source:
raise ValueError(f"Source {index} (github): Missing required field 'repo'")
# Validate repo format (owner/repo)
repo = source['repo']
if '/' not in repo:
raise ValueError(
f"Source {index} (github): Invalid repo format '{repo}'. "
f"Must be 'owner/repo' (e.g., 'facebook/react')"
)
# Validate code_analysis_depth if specified
if 'code_analysis_depth' in source:
depth = source['code_analysis_depth']
if depth not in self.VALID_DEPTH_LEVELS:
raise ValueError(
f"Source {index} (github): Invalid code_analysis_depth '{depth}'. "
f"Must be one of {self.VALID_DEPTH_LEVELS}"
)
# Validate max_issues if specified
if 'max_issues' in source and not isinstance(source['max_issues'], int):
raise ValueError(f"Source {index} (github): 'max_issues' must be an integer")
def _validate_pdf_source(self, source: Dict[str, Any], index: int):
"""Validate PDF source configuration."""
if 'path' not in source:
raise ValueError(f"Source {index} (pdf): Missing required field 'path'")
# Check if file exists
pdf_path = source['path']
if not Path(pdf_path).exists():
logger.warning(f"Source {index} (pdf): File not found: {pdf_path}")
def _validate_legacy(self) -> bool:
"""
Validate legacy config format (backward compatibility).
Legacy configs are the old format used by doc_scraper, github_scraper, pdf_scraper.
"""
logger.info("Detected legacy config format (backward compatible)")
# Detect which legacy type based on fields
if 'base_url' in self.config:
logger.info("Legacy type: documentation")
elif 'repo' in self.config:
logger.info("Legacy type: github")
elif 'pdf' in self.config or 'path' in self.config:
logger.info("Legacy type: pdf")
else:
raise ValueError("Cannot detect legacy config type (missing base_url, repo, or pdf)")
return True
def convert_legacy_to_unified(self) -> Dict[str, Any]:
"""
Convert legacy config to unified format.
Returns:
Unified config dict
"""
if self.is_unified:
logger.info("Config already in unified format")
return self.config
logger.info("Converting legacy config to unified format...")
# Detect legacy type and convert
if 'base_url' in self.config:
return self._convert_legacy_documentation()
elif 'repo' in self.config:
return self._convert_legacy_github()
elif 'pdf' in self.config or 'path' in self.config:
return self._convert_legacy_pdf()
else:
raise ValueError("Cannot convert: unknown legacy format")
def _convert_legacy_documentation(self) -> Dict[str, Any]:
"""Convert legacy documentation config to unified."""
unified = {
'name': self.config.get('name', 'unnamed'),
'description': self.config.get('description', 'Documentation skill'),
'merge_mode': 'rule-based',
'sources': [
{
'type': 'documentation',
**{k: v for k, v in self.config.items()
if k not in ['name', 'description']}
}
]
}
return unified
def _convert_legacy_github(self) -> Dict[str, Any]:
"""Convert legacy GitHub config to unified."""
unified = {
'name': self.config.get('name', 'unnamed'),
'description': self.config.get('description', 'GitHub repository skill'),
'merge_mode': 'rule-based',
'sources': [
{
'type': 'github',
**{k: v for k, v in self.config.items()
if k not in ['name', 'description']}
}
]
}
return unified
def _convert_legacy_pdf(self) -> Dict[str, Any]:
"""Convert legacy PDF config to unified."""
unified = {
'name': self.config.get('name', 'unnamed'),
'description': self.config.get('description', 'PDF document skill'),
'merge_mode': 'rule-based',
'sources': [
{
'type': 'pdf',
**{k: v for k, v in self.config.items()
if k not in ['name', 'description']}
}
]
}
return unified
def get_sources_by_type(self, source_type: str) -> List[Dict[str, Any]]:
"""
Get all sources of a specific type.
Args:
source_type: 'documentation', 'github', or 'pdf'
Returns:
List of sources matching the type
"""
if not self.is_unified:
# For legacy, convert and get sources
unified = self.convert_legacy_to_unified()
sources = unified['sources']
else:
sources = self.config['sources']
return [s for s in sources if s.get('type') == source_type]
def has_multiple_sources(self) -> bool:
"""Check if config has multiple sources (requires merging)."""
if not self.is_unified:
return False
return len(self.config['sources']) > 1
def needs_api_merge(self) -> bool:
"""
Check if config needs API merging.
Returns True if both documentation and github sources exist
with API extraction enabled.
"""
if not self.has_multiple_sources():
return False
has_docs_api = any(
s.get('type') == 'documentation' and s.get('extract_api', True)
for s in self.config['sources']
)
has_github_code = any(
s.get('type') == 'github' and s.get('include_code', False)
for s in self.config['sources']
)
return has_docs_api and has_github_code
def validate_config(config_path: str) -> ConfigValidator:
"""
Validate config file and return validator instance.
Args:
config_path: Path to config JSON file
Returns:
ConfigValidator instance
Raises:
ValueError if config is invalid
"""
validator = ConfigValidator(config_path)
validator.validate()
return validator
if __name__ == '__main__':
import sys
if len(sys.argv) < 2:
print("Usage: python config_validator.py <config.json>")
sys.exit(1)
config_file = sys.argv[1]
try:
validator = validate_config(config_file)
print(f"\n✅ Config valid!")
print(f" Format: {'Unified' if validator.is_unified else 'Legacy'}")
print(f" Name: {validator.config.get('name')}")
if validator.is_unified:
sources = validator.config['sources']
print(f" Sources: {len(sources)}")
for i, source in enumerate(sources):
print(f" {i+1}. {source['type']}")
if validator.needs_api_merge():
merge_mode = validator.config.get('merge_mode', 'rule-based')
print(f" ⚠️ API merge required (mode: {merge_mode})")
except ValueError as e:
print(f"\n❌ Config invalid: {e}")
sys.exit(1)

View File

@ -1,513 +0,0 @@
#!/usr/bin/env python3
"""
Conflict Detector for Multi-Source Skills
Detects conflicts between documentation and code:
- missing_in_docs: API exists in code but not documented
- missing_in_code: API documented but doesn't exist in code
- signature_mismatch: Different parameters/types between docs and code
- description_mismatch: Docs say one thing, code comments say another
Used by unified scraper to identify discrepancies before merging.
"""
import json
import logging
from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, asdict
from difflib import SequenceMatcher
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class Conflict:
"""Represents a conflict between documentation and code."""
type: str # 'missing_in_docs', 'missing_in_code', 'signature_mismatch', 'description_mismatch'
severity: str # 'low', 'medium', 'high'
api_name: str
docs_info: Optional[Dict[str, Any]] = None
code_info: Optional[Dict[str, Any]] = None
difference: Optional[str] = None
suggestion: Optional[str] = None
class ConflictDetector:
"""
Detects conflicts between documentation and code sources.
"""
def __init__(self, docs_data: Dict[str, Any], github_data: Dict[str, Any]):
"""
Initialize conflict detector.
Args:
docs_data: Data from documentation scraper
github_data: Data from GitHub scraper with code analysis
"""
self.docs_data = docs_data
self.github_data = github_data
# Extract API information from both sources
self.docs_apis = self._extract_docs_apis()
self.code_apis = self._extract_code_apis()
logger.info(f"Loaded {len(self.docs_apis)} APIs from documentation")
logger.info(f"Loaded {len(self.code_apis)} APIs from code")
def _extract_docs_apis(self) -> Dict[str, Dict[str, Any]]:
"""
Extract API information from documentation data.
Returns:
Dict mapping API name to API info
"""
apis = {}
# Documentation structure varies, but typically has 'pages' or 'references'
pages = self.docs_data.get('pages', {})
# Handle both dict and list formats
if isinstance(pages, dict):
# Format: {url: page_data, ...}
for url, page_data in pages.items():
content = page_data.get('content', '')
title = page_data.get('title', '')
# Simple heuristic: if title or URL contains "api", "reference", "class", "function"
# it might be an API page
if any(keyword in title.lower() or keyword in url.lower()
for keyword in ['api', 'reference', 'class', 'function', 'method']):
# Extract API signatures from content (simplified)
extracted_apis = self._parse_doc_content_for_apis(content, url)
apis.update(extracted_apis)
elif isinstance(pages, list):
# Format: [{url: '...', apis: [...]}, ...]
for page in pages:
url = page.get('url', '')
page_apis = page.get('apis', [])
# If APIs are already extracted in the page data
for api in page_apis:
api_name = api.get('name', '')
if api_name:
apis[api_name] = {
'parameters': api.get('parameters', []),
'return_type': api.get('return_type', 'Any'),
'source_url': url
}
return apis
def _parse_doc_content_for_apis(self, content: str, source_url: str) -> Dict[str, Dict]:
"""
Parse documentation content to extract API signatures.
This is a simplified approach - real implementation would need
to understand the documentation format (Sphinx, JSDoc, etc.)
"""
apis = {}
# Look for function/method signatures in code blocks
# Common patterns:
# - function_name(param1, param2)
# - ClassName.method_name(param1, param2)
# - def function_name(param1: type, param2: type) -> return_type
import re
# Pattern for common API signatures
patterns = [
# Python style: def name(params) -> return
r'def\s+(\w+)\s*\(([^)]*)\)(?:\s*->\s*(\w+))?',
# JavaScript style: function name(params)
r'function\s+(\w+)\s*\(([^)]*)\)',
# C++ style: return_type name(params)
r'(\w+)\s+(\w+)\s*\(([^)]*)\)',
# Method style: ClassName.method_name(params)
r'(\w+)\.(\w+)\s*\(([^)]*)\)'
]
for pattern in patterns:
for match in re.finditer(pattern, content):
groups = match.groups()
# Parse based on pattern matched
if 'def' in pattern:
# Python function
name = groups[0]
params_str = groups[1]
return_type = groups[2] if len(groups) > 2 else None
elif 'function' in pattern:
# JavaScript function
name = groups[0]
params_str = groups[1]
return_type = None
elif '.' in pattern:
# Class method
class_name = groups[0]
method_name = groups[1]
name = f"{class_name}.{method_name}"
params_str = groups[2] if len(groups) > 2 else groups[1]
return_type = None
else:
# C++ function
return_type = groups[0]
name = groups[1]
params_str = groups[2]
# Parse parameters
params = self._parse_param_string(params_str)
apis[name] = {
'name': name,
'parameters': params,
'return_type': return_type,
'source': source_url,
'raw_signature': match.group(0)
}
return apis
def _parse_param_string(self, params_str: str) -> List[Dict]:
"""Parse parameter string into list of parameter dicts."""
if not params_str.strip():
return []
params = []
for param in params_str.split(','):
param = param.strip()
if not param:
continue
# Try to extract name and type
param_info = {'name': param, 'type': None, 'default': None}
# Check for type annotation (: type)
if ':' in param:
parts = param.split(':', 1)
param_info['name'] = parts[0].strip()
type_part = parts[1].strip()
# Check for default value (= value)
if '=' in type_part:
type_str, default_str = type_part.split('=', 1)
param_info['type'] = type_str.strip()
param_info['default'] = default_str.strip()
else:
param_info['type'] = type_part
# Check for default without type (= value)
elif '=' in param:
parts = param.split('=', 1)
param_info['name'] = parts[0].strip()
param_info['default'] = parts[1].strip()
params.append(param_info)
return params
def _extract_code_apis(self) -> Dict[str, Dict[str, Any]]:
"""
Extract API information from GitHub code analysis.
Returns:
Dict mapping API name to API info
"""
apis = {}
code_analysis = self.github_data.get('code_analysis', {})
if not code_analysis:
return apis
# Support both 'files' and 'analyzed_files' keys
files = code_analysis.get('files', code_analysis.get('analyzed_files', []))
for file_info in files:
file_path = file_info.get('file', 'unknown')
# Extract classes and their methods
for class_info in file_info.get('classes', []):
class_name = class_info['name']
# Add class itself
apis[class_name] = {
'name': class_name,
'type': 'class',
'source': file_path,
'line': class_info.get('line_number'),
'base_classes': class_info.get('base_classes', []),
'docstring': class_info.get('docstring')
}
# Add methods
for method in class_info.get('methods', []):
method_name = f"{class_name}.{method['name']}"
apis[method_name] = {
'name': method_name,
'type': 'method',
'parameters': method.get('parameters', []),
'return_type': method.get('return_type'),
'source': file_path,
'line': method.get('line_number'),
'docstring': method.get('docstring'),
'is_async': method.get('is_async', False)
}
# Extract standalone functions
for func_info in file_info.get('functions', []):
func_name = func_info['name']
apis[func_name] = {
'name': func_name,
'type': 'function',
'parameters': func_info.get('parameters', []),
'return_type': func_info.get('return_type'),
'source': file_path,
'line': func_info.get('line_number'),
'docstring': func_info.get('docstring'),
'is_async': func_info.get('is_async', False)
}
return apis
def detect_all_conflicts(self) -> List[Conflict]:
"""
Detect all types of conflicts.
Returns:
List of Conflict objects
"""
logger.info("Detecting conflicts between documentation and code...")
conflicts = []
# 1. Find APIs missing in documentation
conflicts.extend(self._find_missing_in_docs())
# 2. Find APIs missing in code
conflicts.extend(self._find_missing_in_code())
# 3. Find signature mismatches
conflicts.extend(self._find_signature_mismatches())
logger.info(f"Found {len(conflicts)} conflicts total")
return conflicts
def _find_missing_in_docs(self) -> List[Conflict]:
"""Find APIs that exist in code but not in documentation."""
conflicts = []
for api_name, code_info in self.code_apis.items():
# Simple name matching (can be enhanced with fuzzy matching)
if api_name not in self.docs_apis:
# Check if it's a private/internal API (often not documented)
is_private = api_name.startswith('_') or '__' in api_name
severity = 'low' if is_private else 'medium'
conflicts.append(Conflict(
type='missing_in_docs',
severity=severity,
api_name=api_name,
code_info=code_info,
difference=f"API exists in code ({code_info['source']}) but not found in documentation",
suggestion="Add documentation for this API" if not is_private else "Consider if this internal API should be documented"
))
logger.info(f"Found {len(conflicts)} APIs missing in documentation")
return conflicts
def _find_missing_in_code(self) -> List[Conflict]:
"""Find APIs that are documented but don't exist in code."""
conflicts = []
for api_name, docs_info in self.docs_apis.items():
if api_name not in self.code_apis:
conflicts.append(Conflict(
type='missing_in_code',
severity='high', # This is serious - documented but doesn't exist
api_name=api_name,
docs_info=docs_info,
difference=f"API documented ({docs_info.get('source', 'unknown')}) but not found in code",
suggestion="Update documentation to remove this API, or add it to codebase"
))
logger.info(f"Found {len(conflicts)} APIs missing in code")
return conflicts
def _find_signature_mismatches(self) -> List[Conflict]:
"""Find APIs where signature differs between docs and code."""
conflicts = []
# Find APIs that exist in both
common_apis = set(self.docs_apis.keys()) & set(self.code_apis.keys())
for api_name in common_apis:
docs_info = self.docs_apis[api_name]
code_info = self.code_apis[api_name]
# Compare signatures
mismatch = self._compare_signatures(docs_info, code_info)
if mismatch:
conflicts.append(Conflict(
type='signature_mismatch',
severity=mismatch['severity'],
api_name=api_name,
docs_info=docs_info,
code_info=code_info,
difference=mismatch['difference'],
suggestion=mismatch['suggestion']
))
logger.info(f"Found {len(conflicts)} signature mismatches")
return conflicts
def _compare_signatures(self, docs_info: Dict, code_info: Dict) -> Optional[Dict]:
"""
Compare signatures between docs and code.
Returns:
Dict with mismatch details if conflict found, None otherwise
"""
docs_params = docs_info.get('parameters', [])
code_params = code_info.get('parameters', [])
# Compare parameter counts
if len(docs_params) != len(code_params):
return {
'severity': 'medium',
'difference': f"Parameter count mismatch: docs has {len(docs_params)}, code has {len(code_params)}",
'suggestion': f"Documentation shows {len(docs_params)} parameters, but code has {len(code_params)}"
}
# Compare parameter names and types
for i, (doc_param, code_param) in enumerate(zip(docs_params, code_params)):
doc_name = doc_param.get('name', '')
code_name = code_param.get('name', '')
# Parameter name mismatch
if doc_name != code_name:
# Use fuzzy matching for slight variations
similarity = SequenceMatcher(None, doc_name, code_name).ratio()
if similarity < 0.8: # Not similar enough
return {
'severity': 'medium',
'difference': f"Parameter {i+1} name mismatch: '{doc_name}' in docs vs '{code_name}' in code",
'suggestion': f"Update documentation to use parameter name '{code_name}'"
}
# Type mismatch
doc_type = doc_param.get('type')
code_type = code_param.get('type_hint')
if doc_type and code_type and doc_type != code_type:
return {
'severity': 'low',
'difference': f"Parameter '{doc_name}' type mismatch: '{doc_type}' in docs vs '{code_type}' in code",
'suggestion': f"Verify correct type for parameter '{doc_name}'"
}
# Compare return types if both have them
docs_return = docs_info.get('return_type')
code_return = code_info.get('return_type')
if docs_return and code_return and docs_return != code_return:
return {
'severity': 'low',
'difference': f"Return type mismatch: '{docs_return}' in docs vs '{code_return}' in code",
'suggestion': "Verify correct return type"
}
return None
def generate_summary(self, conflicts: List[Conflict]) -> Dict[str, Any]:
"""
Generate summary statistics for conflicts.
Args:
conflicts: List of Conflict objects
Returns:
Summary dict with statistics
"""
summary = {
'total': len(conflicts),
'by_type': {},
'by_severity': {},
'apis_affected': len(set(c.api_name for c in conflicts))
}
# Count by type
for conflict_type in ['missing_in_docs', 'missing_in_code', 'signature_mismatch', 'description_mismatch']:
count = sum(1 for c in conflicts if c.type == conflict_type)
summary['by_type'][conflict_type] = count
# Count by severity
for severity in ['low', 'medium', 'high']:
count = sum(1 for c in conflicts if c.severity == severity)
summary['by_severity'][severity] = count
return summary
def save_conflicts(self, conflicts: List[Conflict], output_path: str):
"""
Save conflicts to JSON file.
Args:
conflicts: List of Conflict objects
output_path: Path to output JSON file
"""
data = {
'conflicts': [asdict(c) for c in conflicts],
'summary': self.generate_summary(conflicts)
}
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Conflicts saved to: {output_path}")
if __name__ == '__main__':
import sys
if len(sys.argv) < 3:
print("Usage: python conflict_detector.py <docs_data.json> <github_data.json>")
sys.exit(1)
docs_file = sys.argv[1]
github_file = sys.argv[2]
# Load data
with open(docs_file, 'r') as f:
docs_data = json.load(f)
with open(github_file, 'r') as f:
github_data = json.load(f)
# Detect conflicts
detector = ConflictDetector(docs_data, github_data)
conflicts = detector.detect_all_conflicts()
# Print summary
summary = detector.generate_summary(conflicts)
print("\n📊 Conflict Summary:")
print(f" Total conflicts: {summary['total']}")
print(f" APIs affected: {summary['apis_affected']}")
print("\n By Type:")
for conflict_type, count in summary['by_type'].items():
if count > 0:
print(f" {conflict_type}: {count}")
print("\n By Severity:")
for severity, count in summary['by_severity'].items():
if count > 0:
emoji = '🔴' if severity == 'high' else '🟡' if severity == 'medium' else '🟢'
print(f" {emoji} {severity}: {count}")
# Save to file
output_file = 'conflicts.json'
detector.save_conflicts(conflicts, output_file)
print(f"\n✅ Full report saved to: {output_file}")

View File

@ -1,72 +0,0 @@
"""Configuration constants for Skill Seekers CLI.
This module centralizes all magic numbers and configuration values used
across the CLI tools to improve maintainability and clarity.
"""
# ===== SCRAPING CONFIGURATION =====
# Default scraping limits
DEFAULT_RATE_LIMIT = 0.5 # seconds between requests
DEFAULT_MAX_PAGES = 500 # maximum pages to scrape
DEFAULT_CHECKPOINT_INTERVAL = 1000 # pages between checkpoints
DEFAULT_ASYNC_MODE = False # use async mode for parallel scraping (opt-in)
# Content analysis limits
CONTENT_PREVIEW_LENGTH = 500 # characters to check for categorization
MAX_PAGES_WARNING_THRESHOLD = 10000 # warn if config exceeds this
# Quality thresholds
MIN_CATEGORIZATION_SCORE = 2 # minimum score for category assignment
URL_MATCH_POINTS = 3 # points for URL keyword match
TITLE_MATCH_POINTS = 2 # points for title keyword match
CONTENT_MATCH_POINTS = 1 # points for content keyword match
# ===== ENHANCEMENT CONFIGURATION =====
# API-based enhancement limits (uses Anthropic API)
API_CONTENT_LIMIT = 100000 # max characters for API enhancement
API_PREVIEW_LIMIT = 40000 # max characters for preview
# Local enhancement limits (uses Claude Code Max)
LOCAL_CONTENT_LIMIT = 50000 # max characters for local enhancement
LOCAL_PREVIEW_LIMIT = 20000 # max characters for preview
# ===== PAGE ESTIMATION =====
# Estimation and discovery settings
DEFAULT_MAX_DISCOVERY = 1000 # default max pages to discover
DISCOVERY_THRESHOLD = 10000 # threshold for warnings
# ===== FILE LIMITS =====
# Output and processing limits
MAX_REFERENCE_FILES = 100 # maximum reference files per skill
MAX_CODE_BLOCKS_PER_PAGE = 5 # maximum code blocks to extract per page
# ===== EXPORT CONSTANTS =====
__all__ = [
# Scraping
'DEFAULT_RATE_LIMIT',
'DEFAULT_MAX_PAGES',
'DEFAULT_CHECKPOINT_INTERVAL',
'DEFAULT_ASYNC_MODE',
'CONTENT_PREVIEW_LENGTH',
'MAX_PAGES_WARNING_THRESHOLD',
'MIN_CATEGORIZATION_SCORE',
'URL_MATCH_POINTS',
'TITLE_MATCH_POINTS',
'CONTENT_MATCH_POINTS',
# Enhancement
'API_CONTENT_LIMIT',
'API_PREVIEW_LIMIT',
'LOCAL_CONTENT_LIMIT',
'LOCAL_PREVIEW_LIMIT',
# Estimation
'DEFAULT_MAX_DISCOVERY',
'DISCOVERY_THRESHOLD',
# Limits
'MAX_REFERENCE_FILES',
'MAX_CODE_BLOCKS_PER_PAGE',
]

View File

@ -1,273 +0,0 @@
#!/usr/bin/env python3
"""
SKILL.md Enhancement Script
Uses Claude API to improve SKILL.md by analyzing reference documentation.
Usage:
skill-seekers enhance output/steam-inventory/
skill-seekers enhance output/react/
skill-seekers enhance output/godot/ --api-key YOUR_API_KEY
"""
import os
import sys
import json
import argparse
from pathlib import Path
# Add parent directory to path for imports when run as script
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from skill_seekers.cli.constants import API_CONTENT_LIMIT, API_PREVIEW_LIMIT
from skill_seekers.cli.utils import read_reference_files
try:
import anthropic
except ImportError:
print("❌ Error: anthropic package not installed")
print("Install with: pip3 install anthropic")
sys.exit(1)
class SkillEnhancer:
def __init__(self, skill_dir, api_key=None):
self.skill_dir = Path(skill_dir)
self.references_dir = self.skill_dir / "references"
self.skill_md_path = self.skill_dir / "SKILL.md"
# Get API key
self.api_key = api_key or os.environ.get('ANTHROPIC_API_KEY')
if not self.api_key:
raise ValueError(
"No API key provided. Set ANTHROPIC_API_KEY environment variable "
"or use --api-key argument"
)
self.client = anthropic.Anthropic(api_key=self.api_key)
def read_current_skill_md(self):
"""Read existing SKILL.md"""
if not self.skill_md_path.exists():
return None
return self.skill_md_path.read_text(encoding='utf-8')
def enhance_skill_md(self, references, current_skill_md):
"""Use Claude to enhance SKILL.md"""
# Build prompt
prompt = self._build_enhancement_prompt(references, current_skill_md)
print("\n🤖 Asking Claude to enhance SKILL.md...")
print(f" Input: {len(prompt):,} characters")
try:
message = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
temperature=0.3,
messages=[{
"role": "user",
"content": prompt
}]
)
enhanced_content = message.content[0].text
return enhanced_content
except Exception as e:
print(f"❌ Error calling Claude API: {e}")
return None
def _build_enhancement_prompt(self, references, current_skill_md):
"""Build the prompt for Claude"""
# Extract skill name and description
skill_name = self.skill_dir.name
prompt = f"""You are enhancing a Claude skill's SKILL.md file. This skill is about: {skill_name}
I've scraped documentation and organized it into reference files. Your job is to create an EXCELLENT SKILL.md that will help Claude use this documentation effectively.
CURRENT SKILL.MD:
{'```markdown' if current_skill_md else '(none - create from scratch)'}
{current_skill_md or 'No existing SKILL.md'}
{'```' if current_skill_md else ''}
REFERENCE DOCUMENTATION:
"""
for filename, content in references.items():
prompt += f"\n\n## {filename}\n```markdown\n{content[:30000]}\n```\n"
prompt += """
YOUR TASK:
Create an enhanced SKILL.md that includes:
1. **Clear "When to Use This Skill" section** - Be specific about trigger conditions
2. **Excellent Quick Reference section** - Extract 5-10 of the BEST, most practical code examples from the reference docs
- Choose SHORT, clear examples that demonstrate common tasks
- Include both simple and intermediate examples
- Annotate examples with clear descriptions
- Use proper language tags (cpp, python, javascript, json, etc.)
3. **Detailed Reference Files description** - Explain what's in each reference file
4. **Practical "Working with This Skill" section** - Give users clear guidance on how to navigate the skill
5. **Key Concepts section** (if applicable) - Explain core concepts
6. **Keep the frontmatter** (---\nname: ...\n---) intact
IMPORTANT:
- Extract REAL examples from the reference docs, don't make them up
- Prioritize SHORT, clear examples (5-20 lines max)
- Make it actionable and practical
- Don't be too verbose - be concise but useful
- Maintain the markdown structure for Claude skills
- Keep code examples properly formatted with language tags
OUTPUT:
Return ONLY the complete SKILL.md content, starting with the frontmatter (---).
"""
return prompt
def save_enhanced_skill_md(self, content):
"""Save the enhanced SKILL.md"""
# Backup original
if self.skill_md_path.exists():
backup_path = self.skill_md_path.with_suffix('.md.backup')
self.skill_md_path.rename(backup_path)
print(f" 💾 Backed up original to: {backup_path.name}")
# Save enhanced version
self.skill_md_path.write_text(content, encoding='utf-8')
print(f" ✅ Saved enhanced SKILL.md")
def run(self):
"""Main enhancement workflow"""
print(f"\n{'='*60}")
print(f"ENHANCING SKILL: {self.skill_dir.name}")
print(f"{'='*60}\n")
# Read reference files
print("📖 Reading reference documentation...")
references = read_reference_files(
self.skill_dir,
max_chars=API_CONTENT_LIMIT,
preview_limit=API_PREVIEW_LIMIT
)
if not references:
print("❌ No reference files found to analyze")
return False
print(f" ✓ Read {len(references)} reference files")
total_size = sum(len(c) for c in references.values())
print(f" ✓ Total size: {total_size:,} characters\n")
# Read current SKILL.md
current_skill_md = self.read_current_skill_md()
if current_skill_md:
print(f" Found existing SKILL.md ({len(current_skill_md)} chars)")
else:
print(f" No existing SKILL.md, will create new one")
# Enhance with Claude
enhanced = self.enhance_skill_md(references, current_skill_md)
if not enhanced:
print("❌ Enhancement failed")
return False
print(f" ✓ Generated enhanced SKILL.md ({len(enhanced)} chars)\n")
# Save
print("💾 Saving enhanced SKILL.md...")
self.save_enhanced_skill_md(enhanced)
print(f"\n✅ Enhancement complete!")
print(f"\nNext steps:")
print(f" 1. Review: {self.skill_md_path}")
print(f" 2. If you don't like it, restore backup: {self.skill_md_path.with_suffix('.md.backup')}")
print(f" 3. Package your skill:")
print(f" skill-seekers package {self.skill_dir}/")
return True
def main():
parser = argparse.ArgumentParser(
description='Enhance SKILL.md using Claude API',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Using ANTHROPIC_API_KEY environment variable
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers enhance output/steam-inventory/
# Providing API key directly
skill-seekers enhance output/react/ --api-key sk-ant-...
# Show what would be done (dry run)
skill-seekers enhance output/godot/ --dry-run
"""
)
parser.add_argument('skill_dir', type=str,
help='Path to skill directory (e.g., output/steam-inventory/)')
parser.add_argument('--api-key', type=str,
help='Anthropic API key (or set ANTHROPIC_API_KEY env var)')
parser.add_argument('--dry-run', action='store_true',
help='Show what would be done without calling API')
args = parser.parse_args()
# Validate skill directory
skill_dir = Path(args.skill_dir)
if not skill_dir.exists():
print(f"❌ Error: Directory not found: {skill_dir}")
sys.exit(1)
if not skill_dir.is_dir():
print(f"❌ Error: Not a directory: {skill_dir}")
sys.exit(1)
# Dry run mode
if args.dry_run:
print(f"🔍 DRY RUN MODE")
print(f" Would enhance: {skill_dir}")
print(f" References: {skill_dir / 'references'}")
print(f" SKILL.md: {skill_dir / 'SKILL.md'}")
refs_dir = skill_dir / "references"
if refs_dir.exists():
ref_files = list(refs_dir.glob("*.md"))
print(f" Found {len(ref_files)} reference files:")
for rf in ref_files:
size = rf.stat().st_size
print(f" - {rf.name} ({size:,} bytes)")
print("\nTo actually run enhancement:")
print(f" skill-seekers enhance {skill_dir}")
return
# Create enhancer and run
try:
enhancer = SkillEnhancer(skill_dir, api_key=args.api_key)
success = enhancer.run()
sys.exit(0 if success else 1)
except ValueError as e:
print(f"❌ Error: {e}")
print("\nSet your API key:")
print(" export ANTHROPIC_API_KEY=sk-ant-...")
print("Or provide it directly:")
print(f" skill-seekers enhance {skill_dir} --api-key sk-ant-...")
sys.exit(1)
except Exception as e:
print(f"❌ Unexpected error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -1,451 +0,0 @@
#!/usr/bin/env python3
"""
SKILL.md Enhancement Script (Local - Using Claude Code)
Opens a new terminal with Claude Code to enhance SKILL.md, then reports back.
No API key needed - uses your existing Claude Code Max plan!
Usage:
skill-seekers enhance output/steam-inventory/
skill-seekers enhance output/react/
Terminal Selection:
The script automatically detects which terminal app to use:
1. SKILL_SEEKER_TERMINAL env var (highest priority)
Example: export SKILL_SEEKER_TERMINAL="Ghostty"
2. TERM_PROGRAM env var (current terminal)
3. Terminal.app (fallback)
Supported terminals: Ghostty, iTerm, Terminal, WezTerm
"""
import os
import sys
import time
import subprocess
import tempfile
from pathlib import Path
# Add parent directory to path for imports when run as script
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from skill_seekers.cli.constants import LOCAL_CONTENT_LIMIT, LOCAL_PREVIEW_LIMIT
from skill_seekers.cli.utils import read_reference_files
def detect_terminal_app():
"""Detect which terminal app to use with cascading priority.
Priority order:
1. SKILL_SEEKER_TERMINAL environment variable (explicit user preference)
2. TERM_PROGRAM environment variable (inherit current terminal)
3. Terminal.app (fallback default)
Returns:
tuple: (terminal_app_name, detection_method)
- terminal_app_name (str): Name of terminal app to launch (e.g., "Ghostty", "Terminal")
- detection_method (str): How the terminal was detected (for logging)
Examples:
>>> os.environ['SKILL_SEEKER_TERMINAL'] = 'Ghostty'
>>> detect_terminal_app()
('Ghostty', 'SKILL_SEEKER_TERMINAL')
>>> os.environ['TERM_PROGRAM'] = 'iTerm.app'
>>> detect_terminal_app()
('iTerm', 'TERM_PROGRAM')
"""
# Map TERM_PROGRAM values to macOS app names
TERMINAL_MAP = {
'Apple_Terminal': 'Terminal',
'iTerm.app': 'iTerm',
'ghostty': 'Ghostty',
'WezTerm': 'WezTerm',
}
# Priority 1: Check SKILL_SEEKER_TERMINAL env var (explicit preference)
preferred_terminal = os.environ.get('SKILL_SEEKER_TERMINAL', '').strip()
if preferred_terminal:
return preferred_terminal, 'SKILL_SEEKER_TERMINAL'
# Priority 2: Check TERM_PROGRAM (inherit current terminal)
term_program = os.environ.get('TERM_PROGRAM', '').strip()
if term_program and term_program in TERMINAL_MAP:
return TERMINAL_MAP[term_program], 'TERM_PROGRAM'
# Priority 3: Fallback to Terminal.app
if term_program:
# TERM_PROGRAM is set but unknown
return 'Terminal', f'unknown TERM_PROGRAM ({term_program})'
else:
# No TERM_PROGRAM set
return 'Terminal', 'default'
class LocalSkillEnhancer:
def __init__(self, skill_dir):
self.skill_dir = Path(skill_dir)
self.references_dir = self.skill_dir / "references"
self.skill_md_path = self.skill_dir / "SKILL.md"
def create_enhancement_prompt(self):
"""Create the prompt file for Claude Code"""
# Read reference files
references = read_reference_files(
self.skill_dir,
max_chars=LOCAL_CONTENT_LIMIT,
preview_limit=LOCAL_PREVIEW_LIMIT
)
if not references:
print("❌ No reference files found")
return None
# Read current SKILL.md
current_skill_md = ""
if self.skill_md_path.exists():
current_skill_md = self.skill_md_path.read_text(encoding='utf-8')
# Build prompt
prompt = f"""I need you to enhance the SKILL.md file for the {self.skill_dir.name} skill.
CURRENT SKILL.MD:
{'-'*60}
{current_skill_md if current_skill_md else '(No existing SKILL.md - create from scratch)'}
{'-'*60}
REFERENCE DOCUMENTATION:
{'-'*60}
"""
for filename, content in references.items():
prompt += f"\n## {filename}\n{content[:15000]}\n"
prompt += f"""
{'-'*60}
YOUR TASK:
Create an EXCELLENT SKILL.md file that will help Claude use this documentation effectively.
Requirements:
1. **Clear "When to Use This Skill" section**
- Be SPECIFIC about trigger conditions
- List concrete use cases
2. **Excellent Quick Reference section**
- Extract 5-10 of the BEST, most practical code examples from the reference docs
- Choose SHORT, clear examples (5-20 lines max)
- Include both simple and intermediate examples
- Use proper language tags (cpp, python, javascript, json, etc.)
- Add clear descriptions for each example
3. **Detailed Reference Files description**
- Explain what's in each reference file
- Help users navigate the documentation
4. **Practical "Working with This Skill" section**
- Clear guidance for beginners, intermediate, and advanced users
- Navigation tips
5. **Key Concepts section** (if applicable)
- Explain core concepts
- Define important terminology
IMPORTANT:
- Extract REAL examples from the reference docs above
- Prioritize SHORT, clear examples
- Make it actionable and practical
- Keep the frontmatter (---\\nname: ...\\n---) intact
- Use proper markdown formatting
SAVE THE RESULT:
Save the complete enhanced SKILL.md to: {self.skill_md_path.absolute()}
First, backup the original to: {self.skill_md_path.with_suffix('.md.backup').absolute()}
"""
return prompt
def run(self, headless=True, timeout=600):
"""Main enhancement workflow
Args:
headless: If True, run claude directly without opening terminal (default: True)
timeout: Maximum time to wait for enhancement in seconds (default: 600 = 10 minutes)
"""
print(f"\n{'='*60}")
print(f"LOCAL ENHANCEMENT: {self.skill_dir.name}")
print(f"{'='*60}\n")
# Validate
if not self.skill_dir.exists():
print(f"❌ Directory not found: {self.skill_dir}")
return False
# Read reference files
print("📖 Reading reference documentation...")
references = read_reference_files(
self.skill_dir,
max_chars=LOCAL_CONTENT_LIMIT,
preview_limit=LOCAL_PREVIEW_LIMIT
)
if not references:
print("❌ No reference files found to analyze")
return False
print(f" ✓ Read {len(references)} reference files")
total_size = sum(len(c) for c in references.values())
print(f" ✓ Total size: {total_size:,} characters\n")
# Create prompt
print("📝 Creating enhancement prompt...")
prompt = self.create_enhancement_prompt()
if not prompt:
return False
# Save prompt to temp file
with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False, encoding='utf-8') as f:
prompt_file = f.name
f.write(prompt)
print(f" ✓ Prompt saved ({len(prompt):,} characters)\n")
# Headless mode: Run claude directly without opening terminal
if headless:
return self._run_headless(prompt_file, timeout)
# Terminal mode: Launch Claude Code in new terminal
print("🚀 Launching Claude Code in new terminal...")
print(" This will:")
print(" 1. Open a new terminal window")
print(" 2. Run Claude Code with the enhancement task")
print(" 3. Claude will read the docs and enhance SKILL.md")
print(" 4. Terminal will auto-close when done")
print()
# Create a shell script to run in the terminal
shell_script = f'''#!/bin/bash
claude {prompt_file}
echo ""
echo "✅ Enhancement complete!"
echo "Press any key to close..."
read -n 1
rm {prompt_file}
'''
# Save shell script
with tempfile.NamedTemporaryFile(mode='w', suffix='.sh', delete=False) as f:
script_file = f.name
f.write(shell_script)
os.chmod(script_file, 0o755)
# Launch in new terminal (macOS specific)
if sys.platform == 'darwin':
# Detect which terminal app to use
terminal_app, detection_method = detect_terminal_app()
# Show detection info
if detection_method == 'SKILL_SEEKER_TERMINAL':
print(f" Using terminal: {terminal_app} (from SKILL_SEEKER_TERMINAL)")
elif detection_method == 'TERM_PROGRAM':
print(f" Using terminal: {terminal_app} (inherited from current terminal)")
elif detection_method.startswith('unknown TERM_PROGRAM'):
print(f"⚠️ {detection_method}")
print(f" → Using Terminal.app as fallback")
else:
print(f" Using terminal: {terminal_app} (default)")
try:
subprocess.Popen(['open', '-a', terminal_app, script_file])
except Exception as e:
print(f"⚠️ Error launching {terminal_app}: {e}")
print(f"\nManually run: {script_file}")
return False
else:
print("⚠️ Auto-launch only works on macOS")
print(f"\nManually run this command in a new terminal:")
print(f" claude '{prompt_file}'")
print(f"\nThen delete the prompt file:")
print(f" rm '{prompt_file}'")
return False
print("✅ New terminal launched with Claude Code!")
print()
print("📊 Status:")
print(f" - Prompt file: {prompt_file}")
print(f" - Skill directory: {self.skill_dir.absolute()}")
print(f" - SKILL.md will be saved to: {self.skill_md_path.absolute()}")
print(f" - Original backed up to: {self.skill_md_path.with_suffix('.md.backup').absolute()}")
print()
print("⏳ Wait for Claude Code to finish in the other terminal...")
print(" (Usually takes 30-60 seconds)")
print()
print("💡 When done:")
print(f" 1. Check the enhanced SKILL.md: {self.skill_md_path}")
print(f" 2. If you don't like it, restore: mv {self.skill_md_path.with_suffix('.md.backup')} {self.skill_md_path}")
print(f" 3. Package: skill-seekers package {self.skill_dir}/")
return True
def _run_headless(self, prompt_file, timeout):
"""Run Claude enhancement in headless mode (no terminal window)
Args:
prompt_file: Path to prompt file
timeout: Maximum seconds to wait
Returns:
bool: True if enhancement succeeded
"""
import time
from pathlib import Path
print("✨ Running Claude Code enhancement (headless mode)...")
print(f" Timeout: {timeout} seconds ({timeout//60} minutes)")
print()
# Record initial state
initial_mtime = self.skill_md_path.stat().st_mtime if self.skill_md_path.exists() else 0
initial_size = self.skill_md_path.stat().st_size if self.skill_md_path.exists() else 0
# Start timer
start_time = time.time()
try:
# Run claude command directly (this WAITS for completion)
print(" Running: claude {prompt_file}")
print(" ⏳ Please wait...")
print()
result = subprocess.run(
['claude', prompt_file],
capture_output=True,
text=True,
timeout=timeout
)
elapsed = time.time() - start_time
# Check if successful
if result.returncode == 0:
# Verify SKILL.md was actually updated
if self.skill_md_path.exists():
new_mtime = self.skill_md_path.stat().st_mtime
new_size = self.skill_md_path.stat().st_size
if new_mtime > initial_mtime and new_size > initial_size:
print(f"✅ Enhancement complete! ({elapsed:.1f} seconds)")
print(f" SKILL.md updated: {new_size:,} bytes")
print()
# Clean up prompt file
try:
os.unlink(prompt_file)
except:
pass
return True
else:
print(f"⚠️ Claude finished but SKILL.md was not updated")
print(f" This might indicate an error during enhancement")
print()
return False
else:
print(f"❌ SKILL.md not found after enhancement")
return False
else:
print(f"❌ Claude Code returned error (exit code: {result.returncode})")
if result.stderr:
print(f" Error: {result.stderr[:200]}")
return False
except subprocess.TimeoutExpired:
elapsed = time.time() - start_time
print(f"\n⚠️ Enhancement timed out after {elapsed:.0f} seconds")
print(f" Timeout limit: {timeout} seconds")
print()
print(" Possible reasons:")
print(" - Skill is very large (many references)")
print(" - Claude is taking longer than usual")
print(" - Network issues")
print()
print(" Try:")
print(" 1. Use terminal mode: --interactive-enhancement")
print(" 2. Reduce reference content")
print(" 3. Try again later")
# Clean up
try:
os.unlink(prompt_file)
except:
pass
return False
except FileNotFoundError:
print("'claude' command not found")
print()
print(" Make sure Claude Code CLI is installed:")
print(" See: https://docs.claude.com/claude-code")
print()
print(" Try terminal mode instead: --interactive-enhancement")
return False
except Exception as e:
print(f"❌ Unexpected error: {e}")
return False
def main():
import argparse
parser = argparse.ArgumentParser(
description="Enhance a skill with Claude Code (local)",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Headless mode (default - runs in background)
skill-seekers enhance output/react/
# Interactive mode (opens terminal window)
skill-seekers enhance output/react/ --interactive-enhancement
# Custom timeout
skill-seekers enhance output/react/ --timeout 1200
"""
)
parser.add_argument(
'skill_directory',
help='Path to skill directory (e.g., output/react/)'
)
parser.add_argument(
'--interactive-enhancement',
action='store_true',
help='Open terminal window for enhancement (default: headless mode)'
)
parser.add_argument(
'--timeout',
type=int,
default=600,
help='Timeout in seconds for headless mode (default: 600 = 10 minutes)'
)
args = parser.parse_args()
# Run enhancement
enhancer = LocalSkillEnhancer(args.skill_directory)
headless = not args.interactive_enhancement # Invert: default is headless
success = enhancer.run(headless=headless, timeout=args.timeout)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@ -1,288 +0,0 @@
#!/usr/bin/env python3
"""
Page Count Estimator for Skill Seeker
Quickly estimates how many pages a config will scrape without downloading content
"""
import sys
import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import time
import json
# Add parent directory to path for imports when run as script
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from skill_seekers.cli.constants import (
DEFAULT_RATE_LIMIT,
DEFAULT_MAX_DISCOVERY,
DISCOVERY_THRESHOLD
)
def estimate_pages(config, max_discovery=DEFAULT_MAX_DISCOVERY, timeout=30):
"""
Estimate total pages that will be scraped
Args:
config: Configuration dictionary
max_discovery: Maximum pages to discover (safety limit, use -1 for unlimited)
timeout: Timeout for HTTP requests in seconds
Returns:
dict with estimation results
"""
base_url = config['base_url']
start_urls = config.get('start_urls', [base_url])
url_patterns = config.get('url_patterns', {'include': [], 'exclude': []})
rate_limit = config.get('rate_limit', DEFAULT_RATE_LIMIT)
visited = set()
pending = list(start_urls)
discovered = 0
include_patterns = url_patterns.get('include', [])
exclude_patterns = url_patterns.get('exclude', [])
# Handle unlimited mode
unlimited = (max_discovery == -1 or max_discovery is None)
print(f"🔍 Estimating pages for: {config['name']}")
print(f"📍 Base URL: {base_url}")
print(f"🎯 Start URLs: {len(start_urls)}")
print(f"⏱️ Rate limit: {rate_limit}s")
if unlimited:
print(f"🔢 Max discovery: UNLIMITED (will discover all pages)")
print(f"⚠️ WARNING: This may take a long time!")
else:
print(f"🔢 Max discovery: {max_discovery}")
print()
start_time = time.time()
# Loop condition: stop if no more URLs, or if limit reached (when not unlimited)
while pending and (unlimited or discovered < max_discovery):
url = pending.pop(0)
# Skip if already visited
if url in visited:
continue
visited.add(url)
discovered += 1
# Progress indicator
if discovered % 10 == 0:
elapsed = time.time() - start_time
rate = discovered / elapsed if elapsed > 0 else 0
print(f"⏳ Discovered: {discovered} pages ({rate:.1f} pages/sec)", end='\r')
try:
# HEAD request first to check if page exists (faster)
head_response = requests.head(url, timeout=timeout, allow_redirects=True)
# Skip non-HTML content
content_type = head_response.headers.get('Content-Type', '')
if 'text/html' not in content_type:
continue
# Now GET the page to find links
response = requests.get(url, timeout=timeout)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Find all links
for link in soup.find_all('a', href=True):
href = link['href']
full_url = urljoin(url, href)
# Normalize URL
parsed = urlparse(full_url)
full_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
# Check if URL is valid
if not is_valid_url(full_url, base_url, include_patterns, exclude_patterns):
continue
# Add to pending if not visited
if full_url not in visited and full_url not in pending:
pending.append(full_url)
# Rate limiting
time.sleep(rate_limit)
except requests.RequestException as e:
# Silently skip errors during estimation
pass
except Exception as e:
# Silently skip other errors
pass
elapsed = time.time() - start_time
# Results
results = {
'discovered': discovered,
'pending': len(pending),
'estimated_total': discovered + len(pending),
'elapsed_seconds': round(elapsed, 2),
'discovery_rate': round(discovered / elapsed if elapsed > 0 else 0, 2),
'hit_limit': (not unlimited) and (discovered >= max_discovery),
'unlimited': unlimited
}
return results
def is_valid_url(url, base_url, include_patterns, exclude_patterns):
"""Check if URL should be crawled"""
# Must be same domain
if not url.startswith(base_url.rstrip('/')):
return False
# Check exclude patterns first
if exclude_patterns:
for pattern in exclude_patterns:
if pattern in url:
return False
# Check include patterns (if specified)
if include_patterns:
for pattern in include_patterns:
if pattern in url:
return True
return False
# If no include patterns, accept by default
return True
def print_results(results, config):
"""Print estimation results"""
print()
print("=" * 70)
print("📊 ESTIMATION RESULTS")
print("=" * 70)
print()
print(f"Config: {config['name']}")
print(f"Base URL: {config['base_url']}")
print()
print(f"✅ Pages Discovered: {results['discovered']}")
print(f"⏳ Pages Pending: {results['pending']}")
print(f"📈 Estimated Total: {results['estimated_total']}")
print()
print(f"⏱️ Time Elapsed: {results['elapsed_seconds']}s")
print(f"⚡ Discovery Rate: {results['discovery_rate']} pages/sec")
if results.get('unlimited', False):
print()
print("✅ UNLIMITED MODE - Discovered all reachable pages")
print(f" Total pages: {results['estimated_total']}")
elif results['hit_limit']:
print()
print("⚠️ Hit discovery limit - actual total may be higher")
print(" Increase max_discovery parameter for more accurate estimate")
print()
print("=" * 70)
print("💡 RECOMMENDATIONS")
print("=" * 70)
print()
estimated = results['estimated_total']
current_max = config.get('max_pages', 100)
if estimated <= current_max:
print(f"✅ Current max_pages ({current_max}) is sufficient")
else:
recommended = min(estimated + 50, DISCOVERY_THRESHOLD) # Add 50 buffer, cap at threshold
print(f"⚠️ Current max_pages ({current_max}) may be too low")
print(f"📝 Recommended max_pages: {recommended}")
print(f" (Estimated {estimated} + 50 buffer)")
# Estimate time for full scrape
rate_limit = config.get('rate_limit', DEFAULT_RATE_LIMIT)
estimated_time = (estimated * rate_limit) / 60 # in minutes
print()
print(f"⏱️ Estimated full scrape time: {estimated_time:.1f} minutes")
print(f" (Based on rate_limit: {rate_limit}s)")
print()
def load_config(config_path):
"""Load configuration from JSON file"""
try:
with open(config_path, 'r') as f:
config = json.load(f)
return config
except FileNotFoundError:
print(f"❌ Error: Config file not found: {config_path}")
sys.exit(1)
except json.JSONDecodeError as e:
print(f"❌ Error: Invalid JSON in config file: {e}")
sys.exit(1)
def main():
"""Main entry point"""
import argparse
parser = argparse.ArgumentParser(
description='Estimate page count for Skill Seeker configs',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Estimate pages for a config
skill-seekers estimate configs/react.json
# Estimate with higher discovery limit
skill-seekers estimate configs/godot.json --max-discovery 2000
# Quick estimate (stop at 100 pages)
skill-seekers estimate configs/vue.json --max-discovery 100
"""
)
parser.add_argument('config', help='Path to config JSON file')
parser.add_argument('--max-discovery', '-m', type=int, default=DEFAULT_MAX_DISCOVERY,
help=f'Maximum pages to discover (default: {DEFAULT_MAX_DISCOVERY}, use -1 for unlimited)')
parser.add_argument('--unlimited', '-u', action='store_true',
help='Remove discovery limit - discover all pages (same as --max-discovery -1)')
parser.add_argument('--timeout', '-t', type=int, default=30,
help='HTTP request timeout in seconds (default: 30)')
args = parser.parse_args()
# Handle unlimited flag
max_discovery = -1 if args.unlimited else args.max_discovery
# Load config
config = load_config(args.config)
# Run estimation
try:
results = estimate_pages(config, max_discovery, args.timeout)
print_results(results, config)
# Return exit code based on results
if results['hit_limit']:
return 2 # Warning: hit limit
return 0 # Success
except KeyboardInterrupt:
print("\n\n⚠️ Estimation interrupted by user")
return 1
except Exception as e:
print(f"\n\n❌ Error during estimation: {e}")
return 1
if __name__ == '__main__':
sys.exit(main())

View File

@ -1,274 +0,0 @@
#!/usr/bin/env python3
"""
Router Skill Generator
Creates a router/hub skill that intelligently directs queries to specialized sub-skills.
This is used for large documentation sites split into multiple focused skills.
"""
import json
import sys
import argparse
from pathlib import Path
from typing import Dict, List, Any, Tuple
class RouterGenerator:
"""Generates router skills that direct to specialized sub-skills"""
def __init__(self, config_paths: List[str], router_name: str = None):
self.config_paths = [Path(p) for p in config_paths]
self.configs = [self.load_config(p) for p in self.config_paths]
self.router_name = router_name or self.infer_router_name()
self.base_config = self.configs[0] # Use first as template
def load_config(self, path: Path) -> Dict[str, Any]:
"""Load a config file"""
try:
with open(path, 'r') as f:
return json.load(f)
except Exception as e:
print(f"❌ Error loading {path}: {e}")
sys.exit(1)
def infer_router_name(self) -> str:
"""Infer router name from sub-skill names"""
# Find common prefix
names = [cfg['name'] for cfg in self.configs]
if not names:
return "router"
# Get common prefix before first dash
first_name = names[0]
if '-' in first_name:
return first_name.split('-')[0]
return first_name
def extract_routing_keywords(self) -> Dict[str, List[str]]:
"""Extract keywords for routing to each skill"""
routing = {}
for config in self.configs:
name = config['name']
keywords = []
# Extract from categories
if 'categories' in config:
keywords.extend(config['categories'].keys())
# Extract from name (part after dash)
if '-' in name:
skill_topic = name.split('-', 1)[1]
keywords.append(skill_topic)
routing[name] = keywords
return routing
def generate_skill_md(self) -> str:
"""Generate router SKILL.md content"""
routing_keywords = self.extract_routing_keywords()
skill_md = f"""# {self.router_name.replace('-', ' ').title()} Documentation (Router)
## When to Use This Skill
{self.base_config.get('description', f'Use for {self.router_name} development and programming.')}
This is a router skill that directs your questions to specialized sub-skills for efficient, focused assistance.
## How It Works
This skill analyzes your question and activates the appropriate specialized skill(s):
"""
# List sub-skills
for config in self.configs:
name = config['name']
desc = config.get('description', '')
# Remove router name prefix from description if present
if desc.startswith(f"{self.router_name.title()} -"):
desc = desc.split(' - ', 1)[1]
skill_md += f"### {name}\n{desc}\n\n"
# Routing logic
skill_md += """## Routing Logic
The router analyzes your question for topic keywords and activates relevant skills:
**Keywords Skills:**
"""
for skill_name, keywords in routing_keywords.items():
keyword_str = ", ".join(keywords)
skill_md += f"- {keyword_str} → **{skill_name}**\n"
# Quick reference
skill_md += f"""
## Quick Reference
For quick answers, this router provides basic overview information. For detailed documentation, the specialized skills contain comprehensive references.
### Getting Started
1. Ask your question naturally - mention the topic area
2. The router will activate the appropriate skill(s)
3. You'll receive focused, detailed answers from specialized documentation
### Examples
**Question:** "How do I create a 2D sprite?"
**Activates:** {self.router_name}-2d skill
**Question:** "GDScript function syntax"
**Activates:** {self.router_name}-scripting skill
**Question:** "Physics collision handling in 3D"
**Activates:** {self.router_name}-3d + {self.router_name}-physics skills
### All Available Skills
"""
# List all skills
for config in self.configs:
skill_md += f"- **{config['name']}**\n"
skill_md += f"""
## Need Help?
Simply ask your question and mention the topic. The router will find the right specialized skill for you!
---
*This is a router skill. For complete documentation, see the specialized skills listed above.*
"""
return skill_md
def create_router_config(self) -> Dict[str, Any]:
"""Create router configuration"""
routing_keywords = self.extract_routing_keywords()
router_config = {
"name": self.router_name,
"description": self.base_config.get('description', f'{self.router_name.title()} documentation router'),
"base_url": self.base_config['base_url'],
"selectors": self.base_config.get('selectors', {}),
"url_patterns": self.base_config.get('url_patterns', {}),
"rate_limit": self.base_config.get('rate_limit', 0.5),
"max_pages": 500, # Router only scrapes overview pages
"_router": True,
"_sub_skills": [cfg['name'] for cfg in self.configs],
"_routing_keywords": routing_keywords
}
return router_config
def generate(self, output_dir: Path = None) -> Tuple[Path, Path]:
"""Generate router skill and config"""
if output_dir is None:
output_dir = self.config_paths[0].parent
output_dir = Path(output_dir)
# Generate SKILL.md
skill_md = self.generate_skill_md()
skill_path = output_dir.parent / f"output/{self.router_name}/SKILL.md"
skill_path.parent.mkdir(parents=True, exist_ok=True)
with open(skill_path, 'w') as f:
f.write(skill_md)
# Generate config
router_config = self.create_router_config()
config_path = output_dir / f"{self.router_name}.json"
with open(config_path, 'w') as f:
json.dump(router_config, f, indent=2)
return config_path, skill_path
def main():
parser = argparse.ArgumentParser(
description="Generate router/hub skill for split documentation",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Generate router from multiple configs
python3 generate_router.py configs/godot-2d.json configs/godot-3d.json configs/godot-scripting.json
# Use glob pattern
python3 generate_router.py configs/godot-*.json
# Custom router name
python3 generate_router.py configs/godot-*.json --name godot-hub
# Custom output directory
python3 generate_router.py configs/godot-*.json --output-dir configs/routers/
"""
)
parser.add_argument(
'configs',
nargs='+',
help='Sub-skill config files'
)
parser.add_argument(
'--name',
help='Router skill name (default: inferred from sub-skills)'
)
parser.add_argument(
'--output-dir',
help='Output directory (default: same as input configs)'
)
args = parser.parse_args()
# Filter out router configs (avoid recursion)
config_files = []
for path_str in args.configs:
path = Path(path_str)
if path.exists() and not path.stem.endswith('-router'):
config_files.append(path_str)
if not config_files:
print("❌ Error: No valid config files provided")
sys.exit(1)
print(f"\n{'='*60}")
print("ROUTER SKILL GENERATOR")
print(f"{'='*60}")
print(f"Sub-skills: {len(config_files)}")
for cfg in config_files:
print(f" - {Path(cfg).stem}")
print("")
# Generate router
generator = RouterGenerator(config_files, args.name)
config_path, skill_path = generator.generate(args.output_dir)
print(f"✅ Router config created: {config_path}")
print(f"✅ Router SKILL.md created: {skill_path}")
print("")
print(f"{'='*60}")
print("NEXT STEPS")
print(f"{'='*60}")
print(f"1. Review router SKILL.md: {skill_path}")
print(f"2. Optionally scrape router (for overview pages):")
print(f" skill-seekers scrape --config {config_path}")
print("3. Package router skill:")
print(f" skill-seekers package output/{generator.router_name}/")
print("4. Upload router + all sub-skills to Claude")
print("")
if __name__ == "__main__":
main()

View File

@ -1,900 +0,0 @@
#!/usr/bin/env python3
"""
GitHub Repository to Claude Skill Converter (Tasks C1.1-C1.12)
Converts GitHub repositories into Claude AI skills by extracting:
- README and documentation
- Code structure and signatures
- GitHub Issues, Changelog, and Releases
- Usage examples from tests
Usage:
skill-seekers github --repo facebook/react
skill-seekers github --config configs/react_github.json
skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
"""
import os
import sys
import json
import re
import argparse
import logging
from pathlib import Path
from typing import Dict, List, Optional, Any
from datetime import datetime
try:
from github import Github, GithubException, Repository
from github.GithubException import RateLimitExceededException
except ImportError:
print("Error: PyGithub not installed. Run: pip install PyGithub")
sys.exit(1)
# Configure logging FIRST (before using logger)
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# Import code analyzer for deep code analysis
try:
from .code_analyzer import CodeAnalyzer
CODE_ANALYZER_AVAILABLE = True
except ImportError:
CODE_ANALYZER_AVAILABLE = False
logger.warning("Code analyzer not available - deep analysis disabled")
# Directories to exclude from local repository analysis
EXCLUDED_DIRS = {
'venv', 'env', '.venv', '.env', # Virtual environments
'node_modules', '__pycache__', '.pytest_cache', # Dependencies and caches
'.git', '.svn', '.hg', # Version control
'build', 'dist', '*.egg-info', # Build artifacts
'htmlcov', '.coverage', # Coverage reports
'.tox', '.nox', # Testing environments
'.mypy_cache', '.ruff_cache', # Linter caches
}
class GitHubScraper:
"""
GitHub Repository Scraper (C1.1-C1.9)
Extracts repository information for skill generation:
- Repository structure
- README files
- Code comments and docstrings
- Programming language detection
- Function/class signatures
- Test examples
- GitHub Issues
- CHANGELOG
- Releases
"""
def __init__(self, config: Dict[str, Any], local_repo_path: Optional[str] = None):
"""Initialize GitHub scraper with configuration."""
self.config = config
self.repo_name = config['repo']
self.name = config.get('name', self.repo_name.split('/')[-1])
self.description = config.get('description', f'Skill for {self.repo_name}')
# Local repository path (optional - enables unlimited analysis)
self.local_repo_path = local_repo_path or config.get('local_repo_path')
if self.local_repo_path:
self.local_repo_path = os.path.expanduser(self.local_repo_path)
logger.info(f"Local repository mode enabled: {self.local_repo_path}")
# Configure directory exclusions (smart defaults + optional customization)
self.excluded_dirs = set(EXCLUDED_DIRS) # Start with smart defaults
# Option 1: Replace mode - Use only specified exclusions
if 'exclude_dirs' in config:
self.excluded_dirs = set(config['exclude_dirs'])
logger.warning(
f"Using custom directory exclusions ({len(self.excluded_dirs)} dirs) - "
"defaults overridden"
)
logger.debug(f"Custom exclusions: {sorted(self.excluded_dirs)}")
# Option 2: Extend mode - Add to default exclusions
elif 'exclude_dirs_additional' in config:
additional = set(config['exclude_dirs_additional'])
self.excluded_dirs = self.excluded_dirs.union(additional)
logger.info(
f"Added {len(additional)} custom directory exclusions "
f"(total: {len(self.excluded_dirs)})"
)
logger.debug(f"Additional exclusions: {sorted(additional)}")
# GitHub client setup (C1.1)
token = self._get_token()
self.github = Github(token) if token else Github()
self.repo: Optional[Repository.Repository] = None
# Options
self.include_issues = config.get('include_issues', True)
self.max_issues = config.get('max_issues', 100)
self.include_changelog = config.get('include_changelog', True)
self.include_releases = config.get('include_releases', True)
self.include_code = config.get('include_code', False)
self.code_analysis_depth = config.get('code_analysis_depth', 'surface') # 'surface', 'deep', 'full'
self.file_patterns = config.get('file_patterns', [])
# Initialize code analyzer if deep analysis requested
self.code_analyzer = None
if self.code_analysis_depth != 'surface' and CODE_ANALYZER_AVAILABLE:
self.code_analyzer = CodeAnalyzer(depth=self.code_analysis_depth)
logger.info(f"Code analysis depth: {self.code_analysis_depth}")
# Output paths
self.skill_dir = f"output/{self.name}"
self.data_file = f"output/{self.name}_github_data.json"
# Extracted data storage
self.extracted_data = {
'repo_info': {},
'readme': '',
'file_tree': [],
'languages': {},
'signatures': [],
'test_examples': [],
'issues': [],
'changelog': '',
'releases': []
}
def _get_token(self) -> Optional[str]:
"""
Get GitHub token from env var or config (both options supported).
Priority: GITHUB_TOKEN env var > config file > None
"""
# Try environment variable first (recommended)
token = os.getenv('GITHUB_TOKEN')
if token:
logger.info("Using GitHub token from GITHUB_TOKEN environment variable")
return token
# Fall back to config file
token = self.config.get('github_token')
if token:
logger.warning("Using GitHub token from config file (less secure)")
return token
logger.warning("No GitHub token provided - using unauthenticated access (lower rate limits)")
return None
def scrape(self) -> Dict[str, Any]:
"""
Main scraping entry point.
Executes all C1 tasks in sequence.
"""
try:
logger.info(f"Starting GitHub scrape for: {self.repo_name}")
# C1.1: Fetch repository
self._fetch_repository()
# C1.2: Extract README
self._extract_readme()
# C1.3-C1.6: Extract code structure
self._extract_code_structure()
# C1.7: Extract Issues
if self.include_issues:
self._extract_issues()
# C1.8: Extract CHANGELOG
if self.include_changelog:
self._extract_changelog()
# C1.9: Extract Releases
if self.include_releases:
self._extract_releases()
# Save extracted data
self._save_data()
logger.info(f"✅ Scraping complete! Data saved to: {self.data_file}")
return self.extracted_data
except RateLimitExceededException:
logger.error("GitHub API rate limit exceeded. Please wait or use authentication token.")
raise
except GithubException as e:
logger.error(f"GitHub API error: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error during scraping: {e}")
raise
def _fetch_repository(self):
"""C1.1: Fetch repository structure using GitHub API."""
logger.info(f"Fetching repository: {self.repo_name}")
try:
self.repo = self.github.get_repo(self.repo_name)
# Extract basic repo info
self.extracted_data['repo_info'] = {
'name': self.repo.name,
'full_name': self.repo.full_name,
'description': self.repo.description,
'url': self.repo.html_url,
'homepage': self.repo.homepage,
'stars': self.repo.stargazers_count,
'forks': self.repo.forks_count,
'open_issues': self.repo.open_issues_count,
'default_branch': self.repo.default_branch,
'created_at': self.repo.created_at.isoformat() if self.repo.created_at else None,
'updated_at': self.repo.updated_at.isoformat() if self.repo.updated_at else None,
'language': self.repo.language,
'license': self.repo.license.name if self.repo.license else None,
'topics': self.repo.get_topics()
}
logger.info(f"Repository fetched: {self.repo.full_name} ({self.repo.stargazers_count} stars)")
except GithubException as e:
if e.status == 404:
raise ValueError(f"Repository not found: {self.repo_name}")
raise
def _extract_readme(self):
"""C1.2: Extract README.md files."""
logger.info("Extracting README...")
# Try common README locations
readme_files = ['README.md', 'README.rst', 'README.txt', 'README',
'docs/README.md', '.github/README.md']
for readme_path in readme_files:
try:
content = self.repo.get_contents(readme_path)
if content:
self.extracted_data['readme'] = content.decoded_content.decode('utf-8')
logger.info(f"README found: {readme_path}")
return
except GithubException:
continue
logger.warning("No README found in repository")
def _extract_code_structure(self):
"""
C1.3-C1.6: Extract code structure, languages, signatures, and test examples.
Surface layer only - no full implementation code.
"""
logger.info("Extracting code structure...")
# C1.4: Get language breakdown
self._extract_languages()
# Get file tree
self._extract_file_tree()
# Extract signatures and test examples
if self.include_code:
self._extract_signatures_and_tests()
def _extract_languages(self):
"""C1.4: Detect programming languages in repository."""
logger.info("Detecting programming languages...")
try:
languages = self.repo.get_languages()
total_bytes = sum(languages.values())
self.extracted_data['languages'] = {
lang: {
'bytes': bytes_count,
'percentage': round((bytes_count / total_bytes) * 100, 2) if total_bytes > 0 else 0
}
for lang, bytes_count in languages.items()
}
logger.info(f"Languages detected: {', '.join(languages.keys())}")
except GithubException as e:
logger.warning(f"Could not fetch languages: {e}")
def should_exclude_dir(self, dir_name: str) -> bool:
"""Check if directory should be excluded from analysis."""
return dir_name in self.excluded_dirs or dir_name.startswith('.')
def _extract_file_tree(self):
"""Extract repository file tree structure (dual-mode: GitHub API or local filesystem)."""
logger.info("Building file tree...")
if self.local_repo_path:
# Local filesystem mode - unlimited files
self._extract_file_tree_local()
else:
# GitHub API mode - limited by API rate limits
self._extract_file_tree_github()
def _extract_file_tree_local(self):
"""Extract file tree from local filesystem (unlimited files)."""
if not os.path.exists(self.local_repo_path):
logger.error(f"Local repository path not found: {self.local_repo_path}")
return
file_tree = []
for root, dirs, files in os.walk(self.local_repo_path):
# Exclude directories in-place to prevent os.walk from descending into them
dirs[:] = [d for d in dirs if not self.should_exclude_dir(d)]
# Calculate relative path from repo root
rel_root = os.path.relpath(root, self.local_repo_path)
if rel_root == '.':
rel_root = ''
# Add directories
for dir_name in dirs:
dir_path = os.path.join(rel_root, dir_name) if rel_root else dir_name
file_tree.append({
'path': dir_path,
'type': 'dir',
'size': None
})
# Add files
for file_name in files:
file_path = os.path.join(rel_root, file_name) if rel_root else file_name
full_path = os.path.join(root, file_name)
try:
file_size = os.path.getsize(full_path)
except OSError:
file_size = None
file_tree.append({
'path': file_path,
'type': 'file',
'size': file_size
})
self.extracted_data['file_tree'] = file_tree
logger.info(f"File tree built (local mode): {len(file_tree)} items")
def _extract_file_tree_github(self):
"""Extract file tree from GitHub API (rate-limited)."""
try:
contents = self.repo.get_contents("")
file_tree = []
while contents:
file_content = contents.pop(0)
file_info = {
'path': file_content.path,
'type': file_content.type,
'size': file_content.size if file_content.type == 'file' else None
}
file_tree.append(file_info)
if file_content.type == "dir":
contents.extend(self.repo.get_contents(file_content.path))
self.extracted_data['file_tree'] = file_tree
logger.info(f"File tree built (GitHub API mode): {len(file_tree)} items")
except GithubException as e:
logger.warning(f"Could not build file tree: {e}")
def _extract_signatures_and_tests(self):
"""
C1.3, C1.5, C1.6: Extract signatures, docstrings, and test examples.
Extraction depth depends on code_analysis_depth setting:
- surface: File tree only (minimal)
- deep: Parse files for signatures, parameters, types
- full: Complete AST analysis (future enhancement)
"""
if self.code_analysis_depth == 'surface':
logger.info("Code extraction: Surface level (file tree only)")
return
if not self.code_analyzer:
logger.warning("Code analyzer not available - skipping deep analysis")
return
logger.info(f"Extracting code signatures ({self.code_analysis_depth} analysis)...")
# Get primary language for the repository
languages = self.extracted_data.get('languages', {})
if not languages:
logger.warning("No languages detected - skipping code analysis")
return
# Determine primary language
primary_language = max(languages.items(), key=lambda x: x[1]['bytes'])[0]
logger.info(f"Primary language: {primary_language}")
# Determine file extensions to analyze
extension_map = {
'Python': ['.py'],
'JavaScript': ['.js', '.jsx'],
'TypeScript': ['.ts', '.tsx'],
'C': ['.c', '.h'],
'C++': ['.cpp', '.hpp', '.cc', '.hh', '.cxx']
}
extensions = extension_map.get(primary_language, [])
if not extensions:
logger.warning(f"No file extensions mapped for {primary_language}")
return
# Analyze files matching patterns and extensions
analyzed_files = []
file_tree = self.extracted_data.get('file_tree', [])
for file_info in file_tree:
file_path = file_info['path']
# Check if file matches extension
if not any(file_path.endswith(ext) for ext in extensions):
continue
# Check if file matches patterns (if specified)
if self.file_patterns:
import fnmatch
if not any(fnmatch.fnmatch(file_path, pattern) for pattern in self.file_patterns):
continue
# Analyze this file
try:
# Read file content based on mode
if self.local_repo_path:
# Local mode - read from filesystem
full_path = os.path.join(self.local_repo_path, file_path)
with open(full_path, 'r', encoding='utf-8') as f:
content = f.read()
else:
# GitHub API mode - fetch from API
file_content = self.repo.get_contents(file_path)
content = file_content.decoded_content.decode('utf-8')
analysis_result = self.code_analyzer.analyze_file(
file_path,
content,
primary_language
)
if analysis_result and (analysis_result.get('classes') or analysis_result.get('functions')):
analyzed_files.append({
'file': file_path,
'language': primary_language,
**analysis_result
})
logger.debug(f"Analyzed {file_path}: "
f"{len(analysis_result.get('classes', []))} classes, "
f"{len(analysis_result.get('functions', []))} functions")
except Exception as e:
logger.debug(f"Could not analyze {file_path}: {e}")
continue
# Limit number of files analyzed to avoid rate limits (GitHub API mode only)
if not self.local_repo_path and len(analyzed_files) >= 50:
logger.info(f"Reached analysis limit (50 files, GitHub API mode)")
break
self.extracted_data['code_analysis'] = {
'depth': self.code_analysis_depth,
'language': primary_language,
'files_analyzed': len(analyzed_files),
'files': analyzed_files
}
# Calculate totals
total_classes = sum(len(f.get('classes', [])) for f in analyzed_files)
total_functions = sum(len(f.get('functions', [])) for f in analyzed_files)
logger.info(f"Code analysis complete: {len(analyzed_files)} files, "
f"{total_classes} classes, {total_functions} functions")
def _extract_issues(self):
"""C1.7: Extract GitHub Issues (open/closed, labels, milestones)."""
logger.info(f"Extracting GitHub Issues (max {self.max_issues})...")
try:
# Fetch recent issues (open + closed)
issues = self.repo.get_issues(state='all', sort='updated', direction='desc')
issue_list = []
for issue in issues[:self.max_issues]:
# Skip pull requests (they appear in issues)
if issue.pull_request:
continue
issue_data = {
'number': issue.number,
'title': issue.title,
'state': issue.state,
'labels': [label.name for label in issue.labels],
'milestone': issue.milestone.title if issue.milestone else None,
'created_at': issue.created_at.isoformat() if issue.created_at else None,
'updated_at': issue.updated_at.isoformat() if issue.updated_at else None,
'closed_at': issue.closed_at.isoformat() if issue.closed_at else None,
'url': issue.html_url,
'body': issue.body[:500] if issue.body else None # First 500 chars
}
issue_list.append(issue_data)
self.extracted_data['issues'] = issue_list
logger.info(f"Extracted {len(issue_list)} issues")
except GithubException as e:
logger.warning(f"Could not fetch issues: {e}")
def _extract_changelog(self):
"""C1.8: Extract CHANGELOG.md and release notes."""
logger.info("Extracting CHANGELOG...")
# Try common changelog locations
changelog_files = ['CHANGELOG.md', 'CHANGES.md', 'HISTORY.md',
'CHANGELOG.rst', 'CHANGELOG.txt', 'CHANGELOG',
'docs/CHANGELOG.md', '.github/CHANGELOG.md']
for changelog_path in changelog_files:
try:
content = self.repo.get_contents(changelog_path)
if content:
self.extracted_data['changelog'] = content.decoded_content.decode('utf-8')
logger.info(f"CHANGELOG found: {changelog_path}")
return
except GithubException:
continue
logger.warning("No CHANGELOG found in repository")
def _extract_releases(self):
"""C1.9: Extract GitHub Releases with version history."""
logger.info("Extracting GitHub Releases...")
try:
releases = self.repo.get_releases()
release_list = []
for release in releases:
release_data = {
'tag_name': release.tag_name,
'name': release.title,
'body': release.body,
'draft': release.draft,
'prerelease': release.prerelease,
'created_at': release.created_at.isoformat() if release.created_at else None,
'published_at': release.published_at.isoformat() if release.published_at else None,
'url': release.html_url,
'tarball_url': release.tarball_url,
'zipball_url': release.zipball_url
}
release_list.append(release_data)
self.extracted_data['releases'] = release_list
logger.info(f"Extracted {len(release_list)} releases")
except GithubException as e:
logger.warning(f"Could not fetch releases: {e}")
def _save_data(self):
"""Save extracted data to JSON file."""
os.makedirs('output', exist_ok=True)
with open(self.data_file, 'w', encoding='utf-8') as f:
json.dump(self.extracted_data, f, indent=2, ensure_ascii=False)
logger.info(f"Data saved to: {self.data_file}")
class GitHubToSkillConverter:
"""
Convert extracted GitHub data to Claude skill format (C1.10).
"""
def __init__(self, config: Dict[str, Any]):
"""Initialize converter with configuration."""
self.config = config
self.name = config.get('name', config['repo'].split('/')[-1])
self.description = config.get('description', f'Skill for {config["repo"]}')
# Paths
self.data_file = f"output/{self.name}_github_data.json"
self.skill_dir = f"output/{self.name}"
# Load extracted data
self.data = self._load_data()
def _load_data(self) -> Dict[str, Any]:
"""Load extracted GitHub data from JSON."""
if not os.path.exists(self.data_file):
raise FileNotFoundError(f"Data file not found: {self.data_file}")
with open(self.data_file, 'r', encoding='utf-8') as f:
return json.load(f)
def build_skill(self):
"""Build complete skill structure."""
logger.info(f"Building skill for: {self.name}")
# Create directories
os.makedirs(self.skill_dir, exist_ok=True)
os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
# Generate SKILL.md
self._generate_skill_md()
# Generate reference files
self._generate_references()
logger.info(f"✅ Skill built successfully: {self.skill_dir}/")
def _generate_skill_md(self):
"""Generate main SKILL.md file."""
repo_info = self.data.get('repo_info', {})
# Generate skill name (lowercase, hyphens only, max 64 chars)
skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
# Truncate description to 1024 chars if needed
desc = self.description[:1024] if len(self.description) > 1024 else self.description
skill_content = f"""---
name: {skill_name}
description: {desc}
---
# {repo_info.get('name', self.name)}
{self.description}
## Description
{repo_info.get('description', 'GitHub repository skill')}
**Repository:** [{repo_info.get('full_name', 'N/A')}]({repo_info.get('url', '#')})
**Language:** {repo_info.get('language', 'N/A')}
**Stars:** {repo_info.get('stars', 0):,}
**License:** {repo_info.get('license', 'N/A')}
## When to Use This Skill
Use this skill when you need to:
- Understand how to use {self.name}
- Look up API documentation
- Find usage examples
- Check for known issues or recent changes
- Review release history
## Quick Reference
### Repository Info
- **Homepage:** {repo_info.get('homepage', 'N/A')}
- **Topics:** {', '.join(repo_info.get('topics', []))}
- **Open Issues:** {repo_info.get('open_issues', 0)}
- **Last Updated:** {repo_info.get('updated_at', 'N/A')[:10]}
### Languages
{self._format_languages()}
### Recent Releases
{self._format_recent_releases()}
## Available References
- `references/README.md` - Complete README documentation
- `references/CHANGELOG.md` - Version history and changes
- `references/issues.md` - Recent GitHub issues
- `references/releases.md` - Release notes
- `references/file_structure.md` - Repository structure
## Usage
See README.md for complete usage instructions and examples.
---
**Generated by Skill Seeker** | GitHub Repository Scraper
"""
skill_path = f"{self.skill_dir}/SKILL.md"
with open(skill_path, 'w', encoding='utf-8') as f:
f.write(skill_content)
logger.info(f"Generated: {skill_path}")
def _format_languages(self) -> str:
"""Format language breakdown."""
languages = self.data.get('languages', {})
if not languages:
return "No language data available"
lines = []
for lang, info in sorted(languages.items(), key=lambda x: x[1]['bytes'], reverse=True):
lines.append(f"- **{lang}:** {info['percentage']:.1f}%")
return '\n'.join(lines)
def _format_recent_releases(self) -> str:
"""Format recent releases (top 3)."""
releases = self.data.get('releases', [])
if not releases:
return "No releases available"
lines = []
for release in releases[:3]:
lines.append(f"- **{release['tag_name']}** ({release['published_at'][:10]}): {release['name']}")
return '\n'.join(lines)
def _generate_references(self):
"""Generate all reference files."""
# README
if self.data.get('readme'):
readme_path = f"{self.skill_dir}/references/README.md"
with open(readme_path, 'w', encoding='utf-8') as f:
f.write(self.data['readme'])
logger.info(f"Generated: {readme_path}")
# CHANGELOG
if self.data.get('changelog'):
changelog_path = f"{self.skill_dir}/references/CHANGELOG.md"
with open(changelog_path, 'w', encoding='utf-8') as f:
f.write(self.data['changelog'])
logger.info(f"Generated: {changelog_path}")
# Issues
if self.data.get('issues'):
self._generate_issues_reference()
# Releases
if self.data.get('releases'):
self._generate_releases_reference()
# File structure
if self.data.get('file_tree'):
self._generate_file_structure_reference()
def _generate_issues_reference(self):
"""Generate issues.md reference file."""
issues = self.data['issues']
content = f"# GitHub Issues\n\nRecent issues from the repository ({len(issues)} total).\n\n"
# Group by state
open_issues = [i for i in issues if i['state'] == 'open']
closed_issues = [i for i in issues if i['state'] == 'closed']
content += f"## Open Issues ({len(open_issues)})\n\n"
for issue in open_issues[:20]:
labels = ', '.join(issue['labels']) if issue['labels'] else 'No labels'
content += f"### #{issue['number']}: {issue['title']}\n"
content += f"**Labels:** {labels} | **Created:** {issue['created_at'][:10]}\n"
content += f"[View on GitHub]({issue['url']})\n\n"
content += f"\n## Recently Closed Issues ({len(closed_issues)})\n\n"
for issue in closed_issues[:10]:
labels = ', '.join(issue['labels']) if issue['labels'] else 'No labels'
content += f"### #{issue['number']}: {issue['title']}\n"
content += f"**Labels:** {labels} | **Closed:** {issue['closed_at'][:10]}\n"
content += f"[View on GitHub]({issue['url']})\n\n"
issues_path = f"{self.skill_dir}/references/issues.md"
with open(issues_path, 'w', encoding='utf-8') as f:
f.write(content)
logger.info(f"Generated: {issues_path}")
def _generate_releases_reference(self):
"""Generate releases.md reference file."""
releases = self.data['releases']
content = f"# Releases\n\nVersion history for this repository ({len(releases)} releases).\n\n"
for release in releases:
content += f"## {release['tag_name']}: {release['name']}\n"
content += f"**Published:** {release['published_at'][:10]}\n"
if release['prerelease']:
content += f"**Pre-release**\n"
content += f"\n{release['body']}\n\n"
content += f"[View on GitHub]({release['url']})\n\n---\n\n"
releases_path = f"{self.skill_dir}/references/releases.md"
with open(releases_path, 'w', encoding='utf-8') as f:
f.write(content)
logger.info(f"Generated: {releases_path}")
def _generate_file_structure_reference(self):
"""Generate file_structure.md reference file."""
file_tree = self.data['file_tree']
content = f"# Repository File Structure\n\n"
content += f"Total items: {len(file_tree)}\n\n"
content += "```\n"
# Build tree structure
for item in file_tree:
indent = " " * item['path'].count('/')
icon = "📁" if item['type'] == 'dir' else "📄"
content += f"{indent}{icon} {os.path.basename(item['path'])}\n"
content += "```\n"
structure_path = f"{self.skill_dir}/references/file_structure.md"
with open(structure_path, 'w', encoding='utf-8') as f:
f.write(content)
logger.info(f"Generated: {structure_path}")
def main():
"""C1.10: CLI tool entry point."""
parser = argparse.ArgumentParser(
description='GitHub Repository to Claude Skill Converter',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
skill-seekers github --repo facebook/react
skill-seekers github --config configs/react_github.json
skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
"""
)
parser.add_argument('--repo', help='GitHub repository (owner/repo)')
parser.add_argument('--config', help='Path to config JSON file')
parser.add_argument('--token', help='GitHub personal access token')
parser.add_argument('--name', help='Skill name (default: repo name)')
parser.add_argument('--description', help='Skill description')
parser.add_argument('--no-issues', action='store_true', help='Skip GitHub issues')
parser.add_argument('--no-changelog', action='store_true', help='Skip CHANGELOG')
parser.add_argument('--no-releases', action='store_true', help='Skip releases')
parser.add_argument('--max-issues', type=int, default=100, help='Max issues to fetch')
parser.add_argument('--scrape-only', action='store_true', help='Only scrape, don\'t build skill')
args = parser.parse_args()
# Build config from args or file
if args.config:
with open(args.config, 'r') as f:
config = json.load(f)
elif args.repo:
config = {
'repo': args.repo,
'name': args.name or args.repo.split('/')[-1],
'description': args.description or f'GitHub repository skill for {args.repo}',
'github_token': args.token,
'include_issues': not args.no_issues,
'include_changelog': not args.no_changelog,
'include_releases': not args.no_releases,
'max_issues': args.max_issues
}
else:
parser.error('Either --repo or --config is required')
try:
# Phase 1: Scrape GitHub repository
scraper = GitHubScraper(config)
scraper.scrape()
if args.scrape_only:
logger.info("Scrape complete (--scrape-only mode)")
return
# Phase 2: Build skill
converter = GitHubToSkillConverter(config)
converter.build_skill()
logger.info(f"\n✅ Success! Skill created at: output/{config.get('name', config['repo'].split('/')[-1])}/")
logger.info(f"Next step: skill-seekers-package output/{config.get('name', config['repo'].split('/')[-1])}/")
except Exception as e:
logger.error(f"Error: {e}")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@ -1,66 +0,0 @@
# ABOUTME: Detects and validates llms.txt file availability at documentation URLs
# ABOUTME: Supports llms-full.txt, llms.txt, and llms-small.txt variants
import requests
from typing import Optional, Dict, List
from urllib.parse import urlparse
class LlmsTxtDetector:
"""Detect llms.txt files at documentation URLs"""
VARIANTS = [
('llms-full.txt', 'full'),
('llms.txt', 'standard'),
('llms-small.txt', 'small')
]
def __init__(self, base_url: str):
self.base_url = base_url.rstrip('/')
def detect(self) -> Optional[Dict[str, str]]:
"""
Detect available llms.txt variant.
Returns:
Dict with 'url' and 'variant' keys, or None if not found
"""
parsed = urlparse(self.base_url)
root_url = f"{parsed.scheme}://{parsed.netloc}"
for filename, variant in self.VARIANTS:
url = f"{root_url}/{filename}"
if self._check_url_exists(url):
return {'url': url, 'variant': variant}
return None
def detect_all(self) -> List[Dict[str, str]]:
"""
Detect all available llms.txt variants.
Returns:
List of dicts with 'url' and 'variant' keys for each found variant
"""
found_variants = []
for filename, variant in self.VARIANTS:
parsed = urlparse(self.base_url)
root_url = f"{parsed.scheme}://{parsed.netloc}"
url = f"{root_url}/{filename}"
if self._check_url_exists(url):
found_variants.append({
'url': url,
'variant': variant
})
return found_variants
def _check_url_exists(self, url: str) -> bool:
"""Check if URL returns 200 status"""
try:
response = requests.head(url, timeout=5, allow_redirects=True)
return response.status_code == 200
except requests.RequestException:
return False

View File

@ -1,94 +0,0 @@
"""ABOUTME: Downloads llms.txt files from documentation URLs with retry logic"""
"""ABOUTME: Validates markdown content and handles timeouts with exponential backoff"""
import requests
import time
from typing import Optional
class LlmsTxtDownloader:
"""Download llms.txt content from URLs with retry logic"""
def __init__(self, url: str, timeout: int = 30, max_retries: int = 3):
self.url = url
self.timeout = timeout
self.max_retries = max_retries
def get_proper_filename(self) -> str:
"""
Extract filename from URL and convert .txt to .md
Returns:
Proper filename with .md extension
Examples:
https://hono.dev/llms-full.txt -> llms-full.md
https://hono.dev/llms.txt -> llms.md
https://hono.dev/llms-small.txt -> llms-small.md
"""
# Extract filename from URL
from urllib.parse import urlparse
parsed = urlparse(self.url)
filename = parsed.path.split('/')[-1]
# Replace .txt with .md
if filename.endswith('.txt'):
filename = filename[:-4] + '.md'
return filename
def _is_markdown(self, content: str) -> bool:
"""
Check if content looks like markdown.
Returns:
True if content contains markdown patterns
"""
markdown_patterns = ['# ', '## ', '```', '- ', '* ', '`']
return any(pattern in content for pattern in markdown_patterns)
def download(self) -> Optional[str]:
"""
Download llms.txt content with retry logic.
Returns:
String content or None if download fails
"""
headers = {
'User-Agent': 'Skill-Seekers-llms.txt-Reader/1.0'
}
for attempt in range(self.max_retries):
try:
response = requests.get(
self.url,
headers=headers,
timeout=self.timeout
)
response.raise_for_status()
content = response.text
# Validate content is not empty
if len(content) < 100:
print(f"⚠️ Content too short ({len(content)} chars), rejecting")
return None
# Validate content looks like markdown
if not self._is_markdown(content):
print(f"⚠️ Content doesn't look like markdown")
return None
return content
except requests.RequestException as e:
if attempt < self.max_retries - 1:
# Calculate exponential backoff delay: 1s, 2s, 4s, etc.
delay = 2 ** attempt
print(f"⚠️ Attempt {attempt + 1}/{self.max_retries} failed: {e}")
print(f" Retrying in {delay}s...")
time.sleep(delay)
else:
print(f"❌ Failed to download {self.url} after {self.max_retries} attempts: {e}")
return None
return None

View File

@ -1,74 +0,0 @@
"""ABOUTME: Parses llms.txt markdown content into structured page data"""
"""ABOUTME: Extracts titles, content, code samples, and headings from markdown"""
import re
from typing import List, Dict
class LlmsTxtParser:
"""Parse llms.txt markdown content into page structures"""
def __init__(self, content: str):
self.content = content
def parse(self) -> List[Dict]:
"""
Parse markdown content into page structures.
Returns:
List of page dicts with title, content, code_samples, headings
"""
pages = []
# Split by h1 headers (# Title)
sections = re.split(r'\n# ', self.content)
for section in sections:
if not section.strip():
continue
# First line is title
lines = section.split('\n')
title = lines[0].strip('#').strip()
# Parse content
page = self._parse_section('\n'.join(lines[1:]), title)
pages.append(page)
return pages
def _parse_section(self, content: str, title: str) -> Dict:
"""Parse a single section into page structure"""
page = {
'title': title,
'content': '',
'code_samples': [],
'headings': [],
'url': f'llms-txt#{title.lower().replace(" ", "-")}',
'links': []
}
# Extract code blocks
code_blocks = re.findall(r'```(\w+)?\n(.*?)```', content, re.DOTALL)
for lang, code in code_blocks:
page['code_samples'].append({
'code': code.strip(),
'language': lang or 'unknown'
})
# Extract h2/h3 headings
headings = re.findall(r'^(#{2,3})\s+(.+)$', content, re.MULTILINE)
for level_markers, text in headings:
page['headings'].append({
'level': f'h{len(level_markers)}',
'text': text.strip(),
'id': text.lower().replace(' ', '-')
})
# Remove code blocks from content for plain text
content_no_code = re.sub(r'```.*?```', '', content, flags=re.DOTALL)
# Extract paragraphs
paragraphs = [p.strip() for p in content_no_code.split('\n\n') if len(p.strip()) > 20]
page['content'] = '\n\n'.join(paragraphs)
return page

View File

@ -1,285 +0,0 @@
#!/usr/bin/env python3
"""
Skill Seekers - Unified CLI Entry Point
Provides a git-style unified command-line interface for all Skill Seekers tools.
Usage:
skill-seekers <command> [options]
Commands:
scrape Scrape documentation website
github Scrape GitHub repository
pdf Extract from PDF file
unified Multi-source scraping (docs + GitHub + PDF)
enhance AI-powered enhancement (local, no API key)
package Package skill into .zip file
upload Upload skill to Claude
estimate Estimate page count before scraping
Examples:
skill-seekers scrape --config configs/react.json
skill-seekers github --repo microsoft/TypeScript
skill-seekers unified --config configs/react_unified.json
skill-seekers package output/react/
"""
import sys
import argparse
from typing import List, Optional
def create_parser() -> argparse.ArgumentParser:
"""Create the main argument parser with subcommands."""
parser = argparse.ArgumentParser(
prog="skill-seekers",
description="Convert documentation, GitHub repos, and PDFs into Claude AI skills",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Scrape documentation
skill-seekers scrape --config configs/react.json
# Scrape GitHub repository
skill-seekers github --repo microsoft/TypeScript --name typescript
# Multi-source scraping (unified)
skill-seekers unified --config configs/react_unified.json
# AI-powered enhancement
skill-seekers enhance output/react/
# Package and upload
skill-seekers package output/react/
skill-seekers upload output/react.zip
For more information: https://github.com/yusufkaraaslan/Skill_Seekers
"""
)
parser.add_argument(
"--version",
action="version",
version="%(prog)s 2.1.1"
)
subparsers = parser.add_subparsers(
dest="command",
title="commands",
description="Available Skill Seekers commands",
help="Command to run"
)
# === scrape subcommand ===
scrape_parser = subparsers.add_parser(
"scrape",
help="Scrape documentation website",
description="Scrape documentation website and generate skill"
)
scrape_parser.add_argument("--config", help="Config JSON file")
scrape_parser.add_argument("--name", help="Skill name")
scrape_parser.add_argument("--url", help="Documentation URL")
scrape_parser.add_argument("--description", help="Skill description")
scrape_parser.add_argument("--skip-scrape", action="store_true", help="Skip scraping, use cached data")
scrape_parser.add_argument("--enhance", action="store_true", help="AI enhancement (API)")
scrape_parser.add_argument("--enhance-local", action="store_true", help="AI enhancement (local)")
scrape_parser.add_argument("--dry-run", action="store_true", help="Dry run mode")
scrape_parser.add_argument("--async", dest="async_mode", action="store_true", help="Use async scraping")
scrape_parser.add_argument("--workers", type=int, help="Number of async workers")
# === github subcommand ===
github_parser = subparsers.add_parser(
"github",
help="Scrape GitHub repository",
description="Scrape GitHub repository and generate skill"
)
github_parser.add_argument("--config", help="Config JSON file")
github_parser.add_argument("--repo", help="GitHub repo (owner/repo)")
github_parser.add_argument("--name", help="Skill name")
github_parser.add_argument("--description", help="Skill description")
# === pdf subcommand ===
pdf_parser = subparsers.add_parser(
"pdf",
help="Extract from PDF file",
description="Extract content from PDF and generate skill"
)
pdf_parser.add_argument("--config", help="Config JSON file")
pdf_parser.add_argument("--pdf", help="PDF file path")
pdf_parser.add_argument("--name", help="Skill name")
pdf_parser.add_argument("--description", help="Skill description")
pdf_parser.add_argument("--from-json", help="Build from extracted JSON")
# === unified subcommand ===
unified_parser = subparsers.add_parser(
"unified",
help="Multi-source scraping (docs + GitHub + PDF)",
description="Combine multiple sources into one skill"
)
unified_parser.add_argument("--config", required=True, help="Unified config JSON file")
unified_parser.add_argument("--merge-mode", help="Merge mode (rule-based, claude-enhanced)")
unified_parser.add_argument("--dry-run", action="store_true", help="Dry run mode")
# === enhance subcommand ===
enhance_parser = subparsers.add_parser(
"enhance",
help="AI-powered enhancement (local, no API key)",
description="Enhance SKILL.md using Claude Code (local)"
)
enhance_parser.add_argument("skill_directory", help="Skill directory path")
# === package subcommand ===
package_parser = subparsers.add_parser(
"package",
help="Package skill into .zip file",
description="Package skill directory into uploadable .zip"
)
package_parser.add_argument("skill_directory", help="Skill directory path")
package_parser.add_argument("--no-open", action="store_true", help="Don't open output folder")
package_parser.add_argument("--upload", action="store_true", help="Auto-upload after packaging")
# === upload subcommand ===
upload_parser = subparsers.add_parser(
"upload",
help="Upload skill to Claude",
description="Upload .zip file to Claude via Anthropic API"
)
upload_parser.add_argument("zip_file", help=".zip file to upload")
upload_parser.add_argument("--api-key", help="Anthropic API key")
# === estimate subcommand ===
estimate_parser = subparsers.add_parser(
"estimate",
help="Estimate page count before scraping",
description="Estimate total pages for documentation scraping"
)
estimate_parser.add_argument("config", help="Config JSON file")
estimate_parser.add_argument("--max-discovery", type=int, help="Max pages to discover")
return parser
def main(argv: Optional[List[str]] = None) -> int:
"""Main entry point for the unified CLI.
Args:
argv: Command-line arguments (defaults to sys.argv)
Returns:
Exit code (0 for success, non-zero for error)
"""
parser = create_parser()
args = parser.parse_args(argv)
if not args.command:
parser.print_help()
return 1
# Delegate to the appropriate tool
try:
if args.command == "scrape":
from skill_seekers.cli.doc_scraper import main as scrape_main
# Convert args namespace to sys.argv format for doc_scraper
sys.argv = ["doc_scraper.py"]
if args.config:
sys.argv.extend(["--config", args.config])
if args.name:
sys.argv.extend(["--name", args.name])
if args.url:
sys.argv.extend(["--url", args.url])
if args.description:
sys.argv.extend(["--description", args.description])
if args.skip_scrape:
sys.argv.append("--skip-scrape")
if args.enhance:
sys.argv.append("--enhance")
if args.enhance_local:
sys.argv.append("--enhance-local")
if args.dry_run:
sys.argv.append("--dry-run")
if args.async_mode:
sys.argv.append("--async")
if args.workers:
sys.argv.extend(["--workers", str(args.workers)])
return scrape_main() or 0
elif args.command == "github":
from skill_seekers.cli.github_scraper import main as github_main
sys.argv = ["github_scraper.py"]
if args.config:
sys.argv.extend(["--config", args.config])
if args.repo:
sys.argv.extend(["--repo", args.repo])
if args.name:
sys.argv.extend(["--name", args.name])
if args.description:
sys.argv.extend(["--description", args.description])
return github_main() or 0
elif args.command == "pdf":
from skill_seekers.cli.pdf_scraper import main as pdf_main
sys.argv = ["pdf_scraper.py"]
if args.config:
sys.argv.extend(["--config", args.config])
if args.pdf:
sys.argv.extend(["--pdf", args.pdf])
if args.name:
sys.argv.extend(["--name", args.name])
if args.description:
sys.argv.extend(["--description", args.description])
if args.from_json:
sys.argv.extend(["--from-json", args.from_json])
return pdf_main() or 0
elif args.command == "unified":
from skill_seekers.cli.unified_scraper import main as unified_main
sys.argv = ["unified_scraper.py", "--config", args.config]
if args.merge_mode:
sys.argv.extend(["--merge-mode", args.merge_mode])
if args.dry_run:
sys.argv.append("--dry-run")
return unified_main() or 0
elif args.command == "enhance":
from skill_seekers.cli.enhance_skill_local import main as enhance_main
sys.argv = ["enhance_skill_local.py", args.skill_directory]
return enhance_main() or 0
elif args.command == "package":
from skill_seekers.cli.package_skill import main as package_main
sys.argv = ["package_skill.py", args.skill_directory]
if args.no_open:
sys.argv.append("--no-open")
if args.upload:
sys.argv.append("--upload")
return package_main() or 0
elif args.command == "upload":
from skill_seekers.cli.upload_skill import main as upload_main
sys.argv = ["upload_skill.py", args.zip_file]
if args.api_key:
sys.argv.extend(["--api-key", args.api_key])
return upload_main() or 0
elif args.command == "estimate":
from skill_seekers.cli.estimate_pages import main as estimate_main
sys.argv = ["estimate_pages.py", args.config]
if args.max_discovery:
sys.argv.extend(["--max-discovery", str(args.max_discovery)])
return estimate_main() or 0
else:
print(f"Error: Unknown command '{args.command}'", file=sys.stderr)
parser.print_help()
return 1
except KeyboardInterrupt:
print("\n\nInterrupted by user", file=sys.stderr)
return 130
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -1,513 +0,0 @@
#!/usr/bin/env python3
"""
Source Merger for Multi-Source Skills
Merges documentation and code data intelligently:
- Rule-based merge: Fast, deterministic rules
- Claude-enhanced merge: AI-powered reconciliation
Handles conflicts and creates unified API reference.
"""
import json
import logging
import subprocess
import tempfile
import os
from pathlib import Path
from typing import Dict, List, Any, Optional
from .conflict_detector import Conflict, ConflictDetector
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class RuleBasedMerger:
"""
Rule-based API merger using deterministic rules.
Rules:
1. If API only in docs Include with [DOCS_ONLY] tag
2. If API only in code Include with [UNDOCUMENTED] tag
3. If both match perfectly Include normally
4. If conflict Include both versions with [CONFLICT] tag, prefer code signature
"""
def __init__(self, docs_data: Dict, github_data: Dict, conflicts: List[Conflict]):
"""
Initialize rule-based merger.
Args:
docs_data: Documentation scraper data
github_data: GitHub scraper data
conflicts: List of detected conflicts
"""
self.docs_data = docs_data
self.github_data = github_data
self.conflicts = conflicts
# Build conflict index for fast lookup
self.conflict_index = {c.api_name: c for c in conflicts}
# Extract APIs from both sources
detector = ConflictDetector(docs_data, github_data)
self.docs_apis = detector.docs_apis
self.code_apis = detector.code_apis
def merge_all(self) -> Dict[str, Any]:
"""
Merge all APIs using rule-based logic.
Returns:
Dict containing merged API data
"""
logger.info("Starting rule-based merge...")
merged_apis = {}
# Get all unique API names
all_api_names = set(self.docs_apis.keys()) | set(self.code_apis.keys())
for api_name in sorted(all_api_names):
merged_api = self._merge_single_api(api_name)
merged_apis[api_name] = merged_api
logger.info(f"Merged {len(merged_apis)} APIs")
return {
'merge_mode': 'rule-based',
'apis': merged_apis,
'summary': {
'total_apis': len(merged_apis),
'docs_only': sum(1 for api in merged_apis.values() if api['status'] == 'docs_only'),
'code_only': sum(1 for api in merged_apis.values() if api['status'] == 'code_only'),
'matched': sum(1 for api in merged_apis.values() if api['status'] == 'matched'),
'conflict': sum(1 for api in merged_apis.values() if api['status'] == 'conflict')
}
}
def _merge_single_api(self, api_name: str) -> Dict[str, Any]:
"""
Merge a single API using rules.
Args:
api_name: Name of the API to merge
Returns:
Merged API dict
"""
in_docs = api_name in self.docs_apis
in_code = api_name in self.code_apis
has_conflict = api_name in self.conflict_index
# Rule 1: Only in docs
if in_docs and not in_code:
conflict = self.conflict_index.get(api_name)
return {
'name': api_name,
'status': 'docs_only',
'source': 'documentation',
'data': self.docs_apis[api_name],
'warning': 'This API is documented but not found in codebase',
'conflict': conflict.__dict__ if conflict else None
}
# Rule 2: Only in code
if in_code and not in_docs:
is_private = api_name.startswith('_')
conflict = self.conflict_index.get(api_name)
return {
'name': api_name,
'status': 'code_only',
'source': 'code',
'data': self.code_apis[api_name],
'warning': 'This API exists in code but is not documented' if not is_private else 'Internal/private API',
'conflict': conflict.__dict__ if conflict else None
}
# Both exist - check for conflicts
docs_info = self.docs_apis[api_name]
code_info = self.code_apis[api_name]
# Rule 3: Both match perfectly (no conflict)
if not has_conflict:
return {
'name': api_name,
'status': 'matched',
'source': 'both',
'docs_data': docs_info,
'code_data': code_info,
'merged_signature': self._create_merged_signature(code_info, docs_info),
'merged_description': docs_info.get('docstring') or code_info.get('docstring')
}
# Rule 4: Conflict exists - prefer code signature, keep docs description
conflict = self.conflict_index[api_name]
return {
'name': api_name,
'status': 'conflict',
'source': 'both',
'docs_data': docs_info,
'code_data': code_info,
'conflict': conflict.__dict__,
'resolution': 'prefer_code_signature',
'merged_signature': self._create_merged_signature(code_info, docs_info),
'merged_description': docs_info.get('docstring') or code_info.get('docstring'),
'warning': conflict.difference
}
def _create_merged_signature(self, code_info: Dict, docs_info: Dict) -> str:
"""
Create merged signature preferring code data.
Args:
code_info: API info from code
docs_info: API info from docs
Returns:
Merged signature string
"""
name = code_info.get('name', docs_info.get('name'))
params = code_info.get('parameters', docs_info.get('parameters', []))
return_type = code_info.get('return_type', docs_info.get('return_type'))
# Build parameter string
param_strs = []
for param in params:
param_str = param['name']
if param.get('type_hint'):
param_str += f": {param['type_hint']}"
if param.get('default'):
param_str += f" = {param['default']}"
param_strs.append(param_str)
signature = f"{name}({', '.join(param_strs)})"
if return_type:
signature += f" -> {return_type}"
return signature
class ClaudeEnhancedMerger:
"""
Claude-enhanced API merger using local Claude Code.
Opens Claude Code in a new terminal to intelligently reconcile conflicts.
Uses the same approach as enhance_skill_local.py.
"""
def __init__(self, docs_data: Dict, github_data: Dict, conflicts: List[Conflict]):
"""
Initialize Claude-enhanced merger.
Args:
docs_data: Documentation scraper data
github_data: GitHub scraper data
conflicts: List of detected conflicts
"""
self.docs_data = docs_data
self.github_data = github_data
self.conflicts = conflicts
# First do rule-based merge as baseline
self.rule_merger = RuleBasedMerger(docs_data, github_data, conflicts)
def merge_all(self) -> Dict[str, Any]:
"""
Merge all APIs using Claude enhancement.
Returns:
Dict containing merged API data
"""
logger.info("Starting Claude-enhanced merge...")
# Create temporary workspace
workspace_dir = self._create_workspace()
# Launch Claude Code for enhancement
logger.info("Launching Claude Code for intelligent merging...")
logger.info("Claude will analyze conflicts and create reconciled API reference")
try:
self._launch_claude_merge(workspace_dir)
# Read enhanced results
merged_data = self._read_merged_results(workspace_dir)
logger.info("Claude-enhanced merge complete")
return merged_data
except Exception as e:
logger.error(f"Claude enhancement failed: {e}")
logger.info("Falling back to rule-based merge")
return self.rule_merger.merge_all()
def _create_workspace(self) -> str:
"""
Create temporary workspace with merge context.
Returns:
Path to workspace directory
"""
workspace = tempfile.mkdtemp(prefix='skill_merge_')
logger.info(f"Created merge workspace: {workspace}")
# Write context files for Claude
self._write_context_files(workspace)
return workspace
def _write_context_files(self, workspace: str):
"""Write context files for Claude to analyze."""
# 1. Write conflicts summary
conflicts_file = os.path.join(workspace, 'conflicts.json')
with open(conflicts_file, 'w') as f:
json.dump({
'conflicts': [c.__dict__ for c in self.conflicts],
'summary': {
'total': len(self.conflicts),
'by_type': self._count_by_field('type'),
'by_severity': self._count_by_field('severity')
}
}, f, indent=2)
# 2. Write documentation APIs
docs_apis_file = os.path.join(workspace, 'docs_apis.json')
detector = ConflictDetector(self.docs_data, self.github_data)
with open(docs_apis_file, 'w') as f:
json.dump(detector.docs_apis, f, indent=2)
# 3. Write code APIs
code_apis_file = os.path.join(workspace, 'code_apis.json')
with open(code_apis_file, 'w') as f:
json.dump(detector.code_apis, f, indent=2)
# 4. Write merge instructions for Claude
instructions = """# API Merge Task
You are merging API documentation from two sources:
1. Official documentation (user-facing)
2. Source code analysis (implementation reality)
## Context Files:
- `conflicts.json` - All detected conflicts between sources
- `docs_apis.json` - APIs from documentation
- `code_apis.json` - APIs from source code
## Your Task:
For each conflict, reconcile the differences intelligently:
1. **Prefer code signatures as source of truth**
- Use actual parameter names, types, defaults from code
- Code is what actually runs, docs might be outdated
2. **Keep documentation descriptions**
- Docs are user-friendly, code comments might be technical
- Keep the docs' explanation of what the API does
3. **Add implementation notes for discrepancies**
- If docs differ from code, explain the difference
- Example: "⚠️ The `snap` parameter exists in code but is not documented"
4. **Flag missing APIs clearly**
- Missing in docs Add [UNDOCUMENTED] tag
- Missing in code Add [REMOVED] or [DOCS_ERROR] tag
5. **Create unified API reference**
- One definitive signature per API
- Clear warnings about conflicts
- Implementation notes where helpful
## Output Format:
Create `merged_apis.json` with this structure:
```json
{
"apis": {
"API.name": {
"signature": "final_signature_here",
"parameters": [...],
"return_type": "type",
"description": "user-friendly description",
"implementation_notes": "Any discrepancies or warnings",
"source": "both|docs_only|code_only",
"confidence": "high|medium|low"
}
}
}
```
Take your time to analyze each conflict carefully. The goal is to create the most accurate and helpful API reference possible.
"""
instructions_file = os.path.join(workspace, 'MERGE_INSTRUCTIONS.md')
with open(instructions_file, 'w') as f:
f.write(instructions)
logger.info(f"Wrote context files to {workspace}")
def _count_by_field(self, field: str) -> Dict[str, int]:
"""Count conflicts by a specific field."""
counts = {}
for conflict in self.conflicts:
value = getattr(conflict, field)
counts[value] = counts.get(value, 0) + 1
return counts
def _launch_claude_merge(self, workspace: str):
"""
Launch Claude Code to perform merge.
Similar to enhance_skill_local.py approach.
"""
# Create a script that Claude will execute
script_path = os.path.join(workspace, 'merge_script.sh')
script_content = f"""#!/bin/bash
# Automatic merge script for Claude Code
cd "{workspace}"
echo "📊 Analyzing conflicts..."
cat conflicts.json | head -20
echo ""
echo "📖 Documentation APIs: $(cat docs_apis.json | grep -c '\"name\"')"
echo "💻 Code APIs: $(cat code_apis.json | grep -c '\"name\"')"
echo ""
echo "Please review the conflicts and create merged_apis.json"
echo "Follow the instructions in MERGE_INSTRUCTIONS.md"
echo ""
echo "When done, save merged_apis.json and close this terminal."
# Wait for user to complete merge
read -p "Press Enter when merge is complete..."
"""
with open(script_path, 'w') as f:
f.write(script_content)
os.chmod(script_path, 0o755)
# Open new terminal with Claude Code
# Try different terminal emulators
terminals = [
['x-terminal-emulator', '-e'],
['gnome-terminal', '--'],
['xterm', '-e'],
['konsole', '-e']
]
for terminal_cmd in terminals:
try:
cmd = terminal_cmd + ['bash', script_path]
subprocess.Popen(cmd)
logger.info(f"Opened terminal with {terminal_cmd[0]}")
break
except FileNotFoundError:
continue
# Wait for merge to complete
merged_file = os.path.join(workspace, 'merged_apis.json')
logger.info(f"Waiting for merged results at: {merged_file}")
logger.info("Close the terminal when done to continue...")
# Poll for file existence
import time
timeout = 3600 # 1 hour max
elapsed = 0
while not os.path.exists(merged_file) and elapsed < timeout:
time.sleep(5)
elapsed += 5
if not os.path.exists(merged_file):
raise TimeoutError("Claude merge timed out after 1 hour")
def _read_merged_results(self, workspace: str) -> Dict[str, Any]:
"""Read merged results from workspace."""
merged_file = os.path.join(workspace, 'merged_apis.json')
if not os.path.exists(merged_file):
raise FileNotFoundError(f"Merged results not found: {merged_file}")
with open(merged_file, 'r') as f:
merged_data = json.load(f)
return {
'merge_mode': 'claude-enhanced',
**merged_data
}
def merge_sources(docs_data_path: str,
github_data_path: str,
output_path: str,
mode: str = 'rule-based') -> Dict[str, Any]:
"""
Merge documentation and GitHub data.
Args:
docs_data_path: Path to documentation data JSON
github_data_path: Path to GitHub data JSON
output_path: Path to save merged output
mode: 'rule-based' or 'claude-enhanced'
Returns:
Merged data dict
"""
# Load data
with open(docs_data_path, 'r') as f:
docs_data = json.load(f)
with open(github_data_path, 'r') as f:
github_data = json.load(f)
# Detect conflicts
detector = ConflictDetector(docs_data, github_data)
conflicts = detector.detect_all_conflicts()
logger.info(f"Detected {len(conflicts)} conflicts")
# Merge based on mode
if mode == 'claude-enhanced':
merger = ClaudeEnhancedMerger(docs_data, github_data, conflicts)
else:
merger = RuleBasedMerger(docs_data, github_data, conflicts)
merged_data = merger.merge_all()
# Save merged data
with open(output_path, 'w') as f:
json.dump(merged_data, f, indent=2, ensure_ascii=False)
logger.info(f"Merged data saved to: {output_path}")
return merged_data
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Merge documentation and code sources')
parser.add_argument('docs_data', help='Path to documentation data JSON')
parser.add_argument('github_data', help='Path to GitHub data JSON')
parser.add_argument('--output', '-o', default='merged_data.json', help='Output file path')
parser.add_argument('--mode', '-m', choices=['rule-based', 'claude-enhanced'],
default='rule-based', help='Merge mode')
args = parser.parse_args()
merged = merge_sources(args.docs_data, args.github_data, args.output, args.mode)
# Print summary
summary = merged.get('summary', {})
print(f"\n✅ Merge complete ({merged.get('merge_mode')})")
print(f" Total APIs: {summary.get('total_apis', 0)}")
print(f" Matched: {summary.get('matched', 0)}")
print(f" Docs only: {summary.get('docs_only', 0)}")
print(f" Code only: {summary.get('code_only', 0)}")
print(f" Conflicts: {summary.get('conflict', 0)}")
print(f"\n📄 Saved to: {args.output}")

View File

@ -1,81 +0,0 @@
#!/usr/bin/env python3
"""
Multi-Skill Packager
Package multiple skills at once. Useful for packaging router + sub-skills together.
"""
import sys
import argparse
from pathlib import Path
import subprocess
def package_skill(skill_dir: Path) -> bool:
"""Package a single skill"""
try:
result = subprocess.run(
[sys.executable, str(Path(__file__).parent / "package_skill.py"), str(skill_dir)],
capture_output=True,
text=True
)
return result.returncode == 0
except Exception as e:
print(f"❌ Error packaging {skill_dir}: {e}")
return False
def main():
parser = argparse.ArgumentParser(
description="Package multiple skills at once",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Package all godot skills
python3 package_multi.py output/godot*/
# Package specific skills
python3 package_multi.py output/godot-2d/ output/godot-3d/ output/godot-scripting/
"""
)
parser.add_argument(
'skill_dirs',
nargs='+',
help='Skill directories to package'
)
args = parser.parse_args()
print(f"\n{'='*60}")
print(f"MULTI-SKILL PACKAGER")
print(f"{'='*60}\n")
skill_dirs = [Path(d) for d in args.skill_dirs]
success_count = 0
total_count = len(skill_dirs)
for skill_dir in skill_dirs:
if not skill_dir.exists():
print(f"⚠️ Skipping (not found): {skill_dir}")
continue
if not (skill_dir / "SKILL.md").exists():
print(f"⚠️ Skipping (no SKILL.md): {skill_dir}")
continue
print(f"📦 Packaging: {skill_dir.name}")
if package_skill(skill_dir):
success_count += 1
print(f" ✅ Success")
else:
print(f" ❌ Failed")
print("")
print(f"{'='*60}")
print(f"SUMMARY: {success_count}/{total_count} skills packaged")
print(f"{'='*60}\n")
if __name__ == "__main__":
main()

View File

@ -1,220 +0,0 @@
#!/usr/bin/env python3
"""
Simple Skill Packager
Packages a skill directory into a .zip file for Claude.
Usage:
skill-seekers package output/steam-inventory/
skill-seekers package output/react/
skill-seekers package output/react/ --no-open # Don't open folder
"""
import os
import sys
import zipfile
import argparse
from pathlib import Path
# Import utilities
try:
from utils import (
open_folder,
print_upload_instructions,
format_file_size,
validate_skill_directory
)
from quality_checker import SkillQualityChecker, print_report
except ImportError:
# If running from different directory, add cli to path
sys.path.insert(0, str(Path(__file__).parent))
from utils import (
open_folder,
print_upload_instructions,
format_file_size,
validate_skill_directory
)
from quality_checker import SkillQualityChecker, print_report
def package_skill(skill_dir, open_folder_after=True, skip_quality_check=False):
"""
Package a skill directory into a .zip file
Args:
skill_dir: Path to skill directory
open_folder_after: Whether to open the output folder after packaging
skip_quality_check: Skip quality checks before packaging
Returns:
tuple: (success, zip_path) where success is bool and zip_path is Path or None
"""
skill_path = Path(skill_dir)
# Validate skill directory
is_valid, error_msg = validate_skill_directory(skill_path)
if not is_valid:
print(f"❌ Error: {error_msg}")
return False, None
# Run quality checks (unless skipped)
if not skip_quality_check:
print("\n" + "=" * 60)
print("QUALITY CHECK")
print("=" * 60)
checker = SkillQualityChecker(skill_path)
report = checker.check_all()
# Print report
print_report(report, verbose=False)
# If there are errors or warnings, ask user to confirm
if report.has_errors or report.has_warnings:
print("=" * 60)
response = input("\nContinue with packaging? (y/n): ").strip().lower()
if response != 'y':
print("\n❌ Packaging cancelled by user")
return False, None
print()
else:
print("=" * 60)
print()
# Create zip filename
skill_name = skill_path.name
zip_path = skill_path.parent / f"{skill_name}.zip"
print(f"📦 Packaging skill: {skill_name}")
print(f" Source: {skill_path}")
print(f" Output: {zip_path}")
# Create zip file
with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
for root, dirs, files in os.walk(skill_path):
# Skip backup files
files = [f for f in files if not f.endswith('.backup')]
for file in files:
file_path = Path(root) / file
arcname = file_path.relative_to(skill_path)
zf.write(file_path, arcname)
print(f" + {arcname}")
# Get zip size
zip_size = zip_path.stat().st_size
print(f"\n✅ Package created: {zip_path}")
print(f" Size: {zip_size:,} bytes ({format_file_size(zip_size)})")
# Open folder in file browser
if open_folder_after:
print(f"\n📂 Opening folder: {zip_path.parent}")
open_folder(zip_path.parent)
# Print upload instructions
print_upload_instructions(zip_path)
return True, zip_path
def main():
parser = argparse.ArgumentParser(
description="Package a skill directory into a .zip file for Claude",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Package skill with quality checks (recommended)
skill-seekers package output/react/
# Package skill without opening folder
skill-seekers package output/react/ --no-open
# Skip quality checks (faster, but not recommended)
skill-seekers package output/react/ --skip-quality-check
# Package and auto-upload to Claude
skill-seekers package output/react/ --upload
# Get help
skill-seekers package --help
"""
)
parser.add_argument(
'skill_dir',
help='Path to skill directory (e.g., output/react/)'
)
parser.add_argument(
'--no-open',
action='store_true',
help='Do not open the output folder after packaging'
)
parser.add_argument(
'--skip-quality-check',
action='store_true',
help='Skip quality checks before packaging'
)
parser.add_argument(
'--upload',
action='store_true',
help='Automatically upload to Claude after packaging (requires ANTHROPIC_API_KEY)'
)
args = parser.parse_args()
success, zip_path = package_skill(
args.skill_dir,
open_folder_after=not args.no_open,
skip_quality_check=args.skip_quality_check
)
if not success:
sys.exit(1)
# Auto-upload if requested
if args.upload:
# Check if API key is set BEFORE attempting upload
api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
if not api_key:
# No API key - show helpful message but DON'T fail
print("\n" + "="*60)
print("💡 Automatic Upload")
print("="*60)
print()
print("To enable automatic upload:")
print(" 1. Get API key from https://console.anthropic.com/")
print(" 2. Set: export ANTHROPIC_API_KEY=sk-ant-...")
print(" 3. Run package_skill.py with --upload flag")
print()
print("For now, use manual upload (instructions above) ☝️")
print("="*60)
# Exit successfully - packaging worked!
sys.exit(0)
# API key exists - try upload
try:
from upload_skill import upload_skill_api
print("\n" + "="*60)
upload_success, message = upload_skill_api(zip_path)
if not upload_success:
print(f"❌ Upload failed: {message}")
print()
print("💡 Try manual upload instead (instructions above) ☝️")
print("="*60)
# Exit successfully - packaging worked even if upload failed
sys.exit(0)
else:
print("="*60)
sys.exit(0)
except ImportError:
print("\n❌ Error: upload_skill.py not found")
sys.exit(1)
sys.exit(0)
if __name__ == "__main__":
main()

View File

@ -1,401 +0,0 @@
#!/usr/bin/env python3
"""
PDF Documentation to Claude Skill Converter (Task B1.6)
Converts PDF documentation into Claude AI skills.
Uses pdf_extractor_poc.py for extraction, builds skill structure.
Usage:
python3 pdf_scraper.py --config configs/manual_pdf.json
python3 pdf_scraper.py --pdf manual.pdf --name myskill
python3 pdf_scraper.py --from-json manual_extracted.json
"""
import os
import sys
import json
import re
import argparse
from pathlib import Path
# Import the PDF extractor
from .pdf_extractor_poc import PDFExtractor
class PDFToSkillConverter:
"""Convert PDF documentation to Claude skill"""
def __init__(self, config):
self.config = config
self.name = config['name']
self.pdf_path = config.get('pdf_path', '')
self.description = config.get('description', f'Documentation skill for {self.name}')
# Paths
self.skill_dir = f"output/{self.name}"
self.data_file = f"output/{self.name}_extracted.json"
# Extraction options
self.extract_options = config.get('extract_options', {})
# Categories
self.categories = config.get('categories', {})
# Extracted data
self.extracted_data = None
def extract_pdf(self):
"""Extract content from PDF using pdf_extractor_poc.py"""
print(f"\n🔍 Extracting from PDF: {self.pdf_path}")
# Create extractor with options
extractor = PDFExtractor(
self.pdf_path,
verbose=True,
chunk_size=self.extract_options.get('chunk_size', 10),
min_quality=self.extract_options.get('min_quality', 5.0),
extract_images=self.extract_options.get('extract_images', True),
image_dir=f"{self.skill_dir}/assets/images",
min_image_size=self.extract_options.get('min_image_size', 100)
)
# Extract
result = extractor.extract_all()
if not result:
print("❌ Extraction failed")
raise RuntimeError(f"Failed to extract PDF: {self.pdf_path}")
# Save extracted data
with open(self.data_file, 'w', encoding='utf-8') as f:
json.dump(result, f, indent=2, ensure_ascii=False)
print(f"\n💾 Saved extracted data to: {self.data_file}")
self.extracted_data = result
return True
def load_extracted_data(self, json_path):
"""Load previously extracted data from JSON"""
print(f"\n📂 Loading extracted data from: {json_path}")
with open(json_path, 'r', encoding='utf-8') as f:
self.extracted_data = json.load(f)
print(f"✅ Loaded {self.extracted_data['total_pages']} pages")
return True
def categorize_content(self):
"""Categorize pages based on chapters or keywords"""
print(f"\n📋 Categorizing content...")
categorized = {}
# Use chapters if available
if self.extracted_data.get('chapters'):
for chapter in self.extracted_data['chapters']:
category_key = self._sanitize_filename(chapter['title'])
categorized[category_key] = {
'title': chapter['title'],
'pages': []
}
# Assign pages to chapters
for page in self.extracted_data['pages']:
page_num = page['page_number']
# Find which chapter this page belongs to
for chapter in self.extracted_data['chapters']:
if chapter['start_page'] <= page_num <= chapter['end_page']:
category_key = self._sanitize_filename(chapter['title'])
categorized[category_key]['pages'].append(page)
break
# Fall back to keyword-based categorization
elif self.categories:
# Check if categories is already in the right format (for tests)
# If first value is a list of dicts (pages), use as-is
first_value = next(iter(self.categories.values()))
if isinstance(first_value, list) and first_value and isinstance(first_value[0], dict):
# Already categorized - convert to expected format
for cat_key, pages in self.categories.items():
categorized[cat_key] = {
'title': cat_key.replace('_', ' ').title(),
'pages': pages
}
else:
# Keyword-based categorization
# Initialize categories
for cat_key, keywords in self.categories.items():
categorized[cat_key] = {
'title': cat_key.replace('_', ' ').title(),
'pages': []
}
# Categorize by keywords
for page in self.extracted_data['pages']:
text = page.get('text', '').lower()
headings_text = ' '.join([h['text'] for h in page.get('headings', [])]).lower()
# Score against each category
scores = {}
for cat_key, keywords in self.categories.items():
# Handle both string keywords and dict keywords (shouldn't happen, but be safe)
if isinstance(keywords, list):
score = sum(1 for kw in keywords
if isinstance(kw, str) and (kw.lower() in text or kw.lower() in headings_text))
else:
score = 0
if score > 0:
scores[cat_key] = score
# Assign to highest scoring category
if scores:
best_cat = max(scores, key=scores.get)
categorized[best_cat]['pages'].append(page)
else:
# Default category
if 'other' not in categorized:
categorized['other'] = {'title': 'Other', 'pages': []}
categorized['other']['pages'].append(page)
else:
# No categorization - use single category
categorized['content'] = {
'title': 'Content',
'pages': self.extracted_data['pages']
}
print(f"✅ Created {len(categorized)} categories")
for cat_key, cat_data in categorized.items():
print(f" - {cat_data['title']}: {len(cat_data['pages'])} pages")
return categorized
def build_skill(self):
"""Build complete skill structure"""
print(f"\n🏗️ Building skill: {self.name}")
# Create directories
os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
# Categorize content
categorized = self.categorize_content()
# Generate reference files
print(f"\n📝 Generating reference files...")
for cat_key, cat_data in categorized.items():
self._generate_reference_file(cat_key, cat_data)
# Generate index
self._generate_index(categorized)
# Generate SKILL.md
self._generate_skill_md(categorized)
print(f"\n✅ Skill built successfully: {self.skill_dir}/")
print(f"\n📦 Next step: Package with: skill-seekers package {self.skill_dir}/")
def _generate_reference_file(self, cat_key, cat_data):
"""Generate a reference markdown file for a category"""
filename = f"{self.skill_dir}/references/{cat_key}.md"
with open(filename, 'w', encoding='utf-8') as f:
f.write(f"# {cat_data['title']}\n\n")
for page in cat_data['pages']:
# Add headings as section markers
if page.get('headings'):
f.write(f"## {page['headings'][0]['text']}\n\n")
# Add text content
if page.get('text'):
# Limit to first 1000 chars per page to avoid huge files
text = page['text'][:1000]
f.write(f"{text}\n\n")
# Add code samples (check both 'code_samples' and 'code_blocks' for compatibility)
code_list = page.get('code_samples') or page.get('code_blocks')
if code_list:
f.write("### Code Examples\n\n")
for code in code_list[:3]: # Limit to top 3
lang = code.get('language', '')
f.write(f"```{lang}\n{code['code']}\n```\n\n")
# Add images
if page.get('images'):
# Create assets directory if needed
assets_dir = os.path.join(self.skill_dir, 'assets')
os.makedirs(assets_dir, exist_ok=True)
f.write("### Images\n\n")
for img in page['images']:
# Save image to assets
img_filename = f"page_{page['page_number']}_img_{img['index']}.png"
img_path = os.path.join(assets_dir, img_filename)
with open(img_path, 'wb') as img_file:
img_file.write(img['data'])
# Add markdown image reference
f.write(f"![Image {img['index']}](../assets/{img_filename})\n\n")
f.write("---\n\n")
print(f" Generated: {filename}")
def _generate_index(self, categorized):
"""Generate reference index"""
filename = f"{self.skill_dir}/references/index.md"
with open(filename, 'w', encoding='utf-8') as f:
f.write(f"# {self.name.title()} Documentation Reference\n\n")
f.write("## Categories\n\n")
for cat_key, cat_data in categorized.items():
page_count = len(cat_data['pages'])
f.write(f"- [{cat_data['title']}]({cat_key}.md) ({page_count} pages)\n")
f.write("\n## Statistics\n\n")
stats = self.extracted_data.get('quality_statistics', {})
f.write(f"- Total pages: {self.extracted_data.get('total_pages', 0)}\n")
f.write(f"- Code blocks: {self.extracted_data.get('total_code_blocks', 0)}\n")
f.write(f"- Images: {self.extracted_data.get('total_images', 0)}\n")
if stats:
f.write(f"- Average code quality: {stats.get('average_quality', 0):.1f}/10\n")
f.write(f"- Valid code blocks: {stats.get('valid_code_blocks', 0)}\n")
print(f" Generated: {filename}")
def _generate_skill_md(self, categorized):
"""Generate main SKILL.md file"""
filename = f"{self.skill_dir}/SKILL.md"
# Generate skill name (lowercase, hyphens only, max 64 chars)
skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
# Truncate description to 1024 chars if needed
desc = self.description[:1024] if len(self.description) > 1024 else self.description
with open(filename, 'w', encoding='utf-8') as f:
# Write YAML frontmatter
f.write(f"---\n")
f.write(f"name: {skill_name}\n")
f.write(f"description: {desc}\n")
f.write(f"---\n\n")
f.write(f"# {self.name.title()} Documentation Skill\n\n")
f.write(f"{self.description}\n\n")
f.write("## When to use this skill\n\n")
f.write(f"Use this skill when the user asks about {self.name} documentation, ")
f.write("including API references, tutorials, examples, and best practices.\n\n")
f.write("## What's included\n\n")
f.write("This skill contains:\n\n")
for cat_key, cat_data in categorized.items():
f.write(f"- **{cat_data['title']}**: {len(cat_data['pages'])} pages\n")
f.write("\n## Quick Reference\n\n")
# Get high-quality code samples
all_code = []
for page in self.extracted_data['pages']:
all_code.extend(page.get('code_samples', []))
# Sort by quality and get top 5
all_code.sort(key=lambda x: x.get('quality_score', 0), reverse=True)
top_code = all_code[:5]
if top_code:
f.write("### Top Code Examples\n\n")
for i, code in enumerate(top_code, 1):
lang = code['language']
quality = code.get('quality_score', 0)
f.write(f"**Example {i}** (Quality: {quality:.1f}/10):\n\n")
f.write(f"```{lang}\n{code['code'][:300]}...\n```\n\n")
f.write("## Navigation\n\n")
f.write("See `references/index.md` for complete documentation structure.\n\n")
# Add language statistics
langs = self.extracted_data.get('languages_detected', {})
if langs:
f.write("## Languages Covered\n\n")
for lang, count in sorted(langs.items(), key=lambda x: x[1], reverse=True):
f.write(f"- {lang}: {count} examples\n")
print(f" Generated: {filename}")
def _sanitize_filename(self, name):
"""Convert string to safe filename"""
# Remove special chars, replace spaces with underscores
safe = re.sub(r'[^\w\s-]', '', name.lower())
safe = re.sub(r'[-\s]+', '_', safe)
return safe
def main():
parser = argparse.ArgumentParser(
description='Convert PDF documentation to Claude skill',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument('--config', help='PDF config JSON file')
parser.add_argument('--pdf', help='Direct PDF file path')
parser.add_argument('--name', help='Skill name (with --pdf)')
parser.add_argument('--from-json', help='Build skill from extracted JSON')
parser.add_argument('--description', help='Skill description')
args = parser.parse_args()
# Validate inputs
if not (args.config or args.pdf or args.from_json):
parser.error("Must specify --config, --pdf, or --from-json")
# Load or create config
if args.config:
with open(args.config, 'r') as f:
config = json.load(f)
elif args.from_json:
# Build from extracted JSON
name = Path(args.from_json).stem.replace('_extracted', '')
config = {
'name': name,
'description': args.description or f'Documentation skill for {name}'
}
converter = PDFToSkillConverter(config)
converter.load_extracted_data(args.from_json)
converter.build_skill()
return
else:
# Direct PDF mode
if not args.name:
parser.error("Must specify --name with --pdf")
config = {
'name': args.name,
'pdf_path': args.pdf,
'description': args.description or f'Documentation skill for {args.name}',
'extract_options': {
'chunk_size': 10,
'min_quality': 5.0,
'extract_images': True,
'min_image_size': 100
}
}
# Create converter
converter = PDFToSkillConverter(config)
# Extract if needed
if config.get('pdf_path'):
if not converter.extract_pdf():
sys.exit(1)
# Build skill
converter.build_skill()
if __name__ == '__main__':
main()

View File

@ -1,480 +0,0 @@
#!/usr/bin/env python3
"""
Quality Checker for Claude Skills
Validates skill quality, checks links, and generates quality reports.
Usage:
python3 quality_checker.py output/react/
python3 quality_checker.py output/godot/ --verbose
"""
import os
import re
import sys
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass, field
@dataclass
class QualityIssue:
"""Represents a quality issue found during validation."""
level: str # 'error', 'warning', 'info'
category: str # 'enhancement', 'content', 'links', 'structure'
message: str
file: Optional[str] = None
line: Optional[int] = None
@dataclass
class QualityReport:
"""Complete quality report for a skill."""
skill_name: str
skill_path: Path
errors: List[QualityIssue] = field(default_factory=list)
warnings: List[QualityIssue] = field(default_factory=list)
info: List[QualityIssue] = field(default_factory=list)
def add_error(self, category: str, message: str, file: str = None, line: int = None):
"""Add an error to the report."""
self.errors.append(QualityIssue('error', category, message, file, line))
def add_warning(self, category: str, message: str, file: str = None, line: int = None):
"""Add a warning to the report."""
self.warnings.append(QualityIssue('warning', category, message, file, line))
def add_info(self, category: str, message: str, file: str = None, line: int = None):
"""Add info to the report."""
self.info.append(QualityIssue('info', category, message, file, line))
@property
def has_errors(self) -> bool:
"""Check if there are any errors."""
return len(self.errors) > 0
@property
def has_warnings(self) -> bool:
"""Check if there are any warnings."""
return len(self.warnings) > 0
@property
def is_excellent(self) -> bool:
"""Check if quality is excellent (no errors, no warnings)."""
return not self.has_errors and not self.has_warnings
@property
def quality_score(self) -> float:
"""Calculate quality score (0-100)."""
# Start with perfect score
score = 100.0
# Deduct points for issues
score -= len(self.errors) * 15 # -15 per error
score -= len(self.warnings) * 5 # -5 per warning
# Never go below 0
return max(0.0, score)
@property
def quality_grade(self) -> str:
"""Get quality grade (A-F)."""
score = self.quality_score
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
elif score >= 60:
return 'D'
else:
return 'F'
class SkillQualityChecker:
"""Validates skill quality and generates reports."""
def __init__(self, skill_dir: Path):
"""Initialize quality checker.
Args:
skill_dir: Path to skill directory
"""
self.skill_dir = Path(skill_dir)
self.skill_md_path = self.skill_dir / "SKILL.md"
self.references_dir = self.skill_dir / "references"
self.report = QualityReport(
skill_name=self.skill_dir.name,
skill_path=self.skill_dir
)
def check_all(self) -> QualityReport:
"""Run all quality checks and return report.
Returns:
QualityReport: Complete quality report
"""
# Basic structure checks
self._check_skill_structure()
# Enhancement verification
self._check_enhancement_quality()
# Content quality checks
self._check_content_quality()
# Link validation
self._check_links()
return self.report
def _check_skill_structure(self):
"""Check basic skill structure."""
# Check SKILL.md exists
if not self.skill_md_path.exists():
self.report.add_error(
'structure',
'SKILL.md file not found',
str(self.skill_md_path)
)
return
# Check references directory exists
if not self.references_dir.exists():
self.report.add_warning(
'structure',
'references/ directory not found - skill may be incomplete',
str(self.references_dir)
)
elif not list(self.references_dir.glob('*.md')):
self.report.add_warning(
'structure',
'references/ directory is empty - no reference documentation found',
str(self.references_dir)
)
def _check_enhancement_quality(self):
"""Check if SKILL.md was properly enhanced."""
if not self.skill_md_path.exists():
return
content = self.skill_md_path.read_text(encoding='utf-8')
# Check for template indicators (signs it wasn't enhanced)
template_indicators = [
"TODO:",
"[Add description]",
"[Framework specific tips]",
"coming soon",
]
for indicator in template_indicators:
if indicator.lower() in content.lower():
self.report.add_warning(
'enhancement',
f'Found template placeholder: "{indicator}" - SKILL.md may not be enhanced',
'SKILL.md'
)
# Check for good signs of enhancement
enhancement_indicators = {
'code_examples': re.compile(r'```[\w-]+\n', re.MULTILINE),
'real_examples': re.compile(r'Example:', re.IGNORECASE),
'sections': re.compile(r'^## .+', re.MULTILINE),
}
code_blocks = len(enhancement_indicators['code_examples'].findall(content))
real_examples = len(enhancement_indicators['real_examples'].findall(content))
sections = len(enhancement_indicators['sections'].findall(content))
# Quality thresholds
if code_blocks == 0:
self.report.add_warning(
'enhancement',
'No code examples found in SKILL.md - consider enhancing',
'SKILL.md'
)
elif code_blocks < 3:
self.report.add_info(
'enhancement',
f'Only {code_blocks} code examples found - more examples would improve quality',
'SKILL.md'
)
else:
self.report.add_info(
'enhancement',
f'✓ Found {code_blocks} code examples',
'SKILL.md'
)
if sections < 4:
self.report.add_warning(
'enhancement',
f'Only {sections} sections found - SKILL.md may be too basic',
'SKILL.md'
)
else:
self.report.add_info(
'enhancement',
f'✓ Found {sections} sections',
'SKILL.md'
)
def _check_content_quality(self):
"""Check content quality."""
if not self.skill_md_path.exists():
return
content = self.skill_md_path.read_text(encoding='utf-8')
# Check YAML frontmatter
if not content.startswith('---'):
self.report.add_error(
'content',
'Missing YAML frontmatter - SKILL.md must start with ---',
'SKILL.md',
1
)
else:
# Extract frontmatter
try:
frontmatter_match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
if frontmatter_match:
frontmatter = frontmatter_match.group(1)
# Check for required fields
if 'name:' not in frontmatter:
self.report.add_error(
'content',
'Missing "name:" field in YAML frontmatter',
'SKILL.md',
2
)
# Check for description
if 'description:' in frontmatter:
self.report.add_info(
'content',
'✓ YAML frontmatter includes description',
'SKILL.md'
)
else:
self.report.add_error(
'content',
'Invalid YAML frontmatter format',
'SKILL.md',
1
)
except Exception as e:
self.report.add_error(
'content',
f'Error parsing YAML frontmatter: {e}',
'SKILL.md',
1
)
# Check code block language tags
code_blocks_without_lang = re.findall(r'```\n[^`]', content)
if code_blocks_without_lang:
self.report.add_warning(
'content',
f'Found {len(code_blocks_without_lang)} code blocks without language tags',
'SKILL.md'
)
# Check for "When to Use" section
if 'when to use' not in content.lower():
self.report.add_warning(
'content',
'Missing "When to Use This Skill" section',
'SKILL.md'
)
else:
self.report.add_info(
'content',
'✓ Found "When to Use" section',
'SKILL.md'
)
# Check reference files
if self.references_dir.exists():
ref_files = list(self.references_dir.glob('*.md'))
if ref_files:
self.report.add_info(
'content',
f'✓ Found {len(ref_files)} reference files',
'references/'
)
# Check if references are mentioned in SKILL.md
mentioned_refs = 0
for ref_file in ref_files:
if ref_file.name in content:
mentioned_refs += 1
if mentioned_refs == 0:
self.report.add_warning(
'content',
'Reference files exist but none are mentioned in SKILL.md',
'SKILL.md'
)
def _check_links(self):
"""Check internal markdown links."""
if not self.skill_md_path.exists():
return
content = self.skill_md_path.read_text(encoding='utf-8')
# Find all markdown links [text](path)
link_pattern = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
links = link_pattern.findall(content)
broken_links = []
for text, link in links:
# Skip external links (http/https)
if link.startswith('http://') or link.startswith('https://'):
continue
# Skip anchor links
if link.startswith('#'):
continue
# Check if file exists (relative to SKILL.md)
link_path = self.skill_dir / link
if not link_path.exists():
broken_links.append((text, link))
if broken_links:
for text, link in broken_links:
self.report.add_warning(
'links',
f'Broken link: [{text}]({link})',
'SKILL.md'
)
else:
if links:
internal_links = [l for t, l in links if not l.startswith('http')]
if internal_links:
self.report.add_info(
'links',
f'✓ All {len(internal_links)} internal links are valid',
'SKILL.md'
)
def print_report(report: QualityReport, verbose: bool = False):
"""Print quality report to console.
Args:
report: Quality report to print
verbose: Show all info messages
"""
print("\n" + "=" * 60)
print(f"QUALITY REPORT: {report.skill_name}")
print("=" * 60)
print()
# Quality score
print(f"Quality Score: {report.quality_score:.1f}/100 (Grade: {report.quality_grade})")
print()
# Errors
if report.errors:
print(f"❌ ERRORS ({len(report.errors)}):")
for issue in report.errors:
location = f" ({issue.file}:{issue.line})" if issue.file and issue.line else f" ({issue.file})" if issue.file else ""
print(f" [{issue.category}] {issue.message}{location}")
print()
# Warnings
if report.warnings:
print(f"⚠️ WARNINGS ({len(report.warnings)}):")
for issue in report.warnings:
location = f" ({issue.file}:{issue.line})" if issue.file and issue.line else f" ({issue.file})" if issue.file else ""
print(f" [{issue.category}] {issue.message}{location}")
print()
# Info (only in verbose mode)
if verbose and report.info:
print(f" INFO ({len(report.info)}):")
for issue in report.info:
location = f" ({issue.file})" if issue.file else ""
print(f" [{issue.category}] {issue.message}{location}")
print()
# Summary
if report.is_excellent:
print("✅ EXCELLENT! No issues found.")
elif not report.has_errors:
print("✓ GOOD! No errors, but some warnings to review.")
else:
print("❌ NEEDS IMPROVEMENT! Please fix errors before packaging.")
print()
def main():
"""Main entry point."""
import argparse
parser = argparse.ArgumentParser(
description="Check skill quality and generate report",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Basic quality check
python3 quality_checker.py output/react/
# Verbose mode (show all info)
python3 quality_checker.py output/godot/ --verbose
# Exit with error code if issues found
python3 quality_checker.py output/django/ --strict
"""
)
parser.add_argument(
'skill_directory',
help='Path to skill directory (e.g., output/react/)'
)
parser.add_argument(
'--verbose', '-v',
action='store_true',
help='Show all info messages'
)
parser.add_argument(
'--strict',
action='store_true',
help='Exit with error code if any warnings or errors found'
)
args = parser.parse_args()
# Check if directory exists
skill_dir = Path(args.skill_directory)
if not skill_dir.exists():
print(f"❌ Directory not found: {skill_dir}")
sys.exit(1)
# Run quality checks
checker = SkillQualityChecker(skill_dir)
report = checker.check_all()
# Print report
print_report(report, verbose=args.verbose)
# Exit code
if args.strict and (report.has_errors or report.has_warnings):
sys.exit(1)
elif report.has_errors:
sys.exit(1)
else:
sys.exit(0)
if __name__ == "__main__":
main()

View File

@ -1,228 +0,0 @@
#!/usr/bin/env python3
"""
Test Runner for Skill Seeker
Runs all test suites and generates a comprehensive test report
"""
import sys
import unittest
import os
from io import StringIO
from pathlib import Path
class ColoredTextTestResult(unittest.TextTestResult):
"""Custom test result class with colored output"""
# ANSI color codes
GREEN = '\033[92m'
RED = '\033[91m'
YELLOW = '\033[93m'
BLUE = '\033[94m'
RESET = '\033[0m'
BOLD = '\033[1m'
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.test_results = []
def addSuccess(self, test):
super().addSuccess(test)
self.test_results.append(('PASS', test))
if self.showAll:
self.stream.write(f"{self.GREEN}✓ PASS{self.RESET}\n")
elif self.dots:
self.stream.write(f"{self.GREEN}.{self.RESET}")
self.stream.flush()
def addError(self, test, err):
super().addError(test, err)
self.test_results.append(('ERROR', test))
if self.showAll:
self.stream.write(f"{self.RED}✗ ERROR{self.RESET}\n")
elif self.dots:
self.stream.write(f"{self.RED}E{self.RESET}")
self.stream.flush()
def addFailure(self, test, err):
super().addFailure(test, err)
self.test_results.append(('FAIL', test))
if self.showAll:
self.stream.write(f"{self.RED}✗ FAIL{self.RESET}\n")
elif self.dots:
self.stream.write(f"{self.RED}F{self.RESET}")
self.stream.flush()
def addSkip(self, test, reason):
super().addSkip(test, reason)
self.test_results.append(('SKIP', test))
if self.showAll:
self.stream.write(f"{self.YELLOW}⊘ SKIP{self.RESET}\n")
elif self.dots:
self.stream.write(f"{self.YELLOW}s{self.RESET}")
self.stream.flush()
class ColoredTextTestRunner(unittest.TextTestRunner):
"""Custom test runner with colored output"""
resultclass = ColoredTextTestResult
def discover_tests(test_dir='tests'):
"""Discover all test files in the tests directory"""
loader = unittest.TestLoader()
start_dir = test_dir
pattern = 'test_*.py'
suite = loader.discover(start_dir, pattern=pattern)
return suite
def run_specific_suite(suite_name):
"""Run a specific test suite"""
loader = unittest.TestLoader()
suite_map = {
'config': 'tests.test_config_validation',
'features': 'tests.test_scraper_features',
'integration': 'tests.test_integration'
}
if suite_name not in suite_map:
print(f"Unknown test suite: {suite_name}")
print(f"Available suites: {', '.join(suite_map.keys())}")
return None
module_name = suite_map[suite_name]
try:
suite = loader.loadTestsFromName(module_name)
return suite
except Exception as e:
print(f"Error loading test suite '{suite_name}': {e}")
return None
def print_summary(result):
"""Print a detailed test summary"""
total = result.testsRun
passed = total - len(result.failures) - len(result.errors) - len(result.skipped)
failed = len(result.failures)
errors = len(result.errors)
skipped = len(result.skipped)
print("\n" + "="*70)
print("TEST SUMMARY")
print("="*70)
# Overall stats
print(f"\n{ColoredTextTestResult.BOLD}Total Tests:{ColoredTextTestResult.RESET} {total}")
print(f"{ColoredTextTestResult.GREEN}✓ Passed:{ColoredTextTestResult.RESET} {passed}")
if failed > 0:
print(f"{ColoredTextTestResult.RED}✗ Failed:{ColoredTextTestResult.RESET} {failed}")
if errors > 0:
print(f"{ColoredTextTestResult.RED}✗ Errors:{ColoredTextTestResult.RESET} {errors}")
if skipped > 0:
print(f"{ColoredTextTestResult.YELLOW}⊘ Skipped:{ColoredTextTestResult.RESET} {skipped}")
# Success rate
if total > 0:
success_rate = (passed / total) * 100
color = ColoredTextTestResult.GREEN if success_rate == 100 else \
ColoredTextTestResult.YELLOW if success_rate >= 80 else \
ColoredTextTestResult.RED
print(f"\n{color}Success Rate: {success_rate:.1f}%{ColoredTextTestResult.RESET}")
# Category breakdown
if hasattr(result, 'test_results'):
print(f"\n{ColoredTextTestResult.BOLD}Test Breakdown by Category:{ColoredTextTestResult.RESET}")
categories = {}
for status, test in result.test_results:
test_name = str(test)
# Extract test class name
if '.' in test_name:
class_name = test_name.split('.')[0].split()[-1]
if class_name not in categories:
categories[class_name] = {'PASS': 0, 'FAIL': 0, 'ERROR': 0, 'SKIP': 0}
categories[class_name][status] += 1
for category, stats in sorted(categories.items()):
total_cat = sum(stats.values())
passed_cat = stats['PASS']
print(f" {category}: {passed_cat}/{total_cat} passed")
print("\n" + "="*70)
# Return status
return failed == 0 and errors == 0
def main():
"""Main test runner"""
import argparse
parser = argparse.ArgumentParser(
description='Run tests for Skill Seeker',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument('--suite', '-s', type=str,
help='Run specific test suite (config, features, integration)')
parser.add_argument('--verbose', '-v', action='store_true',
help='Verbose output (show each test)')
parser.add_argument('--quiet', '-q', action='store_true',
help='Quiet output (minimal output)')
parser.add_argument('--failfast', '-f', action='store_true',
help='Stop on first failure')
parser.add_argument('--list', '-l', action='store_true',
help='List all available tests')
args = parser.parse_args()
# Set verbosity
verbosity = 1
if args.verbose:
verbosity = 2
elif args.quiet:
verbosity = 0
print(f"\n{ColoredTextTestResult.BOLD}{'='*70}{ColoredTextTestResult.RESET}")
print(f"{ColoredTextTestResult.BOLD}SKILL SEEKER TEST SUITE{ColoredTextTestResult.RESET}")
print(f"{ColoredTextTestResult.BOLD}{'='*70}{ColoredTextTestResult.RESET}\n")
# Discover or load specific suite
if args.suite:
print(f"Running test suite: {ColoredTextTestResult.BLUE}{args.suite}{ColoredTextTestResult.RESET}\n")
suite = run_specific_suite(args.suite)
if suite is None:
return 1
else:
print(f"Running {ColoredTextTestResult.BLUE}all tests{ColoredTextTestResult.RESET}\n")
suite = discover_tests()
# List tests
if args.list:
print("\nAvailable tests:\n")
for test_group in suite:
for test in test_group:
print(f" - {test}")
print()
return 0
# Run tests
runner = ColoredTextTestRunner(
verbosity=verbosity,
failfast=args.failfast
)
result = runner.run(suite)
# Print summary
success = print_summary(result)
# Return appropriate exit code
return 0 if success else 1
if __name__ == '__main__':
sys.exit(main())

View File

@ -1,320 +0,0 @@
#!/usr/bin/env python3
"""
Config Splitter for Large Documentation Sites
Splits large documentation configs into multiple smaller, focused skill configs.
Supports multiple splitting strategies: category-based, size-based, and automatic.
"""
import json
import sys
import argparse
from pathlib import Path
from typing import Dict, List, Any, Tuple
from collections import defaultdict
class ConfigSplitter:
"""Splits large documentation configs into multiple focused configs"""
def __init__(self, config_path: str, strategy: str = "auto", target_pages: int = 5000):
self.config_path = Path(config_path)
self.strategy = strategy
self.target_pages = target_pages
self.config = self.load_config()
self.base_name = self.config['name']
def load_config(self) -> Dict[str, Any]:
"""Load configuration from file"""
try:
with open(self.config_path, 'r') as f:
return json.load(f)
except FileNotFoundError:
print(f"❌ Error: Config file not found: {self.config_path}")
sys.exit(1)
except json.JSONDecodeError as e:
print(f"❌ Error: Invalid JSON in config file: {e}")
sys.exit(1)
def get_split_strategy(self) -> str:
"""Determine split strategy"""
# Check if strategy is defined in config
if 'split_strategy' in self.config:
config_strategy = self.config['split_strategy']
if config_strategy != "none":
return config_strategy
# Use provided strategy or auto-detect
if self.strategy == "auto":
max_pages = self.config.get('max_pages', 500)
if max_pages < 5000:
print(f" Small documentation ({max_pages} pages) - no splitting needed")
return "none"
elif max_pages < 10000 and 'categories' in self.config:
print(f" Medium documentation ({max_pages} pages) - category split recommended")
return "category"
elif 'categories' in self.config and len(self.config['categories']) >= 3:
print(f" Large documentation ({max_pages} pages) - router + categories recommended")
return "router"
else:
print(f" Large documentation ({max_pages} pages) - size-based split")
return "size"
return self.strategy
def split_by_category(self, create_router: bool = False) -> List[Dict[str, Any]]:
"""Split config by categories"""
if 'categories' not in self.config:
print("❌ Error: No categories defined in config")
sys.exit(1)
categories = self.config['categories']
split_categories = self.config.get('split_config', {}).get('split_by_categories')
# If specific categories specified, use only those
if split_categories:
categories = {k: v for k, v in categories.items() if k in split_categories}
configs = []
for category_name, keywords in categories.items():
# Create new config for this category
new_config = self.config.copy()
new_config['name'] = f"{self.base_name}-{category_name}"
new_config['description'] = f"{self.base_name.capitalize()} - {category_name.replace('_', ' ').title()}. {self.config.get('description', '')}"
# Update URL patterns to focus on this category
url_patterns = new_config.get('url_patterns', {})
# Add category keywords to includes
includes = url_patterns.get('include', [])
for keyword in keywords:
if keyword.startswith('/'):
includes.append(keyword)
if includes:
url_patterns['include'] = list(set(includes))
new_config['url_patterns'] = url_patterns
# Keep only this category
new_config['categories'] = {category_name: keywords}
# Remove split config from child
if 'split_strategy' in new_config:
del new_config['split_strategy']
if 'split_config' in new_config:
del new_config['split_config']
# Adjust max_pages estimate
if 'max_pages' in new_config:
new_config['max_pages'] = self.target_pages
configs.append(new_config)
print(f"✅ Created {len(configs)} category-based configs")
# Optionally create router config
if create_router:
router_config = self.create_router_config(configs)
configs.insert(0, router_config)
print(f"✅ Created router config: {router_config['name']}")
return configs
def split_by_size(self) -> List[Dict[str, Any]]:
"""Split config by size (page count)"""
max_pages = self.config.get('max_pages', 500)
num_splits = (max_pages + self.target_pages - 1) // self.target_pages
configs = []
for i in range(num_splits):
new_config = self.config.copy()
part_num = i + 1
new_config['name'] = f"{self.base_name}-part{part_num}"
new_config['description'] = f"{self.base_name.capitalize()} - Part {part_num}. {self.config.get('description', '')}"
new_config['max_pages'] = self.target_pages
# Remove split config from child
if 'split_strategy' in new_config:
del new_config['split_strategy']
if 'split_config' in new_config:
del new_config['split_config']
configs.append(new_config)
print(f"✅ Created {len(configs)} size-based configs ({self.target_pages} pages each)")
return configs
def create_router_config(self, sub_configs: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Create a router config that references sub-skills"""
router_name = self.config.get('split_config', {}).get('router_name', self.base_name)
router_config = {
"name": router_name,
"description": self.config.get('description', ''),
"base_url": self.config['base_url'],
"selectors": self.config['selectors'],
"url_patterns": self.config.get('url_patterns', {}),
"rate_limit": self.config.get('rate_limit', 0.5),
"max_pages": 500, # Router only needs overview pages
"_router": True,
"_sub_skills": [cfg['name'] for cfg in sub_configs],
"_routing_keywords": {
cfg['name']: list(cfg.get('categories', {}).keys())
for cfg in sub_configs
}
}
return router_config
def split(self) -> List[Dict[str, Any]]:
"""Execute split based on strategy"""
strategy = self.get_split_strategy()
print(f"\n{'='*60}")
print(f"CONFIG SPLITTER: {self.base_name}")
print(f"{'='*60}")
print(f"Strategy: {strategy}")
print(f"Target pages per skill: {self.target_pages}")
print("")
if strategy == "none":
print(" No splitting required")
return [self.config]
elif strategy == "category":
return self.split_by_category(create_router=False)
elif strategy == "router":
create_router = self.config.get('split_config', {}).get('create_router', True)
return self.split_by_category(create_router=create_router)
elif strategy == "size":
return self.split_by_size()
else:
print(f"❌ Error: Unknown strategy: {strategy}")
sys.exit(1)
def save_configs(self, configs: List[Dict[str, Any]], output_dir: Path = None) -> List[Path]:
"""Save configs to files"""
if output_dir is None:
output_dir = self.config_path.parent
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
saved_files = []
for config in configs:
filename = f"{config['name']}.json"
filepath = output_dir / filename
with open(filepath, 'w') as f:
json.dump(config, f, indent=2)
saved_files.append(filepath)
print(f" 💾 Saved: {filepath}")
return saved_files
def main():
parser = argparse.ArgumentParser(
description="Split large documentation configs into multiple focused skills",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Auto-detect strategy
python3 split_config.py configs/godot.json
# Use category-based split
python3 split_config.py configs/godot.json --strategy category
# Use router + categories
python3 split_config.py configs/godot.json --strategy router
# Custom target size
python3 split_config.py configs/godot.json --target-pages 3000
# Dry run (don't save files)
python3 split_config.py configs/godot.json --dry-run
Split Strategies:
none - No splitting (single skill)
auto - Automatically choose best strategy
category - Split by categories defined in config
router - Create router + category-based sub-skills
size - Split by page count
"""
)
parser.add_argument(
'config',
help='Path to config file (e.g., configs/godot.json)'
)
parser.add_argument(
'--strategy',
choices=['auto', 'none', 'category', 'router', 'size'],
default='auto',
help='Splitting strategy (default: auto)'
)
parser.add_argument(
'--target-pages',
type=int,
default=5000,
help='Target pages per skill (default: 5000)'
)
parser.add_argument(
'--output-dir',
help='Output directory for configs (default: same as input)'
)
parser.add_argument(
'--dry-run',
action='store_true',
help='Show what would be created without saving files'
)
args = parser.parse_args()
# Create splitter
splitter = ConfigSplitter(args.config, args.strategy, args.target_pages)
# Split config
configs = splitter.split()
if args.dry_run:
print(f"\n{'='*60}")
print("DRY RUN - No files saved")
print(f"{'='*60}")
print(f"Would create {len(configs)} config files:")
for cfg in configs:
is_router = cfg.get('_router', False)
router_marker = " (ROUTER)" if is_router else ""
print(f" 📄 {cfg['name']}.json{router_marker}")
else:
print(f"\n{'='*60}")
print("SAVING CONFIGS")
print(f"{'='*60}")
saved_files = splitter.save_configs(configs, args.output_dir)
print(f"\n{'='*60}")
print("NEXT STEPS")
print(f"{'='*60}")
print("1. Review generated configs")
print("2. Scrape each config:")
for filepath in saved_files:
print(f" skill-seekers scrape --config {filepath}")
print("3. Package skills:")
print(" skill-seekers-package-multi configs/<name>-*.json")
print("")
if __name__ == "__main__":
main()

View File

@ -1,192 +0,0 @@
#!/usr/bin/env python3
"""
Simple Integration Tests for Unified Multi-Source Scraper
Focuses on real-world usage patterns rather than unit tests.
"""
import os
import sys
import json
import tempfile
from pathlib import Path
# Add CLI to path
sys.path.insert(0, str(Path(__file__).parent))
from .config_validator import validate_config
def test_validate_existing_unified_configs():
"""Test that all existing unified configs are valid"""
configs_dir = Path(__file__).parent.parent / 'configs'
unified_configs = [
'godot_unified.json',
'react_unified.json',
'django_unified.json',
'fastapi_unified.json'
]
for config_name in unified_configs:
config_path = configs_dir / config_name
if config_path.exists():
print(f"\n✓ Validating {config_name}...")
validator = validate_config(str(config_path))
assert validator.is_unified, f"{config_name} should be unified format"
assert validator.needs_api_merge(), f"{config_name} should need API merging"
print(f" Sources: {len(validator.config['sources'])}")
print(f" Merge mode: {validator.config.get('merge_mode')}")
def test_backward_compatibility():
"""Test that legacy configs still work"""
configs_dir = Path(__file__).parent.parent / 'configs'
legacy_configs = [
'react.json',
'godot.json',
'django.json'
]
for config_name in legacy_configs:
config_path = configs_dir / config_name
if config_path.exists():
print(f"\n✓ Validating legacy {config_name}...")
validator = validate_config(str(config_path))
assert not validator.is_unified, f"{config_name} should be legacy format"
print(f" Format: Legacy")
def test_create_temp_unified_config():
"""Test creating a unified config from scratch"""
config = {
"name": "test_unified",
"description": "Test unified config",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://example.com/docs",
"extract_api": True,
"max_pages": 50
},
{
"type": "github",
"repo": "test/repo",
"include_code": True,
"code_analysis_depth": "surface"
}
]
}
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
json.dump(config, f)
config_path = f.name
try:
print("\n✓ Validating temp unified config...")
validator = validate_config(config_path)
assert validator.is_unified
assert validator.needs_api_merge()
assert len(validator.config['sources']) == 2
print(" ✓ Config is valid unified format")
print(f" Sources: {len(validator.config['sources'])}")
finally:
os.unlink(config_path)
def test_mixed_source_types():
"""Test config with documentation, GitHub, and PDF sources"""
config = {
"name": "test_mixed",
"description": "Test mixed sources",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://example.com"
},
{
"type": "github",
"repo": "test/repo"
},
{
"type": "pdf",
"path": "/path/to/manual.pdf"
}
]
}
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
json.dump(config, f)
config_path = f.name
try:
print("\n✓ Validating mixed source types...")
validator = validate_config(config_path)
assert validator.is_unified
assert len(validator.config['sources']) == 3
# Check each source type
source_types = [s['type'] for s in validator.config['sources']]
assert 'documentation' in source_types
assert 'github' in source_types
assert 'pdf' in source_types
print(" ✓ All 3 source types validated")
finally:
os.unlink(config_path)
def test_config_validation_errors():
"""Test that invalid configs are rejected"""
# Invalid source type
config = {
"name": "test",
"description": "Test",
"sources": [
{"type": "invalid_type", "url": "https://example.com"}
]
}
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
json.dump(config, f)
config_path = f.name
try:
print("\n✓ Testing invalid source type...")
try:
# validate_config() calls .validate() automatically
validator = validate_config(config_path)
assert False, "Should have raised error for invalid source type"
except ValueError as e:
assert "Invalid" in str(e) or "invalid" in str(e)
print(" ✓ Invalid source type correctly rejected")
finally:
os.unlink(config_path)
# Run tests
if __name__ == '__main__':
print("=" * 60)
print("Running Unified Scraper Integration Tests")
print("=" * 60)
try:
test_validate_existing_unified_configs()
test_backward_compatibility()
test_create_temp_unified_config()
test_mixed_source_types()
test_config_validation_errors()
print("\n" + "=" * 60)
print("✅ All integration tests passed!")
print("=" * 60)
except AssertionError as e:
print(f"\n❌ Test failed: {e}")
sys.exit(1)
except Exception as e:
print(f"\n❌ Unexpected error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)

View File

@ -1,450 +0,0 @@
#!/usr/bin/env python3
"""
Unified Multi-Source Scraper
Orchestrates scraping from multiple sources (documentation, GitHub, PDF),
detects conflicts, merges intelligently, and builds unified skills.
This is the main entry point for unified config workflow.
Usage:
skill-seekers unified --config configs/godot_unified.json
skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
"""
import os
import sys
import json
import logging
import argparse
import subprocess
from pathlib import Path
from typing import Dict, List, Any, Optional
# Import validators and scrapers
try:
from config_validator import ConfigValidator, validate_config
from conflict_detector import ConflictDetector
from merge_sources import RuleBasedMerger, ClaudeEnhancedMerger
from unified_skill_builder import UnifiedSkillBuilder
except ImportError as e:
print(f"Error importing modules: {e}")
print("Make sure you're running from the project root directory")
sys.exit(1)
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
class UnifiedScraper:
"""
Orchestrates multi-source scraping and merging.
Main workflow:
1. Load and validate unified config
2. Scrape all sources (docs, GitHub, PDF)
3. Detect conflicts between sources
4. Merge intelligently (rule-based or Claude-enhanced)
5. Build unified skill
"""
def __init__(self, config_path: str, merge_mode: Optional[str] = None):
"""
Initialize unified scraper.
Args:
config_path: Path to unified config JSON
merge_mode: Override config merge_mode ('rule-based' or 'claude-enhanced')
"""
self.config_path = config_path
# Validate and load config
logger.info(f"Loading config: {config_path}")
self.validator = validate_config(config_path)
self.config = self.validator.config
# Determine merge mode
self.merge_mode = merge_mode or self.config.get('merge_mode', 'rule-based')
logger.info(f"Merge mode: {self.merge_mode}")
# Storage for scraped data
self.scraped_data = {}
# Output paths
self.name = self.config['name']
self.output_dir = f"output/{self.name}"
self.data_dir = f"output/{self.name}_unified_data"
os.makedirs(self.output_dir, exist_ok=True)
os.makedirs(self.data_dir, exist_ok=True)
def scrape_all_sources(self):
"""
Scrape all configured sources.
Routes to appropriate scraper based on source type.
"""
logger.info("=" * 60)
logger.info("PHASE 1: Scraping all sources")
logger.info("=" * 60)
if not self.validator.is_unified:
logger.warning("Config is not unified format, converting...")
self.config = self.validator.convert_legacy_to_unified()
sources = self.config.get('sources', [])
for i, source in enumerate(sources):
source_type = source['type']
logger.info(f"\n[{i+1}/{len(sources)}] Scraping {source_type} source...")
try:
if source_type == 'documentation':
self._scrape_documentation(source)
elif source_type == 'github':
self._scrape_github(source)
elif source_type == 'pdf':
self._scrape_pdf(source)
else:
logger.warning(f"Unknown source type: {source_type}")
except Exception as e:
logger.error(f"Error scraping {source_type}: {e}")
logger.info("Continuing with other sources...")
logger.info(f"\n✅ Scraped {len(self.scraped_data)} sources successfully")
def _scrape_documentation(self, source: Dict[str, Any]):
"""Scrape documentation website."""
# Create temporary config for doc scraper
doc_config = {
'name': f"{self.name}_docs",
'base_url': source['base_url'],
'selectors': source.get('selectors', {}),
'url_patterns': source.get('url_patterns', {}),
'categories': source.get('categories', {}),
'rate_limit': source.get('rate_limit', 0.5),
'max_pages': source.get('max_pages', 100)
}
# Write temporary config
temp_config_path = os.path.join(self.data_dir, 'temp_docs_config.json')
with open(temp_config_path, 'w') as f:
json.dump(doc_config, f, indent=2)
# Run doc_scraper as subprocess
logger.info(f"Scraping documentation from {source['base_url']}")
doc_scraper_path = Path(__file__).parent / "doc_scraper.py"
cmd = [sys.executable, str(doc_scraper_path), '--config', temp_config_path]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
logger.error(f"Documentation scraping failed: {result.stderr}")
return
# Load scraped data
docs_data_file = f"output/{doc_config['name']}_data/summary.json"
if os.path.exists(docs_data_file):
with open(docs_data_file, 'r') as f:
summary = json.load(f)
self.scraped_data['documentation'] = {
'pages': summary.get('pages', []),
'data_file': docs_data_file
}
logger.info(f"✅ Documentation: {summary.get('total_pages', 0)} pages scraped")
else:
logger.warning("Documentation data file not found")
# Clean up temp config
if os.path.exists(temp_config_path):
os.remove(temp_config_path)
def _scrape_github(self, source: Dict[str, Any]):
"""Scrape GitHub repository."""
sys.path.insert(0, str(Path(__file__).parent))
try:
from github_scraper import GitHubScraper
except ImportError:
logger.error("github_scraper.py not found")
return
# Create config for GitHub scraper
github_config = {
'repo': source['repo'],
'name': f"{self.name}_github",
'github_token': source.get('github_token'),
'include_issues': source.get('include_issues', True),
'max_issues': source.get('max_issues', 100),
'include_changelog': source.get('include_changelog', True),
'include_releases': source.get('include_releases', True),
'include_code': source.get('include_code', True),
'code_analysis_depth': source.get('code_analysis_depth', 'surface'),
'file_patterns': source.get('file_patterns', []),
'local_repo_path': source.get('local_repo_path') # Pass local_repo_path from config
}
# Scrape
logger.info(f"Scraping GitHub repository: {source['repo']}")
scraper = GitHubScraper(github_config)
github_data = scraper.scrape()
# Save data
github_data_file = os.path.join(self.data_dir, 'github_data.json')
with open(github_data_file, 'w') as f:
json.dump(github_data, f, indent=2, ensure_ascii=False)
self.scraped_data['github'] = {
'data': github_data,
'data_file': github_data_file
}
logger.info(f"✅ GitHub: Repository scraped successfully")
def _scrape_pdf(self, source: Dict[str, Any]):
"""Scrape PDF document."""
sys.path.insert(0, str(Path(__file__).parent))
try:
from pdf_scraper import PDFToSkillConverter
except ImportError:
logger.error("pdf_scraper.py not found")
return
# Create config for PDF scraper
pdf_config = {
'name': f"{self.name}_pdf",
'pdf': source['path'],
'extract_tables': source.get('extract_tables', False),
'ocr': source.get('ocr', False),
'password': source.get('password')
}
# Scrape
logger.info(f"Scraping PDF: {source['path']}")
converter = PDFToSkillConverter(pdf_config)
pdf_data = converter.extract_all()
# Save data
pdf_data_file = os.path.join(self.data_dir, 'pdf_data.json')
with open(pdf_data_file, 'w') as f:
json.dump(pdf_data, f, indent=2, ensure_ascii=False)
self.scraped_data['pdf'] = {
'data': pdf_data,
'data_file': pdf_data_file
}
logger.info(f"✅ PDF: {len(pdf_data.get('pages', []))} pages extracted")
def detect_conflicts(self) -> List:
"""
Detect conflicts between documentation and code.
Only applicable if both documentation and GitHub sources exist.
Returns:
List of conflicts
"""
logger.info("\n" + "=" * 60)
logger.info("PHASE 2: Detecting conflicts")
logger.info("=" * 60)
if not self.validator.needs_api_merge():
logger.info("No API merge needed (only one API source)")
return []
# Get documentation and GitHub data
docs_data = self.scraped_data.get('documentation', {})
github_data = self.scraped_data.get('github', {})
if not docs_data or not github_data:
logger.warning("Missing documentation or GitHub data for conflict detection")
return []
# Load data files
with open(docs_data['data_file'], 'r') as f:
docs_json = json.load(f)
with open(github_data['data_file'], 'r') as f:
github_json = json.load(f)
# Detect conflicts
detector = ConflictDetector(docs_json, github_json)
conflicts = detector.detect_all_conflicts()
# Save conflicts
conflicts_file = os.path.join(self.data_dir, 'conflicts.json')
detector.save_conflicts(conflicts, conflicts_file)
# Print summary
summary = detector.generate_summary(conflicts)
logger.info(f"\n📊 Conflict Summary:")
logger.info(f" Total: {summary['total']}")
logger.info(f" By Type:")
for ctype, count in summary['by_type'].items():
if count > 0:
logger.info(f" - {ctype}: {count}")
logger.info(f" By Severity:")
for severity, count in summary['by_severity'].items():
if count > 0:
emoji = '🔴' if severity == 'high' else '🟡' if severity == 'medium' else '🟢'
logger.info(f" {emoji} {severity}: {count}")
return conflicts
def merge_sources(self, conflicts: List):
"""
Merge data from multiple sources.
Args:
conflicts: List of detected conflicts
"""
logger.info("\n" + "=" * 60)
logger.info(f"PHASE 3: Merging sources ({self.merge_mode})")
logger.info("=" * 60)
if not conflicts:
logger.info("No conflicts to merge")
return None
# Get data files
docs_data = self.scraped_data.get('documentation', {})
github_data = self.scraped_data.get('github', {})
# Load data
with open(docs_data['data_file'], 'r') as f:
docs_json = json.load(f)
with open(github_data['data_file'], 'r') as f:
github_json = json.load(f)
# Choose merger
if self.merge_mode == 'claude-enhanced':
merger = ClaudeEnhancedMerger(docs_json, github_json, conflicts)
else:
merger = RuleBasedMerger(docs_json, github_json, conflicts)
# Merge
merged_data = merger.merge_all()
# Save merged data
merged_file = os.path.join(self.data_dir, 'merged_data.json')
with open(merged_file, 'w') as f:
json.dump(merged_data, f, indent=2, ensure_ascii=False)
logger.info(f"✅ Merged data saved: {merged_file}")
return merged_data
def build_skill(self, merged_data: Optional[Dict] = None):
"""
Build final unified skill.
Args:
merged_data: Merged API data (if conflicts were resolved)
"""
logger.info("\n" + "=" * 60)
logger.info("PHASE 4: Building unified skill")
logger.info("=" * 60)
# Load conflicts if they exist
conflicts = []
conflicts_file = os.path.join(self.data_dir, 'conflicts.json')
if os.path.exists(conflicts_file):
with open(conflicts_file, 'r') as f:
conflicts_data = json.load(f)
conflicts = conflicts_data.get('conflicts', [])
# Build skill
builder = UnifiedSkillBuilder(
self.config,
self.scraped_data,
merged_data,
conflicts
)
builder.build()
logger.info(f"✅ Unified skill built: {self.output_dir}/")
def run(self):
"""
Execute complete unified scraping workflow.
"""
logger.info("\n" + "🚀 " * 20)
logger.info(f"Unified Scraper: {self.config['name']}")
logger.info("🚀 " * 20 + "\n")
try:
# Phase 1: Scrape all sources
self.scrape_all_sources()
# Phase 2: Detect conflicts (if applicable)
conflicts = self.detect_conflicts()
# Phase 3: Merge sources (if conflicts exist)
merged_data = None
if conflicts:
merged_data = self.merge_sources(conflicts)
# Phase 4: Build skill
self.build_skill(merged_data)
logger.info("\n" + "" * 20)
logger.info("Unified scraping complete!")
logger.info("" * 20 + "\n")
logger.info(f"📁 Output: {self.output_dir}/")
logger.info(f"📁 Data: {self.data_dir}/")
except KeyboardInterrupt:
logger.info("\n\n⚠️ Scraping interrupted by user")
sys.exit(1)
except Exception as e:
logger.error(f"\n\n❌ Error during scraping: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description='Unified multi-source scraper',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Basic usage with unified config
skill-seekers unified --config configs/godot_unified.json
# Override merge mode
skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
# Backward compatible with legacy configs
skill-seekers unified --config configs/react.json
"""
)
parser.add_argument('--config', '-c', required=True,
help='Path to unified config JSON file')
parser.add_argument('--merge-mode', '-m',
choices=['rule-based', 'claude-enhanced'],
help='Override config merge mode')
args = parser.parse_args()
# Create and run scraper
scraper = UnifiedScraper(args.config, args.merge_mode)
scraper.run()
if __name__ == '__main__':
main()

View File

@ -1,444 +0,0 @@
#!/usr/bin/env python3
"""
Unified Skill Builder
Generates final skill structure from merged multi-source data:
- SKILL.md with merged APIs and conflict warnings
- references/ with organized content by source
- Inline conflict markers ()
- Separate conflicts summary section
Supports mixed sources (documentation, GitHub, PDF) and highlights
discrepancies transparently.
"""
import os
import json
import logging
from pathlib import Path
from typing import Dict, List, Any, Optional
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class UnifiedSkillBuilder:
"""
Builds unified skill from multi-source data.
"""
def __init__(self, config: Dict, scraped_data: Dict,
merged_data: Optional[Dict] = None, conflicts: Optional[List] = None):
"""
Initialize skill builder.
Args:
config: Unified config dict
scraped_data: Dict of scraped data by source type
merged_data: Merged API data (if conflicts were resolved)
conflicts: List of detected conflicts
"""
self.config = config
self.scraped_data = scraped_data
self.merged_data = merged_data
self.conflicts = conflicts or []
self.name = config['name']
self.description = config['description']
self.skill_dir = f"output/{self.name}"
# Create directories
os.makedirs(self.skill_dir, exist_ok=True)
os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
def build(self):
"""Build complete skill structure."""
logger.info(f"Building unified skill: {self.name}")
# Generate main SKILL.md
self._generate_skill_md()
# Generate reference files by source
self._generate_references()
# Generate conflicts report (if any)
if self.conflicts:
self._generate_conflicts_report()
logger.info(f"✅ Unified skill built: {self.skill_dir}/")
def _generate_skill_md(self):
"""Generate main SKILL.md file."""
skill_path = os.path.join(self.skill_dir, 'SKILL.md')
# Generate skill name (lowercase, hyphens only, max 64 chars)
skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
# Truncate description to 1024 chars if needed
desc = self.description[:1024] if len(self.description) > 1024 else self.description
content = f"""---
name: {skill_name}
description: {desc}
---
# {self.name.title()}
{self.description}
## 📚 Sources
This skill combines knowledge from multiple sources:
"""
# List sources
for source in self.config.get('sources', []):
source_type = source['type']
if source_type == 'documentation':
content += f"- ✅ **Documentation**: {source.get('base_url', 'N/A')}\n"
content += f" - Pages: {source.get('max_pages', 'unlimited')}\n"
elif source_type == 'github':
content += f"- ✅ **GitHub Repository**: {source.get('repo', 'N/A')}\n"
content += f" - Code Analysis: {source.get('code_analysis_depth', 'surface')}\n"
content += f" - Issues: {source.get('max_issues', 0)}\n"
elif source_type == 'pdf':
content += f"- ✅ **PDF Document**: {source.get('path', 'N/A')}\n"
# Data quality section
if self.conflicts:
content += f"\n## ⚠️ Data Quality\n\n"
content += f"**{len(self.conflicts)} conflicts detected** between sources.\n\n"
# Count by type
by_type = {}
for conflict in self.conflicts:
ctype = conflict.type if hasattr(conflict, 'type') else conflict.get('type', 'unknown')
by_type[ctype] = by_type.get(ctype, 0) + 1
content += "**Conflict Breakdown:**\n"
for ctype, count in by_type.items():
content += f"- {ctype}: {count}\n"
content += f"\nSee `references/conflicts.md` for detailed conflict information.\n"
# Merged API section (if available)
if self.merged_data:
content += self._format_merged_apis()
# Quick reference from each source
content += "\n## 📖 Reference Documentation\n\n"
content += "Organized by source:\n\n"
for source in self.config.get('sources', []):
source_type = source['type']
content += f"- [{source_type.title()}](references/{source_type}/)\n"
# When to use this skill
content += f"\n## 💡 When to Use This Skill\n\n"
content += f"Use this skill when you need to:\n"
content += f"- Understand how to use {self.name}\n"
content += f"- Look up API documentation\n"
content += f"- Find usage examples\n"
if 'github' in self.scraped_data:
content += f"- Check for known issues or recent changes\n"
content += f"- Review release history\n"
content += "\n---\n\n"
content += "*Generated by Skill Seeker's unified multi-source scraper*\n"
with open(skill_path, 'w', encoding='utf-8') as f:
f.write(content)
logger.info(f"Created SKILL.md")
def _format_merged_apis(self) -> str:
"""Format merged APIs section with inline conflict warnings."""
if not self.merged_data:
return ""
content = "\n## 🔧 API Reference\n\n"
content += "*Merged from documentation and code analysis*\n\n"
apis = self.merged_data.get('apis', {})
if not apis:
return content + "*No APIs to display*\n"
# Group APIs by status
matched = {k: v for k, v in apis.items() if v.get('status') == 'matched'}
conflicts = {k: v for k, v in apis.items() if v.get('status') == 'conflict'}
docs_only = {k: v for k, v in apis.items() if v.get('status') == 'docs_only'}
code_only = {k: v for k, v in apis.items() if v.get('status') == 'code_only'}
# Show matched APIs first
if matched:
content += "### ✅ Verified APIs\n\n"
content += "*Documentation and code agree*\n\n"
for api_name, api_data in list(matched.items())[:10]: # Limit to first 10
content += self._format_api_entry(api_data, inline_conflict=False)
# Show conflicting APIs with warnings
if conflicts:
content += "\n### ⚠️ APIs with Conflicts\n\n"
content += "*Documentation and code differ*\n\n"
for api_name, api_data in list(conflicts.items())[:10]:
content += self._format_api_entry(api_data, inline_conflict=True)
# Show undocumented APIs
if code_only:
content += f"\n### 💻 Undocumented APIs\n\n"
content += f"*Found in code but not in documentation ({len(code_only)} total)*\n\n"
for api_name, api_data in list(code_only.items())[:5]:
content += self._format_api_entry(api_data, inline_conflict=False)
# Show removed/missing APIs
if docs_only:
content += f"\n### 📖 Documentation-Only APIs\n\n"
content += f"*Documented but not found in code ({len(docs_only)} total)*\n\n"
for api_name, api_data in list(docs_only.items())[:5]:
content += self._format_api_entry(api_data, inline_conflict=False)
content += f"\n*See references/api/ for complete API documentation*\n"
return content
def _format_api_entry(self, api_data: Dict, inline_conflict: bool = False) -> str:
"""Format a single API entry."""
name = api_data.get('name', 'Unknown')
signature = api_data.get('merged_signature', name)
description = api_data.get('merged_description', '')
warning = api_data.get('warning', '')
entry = f"#### `{signature}`\n\n"
if description:
entry += f"{description}\n\n"
# Add inline conflict warning
if inline_conflict and warning:
entry += f"⚠️ **Conflict**: {warning}\n\n"
# Show both versions if available
conflict = api_data.get('conflict', {})
if conflict:
docs_info = conflict.get('docs_info')
code_info = conflict.get('code_info')
if docs_info and code_info:
entry += "**Documentation says:**\n"
entry += f"```\n{docs_info.get('raw_signature', 'N/A')}\n```\n\n"
entry += "**Code implementation:**\n"
entry += f"```\n{self._format_code_signature(code_info)}\n```\n\n"
# Add source info
source = api_data.get('source', 'unknown')
entry += f"*Source: {source}*\n\n"
entry += "---\n\n"
return entry
def _format_code_signature(self, code_info: Dict) -> str:
"""Format code signature for display."""
name = code_info.get('name', '')
params = code_info.get('parameters', [])
return_type = code_info.get('return_type')
param_strs = []
for param in params:
param_str = param.get('name', '')
if param.get('type_hint'):
param_str += f": {param['type_hint']}"
if param.get('default'):
param_str += f" = {param['default']}"
param_strs.append(param_str)
sig = f"{name}({', '.join(param_strs)})"
if return_type:
sig += f" -> {return_type}"
return sig
def _generate_references(self):
"""Generate reference files organized by source."""
logger.info("Generating reference files...")
# Generate references for each source type
if 'documentation' in self.scraped_data:
self._generate_docs_references()
if 'github' in self.scraped_data:
self._generate_github_references()
if 'pdf' in self.scraped_data:
self._generate_pdf_references()
# Generate merged API reference if available
if self.merged_data:
self._generate_merged_api_reference()
def _generate_docs_references(self):
"""Generate references from documentation source."""
docs_dir = os.path.join(self.skill_dir, 'references', 'documentation')
os.makedirs(docs_dir, exist_ok=True)
# Create index
index_path = os.path.join(docs_dir, 'index.md')
with open(index_path, 'w') as f:
f.write("# Documentation\n\n")
f.write("Reference from official documentation.\n\n")
logger.info("Created documentation references")
def _generate_github_references(self):
"""Generate references from GitHub source."""
github_dir = os.path.join(self.skill_dir, 'references', 'github')
os.makedirs(github_dir, exist_ok=True)
github_data = self.scraped_data['github']['data']
# Create README reference
if github_data.get('readme'):
readme_path = os.path.join(github_dir, 'README.md')
with open(readme_path, 'w') as f:
f.write("# Repository README\n\n")
f.write(github_data['readme'])
# Create issues reference
if github_data.get('issues'):
issues_path = os.path.join(github_dir, 'issues.md')
with open(issues_path, 'w') as f:
f.write("# GitHub Issues\n\n")
f.write(f"{len(github_data['issues'])} recent issues.\n\n")
for issue in github_data['issues'][:20]:
f.write(f"## #{issue['number']}: {issue['title']}\n\n")
f.write(f"**State**: {issue['state']}\n")
if issue.get('labels'):
f.write(f"**Labels**: {', '.join(issue['labels'])}\n")
f.write(f"**URL**: {issue.get('url', 'N/A')}\n\n")
# Create releases reference
if github_data.get('releases'):
releases_path = os.path.join(github_dir, 'releases.md')
with open(releases_path, 'w') as f:
f.write("# Releases\n\n")
for release in github_data['releases'][:10]:
f.write(f"## {release['tag_name']}: {release.get('name', 'N/A')}\n\n")
f.write(f"**Published**: {release.get('published_at', 'N/A')[:10]}\n\n")
if release.get('body'):
f.write(release['body'][:500])
f.write("\n\n")
logger.info("Created GitHub references")
def _generate_pdf_references(self):
"""Generate references from PDF source."""
pdf_dir = os.path.join(self.skill_dir, 'references', 'pdf')
os.makedirs(pdf_dir, exist_ok=True)
# Create index
index_path = os.path.join(pdf_dir, 'index.md')
with open(index_path, 'w') as f:
f.write("# PDF Documentation\n\n")
f.write("Reference from PDF document.\n\n")
logger.info("Created PDF references")
def _generate_merged_api_reference(self):
"""Generate merged API reference file."""
api_dir = os.path.join(self.skill_dir, 'references', 'api')
os.makedirs(api_dir, exist_ok=True)
api_path = os.path.join(api_dir, 'merged_api.md')
with open(api_path, 'w') as f:
f.write("# Merged API Reference\n\n")
f.write("*Combined from documentation and code analysis*\n\n")
apis = self.merged_data.get('apis', {})
for api_name in sorted(apis.keys()):
api_data = apis[api_name]
entry = self._format_api_entry(api_data, inline_conflict=True)
f.write(entry)
logger.info(f"Created merged API reference ({len(apis)} APIs)")
def _generate_conflicts_report(self):
"""Generate detailed conflicts report."""
conflicts_path = os.path.join(self.skill_dir, 'references', 'conflicts.md')
with open(conflicts_path, 'w') as f:
f.write("# Conflict Report\n\n")
f.write(f"Found **{len(self.conflicts)}** conflicts between sources.\n\n")
# Group by severity
high = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'high') or c.get('severity') == 'high']
medium = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'medium') or c.get('severity') == 'medium']
low = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'low') or c.get('severity') == 'low']
f.write("## Severity Breakdown\n\n")
f.write(f"- 🔴 **High**: {len(high)} (action required)\n")
f.write(f"- 🟡 **Medium**: {len(medium)} (review recommended)\n")
f.write(f"- 🟢 **Low**: {len(low)} (informational)\n\n")
# List high severity conflicts
if high:
f.write("## 🔴 High Severity\n\n")
f.write("*These conflicts require immediate attention*\n\n")
for conflict in high:
api_name = conflict.api_name if hasattr(conflict, 'api_name') else conflict.get('api_name', 'Unknown')
diff = conflict.difference if hasattr(conflict, 'difference') else conflict.get('difference', 'N/A')
f.write(f"### {api_name}\n\n")
f.write(f"**Issue**: {diff}\n\n")
# List medium severity
if medium:
f.write("## 🟡 Medium Severity\n\n")
for conflict in medium[:20]: # Limit to 20
api_name = conflict.api_name if hasattr(conflict, 'api_name') else conflict.get('api_name', 'Unknown')
diff = conflict.difference if hasattr(conflict, 'difference') else conflict.get('difference', 'N/A')
f.write(f"### {api_name}\n\n")
f.write(f"{diff}\n\n")
logger.info(f"Created conflicts report")
if __name__ == '__main__':
# Test with mock data
import sys
if len(sys.argv) < 2:
print("Usage: python unified_skill_builder.py <config.json>")
sys.exit(1)
config_path = sys.argv[1]
with open(config_path, 'r') as f:
config = json.load(f)
# Mock scraped data
scraped_data = {
'github': {
'data': {
'readme': '# Test Repository',
'issues': [],
'releases': []
}
}
}
builder = UnifiedSkillBuilder(config, scraped_data)
builder.build()
print(f"\n✅ Test skill built in: output/{config['name']}/")

View File

@ -1,175 +0,0 @@
#!/usr/bin/env python3
"""
Automatic Skill Uploader
Uploads a skill .zip file to Claude using the Anthropic API
Usage:
# Set API key (one-time)
export ANTHROPIC_API_KEY=sk-ant-...
# Upload skill
python3 upload_skill.py output/react.zip
python3 upload_skill.py output/godot.zip
"""
import os
import sys
import json
import argparse
from pathlib import Path
# Import utilities
try:
from utils import (
get_api_key,
get_upload_url,
print_upload_instructions,
validate_zip_file
)
except ImportError:
sys.path.insert(0, str(Path(__file__).parent))
from utils import (
get_api_key,
get_upload_url,
print_upload_instructions,
validate_zip_file
)
def upload_skill_api(zip_path):
"""
Upload skill to Claude via Anthropic API
Args:
zip_path: Path to skill .zip file
Returns:
tuple: (success, message)
"""
# Check for requests library
try:
import requests
except ImportError:
return False, "requests library not installed. Run: pip install requests"
# Validate zip file
is_valid, error_msg = validate_zip_file(zip_path)
if not is_valid:
return False, error_msg
# Get API key
api_key = get_api_key()
if not api_key:
return False, "ANTHROPIC_API_KEY not set. Run: export ANTHROPIC_API_KEY=sk-ant-..."
zip_path = Path(zip_path)
skill_name = zip_path.stem
print(f"📤 Uploading skill: {skill_name}")
print(f" Source: {zip_path}")
print(f" Size: {zip_path.stat().st_size:,} bytes")
print()
# Prepare API request
api_url = "https://api.anthropic.com/v1/skills"
headers = {
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
"anthropic-beta": "skills-2025-10-02"
}
try:
# Read zip file
with open(zip_path, 'rb') as f:
zip_data = f.read()
# Upload skill
print("⏳ Uploading to Anthropic API...")
files = {
'files[]': (zip_path.name, zip_data, 'application/zip')
}
response = requests.post(
api_url,
headers=headers,
files=files,
timeout=60
)
# Check response
if response.status_code == 200:
print()
print("✅ Skill uploaded successfully!")
print()
print("Your skill is now available in Claude at:")
print(f" {get_upload_url()}")
print()
return True, "Upload successful"
elif response.status_code == 401:
return False, "Authentication failed. Check your ANTHROPIC_API_KEY"
elif response.status_code == 400:
error_msg = response.json().get('error', {}).get('message', 'Unknown error')
return False, f"Invalid skill format: {error_msg}"
else:
error_msg = response.json().get('error', {}).get('message', 'Unknown error')
return False, f"Upload failed ({response.status_code}): {error_msg}"
except requests.exceptions.Timeout:
return False, "Upload timed out. Try again or use manual upload"
except requests.exceptions.ConnectionError:
return False, "Connection error. Check your internet connection"
except Exception as e:
return False, f"Unexpected error: {str(e)}"
def main():
parser = argparse.ArgumentParser(
description="Upload a skill .zip file to Claude via Anthropic API",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Setup:
1. Get your Anthropic API key from https://console.anthropic.com/
2. Set the API key:
export ANTHROPIC_API_KEY=sk-ant-...
Examples:
# Upload skill
python3 upload_skill.py output/react.zip
# Upload with explicit path
python3 upload_skill.py /path/to/skill.zip
Requirements:
- ANTHROPIC_API_KEY environment variable must be set
- requests library (pip install requests)
"""
)
parser.add_argument(
'zip_file',
help='Path to skill .zip file (e.g., output/react.zip)'
)
args = parser.parse_args()
# Upload skill
success, message = upload_skill_api(args.zip_file)
if success:
sys.exit(0)
else:
print(f"\n❌ Upload failed: {message}")
print()
print("📝 Manual upload instructions:")
print_upload_instructions(args.zip_file)
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -1,224 +0,0 @@
#!/usr/bin/env python3
"""
Utility functions for Skill Seeker CLI tools
"""
import os
import sys
import subprocess
import platform
from pathlib import Path
from typing import Optional, Tuple, Dict, Union
def open_folder(folder_path: Union[str, Path]) -> bool:
"""
Open a folder in the system file browser
Args:
folder_path: Path to folder to open
Returns:
bool: True if successful, False otherwise
"""
folder_path = Path(folder_path).resolve()
if not folder_path.exists():
print(f"⚠️ Folder not found: {folder_path}")
return False
system = platform.system()
try:
if system == "Linux":
# Try xdg-open first (standard)
subprocess.run(["xdg-open", str(folder_path)], check=True)
elif system == "Darwin": # macOS
subprocess.run(["open", str(folder_path)], check=True)
elif system == "Windows":
subprocess.run(["explorer", str(folder_path)], check=True)
else:
print(f"⚠️ Unknown operating system: {system}")
return False
return True
except subprocess.CalledProcessError:
print(f"⚠️ Could not open folder automatically")
return False
except FileNotFoundError:
print(f"⚠️ File browser not found on system")
return False
def has_api_key() -> bool:
"""
Check if ANTHROPIC_API_KEY is set in environment
Returns:
bool: True if API key is set, False otherwise
"""
api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
return len(api_key) > 0
def get_api_key() -> Optional[str]:
"""
Get ANTHROPIC_API_KEY from environment
Returns:
str: API key or None if not set
"""
api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
return api_key if api_key else None
def get_upload_url() -> str:
"""
Get the Claude skills upload URL
Returns:
str: Claude skills upload URL
"""
return "https://claude.ai/skills"
def print_upload_instructions(zip_path: Union[str, Path]) -> None:
"""
Print clear upload instructions for manual upload
Args:
zip_path: Path to the .zip file to upload
"""
zip_path = Path(zip_path)
print()
print("╔══════════════════════════════════════════════════════════╗")
print("║ NEXT STEP ║")
print("╚══════════════════════════════════════════════════════════╝")
print()
print(f"📤 Upload to Claude: {get_upload_url()}")
print()
print(f"1. Go to {get_upload_url()}")
print("2. Click \"Upload Skill\"")
print(f"3. Select: {zip_path}")
print("4. Done! ✅")
print()
def format_file_size(size_bytes: int) -> str:
"""
Format file size in human-readable format
Args:
size_bytes: Size in bytes
Returns:
str: Formatted size (e.g., "45.3 KB")
"""
if size_bytes < 1024:
return f"{size_bytes} bytes"
elif size_bytes < 1024 * 1024:
return f"{size_bytes / 1024:.1f} KB"
else:
return f"{size_bytes / (1024 * 1024):.1f} MB"
def validate_skill_directory(skill_dir: Union[str, Path]) -> Tuple[bool, Optional[str]]:
"""
Validate that a directory is a valid skill directory
Args:
skill_dir: Path to skill directory
Returns:
tuple: (is_valid, error_message)
"""
skill_path = Path(skill_dir)
if not skill_path.exists():
return False, f"Directory not found: {skill_dir}"
if not skill_path.is_dir():
return False, f"Not a directory: {skill_dir}"
skill_md = skill_path / "SKILL.md"
if not skill_md.exists():
return False, f"SKILL.md not found in {skill_dir}"
return True, None
def validate_zip_file(zip_path: Union[str, Path]) -> Tuple[bool, Optional[str]]:
"""
Validate that a file is a valid skill .zip file
Args:
zip_path: Path to .zip file
Returns:
tuple: (is_valid, error_message)
"""
zip_path = Path(zip_path)
if not zip_path.exists():
return False, f"File not found: {zip_path}"
if not zip_path.is_file():
return False, f"Not a file: {zip_path}"
if not zip_path.suffix == '.zip':
return False, f"Not a .zip file: {zip_path}"
return True, None
def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, preview_limit: int = 40000) -> Dict[str, str]:
"""Read reference files from a skill directory with size limits.
This function reads markdown files from the references/ subdirectory
of a skill, applying both per-file and total content limits.
Args:
skill_dir (str or Path): Path to skill directory
max_chars (int): Maximum total characters to read (default: 100000)
preview_limit (int): Maximum characters per file (default: 40000)
Returns:
dict: Dictionary mapping filename to content
Example:
>>> refs = read_reference_files('output/react/', max_chars=50000)
>>> len(refs)
5
"""
from pathlib import Path
skill_path = Path(skill_dir)
references_dir = skill_path / "references"
references: Dict[str, str] = {}
if not references_dir.exists():
print(f"⚠ No references directory found at {references_dir}")
return references
total_chars = 0
for ref_file in sorted(references_dir.glob("*.md")):
if ref_file.name == "index.md":
continue
content = ref_file.read_text(encoding='utf-8')
# Limit size per file
if len(content) > preview_limit:
content = content[:preview_limit] + "\n\n[Content truncated...]"
references[ref_file.name] = content
total_chars += len(content)
# Stop if we've read enough
if total_chars > max_chars:
print(f" Limiting input to {max_chars:,} characters")
break
return references

View File

@ -1,27 +0,0 @@
"""Skill Seekers MCP (Model Context Protocol) server package.
This package provides MCP server integration for Claude Code, allowing
natural language interaction with Skill Seekers tools.
Main modules:
- server: MCP server implementation with 9 tools
Available MCP Tools:
- list_configs: List all available preset configurations
- generate_config: Generate a new config file for any docs site
- validate_config: Validate a config file structure
- estimate_pages: Estimate page count before scraping
- scrape_docs: Scrape and build a skill
- package_skill: Package skill into .zip file (with auto-upload)
- upload_skill: Upload .zip to Claude
- split_config: Split large documentation configs
- generate_router: Generate router/hub skills
Usage:
The MCP server is typically run by Claude Code via configuration
in ~/.config/claude-code/mcp.json
"""
__version__ = "2.0.0"
__all__ = []

View File

@ -1,9 +0,0 @@
# MCP Server dependencies
mcp>=1.0.0
# CLI tool dependencies (shared)
requests>=2.31.0
beautifulsoup4>=4.12.0
# Optional: for API-based enhancement
# anthropic>=0.18.0

View File

@ -1,19 +0,0 @@
"""MCP tools subpackage.
This package will contain modularized MCP tool implementations.
Planned structure (for future refactoring):
- scraping_tools.py: Tools for scraping (estimate_pages, scrape_docs)
- building_tools.py: Tools for building (package_skill, validate_config)
- deployment_tools.py: Tools for deployment (upload_skill)
- config_tools.py: Tools for configs (list_configs, generate_config)
- advanced_tools.py: Advanced tools (split_config, generate_router)
Current state:
All tools are currently implemented in mcp/server.py
This directory is a placeholder for future modularization.
"""
__version__ = "2.0.0"
__all__ = []

View File

@ -1,68 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# ==================== Purpose ====================
# Bootstraps a local venv for the vendored Skill Seekers source code.
#
# Output:
# - Creates: assets/skills/skills-skills/scripts/.venv-skill-seekers/
usage() {
cat <<'EOF'
Usage:
skill-seekers-bootstrap.sh [--venv <dir>]
Examples:
./assets/skills/skills-skills/scripts/skill-seekers-bootstrap.sh
./assets/skills/skills-skills/scripts/skill-seekers-bootstrap.sh --venv ./assets/skills/skills-skills/scripts/.venv-skill-seekers
EOF
}
die() {
echo "Error: $*" >&2
exit 1
}
script_dir="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
tool_dir="${script_dir}/Skill_Seekers-development"
default_venv="${script_dir}/.venv-skill-seekers"
venv_dir="$default_venv"
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
usage
exit 0
;;
--venv)
[[ $# -ge 2 ]] || die "--venv requires a directory argument"
venv_dir="$2"
shift 2
;;
--)
shift
break
;;
-*)
die "Unknown argument: $1 (use --help)"
;;
*)
die "Unexpected positional argument: $1 (use --help)"
;;
esac
done
[[ -d "$tool_dir" ]] || die "Missing vendored tool dir: $tool_dir"
[[ -f "$tool_dir/requirements.txt" ]] || die "Missing requirements.txt: $tool_dir/requirements.txt"
command -v python3 >/dev/null 2>&1 || die "python3 not found"
if [[ ! -d "$venv_dir" ]]; then
python3 -m venv "$venv_dir"
fi
"$venv_dir/bin/python" -m pip install --upgrade pip >/dev/null
"$venv_dir/bin/pip" install -r "$tool_dir/requirements.txt"
echo "OK: venv ready: $venv_dir"

View File

@ -1 +0,0 @@
Skill_Seekers-development/configs

View File

@ -1,80 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# ==================== Purpose ====================
# Import Skill Seekers output/NAME/ into this repo's assets/skills/NAME/.
usage() {
cat <<'EOF'
Usage:
skill-seekers-import.sh <skill-name> [--force]
Behavior:
- Source: ./output/<skill-name>/
- Dest: ./assets/skills/<skill-name>/
- By default, refuses to overwrite an existing assets/skills/<skill-name>/SKILL.md
Examples:
./assets/skills/skills-skills/scripts/skill-seekers-import.sh react
./assets/skills/skills-skills/scripts/skill-seekers-import.sh react --force
EOF
}
die() {
echo "Error: $*" >&2
exit 1
}
force=0
skill_name=""
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
usage
exit 0
;;
--force)
force=1
shift
;;
--)
shift
break
;;
-*)
die "Unknown argument: $1 (use --help)"
;;
*)
if [[ -z "$skill_name" ]]; then
skill_name="$1"
shift
else
die "Extra argument: $1 (only one <skill-name> is allowed)"
fi
;;
esac
done
[[ -n "$skill_name" ]] || { usage; exit 1; }
if [[ ! "$skill_name" =~ ^[a-z][a-z0-9-]*$ ]]; then
die "skill-name must match ^[a-z][a-z0-9-]*$ (e.g. my-skill)"
fi
repo_root="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/../../../.." && pwd)"
src_dir="${repo_root}/output/${skill_name}"
dest_dir="${repo_root}/assets/skills/${skill_name}"
[[ -d "$src_dir" ]] || die "Missing Skill Seekers output dir: $src_dir"
[[ -f "$src_dir/SKILL.md" ]] || die "Missing output SKILL.md: $src_dir/SKILL.md"
mkdir -p "$dest_dir"
if [[ -f "$dest_dir/SKILL.md" && "$force" -ne 1 ]]; then
die "Refusing to overwrite existing: $dest_dir/SKILL.md (use --force)"
fi
rsync -a --delete "$src_dir"/ "$dest_dir"/
echo "OK: imported to: $dest_dir"

View File

@ -1 +0,0 @@
Skill_Seekers-development/src

View File

@ -1,117 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# ==================== Purpose ====================
# Update the vendored Skill Seekers source snapshot inside this repo.
#
# Notes:
# - This keeps ONLY "source + configs + runtime manifests" to avoid importing upstream Markdown docs
# (which would affect this repo's markdownlint).
usage() {
cat <<'EOF'
Usage:
skill-seekers-update.sh [--repo <owner/repo>] [--ref <git-ref>] [--dry-run]
Defaults:
--repo yusufkaraaslan/Skill_Seekers
--ref main
Examples:
./assets/skills/skills-skills/scripts/skill-seekers-update.sh
./assets/skills/skills-skills/scripts/skill-seekers-update.sh --ref v2.1.1
./assets/skills/skills-skills/scripts/skill-seekers-update.sh --dry-run
EOF
}
die() {
echo "Error: $*" >&2
exit 1
}
script_dir="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
target_dir="${script_dir}/Skill_Seekers-development"
repo="yusufkaraaslan/Skill_Seekers"
ref="main"
dry_run=0
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
usage
exit 0
;;
--repo)
[[ $# -ge 2 ]] || die "--repo requires an argument like owner/repo"
repo="$2"
shift 2
;;
--ref)
[[ $# -ge 2 ]] || die "--ref requires a git ref (branch/tag/commit)"
ref="$2"
shift 2
;;
--dry-run)
dry_run=1
shift
;;
--)
shift
break
;;
*)
die "Unknown argument: $1 (use --help)"
;;
esac
done
command -v curl >/dev/null 2>&1 || die "curl not found"
command -v tar >/dev/null 2>&1 || die "tar not found"
command -v rsync >/dev/null 2>&1 || die "rsync not found"
tmp_dir="$(mktemp -d)"
cleanup() { rm -rf "$tmp_dir"; }
trap cleanup EXIT
archive_url="https://codeload.github.com/${repo}/tar.gz/${ref}"
archive_path="${tmp_dir}/skill-seekers.tgz"
curl -fsSL "$archive_url" -o "$archive_path"
tar -xzf "$archive_path" -C "$tmp_dir"
extracted_root="$(find "$tmp_dir" -mindepth 1 -maxdepth 1 -type d | head -n 1)"
[[ -n "$extracted_root" ]] || die "Failed to locate extracted archive root"
if [[ "$dry_run" -eq 1 ]]; then
echo "DRY RUN:"
echo " repo: $repo"
echo " ref: $ref"
echo " from: $extracted_root"
echo " to: $target_dir"
exit 0
fi
mkdir -p "$target_dir"
rsync -a --delete \
--exclude '.git' \
--exclude '*.md' \
--exclude 'docs/' \
--exclude 'tests/' \
--exclude '.claude/' \
--exclude '.gitignore' \
--exclude 'CHANGELOG.md' \
--exclude 'ROADMAP.md' \
--exclude 'FUTURE_RELEASES.md' \
--exclude 'ASYNC_SUPPORT.md' \
--exclude 'STRUCTURE.md' \
--exclude 'CONTRIBUTING.md' \
--exclude 'QUICKSTART.md' \
--exclude 'BULLETPROOF_QUICKSTART.md' \
--exclude 'FLEXIBLE_ROADMAP.md' \
"$extracted_root"/ \
"$target_dir"/
echo "OK: updated vendored source in: $target_dir"

View File

@ -1,65 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
# ==================== Purpose ====================
# Run Skill Seekers from vendored source with a local venv.
#
# This script does NOT auto-install dependencies.
# Run skill-seekers-bootstrap.sh once if you see ImportError.
usage() {
cat <<'EOF'
Usage:
skill-seekers.sh [--venv <dir>] -- <skill-seekers args...>
Examples:
./assets/skills/skills-skills/scripts/skill-seekers.sh -- --version
./assets/skills/skills-skills/scripts/skill-seekers.sh -- scrape --config ./assets/skills/skills-skills/scripts/Skill_Seekers-development/configs/react.json
./assets/skills/skills-skills/scripts/skill-seekers.sh -- github --repo facebook/react --name react
EOF
}
die() {
echo "Error: $*" >&2
exit 1
}
script_dir="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
tool_dir="${script_dir}/Skill_Seekers-development"
tool_src="${tool_dir}/src"
default_venv="${script_dir}/.venv-skill-seekers"
venv_dir="$default_venv"
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
usage
exit 0
;;
--venv)
[[ $# -ge 2 ]] || die "--venv requires a directory argument"
venv_dir="$2"
shift 2
;;
--)
shift
break
;;
*)
die "Expected '--' before skill-seekers arguments (use --help)"
;;
esac
done
[[ -d "$tool_src" ]] || die "Missing vendored source dir: $tool_src"
python_bin="python3"
if [[ -x "$venv_dir/bin/python" ]]; then
python_bin="$venv_dir/bin/python"
fi
export PYTHONPATH="$tool_src${PYTHONPATH:+:$PYTHONPATH}"
exec "$python_bin" -m skill_seekers.cli.main "$@"

View File

@ -1,50 +0,0 @@
# ACCEPTANCE — 精密验收标准
## 原子断言Atomic Assertions
### A1. CI markdownlint 不再硬失败
- Verify
- `test -f .github/lint_config.json`
- `markdownlint --config .github/lint_config.json '**/*.md'`
- Expected
- 不再出现 `Cannot read or parse config file '.github/lint_config.json': ENOENT`
- 命令退出码为 0
### A2. 本地 `make lint` 与 CI 行为一致
- Verify
- `make lint`
- `markdownlint --config .github/lint_config.json '**/*.md'`
- Expected
- 两者 lint 的覆盖范围一致(至少包含 `assets/documents/**`、`assets/skills/**` 等深层 Markdown
- 退出码一致(都为 0
### A3. 关键入口指引不再引用旧路径(最小集)
- Verify
- `rg -n "cp -f config/\\.codex" assets/config/.codex/README.md`
- `rg -n "\\./skills/skills-skills" assets/skills/skills-skills/references -S`
- Expected
- 上述 grep/rg 均无匹配(或仅在“明确标注为历史示例”的段落中出现,并有解释)
### A4. 忽略规则与新结构一致
- Verify
- `rg -n "^assets/repo/backups/gz/" .gitignore`(或等价忽略规则)
- `git status --porcelain=v1`
- Expected
- `.gitignore` 能覆盖 `assets/repo/backups/gz/`
- `git status` 不再因为该目录出现未跟踪噪音(除非用户明确想纳入版本控制)
## 边缘路径Edge Cases至少 3 个)
1. 在没有启用 `globstar``/bin/sh` 环境下执行 `make lint` 仍能递归 lint通过“引用 glob 交给 markdownlint”解决
2. `assets/repo/` 下第三方镜像的 Markdown 仍然存在违规时lint 策略不会逼迫去改第三方大量文件(通过 `ignorePatterns` 或限定 lint 范围解决,需在 PLAN 明确选择)。
3. 新增任意 `assets/documents/**.md` 后,`make lint` 必定能扫到(通过新增一个临时 md 文件自测,或用 `markdownlint --debug` 验证匹配)。
## 禁止性准则Anti-Goals
- 不以“删除大段内容/关闭整个 lint”来换 CI 绿。
- 不修改 `.github/workflows/*.yml`(除非证明仅靠配置文件无法修复,且得到明确授权)。

View File

@ -1,54 +0,0 @@
# CONTEXT — 迁移后 lint/路径问题图谱
## 现状追溯Live Evidence
### 1) CI markdownlint 配置缺失(硬失败)
- CI 命令(来自 `.github/workflows/ci.yml`
`markdownlint --config .github/lint_config.json '**/*.md'`
- 现场输出(本机复现):
```text
Cannot read or parse config file '.github/lint_config.json': ENOENT: no such file or directory, open '.github/lint_config.json'
```
### 2) 本地 `make lint` 与 CI 不一致(假通过)
- `Makefile` 当前 lint 命令:`markdownlint **/*.md`
- 在 `/bin/sh -> dash` 下,`**/*.md` 只会匹配“单层目录的 md”不会递归覆盖 `assets/**`
- 结果:`make lint` 可能返回 0但 CI 会真正 lint 全仓并失败。
### 3) 关键“操作指引”仍引用旧路径
- `assets/config/.codex/README.md` 仍要求复制 `config/.codex/...`(实际路径已迁到 `assets/config/.codex/...`)。
- `assets/skills/skills-skills/references/*.md` 示例仍写 `./skills/...`(实际应为 `./assets/skills/...`)。
### 4) 忽略规则偏差导致工作区污染
- `.gitignore` 仍忽略 `backups/gz/`(旧位置),但当前备份落在 `assets/repo/backups/gz/`
- 现场信号:`git status` 出现 `?? assets/repo/backups/gz/`
## 约束矩阵(从仓库 AGENTS.md/资产规范提取)
| 约束 | 来源 | 含义 |
|---|---|---|
| 不自动修改 `.github/workflows/*.yml` | 根 `AGENTS.md` | 优先“补配置/改命令”而不是改 CI 工作流 |
| 不删除或覆盖 `assets/repo/backups/gz/` 存档 | 根 `AGENTS.md` | 不清理现有 `.tar.gz`,只能通过 ignore/流程避免污染 |
| `assets/repo/` 第三方镜像少改动 | `assets/AGENTS.md` | 仅在影响入口/指引时做最小修改 |
## 风险量化表
| 风险点 | 严重程度 | 触发信号 (Signal) | 缓解方案 (Mitigation) |
| :--- | :--- | :--- | :--- |
| 通过“放宽 lint 配置”掩盖真实问题 | Medium | CI 绿但文档质量下降、后续难以收敛 | 配置要“最小放宽”,并在 PLAN 中记录哪些规则被禁用及原因 |
| 为了 lint 大规模重排文档引入链接/引用破坏 | High | lychee/link-checker 或手工打开出现断链 | 优先改配置与关键入口文档;如果必须改文档,限定范围并每步做 link/rg 校验 |
| 继续生成备份产物污染工作区 | Medium | `git status` 持续出现 `assets/repo/backups/gz/` | `.gitignore` 增加 `assets/repo/backups/gz/`,并在脚本说明中明确输出位置 |
## 假设与证伪(执行 Agent 必跑)
| 假设 | 默认假设 | 证伪命令 |
|---|---|---|
| CI 失败主因是缺 `.github/lint_config.json` | 是 | `ls -la .github/lint_config.json` |
| 修复 `.github/lint_config.json` 后仍会有 lint 违规 | 是(已见多条) | `markdownlint --config .github/lint_config.json '**/*.md'` |
| `make lint` 未覆盖 `assets/**` | 是 | `make -n lint` + 对比 `markdownlint '**/*.md'` 的输出范围 |

Some files were not shown because too many files have changed in this diff Show More