Initial POC: Developer Intelligence knowledge graph
- SQLite backend with file/repo/relationship entities - tree-sitter Go AST parser for deterministic import detection - Ollama doc generation with concurrent batch processing - MCP server (FastMCP) for Claude Code integration - Merge simulation with staleness cascade - Lazy refresh for stale relationship and repo docs - CLAUDE.md for agent context
This commit is contained in:
9
.mcp.json
Normal file
9
.mcp.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"dev-intel": {
|
||||
"command": "python",
|
||||
"args": ["mcp_server.py"],
|
||||
"cwd": "."
|
||||
}
|
||||
}
|
||||
}
|
||||
38
CLAUDE.md
Normal file
38
CLAUDE.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Developer Intelligence Knowledge Graph
|
||||
|
||||
This project contains a knowledge graph of LLM-generated documentation for a Go codebase. Instead of reading raw source files, query the knowledge graph via MCP tools.
|
||||
|
||||
## How to use
|
||||
|
||||
**Always prefer the knowledge graph over reading raw files.** The graph contains pre-generated English documentation for every file and relationship in the codebase.
|
||||
|
||||
### Query patterns
|
||||
|
||||
- "What does X do?" → `get_file_doc("path/to/file.go")`
|
||||
- "How do X and Y interact?" → `get_relationship("file_a.go", "file_b.go")`
|
||||
- "What's this project about?" → `get_repo_overview()`
|
||||
- "What depends on X?" → `get_dependents("path/to/file.go")`
|
||||
- "What does X depend on?" → `get_dependencies("path/to/file.go")`
|
||||
- "Find anything about routing" → `search_docs("routing")`
|
||||
- "What's outdated?" → `get_stale_docs()`
|
||||
- "How big is the graph?" → `get_graph_stats()`
|
||||
|
||||
### Schema
|
||||
|
||||
The knowledge graph has three entity types:
|
||||
|
||||
- **File** — one per source file. Has `documentation` (English description of what the file does), `staleness` (fresh or stale), `prev_documentation` (previous version after a merge).
|
||||
- **Repo** — one per repository. Has `documentation` (project-level summary composed from file docs).
|
||||
- **Relationship** — import edges between files. Has `documentation` (how the two files interact), `staleness`.
|
||||
|
||||
### Staleness
|
||||
|
||||
When a file changes, its documentation is regenerated immediately. All downstream relationships and the repo summary are marked **stale** — meaning the code has changed but the doc hasn't been regenerated yet. Stale docs are still returned but flagged with `[STALE]`.
|
||||
|
||||
### Tips
|
||||
|
||||
- Start broad (`get_repo_overview`) then drill into specifics (`get_file_doc`, `get_dependents`).
|
||||
- Use `search_docs` for concept-level queries ("middleware", "authentication", "error handling").
|
||||
- If a doc says `[STALE]`, the underlying code changed since it was generated. Mention this to the user.
|
||||
- `get_dependents` is the impact analysis tool — "what breaks if I change this file?"
|
||||
- File paths are relative to the repo root (e.g., `echo.go`, `middleware/compress.go`).
|
||||
132
README.md
Normal file
132
README.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Developer Intelligence POC
|
||||
|
||||
A local proof-of-concept that builds a living knowledge graph from a Go codebase. Every file gets LLM-generated documentation. Every relationship gets documented. Everything stays current on every merge. Query it from Claude Code via MCP.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cd dev-intel-poc
|
||||
pip install -r requirements.txt
|
||||
python ingest.py # clone echo, parse, generate docs (~15-20 min)
|
||||
claude --mcp-config .mcp.json # start Claude Code with the knowledge graph
|
||||
```
|
||||
|
||||
Then ask Claude Code:
|
||||
- "How does routing work in echo?"
|
||||
- "What files depend on context.go?"
|
||||
- "Give me an overview of this project"
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.11+
|
||||
- Ollama running at `192.168.86.172:11434` with `qwen2.5:7b`
|
||||
- Claude Code CLI (`claude`)
|
||||
- git
|
||||
|
||||
## Demo Walkthrough
|
||||
|
||||
### Act 1: "Here's what your codebase knows about itself"
|
||||
|
||||
After `python ingest.py` completes, start Claude Code:
|
||||
|
||||
```bash
|
||||
claude --mcp-config .mcp.json
|
||||
```
|
||||
|
||||
Ask it:
|
||||
```
|
||||
> What does echo.go do?
|
||||
> How does echo.go interact with router.go?
|
||||
> Give me an overview of the whole project
|
||||
> What files depend on context.go?
|
||||
> Search for anything related to "middleware"
|
||||
```
|
||||
|
||||
Every answer comes from LLM-generated documentation stored in the knowledge graph — not from reading raw source code.
|
||||
|
||||
### Act 2: "Now someone pushes a change"
|
||||
|
||||
In another terminal:
|
||||
```bash
|
||||
python simulate_merge.py echo.go
|
||||
```
|
||||
|
||||
This:
|
||||
1. Regenerates echo.go's documentation (reflects the new code)
|
||||
2. Marks all relationships involving echo.go as STALE
|
||||
3. Marks the repo summary as STALE
|
||||
|
||||
Back in Claude Code:
|
||||
```
|
||||
> What does echo.go do? # fresh doc — mentions the new tracing feature
|
||||
> What's the repo overview? # shows [STALE] — knows it's outdated
|
||||
> Show me all stale docs # lists everything that needs refresh
|
||||
```
|
||||
|
||||
### Act 3: "The system heals itself"
|
||||
|
||||
```bash
|
||||
python refresh_stale.py
|
||||
```
|
||||
|
||||
Back in Claude Code:
|
||||
```
|
||||
> What's the repo overview? # fresh again — rewritten to include new capabilities
|
||||
> Show me all stale docs # "Everything is fresh!"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
ingest.py ──→ repos/target/ (git clone)
|
||||
│ │
|
||||
│ parser.py (tree-sitter AST)
|
||||
│ │
|
||||
│ docgen.py (Ollama qwen2.5:7b)
|
||||
│ │
|
||||
└──────→ devintel.db (SQLite)
|
||||
│
|
||||
mcp_server.py ──→ Claude Code
|
||||
```
|
||||
|
||||
No Docker. No external databases. One SQLite file. One MCP server.
|
||||
|
||||
## MCP Tools
|
||||
|
||||
| Tool | What it does |
|
||||
|---|---|
|
||||
| `get_file_doc(path)` | Read a file's generated documentation |
|
||||
| `get_relationship(a, b)` | How two files interact |
|
||||
| `get_repo_overview()` | Project-level summary |
|
||||
| `get_dependents(path)` | What breaks if you change this file |
|
||||
| `get_dependencies(path)` | What this file depends on |
|
||||
| `search_docs(query)` | Keyword search across all docs |
|
||||
| `get_stale_docs()` | List outdated documentation |
|
||||
| `get_graph_stats()` | File count, relationship count, staleness |
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
dev-intel-poc/
|
||||
├── requirements.txt # Python deps
|
||||
├── .mcp.json # Claude Code MCP config
|
||||
├── ingest.py # Main ingestion pipeline
|
||||
├── simulate_merge.py # Simulate a code change
|
||||
├── refresh_stale.py # Refresh stale docs
|
||||
├── db.py # SQLite backend
|
||||
├── parser.py # tree-sitter Go AST parser
|
||||
├── docgen.py # Ollama doc generation
|
||||
├── mcp_server.py # MCP server for Claude Code
|
||||
└── devintel.db # Generated — the knowledge graph
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
| Env Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `OLLAMA_URL` | `http://192.168.86.172:11434` | Ollama endpoint |
|
||||
| `OLLAMA_MODEL` | `qwen2.5:7b` | Model for doc generation |
|
||||
| `TARGET_REPO` | `https://github.com/labstack/echo.git` | Repo to ingest |
|
||||
| `MAX_CONCURRENT` | `4` | Parallel Ollama requests |
|
||||
| `DEVINTEL_DB` | `./devintel.db` | SQLite database path |
|
||||
| `REPO_DIR` | `./repos/target` | Cloned repo location |
|
||||
229
db.py
Normal file
229
db.py
Normal file
@@ -0,0 +1,229 @@
|
||||
"""SQLite backend for Developer Intelligence POC."""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
from datetime import datetime
|
||||
|
||||
DB_PATH = os.environ.get("DEVINTEL_DB", os.path.join(os.path.dirname(__file__), "devintel.db"))
|
||||
|
||||
|
||||
def get_db() -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute("PRAGMA foreign_keys=ON")
|
||||
return conn
|
||||
|
||||
|
||||
def init_db():
|
||||
"""Create tables if they don't exist."""
|
||||
conn = get_db()
|
||||
conn.executescript("""
|
||||
CREATE TABLE IF NOT EXISTS repos (
|
||||
name TEXT PRIMARY KEY,
|
||||
url TEXT,
|
||||
language TEXT,
|
||||
documentation TEXT,
|
||||
staleness TEXT DEFAULT 'fresh',
|
||||
updated_at TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS files (
|
||||
path TEXT PRIMARY KEY,
|
||||
repo TEXT REFERENCES repos(name),
|
||||
language TEXT,
|
||||
documentation TEXT,
|
||||
prev_documentation TEXT,
|
||||
functions TEXT, -- JSON array
|
||||
last_commit TEXT,
|
||||
staleness TEXT DEFAULT 'fresh',
|
||||
updated_at TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS relationships (
|
||||
from_file TEXT REFERENCES files(path),
|
||||
to_file TEXT REFERENCES files(path),
|
||||
rel_type TEXT DEFAULT 'IMPORTS',
|
||||
documentation TEXT,
|
||||
staleness TEXT DEFAULT 'fresh',
|
||||
updated_at TEXT,
|
||||
PRIMARY KEY (from_file, to_file, rel_type)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_files_repo ON files(repo);
|
||||
CREATE INDEX IF NOT EXISTS idx_files_staleness ON files(staleness);
|
||||
CREATE INDEX IF NOT EXISTS idx_rels_staleness ON relationships(staleness);
|
||||
CREATE INDEX IF NOT EXISTS idx_rels_to ON relationships(to_file);
|
||||
""")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
|
||||
class GraphDB:
|
||||
def __init__(self):
|
||||
init_db()
|
||||
self.conn = get_db()
|
||||
|
||||
def create_repo(self, name: str, url: str, language: str, documentation: str):
|
||||
self.conn.execute(
|
||||
"""INSERT INTO repos (name, url, language, documentation, staleness, updated_at)
|
||||
VALUES (?, ?, ?, ?, 'fresh', ?)
|
||||
ON CONFLICT(name) DO UPDATE SET
|
||||
url=excluded.url, language=excluded.language,
|
||||
documentation=excluded.documentation, staleness='fresh',
|
||||
updated_at=excluded.updated_at""",
|
||||
(name, url, language, documentation, datetime.utcnow().isoformat()),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def create_file(self, path: str, repo: str, language: str, documentation: str,
|
||||
functions: list[str], commit: str = "initial"):
|
||||
import json
|
||||
self.conn.execute(
|
||||
"""INSERT INTO files (path, repo, language, documentation, functions, last_commit, staleness, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, 'fresh', ?)
|
||||
ON CONFLICT(path) DO UPDATE SET
|
||||
repo=excluded.repo, language=excluded.language,
|
||||
documentation=excluded.documentation, functions=excluded.functions,
|
||||
last_commit=excluded.last_commit, staleness='fresh',
|
||||
updated_at=excluded.updated_at""",
|
||||
(path, repo, language, documentation, json.dumps(functions), commit,
|
||||
datetime.utcnow().isoformat()),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def create_relationship(self, from_file: str, to_file: str, rel_type: str = "IMPORTS",
|
||||
documentation: str = ""):
|
||||
self.conn.execute(
|
||||
"""INSERT INTO relationships (from_file, to_file, rel_type, documentation, staleness, updated_at)
|
||||
VALUES (?, ?, ?, ?, 'fresh', ?)
|
||||
ON CONFLICT(from_file, to_file, rel_type) DO UPDATE SET
|
||||
documentation=excluded.documentation, staleness=excluded.staleness,
|
||||
updated_at=excluded.updated_at""",
|
||||
(from_file, to_file, rel_type, documentation, datetime.utcnow().isoformat()),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def mark_relationships_stale(self, file_path: str):
|
||||
now = datetime.utcnow().isoformat()
|
||||
self.conn.execute(
|
||||
"UPDATE relationships SET staleness='stale', updated_at=? WHERE from_file=? OR to_file=?",
|
||||
(now, file_path, file_path),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def mark_repo_stale(self, repo: str):
|
||||
self.conn.execute(
|
||||
"UPDATE repos SET staleness='stale', updated_at=? WHERE name=?",
|
||||
(datetime.utcnow().isoformat(), repo),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def update_file_doc(self, path: str, documentation: str, commit: str):
|
||||
self.conn.execute(
|
||||
"""UPDATE files SET prev_documentation=documentation,
|
||||
documentation=?, staleness='fresh', last_commit=?, updated_at=?
|
||||
WHERE path=?""",
|
||||
(documentation, commit, datetime.utcnow().isoformat(), path),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def get_file(self, path: str) -> dict | None:
|
||||
row = self.conn.execute("SELECT * FROM files WHERE path=?", (path,)).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
def get_repo(self, name: str = None) -> dict | None:
|
||||
if name:
|
||||
row = self.conn.execute("SELECT * FROM repos WHERE name=?", (name,)).fetchone()
|
||||
else:
|
||||
row = self.conn.execute("SELECT * FROM repos LIMIT 1").fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
def get_dependents(self, path: str) -> list[dict]:
|
||||
rows = self.conn.execute(
|
||||
"""SELECT r.from_file, r.documentation AS rel_doc, r.staleness AS rel_staleness,
|
||||
f.documentation AS file_doc
|
||||
FROM relationships r
|
||||
JOIN files f ON f.path = r.from_file
|
||||
WHERE r.to_file = ?
|
||||
ORDER BY r.from_file""",
|
||||
(path,),
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
def get_dependencies(self, path: str) -> list[dict]:
|
||||
rows = self.conn.execute(
|
||||
"""SELECT r.to_file, r.documentation AS rel_doc, r.staleness AS rel_staleness,
|
||||
f.documentation AS file_doc
|
||||
FROM relationships r
|
||||
JOIN files f ON f.path = r.to_file
|
||||
WHERE r.from_file = ?
|
||||
ORDER BY r.to_file""",
|
||||
(path,),
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
def get_relationship(self, from_file: str, to_file: str) -> dict | None:
|
||||
row = self.conn.execute(
|
||||
"SELECT * FROM relationships WHERE from_file=? AND to_file=?",
|
||||
(from_file, to_file),
|
||||
).fetchone()
|
||||
return dict(row) if row else None
|
||||
|
||||
def search_docs(self, query: str, limit: int = 10) -> list[dict]:
|
||||
rows = self.conn.execute(
|
||||
"""SELECT path, documentation, staleness FROM files
|
||||
WHERE LOWER(documentation) LIKE LOWER(?)
|
||||
ORDER BY path LIMIT ?""",
|
||||
(f"%{query}%", limit),
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
def get_stale_relationships(self) -> list[dict]:
|
||||
rows = self.conn.execute(
|
||||
"""SELECT r.from_file, r.to_file,
|
||||
f1.documentation AS from_doc, f2.documentation AS to_doc
|
||||
FROM relationships r
|
||||
JOIN files f1 ON f1.path = r.from_file
|
||||
JOIN files f2 ON f2.path = r.to_file
|
||||
WHERE r.staleness = 'stale'"""
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
def get_stale_repos(self) -> list[dict]:
|
||||
rows = self.conn.execute(
|
||||
"SELECT name, url FROM repos WHERE staleness='stale'"
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
def get_repo_files_docs(self, repo: str) -> list[tuple[str, str]]:
|
||||
rows = self.conn.execute(
|
||||
"SELECT path, documentation FROM files WHERE repo=? ORDER BY path",
|
||||
(repo,),
|
||||
).fetchall()
|
||||
return [(r["path"], r["documentation"]) for r in rows]
|
||||
|
||||
def update_relationship_doc(self, from_file: str, to_file: str, documentation: str):
|
||||
self.conn.execute(
|
||||
"""UPDATE relationships SET documentation=?, staleness='fresh', updated_at=?
|
||||
WHERE from_file=? AND to_file=?""",
|
||||
(documentation, datetime.utcnow().isoformat(), from_file, to_file),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def update_repo_doc(self, name: str, documentation: str):
|
||||
self.conn.execute(
|
||||
"UPDATE repos SET documentation=?, staleness='fresh', updated_at=? WHERE name=?",
|
||||
(documentation, datetime.utcnow().isoformat(), name),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
files = self.conn.execute("SELECT count(*) AS c FROM files").fetchone()["c"]
|
||||
rels = self.conn.execute("SELECT count(*) AS c FROM relationships").fetchone()["c"]
|
||||
stale_f = self.conn.execute("SELECT count(*) AS c FROM files WHERE staleness='stale'").fetchone()["c"]
|
||||
stale_r = self.conn.execute("SELECT count(*) AS c FROM relationships WHERE staleness='stale'").fetchone()["c"]
|
||||
return {"files": files, "relationships": rels, "stale_files": stale_f, "stale_relationships": stale_r}
|
||||
|
||||
def close(self):
|
||||
self.conn.close()
|
||||
149
ingest.py
Normal file
149
ingest.py
Normal file
@@ -0,0 +1,149 @@
|
||||
"""Main ingestion script. Clone repo, parse, generate docs, load SQLite."""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import time
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from parser import parse_go_file, filter_imports, get_repo_module, resolve_import_to_file
|
||||
from docgen import generate_file_doc, generate_repo_doc, generate_docs_batch
|
||||
from db import GraphDB
|
||||
|
||||
TARGET_REPO = os.environ.get("TARGET_REPO", "https://github.com/labstack/echo.git")
|
||||
REPO_DIR = os.environ.get("REPO_DIR", os.path.join(os.path.dirname(__file__), "repos", "target"))
|
||||
|
||||
|
||||
def clone_repo():
|
||||
if Path(REPO_DIR).exists() and (Path(REPO_DIR) / ".git").exists():
|
||||
print(f"Repo already cloned at {REPO_DIR}, pulling latest...")
|
||||
subprocess.run(["git", "-C", REPO_DIR, "pull"], check=True)
|
||||
else:
|
||||
print(f"Cloning {TARGET_REPO}...")
|
||||
Path(REPO_DIR).parent.mkdir(parents=True, exist_ok=True)
|
||||
subprocess.run(["git", "clone", "--depth=1", TARGET_REPO, REPO_DIR], check=True)
|
||||
|
||||
result = subprocess.run(
|
||||
["git", "-C", REPO_DIR, "rev-parse", "HEAD"],
|
||||
capture_output=True, text=True, check=True,
|
||||
)
|
||||
return result.stdout.strip()
|
||||
|
||||
|
||||
def discover_go_files() -> dict[str, str]:
|
||||
files = {}
|
||||
repo = Path(REPO_DIR)
|
||||
for gofile in repo.rglob("*.go"):
|
||||
rel = str(gofile.relative_to(repo))
|
||||
if "vendor/" in rel or "_test.go" in rel:
|
||||
continue
|
||||
try:
|
||||
files[rel] = gofile.read_text(errors="replace")
|
||||
except Exception as e:
|
||||
print(f" Skipping {rel}: {e}")
|
||||
return files
|
||||
|
||||
|
||||
def run():
|
||||
start = time.time()
|
||||
print("=" * 60)
|
||||
print("Developer Intelligence POC — Ingestion")
|
||||
print("=" * 60)
|
||||
|
||||
# Step 1: Clone
|
||||
print("\n[1/6] Cloning repository...")
|
||||
commit = clone_repo()
|
||||
repo_name = TARGET_REPO.rstrip("/").split("/")[-1].replace(".git", "")
|
||||
print(f" Repo: {repo_name}, commit: {commit[:8]}")
|
||||
|
||||
# Step 2: Discover files
|
||||
print("\n[2/6] Discovering Go files...")
|
||||
go_files = discover_go_files()
|
||||
print(f" Found {len(go_files)} Go files (excluding tests and vendor)")
|
||||
|
||||
# Step 3: Parse AST
|
||||
print("\n[3/6] Parsing AST (tree-sitter)...")
|
||||
repo_module = get_repo_module(REPO_DIR)
|
||||
print(f" Module: {repo_module}")
|
||||
|
||||
parsed = {}
|
||||
all_imports = {}
|
||||
for rel_path, content in go_files.items():
|
||||
info = parse_go_file(rel_path, content, repo_module)
|
||||
parsed[rel_path] = info
|
||||
filtered = filter_imports(info.imports, repo_module)
|
||||
all_imports[rel_path] = filtered
|
||||
|
||||
total_imports = sum(len(v) for v in all_imports.values())
|
||||
first_party = sum(1 for v in all_imports.values() for i in v if i.startswith(repo_module))
|
||||
print(f" Parsed {len(parsed)} files, {total_imports} filtered imports ({first_party} first-party)")
|
||||
|
||||
# Step 4: Generate file docs
|
||||
print("\n[4/6] Generating file documentation (Ollama)...")
|
||||
file_items = [(path, content) for path, content in go_files.items()]
|
||||
file_docs = generate_docs_batch(file_items, generate_file_doc)
|
||||
file_doc_map = {file_items[i][0]: file_docs[i] for i in range(len(file_items))}
|
||||
|
||||
# Step 5: Load into SQLite
|
||||
print("\n[5/6] Loading into database...")
|
||||
db = GraphDB()
|
||||
|
||||
for rel_path, content in go_files.items():
|
||||
info = parsed[rel_path]
|
||||
doc = file_doc_map.get(rel_path, "")
|
||||
db.create_file(
|
||||
path=rel_path,
|
||||
repo=repo_name,
|
||||
language="go",
|
||||
documentation=doc,
|
||||
functions=info.functions,
|
||||
commit=commit[:8],
|
||||
)
|
||||
|
||||
edges_created = 0
|
||||
for from_file, imports in all_imports.items():
|
||||
for imp in imports:
|
||||
if not imp.startswith(repo_module):
|
||||
continue
|
||||
target_dir = resolve_import_to_file(imp, repo_module, go_files)
|
||||
if target_dir:
|
||||
# Find actual files in that directory
|
||||
for fpath in go_files:
|
||||
fdir = str(Path(fpath).parent)
|
||||
if fdir == target_dir or fdir.endswith("/" + target_dir):
|
||||
db.create_relationship(from_file, fpath)
|
||||
edges_created += 1
|
||||
|
||||
print(f" Created {len(go_files)} file nodes, {edges_created} import edges")
|
||||
|
||||
# Step 6: Generate repo-level doc
|
||||
print("\n[6/6] Generating repo-level documentation...")
|
||||
readme_path = Path(REPO_DIR) / "README.md"
|
||||
readme = readme_path.read_text(errors="replace") if readme_path.exists() else ""
|
||||
|
||||
entry_candidates = ["echo.go", "router.go", "context.go", "group.go", "middleware.go"]
|
||||
entry_files = []
|
||||
for candidate in entry_candidates:
|
||||
for path, content in go_files.items():
|
||||
if path.endswith(candidate):
|
||||
entry_files.append((path, content))
|
||||
break
|
||||
|
||||
repo_doc = generate_repo_doc(readme, entry_files)
|
||||
db.create_repo(name=repo_name, url=TARGET_REPO, language="go", documentation=repo_doc)
|
||||
|
||||
stats = db.get_stats()
|
||||
elapsed = time.time() - start
|
||||
print("\n" + "=" * 60)
|
||||
print("Ingestion complete!")
|
||||
print(f" Files: {stats['files']}")
|
||||
print(f" Relationships: {stats['relationships']}")
|
||||
print(f" Time: {elapsed:.1f}s ({elapsed/60:.1f}m)")
|
||||
print(f" Database: {os.path.abspath(db.conn.execute('PRAGMA database_list').fetchone()[2])}")
|
||||
print("=" * 60)
|
||||
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run()
|
||||
154
mcp_server.py
Normal file
154
mcp_server.py
Normal file
@@ -0,0 +1,154 @@
|
||||
"""MCP server for Developer Intelligence POC. Queries SQLite, serves to Claude Code."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
|
||||
from db import GraphDB
|
||||
from mcp.server.fastmcp import FastMCP
|
||||
|
||||
mcp = FastMCP("Developer Intelligence")
|
||||
|
||||
|
||||
def _get_db():
|
||||
return GraphDB()
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_file_doc(path: str) -> str:
|
||||
"""Get the generated documentation for a source file. Pass the relative file path (e.g. 'echo.go' or 'middleware/compress.go')."""
|
||||
db = _get_db()
|
||||
f = db.get_file(path)
|
||||
db.close()
|
||||
if not f:
|
||||
return f"File not found: {path}"
|
||||
staleness = " [STALE]" if f["staleness"] == "stale" else ""
|
||||
prev = ""
|
||||
if f.get("prev_documentation"):
|
||||
prev = f"\n\n--- Previous version ---\n{f['prev_documentation']}"
|
||||
return f"{f['documentation']}{staleness}\n\n(commit: {f['last_commit']}, updated: {f['updated_at']}){prev}"
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_relationship(file_a: str, file_b: str) -> str:
|
||||
"""Get the documentation for the import relationship between two files."""
|
||||
db = _get_db()
|
||||
rel = db.get_relationship(file_a, file_b)
|
||||
db.close()
|
||||
if not rel:
|
||||
return f"No relationship found between {file_a} and {file_b}"
|
||||
doc = rel["documentation"] or "(no relationship documentation generated yet)"
|
||||
staleness = " [STALE]" if rel["staleness"] == "stale" else ""
|
||||
return f"{doc}{staleness}"
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_repo_overview() -> str:
|
||||
"""Get the repo-level documentation summary — a high-level overview of the entire project."""
|
||||
db = _get_db()
|
||||
repo = db.get_repo()
|
||||
db.close()
|
||||
if not repo:
|
||||
return "No repo found"
|
||||
staleness = " [STALE]" if repo["staleness"] == "stale" else ""
|
||||
return f"# {repo['name']}{staleness}\n\n{repo['documentation']}"
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_dependents(path: str) -> str:
|
||||
"""Get all files that import/depend on the given file. Shows what breaks if you change this file."""
|
||||
db = _get_db()
|
||||
deps = db.get_dependents(path)
|
||||
db.close()
|
||||
if not deps:
|
||||
return f"No files depend on {path}"
|
||||
lines = [f"Files that depend on {path} ({len(deps)} total):\n"]
|
||||
for d in deps:
|
||||
staleness = " [STALE]" if d["rel_staleness"] == "stale" else ""
|
||||
doc = d["rel_doc"] or "(no relationship doc)"
|
||||
lines.append(f" {d['from_file']}{staleness}")
|
||||
lines.append(f" Relationship: {doc}")
|
||||
lines.append(f" File: {d['file_doc'][:150]}...")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_dependencies(path: str) -> str:
|
||||
"""Get all files that the given file imports/depends on."""
|
||||
db = _get_db()
|
||||
deps = db.get_dependencies(path)
|
||||
db.close()
|
||||
if not deps:
|
||||
return f"{path} has no tracked dependencies"
|
||||
lines = [f"Dependencies of {path} ({len(deps)} total):\n"]
|
||||
for d in deps:
|
||||
staleness = " [STALE]" if d["rel_staleness"] == "stale" else ""
|
||||
doc = d["rel_doc"] or "(no relationship doc)"
|
||||
lines.append(f" {d['to_file']}{staleness}")
|
||||
lines.append(f" Relationship: {doc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def search_docs(query: str) -> str:
|
||||
"""Search across all file documentation by keyword. Use to find files related to a concept (e.g. 'routing', 'middleware', 'authentication')."""
|
||||
db = _get_db()
|
||||
results = db.search_docs(query)
|
||||
db.close()
|
||||
if not results:
|
||||
return f"No files found matching '{query}'"
|
||||
lines = [f"Files matching '{query}' ({len(results)} results):\n"]
|
||||
for r in results:
|
||||
staleness = " [STALE]" if r["staleness"] == "stale" else ""
|
||||
doc = r["documentation"][:200] + "..." if len(r["documentation"]) > 200 else r["documentation"]
|
||||
lines.append(f" {r['path']}{staleness}: {doc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_stale_docs() -> str:
|
||||
"""List all entities and relationships with stale (outdated) documentation."""
|
||||
db = _get_db()
|
||||
stale_rels = db.get_stale_relationships()
|
||||
stale_repos = db.get_stale_repos()
|
||||
stats = db.get_stats()
|
||||
db.close()
|
||||
|
||||
lines = ["Stale documentation:\n"]
|
||||
if stale_repos:
|
||||
lines.append(f" Repos ({len(stale_repos)}):")
|
||||
for r in stale_repos:
|
||||
lines.append(f" {r['name']}")
|
||||
lines.append(f" Files: {stats['stale_files']} stale")
|
||||
if stale_rels:
|
||||
lines.append(f" Relationships ({len(stale_rels)}):")
|
||||
for r in stale_rels[:20]: # Cap output
|
||||
lines.append(f" {r['from_file']} -> {r['to_file']}")
|
||||
if len(stale_rels) > 20:
|
||||
lines.append(f" ... and {len(stale_rels) - 20} more")
|
||||
if stats["stale_files"] == 0 and stats["stale_relationships"] == 0:
|
||||
lines.append(" Everything is fresh!")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_graph_stats() -> str:
|
||||
"""Get overall knowledge graph statistics — file count, relationship count, staleness."""
|
||||
db = _get_db()
|
||||
stats = db.get_stats()
|
||||
repo = db.get_repo()
|
||||
db.close()
|
||||
return json.dumps({
|
||||
"repo": repo["name"] if repo else None,
|
||||
"files": stats["files"],
|
||||
"relationships": stats["relationships"],
|
||||
"stale_files": stats["stale_files"],
|
||||
"stale_relationships": stats["stale_relationships"],
|
||||
}, indent=2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Starting Developer Intelligence MCP Server (stdio)...")
|
||||
mcp.run(transport="stdio")
|
||||
75
refresh_stale.py
Normal file
75
refresh_stale.py
Normal file
@@ -0,0 +1,75 @@
|
||||
"""Refresh all stale documentation — relationships and repo summaries."""
|
||||
|
||||
import sys
|
||||
import os
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
|
||||
from db import GraphDB
|
||||
from docgen import generate_relationship_doc, generate_repo_doc
|
||||
|
||||
REPO_DIR = os.environ.get("REPO_DIR", os.path.join(os.path.dirname(__file__), "repos", "target"))
|
||||
|
||||
|
||||
def refresh_stale():
|
||||
db = GraphDB()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("Refreshing stale documentation")
|
||||
print(f"{'='*60}")
|
||||
|
||||
stats_before = db.get_stats()
|
||||
print(f"\nBefore: {stats_before['stale_relationships']} stale relationships")
|
||||
|
||||
# Refresh stale relationship docs
|
||||
stale_rels = db.get_stale_relationships()
|
||||
if stale_rels:
|
||||
print(f"\n[1/2] Regenerating {len(stale_rels)} stale relationship docs...")
|
||||
for i, rel in enumerate(stale_rels):
|
||||
doc = generate_relationship_doc(
|
||||
rel["from_file"], rel["from_doc"] or "",
|
||||
rel["to_file"], rel["to_doc"] or "",
|
||||
)
|
||||
db.update_relationship_doc(rel["from_file"], rel["to_file"], doc)
|
||||
if (i + 1) % 5 == 0 or (i + 1) == len(stale_rels):
|
||||
print(f" Refreshed {i+1}/{len(stale_rels)}")
|
||||
else:
|
||||
print("\n[1/2] No stale relationships.")
|
||||
|
||||
# Refresh stale repo docs
|
||||
stale_repos = db.get_stale_repos()
|
||||
if stale_repos:
|
||||
print(f"\n[2/2] Regenerating {len(stale_repos)} stale repo docs...")
|
||||
for repo in stale_repos:
|
||||
file_docs = db.get_repo_files_docs(repo["name"])
|
||||
priority_names = ["echo.go", "router.go", "context.go", "group.go", "main.go", "server.go"]
|
||||
entry_files = []
|
||||
for name in priority_names:
|
||||
for path, doc in file_docs:
|
||||
if path.endswith(name) and doc:
|
||||
entry_files.append((path, doc))
|
||||
break
|
||||
|
||||
readme_path = os.path.join(REPO_DIR, "README.md")
|
||||
readme = ""
|
||||
if os.path.exists(readme_path):
|
||||
with open(readme_path) as f:
|
||||
readme = f.read()
|
||||
|
||||
repo_doc = generate_repo_doc(readme, entry_files)
|
||||
db.update_repo_doc(repo["name"], repo_doc)
|
||||
print(f" Refreshed repo: {repo['name']}")
|
||||
else:
|
||||
print("\n[2/2] No stale repos.")
|
||||
|
||||
stats_after = db.get_stats()
|
||||
print(f"\n{'='*60}")
|
||||
print("Refresh complete!")
|
||||
print(f" Stale rels: {stats_before['stale_relationships']} → {stats_after['stale_relationships']}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
refresh_stale()
|
||||
4
requirements.txt
Normal file
4
requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
requests==2.32.3
|
||||
tree-sitter==0.24.0
|
||||
tree-sitter-go==0.23.4
|
||||
mcp==1.9.0
|
||||
99
simulate_merge.py
Normal file
99
simulate_merge.py
Normal file
@@ -0,0 +1,99 @@
|
||||
"""Simulate a merge: modify a file, regenerate its doc, mark downstream stale."""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import time
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
|
||||
from db import GraphDB
|
||||
from docgen import generate_file_doc
|
||||
|
||||
REPO_DIR = os.environ.get("REPO_DIR", os.path.join(os.path.dirname(__file__), "repos", "target"))
|
||||
|
||||
|
||||
def simulate_merge(file_path: str, new_content: str = None):
|
||||
db = GraphDB()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Simulating merge: {file_path}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# Get current doc (before)
|
||||
f = db.get_file(file_path)
|
||||
if not f:
|
||||
print(f"ERROR: File {file_path} not found in graph")
|
||||
db.close()
|
||||
return
|
||||
|
||||
print(f"\n[BEFORE] Documentation for {file_path}:")
|
||||
print(f" {f['documentation'][:300]}")
|
||||
|
||||
# Read file content (or use provided)
|
||||
if new_content is None:
|
||||
full_path = os.path.join(REPO_DIR, file_path)
|
||||
if os.path.exists(full_path):
|
||||
with open(full_path) as fh:
|
||||
original = fh.read()
|
||||
new_content = original + """
|
||||
|
||||
// EnableTracing adds distributed request tracing with correlation IDs
|
||||
// across the middleware chain. Each request gets a unique trace ID propagated
|
||||
// through all handlers for end-to-end debugging in microservice architectures.
|
||||
func (e *Echo) EnableTracing(correlationHeader string) {
|
||||
e.tracingEnabled = true
|
||||
e.correlationHeader = correlationHeader
|
||||
}
|
||||
"""
|
||||
else:
|
||||
print(f"ERROR: File not found on disk: {full_path}")
|
||||
db.close()
|
||||
return
|
||||
|
||||
# Regenerate doc for changed file
|
||||
print(f"\n[MERGE] Regenerating documentation for {file_path}...")
|
||||
new_doc = generate_file_doc(file_path, new_content)
|
||||
|
||||
commit_hash = f"sim-{int(time.time())}"
|
||||
db.update_file_doc(file_path, new_doc, commit_hash)
|
||||
|
||||
print(f"\n[AFTER] Documentation for {file_path}:")
|
||||
print(f" {new_doc[:300]}")
|
||||
|
||||
# Mark downstream stale
|
||||
print(f"\n[CASCADE] Marking downstream as stale...")
|
||||
db.mark_relationships_stale(file_path)
|
||||
|
||||
repo = db.get_repo()
|
||||
if repo:
|
||||
db.mark_repo_stale(repo["name"])
|
||||
print(f" Repo '{repo['name']}' marked stale")
|
||||
|
||||
stats = db.get_stats()
|
||||
print(f" Stale relationships: {stats['stale_relationships']}")
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("Merge simulation complete.")
|
||||
print(f" File doc: REGENERATED (fresh)")
|
||||
print(f" Relationships: STALE (awaiting refresh)")
|
||||
print(f" Repo doc: STALE (awaiting refresh)")
|
||||
print(f"\nRun: python refresh_stale.py")
|
||||
print(f"Or ask Claude Code — stale docs show [STALE] indicator.")
|
||||
print(f"{'='*60}")
|
||||
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python simulate_merge.py <file_path>")
|
||||
print("Example: python simulate_merge.py echo.go")
|
||||
sys.exit(1)
|
||||
|
||||
target = sys.argv[1]
|
||||
content = None
|
||||
if len(sys.argv) > 2:
|
||||
with open(sys.argv[2]) as f:
|
||||
content = f.read()
|
||||
|
||||
simulate_merge(target, content)
|
||||
Reference in New Issue
Block a user