0265ec7a6089f391e6958229c07bc38d9b5c7704
- extract-patterns.js: mines layered arch, ArgoCD appsets, cloud regions, CIDR allocations, naming conventions, sync waves, tech stack from code - agent-kb.js: token-efficient JSON rendering of same doc tree - eval-confluence-ref-questions.json: 32 reference-only benchmark questions - wiggum-v2.sh: Ralph Wiggum loop targeting confluence baseline (77.8%) - docs/human-ux-spec.md: BMad UX designer spec for human doc structure - Eval results: V2 at 28.7% vs confluence 77.8% baseline - Hub/spoke ownership now correctly extracted (95% on that question) - Naming conventions, regions, CIDRs surfaced in system-architecture.md
Developer Intelligence Pipeline v2
Multi-language semantic graph extractor that builds a knowledge graph from source code. Produces function-level call graphs, cross-file dependency maps, and semantic diffs — all without LLM calls.
Quick Start
npm install
node pipeline.js batch /path/to/repo --output /tmp/output
What It Does
Parses source code into a directed graph of entities (modules, functions, classes, configs) and relationships (CALLS, IMPORTS, CONTAINS, IMPLEMENTS). Then diffs snapshots to detect breaking changes, compute impact scores, and identify affected callers.
Supported Languages
| Language | Parser | Entities |
|---|---|---|
| TypeScript/JavaScript | tree-sitter | Modules, Functions, Classes, Imports |
| Python | tree-sitter | Modules, Functions, Classes (with _/__ visibility) |
| Go | tree-sitter | Modules, Functions, Structs, Receiver Methods |
| Java | tree-sitter | Modules, Functions, Classes, Interfaces |
| Bash | tree-sitter | Modules, Functions, source imports, Commands |
| YAML | js-yaml | Config keys (K8s manifests, Helm, KCL) |
| Terraform/HCL | regex | Resources, Data, Modules, Providers |
Pipeline Phases
Phase 1: Entity Extraction (extract.js)
node extract.js /path/to/file.ts /repo/root
Outputs JSON with entities and relationships.
Phase 2: Graph Store (graph.js)
node graph.js build /dir/of/jsons snapshot.json
node graph.js query snapshot.json "cli/route.ts:tryRouteCli"
node graph.js diff old.json new.json
Phase 3: Namespace Registry (namespace.js)
node namespace.js build snap-a.json snap-b.json --output registry.json
node namespace.js resolve graph.json registry.json
node namespace.js lookup registry.json functionName
3-tier cross-repo resolution: exact ID → normalized path → name-only.
Phase 4: Semantic Diff (semantic-diff.js)
node semantic-diff.js diff old.json new.json
node semantic-diff.js score old.json new.json
Categorizes changes as breaking/significant/internal/cosmetic. Impact score 0-100.
Phase 5: Pipeline (pipeline.js)
node pipeline.js batch /repo --output /tmp/out # Full extraction
node pipeline.js benchmark /repo --samples 20 # Performance test
node pipeline.js run /repo --snapshot prev.json # Incremental diff
Benchmark (OpenClaw repo)
| Metric | Value |
|---|---|
| Files | 4,325 |
| Extracted | 4,259 (98.5%) |
| Nodes | 21,646 |
| Edges | 133,979 |
| Time | 67 seconds |
| Avg/file | 15ms |
V1 vs V2
| Metric | V1 POC | V2 Pipeline |
|---|---|---|
| Parse time | ~2s | 552ms |
| Total time | 15-20 min (LLM) | 552ms |
| Entities | files + imports | 457 (4 types) |
| CALLS edges | 0 | 1,290 |
| Cross-file calls | No | 51 resolved |
| Languages | Go only | 8 |
| Semantic diff | No | Yes |
| Impact scoring | No | Yes |
| Cost | ~$0 (Ollama) | $0 |
Tested on labstack/echo (44 Go files)
Testing
bash test/run-all.sh # 9/9 ground truth benchmark
node test/test-graph.js # 25/25 graph store tests
Architecture
source files → extract.js → JSON → graph.js → snapshot.json
↓
semantic-diff.js → impact report
↓
namespace.js → cross-repo links
Zero external runtime dependencies beyond tree-sitter grammars.
License
MIT
Description
Languages
JavaScript
98.9%
Shell
1.1%