Jarvis Prime 15fb1a753b Add deep extractors, reference pages, keyword index; eval 53.3%
- extract-deep.js: mines addon versions, TF configs, script params, helm values, state services
- generate-reference-pages.js: creates operations.md, configuration.md, network-architecture.md
- reference/index.md: keyword-rich topic-to-file routing table
- Enriched CIDR extractor with inline comment capture
- Eval progression: 28.7% -> 33.4% -> 46.7% -> 52.5% -> 53.3%
- NOT_FOUND: 25 -> 20 -> 16 -> 10 -> 11
- Top scores: config-region-code 95%, argo-gen-params 95%, multiple 100%s
- Remaining gap: agent planner (haiku) doesn't consistently follow index routing
2026-03-10 19:01:21 +00:00

Developer Intelligence Pipeline v2

Multi-language semantic graph extractor that builds a knowledge graph from source code. Produces function-level call graphs, cross-file dependency maps, and semantic diffs — all without LLM calls.

Quick Start

npm install
node pipeline.js batch /path/to/repo --output /tmp/output

What It Does

Parses source code into a directed graph of entities (modules, functions, classes, configs) and relationships (CALLS, IMPORTS, CONTAINS, IMPLEMENTS). Then diffs snapshots to detect breaking changes, compute impact scores, and identify affected callers.

Supported Languages

Language Parser Entities
TypeScript/JavaScript tree-sitter Modules, Functions, Classes, Imports
Python tree-sitter Modules, Functions, Classes (with _/__ visibility)
Go tree-sitter Modules, Functions, Structs, Receiver Methods
Java tree-sitter Modules, Functions, Classes, Interfaces
Bash tree-sitter Modules, Functions, source imports, Commands
YAML js-yaml Config keys (K8s manifests, Helm, KCL)
Terraform/HCL regex Resources, Data, Modules, Providers

Pipeline Phases

Phase 1: Entity Extraction (extract.js)

node extract.js /path/to/file.ts /repo/root

Outputs JSON with entities and relationships.

Phase 2: Graph Store (graph.js)

node graph.js build /dir/of/jsons snapshot.json
node graph.js query snapshot.json "cli/route.ts:tryRouteCli"
node graph.js diff old.json new.json

Phase 3: Namespace Registry (namespace.js)

node namespace.js build snap-a.json snap-b.json --output registry.json
node namespace.js resolve graph.json registry.json
node namespace.js lookup registry.json functionName

3-tier cross-repo resolution: exact ID → normalized path → name-only.

Phase 4: Semantic Diff (semantic-diff.js)

node semantic-diff.js diff old.json new.json
node semantic-diff.js score old.json new.json

Categorizes changes as breaking/significant/internal/cosmetic. Impact score 0-100.

Phase 5: Pipeline (pipeline.js)

node pipeline.js batch /repo --output /tmp/out     # Full extraction
node pipeline.js benchmark /repo --samples 20       # Performance test
node pipeline.js run /repo --snapshot prev.json     # Incremental diff

Benchmark (OpenClaw repo)

Metric Value
Files 4,325
Extracted 4,259 (98.5%)
Nodes 21,646
Edges 133,979
Time 67 seconds
Avg/file 15ms

V1 vs V2

Metric V1 POC V2 Pipeline
Parse time ~2s 552ms
Total time 15-20 min (LLM) 552ms
Entities files + imports 457 (4 types)
CALLS edges 0 1,290
Cross-file calls No 51 resolved
Languages Go only 8
Semantic diff No Yes
Impact scoring No Yes
Cost ~$0 (Ollama) $0

Tested on labstack/echo (44 Go files)

Testing

bash test/run-all.sh          # 9/9 ground truth benchmark
node test/test-graph.js       # 25/25 graph store tests

Architecture

source files → extract.js → JSON → graph.js → snapshot.json
                                                    ↓
                                          semantic-diff.js → impact report
                                                    ↓
                                          namespace.js → cross-repo links

Zero external runtime dependencies beyond tree-sitter grammars.

License

MIT

Description
Developer Intelligence Pipeline v2 — multi-language semantic graph extractor
Readme 6.6 MiB
Languages
JavaScript 98.9%
Shell 1.1%