Add README with benchmarks and V1 vs V2 comparison
This commit is contained in:
113
README.md
Normal file
113
README.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Developer Intelligence Pipeline v2
|
||||
|
||||
Multi-language semantic graph extractor that builds a knowledge graph from source code. Produces function-level call graphs, cross-file dependency maps, and semantic diffs — all without LLM calls.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
npm install
|
||||
node pipeline.js batch /path/to/repo --output /tmp/output
|
||||
```
|
||||
|
||||
## What It Does
|
||||
|
||||
Parses source code into a directed graph of entities (modules, functions, classes, configs) and relationships (CALLS, IMPORTS, CONTAINS, IMPLEMENTS). Then diffs snapshots to detect breaking changes, compute impact scores, and identify affected callers.
|
||||
|
||||
## Supported Languages
|
||||
|
||||
| Language | Parser | Entities |
|
||||
|----------|--------|----------|
|
||||
| TypeScript/JavaScript | tree-sitter | Modules, Functions, Classes, Imports |
|
||||
| Python | tree-sitter | Modules, Functions, Classes (with `_`/`__` visibility) |
|
||||
| Go | tree-sitter | Modules, Functions, Structs, Receiver Methods |
|
||||
| Java | tree-sitter | Modules, Functions, Classes, Interfaces |
|
||||
| Bash | tree-sitter | Modules, Functions, `source` imports, Commands |
|
||||
| YAML | js-yaml | Config keys (K8s manifests, Helm, KCL) |
|
||||
| Terraform/HCL | regex | Resources, Data, Modules, Providers |
|
||||
|
||||
## Pipeline Phases
|
||||
|
||||
### Phase 1: Entity Extraction (`extract.js`)
|
||||
```bash
|
||||
node extract.js /path/to/file.ts /repo/root
|
||||
```
|
||||
Outputs JSON with entities and relationships.
|
||||
|
||||
### Phase 2: Graph Store (`graph.js`)
|
||||
```bash
|
||||
node graph.js build /dir/of/jsons snapshot.json
|
||||
node graph.js query snapshot.json "cli/route.ts:tryRouteCli"
|
||||
node graph.js diff old.json new.json
|
||||
```
|
||||
|
||||
### Phase 3: Namespace Registry (`namespace.js`)
|
||||
```bash
|
||||
node namespace.js build snap-a.json snap-b.json --output registry.json
|
||||
node namespace.js resolve graph.json registry.json
|
||||
node namespace.js lookup registry.json functionName
|
||||
```
|
||||
3-tier cross-repo resolution: exact ID → normalized path → name-only.
|
||||
|
||||
### Phase 4: Semantic Diff (`semantic-diff.js`)
|
||||
```bash
|
||||
node semantic-diff.js diff old.json new.json
|
||||
node semantic-diff.js score old.json new.json
|
||||
```
|
||||
Categorizes changes as breaking/significant/internal/cosmetic. Impact score 0-100.
|
||||
|
||||
### Phase 5: Pipeline (`pipeline.js`)
|
||||
```bash
|
||||
node pipeline.js batch /repo --output /tmp/out # Full extraction
|
||||
node pipeline.js benchmark /repo --samples 20 # Performance test
|
||||
node pipeline.js run /repo --snapshot prev.json # Incremental diff
|
||||
```
|
||||
|
||||
## Benchmark (OpenClaw repo)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Files | 4,325 |
|
||||
| Extracted | 4,259 (98.5%) |
|
||||
| Nodes | 21,646 |
|
||||
| Edges | 133,979 |
|
||||
| Time | 67 seconds |
|
||||
| Avg/file | 15ms |
|
||||
|
||||
## V1 vs V2
|
||||
|
||||
| Metric | V1 POC | V2 Pipeline |
|
||||
|--------|--------|-------------|
|
||||
| Parse time | ~2s | 552ms |
|
||||
| Total time | 15-20 min (LLM) | 552ms |
|
||||
| Entities | files + imports | 457 (4 types) |
|
||||
| CALLS edges | 0 | 1,290 |
|
||||
| Cross-file calls | No | 51 resolved |
|
||||
| Languages | Go only | 8 |
|
||||
| Semantic diff | No | Yes |
|
||||
| Impact scoring | No | Yes |
|
||||
| Cost | ~$0 (Ollama) | $0 |
|
||||
|
||||
*Tested on labstack/echo (44 Go files)*
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
bash test/run-all.sh # 9/9 ground truth benchmark
|
||||
node test/test-graph.js # 25/25 graph store tests
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
source files → extract.js → JSON → graph.js → snapshot.json
|
||||
↓
|
||||
semantic-diff.js → impact report
|
||||
↓
|
||||
namespace.js → cross-repo links
|
||||
```
|
||||
|
||||
Zero external runtime dependencies beyond tree-sitter grammars.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user