130 lines
7.1 KiB
Markdown
130 lines
7.1 KiB
Markdown
|
|
# Dev Intel V2 Architecture Document
|
||
|
|
|
||
|
|
## 1. Introduction & Goals
|
||
|
|
The goal of the Dev Intel V2 pipeline improvements is to elevate the documentation quality from purely descriptive to deeply explanatory. The current state struggles to answer the "why" behind the infrastructure architecture and falls short in mapping flow paths and Terraform structures. This document details the design for closing these gaps, satisfying the PRD requirements for Terraform extraction, Flow Tracing, and Change Impact analysis.
|
||
|
|
|
||
|
|
## 2. Component Architecture
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
graph TD
|
||
|
|
subgraph Extraction Phase
|
||
|
|
TS[extract.js<br/>Tree-sitter Code]
|
||
|
|
HELM[extract-helm.js<br/>Helm + Templates]
|
||
|
|
TF[extract-terraform.js<br/>HCL / Regex Hybrid]
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph Knowledge Graph
|
||
|
|
GRAPH[graph.js<br/>In-Memory Store]
|
||
|
|
TS -->|Node/Edges| GRAPH
|
||
|
|
HELM -->|Node/Edges| GRAPH
|
||
|
|
TF -->|Node/Edges| GRAPH
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph Enrichment & Analysis Phase
|
||
|
|
FLOW[flow.js<br/>Entry Point Auto-Detector]
|
||
|
|
IMPACT[impact.js<br/>Change Impact Query]
|
||
|
|
PROSE[prose.js<br/>Explanatory LLM]
|
||
|
|
|
||
|
|
GRAPH <--> FLOW
|
||
|
|
GRAPH <--> IMPACT
|
||
|
|
GRAPH --> PROSE
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph Outputs
|
||
|
|
DOCS[sysdoc.js<br/>Diataxis Docs]
|
||
|
|
PROSE --> DOCS
|
||
|
|
FLOW --> DOCS
|
||
|
|
IMPACT --> DOCS
|
||
|
|
end
|
||
|
|
```
|
||
|
|
|
||
|
|
## 3. Data Flow & Component Design
|
||
|
|
|
||
|
|
### 3.1 `extract-terraform.js`: Terraform Extraction
|
||
|
|
**Problem:** Current naive regex misses modules, locals, and complex blocks, resulting in ~0% coverage of `control-core`.
|
||
|
|
**Design:** A hybrid extraction module that attempts to load `tree-sitter-hcl` first. If unavailable or unpinned, it falls back to an advanced multi-pass regex parser.
|
||
|
|
- **Pass 1:** Extract block declarations (`module`, `resource`, `data`, `variable`, `output`, `provider`, `locals`).
|
||
|
|
- **Pass 2:** Extract dependencies. Within each block's body, run a regex to find references (`var\.([a-zA-Z0-9_-]+)`, `local\.([a-zA-Z0-9_-]+)`, `module\.([a-zA-Z0-9_-]+)\.`, `aws_s3_bucket\.([a-zA-Z0-9_-]+)\.`).
|
||
|
|
- **Entity Mapping:** Nodes are created with `kind: 'terraform'`, `type: blockType`. Edges of type `DEPENDS_ON` are generated from references.
|
||
|
|
|
||
|
|
### 3.2 Explanatory Prose Generator (`prose.js`)
|
||
|
|
**Problem:** The LLM generates "what" instead of "why" because prompts only pass structural components.
|
||
|
|
**Design:** The `prose.js` prompt generator will be restructured to feed enriched context:
|
||
|
|
1. **Dependency Matrix:** Explicitly list upstream and downstream components.
|
||
|
|
2. **Anomaly Flags:** If a subsystem has zero functions but many variables, or high fan-in/fan-out, pass this to the LLM.
|
||
|
|
3. **Prompt Update:** Change the prompt instruction from *"Describe this system"* to *"Explain the architectural rationale behind this subsystem. Why does it depend on X and Y? Why does it exhibit anomaly Z?"*
|
||
|
|
|
||
|
|
### 3.3 Entry Point Auto-Detection (`flow.js`)
|
||
|
|
**Problem:** No start-to-finish execution paths exist in the generated docs.
|
||
|
|
**Design:** `flow.js` will scan the `GraphStore` for nodes matching specific heuristics:
|
||
|
|
- **K8s:** Nodes of kind `Deployment` or `StatefulSet` that have an incoming edge from a `Service` or `Ingress`.
|
||
|
|
- **Scripts:** Bash files containing `main()` or ending with an execution call.
|
||
|
|
- **Python:** Files containing `if __name__ == '__main__':`.
|
||
|
|
- **CI/CD:** Files in `.github/workflows/` or `.gitlab-ci.yml`.
|
||
|
|
Once identified, a breadth-first search (BFS) follows outbound `CALLS` or `DEPENDS_ON` edges to map the execution flow.
|
||
|
|
|
||
|
|
### 3.4 Change Impact Query Interface (`graph.js` / `impact.js`)
|
||
|
|
**Problem:** Engineers cannot determine the blast radius of a change.
|
||
|
|
**Design:** A new query interface that traverses the graph backwards.
|
||
|
|
- Given a `nodeId` (e.g., a Secret or Terraform Module), traverse all *inbound* `DEPENDS_ON` or `CALLS` edges recursively.
|
||
|
|
- Return a hierarchical JSON payload or Markdown tree representing all downstream systems forced to redeploy or re-evaluate if the target node is modified.
|
||
|
|
|
||
|
|
### 3.5 Index Enrichment (`extract-helm.js`)
|
||
|
|
**Problem:** AI agents get lost when wrapper charts only provide links to sub-charts.
|
||
|
|
**Design:** During the indexing phase, when a wrapper chart lists dependencies in `Chart.yaml`, `sysdoc.js` will query the graph for those sub-charts and inline their key entities (Deployments, Services, ConfigMaps) directly into the wrapper chart's markdown section. This guarantees the LLM has complete context in a single context window.
|
||
|
|
|
||
|
|
## 4. Interface Contracts
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
// extract-terraform.js
|
||
|
|
/**
|
||
|
|
* @param {string} filePath - Absolute path to the .tf file.
|
||
|
|
* @param {string} repoRoot - Base path of the repository.
|
||
|
|
* @returns {Object} { file, language: 'hcl', entities: [...], relationships: [...] }
|
||
|
|
*/
|
||
|
|
function extractTerraform(filePath, repoRoot);
|
||
|
|
|
||
|
|
// flow.js
|
||
|
|
/**
|
||
|
|
* @param {GraphStore} graph - The populated knowledge graph.
|
||
|
|
* @returns {Array<Object>} List of entry point nodes.
|
||
|
|
*/
|
||
|
|
function detectEntryPoints(graph);
|
||
|
|
|
||
|
|
/**
|
||
|
|
* @param {GraphStore} graph - The populated knowledge graph.
|
||
|
|
* @param {string} startNodeId - The entry point ID.
|
||
|
|
* @returns {Object} A tree representing the execution flow.
|
||
|
|
*/
|
||
|
|
function traceExecution(graph, startNodeId);
|
||
|
|
|
||
|
|
// impact.js
|
||
|
|
/**
|
||
|
|
* @param {GraphStore} graph - The populated knowledge graph.
|
||
|
|
* @param {string} targetNodeId - The node being modified.
|
||
|
|
* @param {number} maxDepth - Max traversal depth.
|
||
|
|
* @returns {Array<Object>} List of impacted downstream nodes.
|
||
|
|
*/
|
||
|
|
function queryImpact(graph, targetNodeId, maxDepth = 10);
|
||
|
|
```
|
||
|
|
|
||
|
|
## 5. Key Technical Decisions with Rationale
|
||
|
|
|
||
|
|
1. **Hybrid Parsing for Terraform (HCL):**
|
||
|
|
* *Decision:* Try `tree-sitter-hcl`, fallback to regex.
|
||
|
|
* *Rationale:* The PRD noted `tree-sitter-hcl` might not be pinned or available in Node 22. A pure tree-sitter approach risks completely failing. The regex fallback guarantees we extract the most critical blocks (modules, resources) even if the grammar fails to load.
|
||
|
|
2. **In-Memory Graph Traversal for Impact:**
|
||
|
|
* *Decision:* Use BFS on the existing `Map`-based graph rather than introducing a graph database (e.g., Neo4j).
|
||
|
|
* *Rationale:* The scope is limited to monorepos parsed at runtime. Adding an external database violates the "no external dependencies" philosophy of `graph.js` and slows down the pipeline.
|
||
|
|
3. **Inlining Dependencies in Helm:**
|
||
|
|
* *Decision:* Inline sub-chart structures into the wrapper chart's markdown.
|
||
|
|
* *Rationale:* It solves the Tier 1 issue of context-switching for AI coding agents and human readers, despite the risk of increasing document length.
|
||
|
|
|
||
|
|
## 6. Risk Mitigations
|
||
|
|
|
||
|
|
| Risk | Mitigation |
|
||
|
|
| :--- | :--- |
|
||
|
|
| **LLM Context Bloat (T1.1, T1.2)** | Limit the inlining depth of sub-charts to 1 level. Truncate anomaly explanations if the context window exceeds the configured threshold (e.g., 100k tokens). |
|
||
|
|
| **Noisy Graph Edges Breaking Impact Analysis (T2.3)** | Introduce edge "confidence scores" or strict edge typing (`EXPLICIT_DEPENDS_ON` vs `INFERRED_CALLS`). The impact query will only traverse high-confidence, explicitly defined edges. |
|
||
|
|
| **Regex Fallback Missing Complex HCL (T2.1)** | Focus the regex fallback purely on extracting block headers and specific reference patterns (`var.`, `module.`). This covers 80% of structural relationships, which satisfies the success metric. |
|