Remove product references from blog post

This commit is contained in:
Jarvis Prime
2026-03-23 00:34:04 +00:00
parent edadadd431
commit 64b2e3a533
2 changed files with 12 additions and 12 deletions

View File

@@ -5,28 +5,28 @@
<h2 id="the-stack">The Stack</h2>
<p>The core pieces are:</p>
<ul>
<li><strong>LLM</strong>: Claude via <a href="https://github.com/openclaw/kiro">kiro-anthropic</a>, a local proxy that routes requests through my own Anthropic API key. No vendor lock-in, no shared rate limits, full visibility into whats being sent.</li>
<li><strong>LLM</strong>: Claude via a local reverse proxy that routes requests through my own Anthropic API key. No vendor lock-in, no shared rate limits, full visibility into whats being sent.</li>
<li><strong>TTS</strong>: <a href="https://github.com/rhasspy/piper">Piper TTS</a> running on TrueNAS via the <a href="https://github.com/rhasspy/wyoming">Wyoming protocol</a>. Also <a href="https://huggingface.co/Qwen/Qwen3-TTS">Qwen3-TTS</a> on the Mac for voice cloning.</li>
<li><strong>STT</strong>: <a href="https://github.com/openai/whisper">Whisper</a> server on TrueNAS, exposed as an OpenAI-compatible <code>/v1/audio/transcriptions</code> endpoint.</li>
<li><strong>Agent framework</strong>: <a href="https://github.com/openclaw/openclaw">OpenClaw</a> — open source, self-hosted, connects to Telegram and WhatsApp.</li>
<li><strong>Agent framework</strong>: An open-source agent framework — self-hosted, connects to Telegram and WhatsApp.</li>
</ul>
<p>The whole thing runs on hardware I already owned. The only recurring cost is the LLM API.</p>
<h2 id="getting-it-running">Getting It Running</h2>
<p>Piper and Whisper both run in Docker on TrueNAS. Piper is straightforward — pick a voice model, point it at the Wyoming port, done. Whisper took a bit more tuning. Im running <code>small.en</code> which is a good balance of speed and accuracy for English transcription.</p>
<p>The OpenClaw agent framework is what ties it together. It handles the conversation loop, routes messages to the right tools, and manages the Telegram integration. Skills are just directories with a <code>SKILL.md</code> and some scripts — easy to add, easy to audit.</p>
<p>For the LLM proxy, kiro-anthropic sits between OpenClaw and the Anthropic API. It adds request logging, lets me swap models without touching agent config, and gives me a single place to manage the API key.</p>
<p>The agent framework is what ties it together. It handles the conversation loop, routes messages to the right tools, and manages the Telegram integration. Skills are just directories with a <code>SKILL.md</code> and some scripts — easy to add, easy to audit.</p>
<p>The LLM proxy sits between the agent and the Anthropic API. It adds request logging, lets me swap models without touching agent config, and gives me a single place to manage the API key.</p>
<p>Qwen3-TTS on the Mac runs via <a href="https://github.com/Blaizzy/mlx-audio">mlx-audio</a>, which uses Apple Silicons Neural Engine. The voice cloning feature is genuinely impressive — give it a 3-second audio sample and itll match the voice reasonably well.</p>
<h2 id="what-surprised-me">What Surprised Me</h2>
<p><strong>Piper is fast.</strong> I expected local TTS to be slow and robotic. Piper is neither. On TrueNAS (not a powerful machine), it generates speech at realtime speed or better. The voice quality isnt Eleven Labs, but its completely usable for a personal assistant.</p>
<p><strong>Whisper <code>small.en</code> is accurate enough.</strong> I was skeptical about running a smaller model, but it handles my voice well — probably 95% accuracy on normal speech. The only failures are proper nouns and technical jargon, which is expected. For a personal assistant that mostly hears the same vocabulary over and over, its fine.</p>
<p><strong>The real cost is tokens, not compute.</strong> I assumed the bottleneck would be CPU/GPU. Its not. The hardware handles everything comfortably. What actually costs money is the LLM API. I pulled traces from <a href="https://github.com/comet-ml/opik">Opik</a> and found 75 million input tokens across 100 traces. Context window size is the killer — long conversations with tool call history get expensive fast.</p>
<p><strong>V8 memory limits in Docker will silently kill your agent.</strong> OpenClaw runs on Node.js. Docker containers have a default memory limit, and Nodes V8 heap will hit it and crash without a clear error. The fix is <code>NODE_OPTIONS=--max-old-space-size=4096</code> in your container environment. I lost a few hours to this before finding it.</p>
<p><strong>V8 memory limits in Docker will silently kill your agent.</strong> The agent framework runs on Node.js. Docker containers have a default memory limit, and Nodes V8 heap will hit it and crash without a clear error. The fix is <code>NODE_OPTIONS=--max-old-space-size=4096</code> in your container environment. I lost a few hours to this before finding it.</p>
<h2 id="the-numbers">The Numbers</h2>
<p>Rough monthly costs running this setup:</p>
<ul>
<li>LLM API (Claude): ~$1525/month depending on conversation volume</li>
<li>Electricity for TrueNAS: already running, marginal cost near zero</li>
<li>Everything else (Piper, Whisper, OpenClaw): $0</li>
<li>Everything else (Piper, Whisper, agent framework): $0</li>
</ul>
<p>The 75M input tokens I mentioned came from a period of heavy testing with long context windows. Normal usage is significantly lower. The lesson: be deliberate about what you include in the system prompt and conversation history. Every token in context costs money on both ends of the conversation.</p>
<h2 id="whats-next">Whats Next</h2>

View File

@@ -12,10 +12,10 @@ I didn't buy new hardware for this. I used what I had: a TrueNAS server in my ho
The core pieces are:
- **LLM**: Claude via [kiro-anthropic](https://github.com/openclaw/kiro), a local proxy that routes requests through my own Anthropic API key. No vendor lock-in, no shared rate limits, full visibility into what's being sent.
- **LLM**: Claude via a local reverse proxy that routes requests through my own Anthropic API key. No vendor lock-in, no shared rate limits, full visibility into what's being sent.
- **TTS**: [Piper TTS](https://github.com/rhasspy/piper) running on TrueNAS via the [Wyoming protocol](https://github.com/rhasspy/wyoming). Also [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS) on the Mac for voice cloning.
- **STT**: [Whisper](https://github.com/openai/whisper) server on TrueNAS, exposed as an OpenAI-compatible `/v1/audio/transcriptions` endpoint.
- **Agent framework**: [OpenClaw](https://github.com/openclaw/openclaw) — open source, self-hosted, connects to Telegram and WhatsApp.
- **Agent framework**: An open-source agent framework — self-hosted, connects to Telegram and WhatsApp.
The whole thing runs on hardware I already owned. The only recurring cost is the LLM API.
@@ -23,9 +23,9 @@ The whole thing runs on hardware I already owned. The only recurring cost is the
Piper and Whisper both run in Docker on TrueNAS. Piper is straightforward — pick a voice model, point it at the Wyoming port, done. Whisper took a bit more tuning. I'm running `small.en` which is a good balance of speed and accuracy for English transcription.
The OpenClaw agent framework is what ties it together. It handles the conversation loop, routes messages to the right tools, and manages the Telegram integration. Skills are just directories with a `SKILL.md` and some scripts — easy to add, easy to audit.
The agent framework is what ties it together. It handles the conversation loop, routes messages to the right tools, and manages the Telegram integration. Skills are just directories with a `SKILL.md` and some scripts — easy to add, easy to audit.
For the LLM proxy, kiro-anthropic sits between OpenClaw and the Anthropic API. It adds request logging, lets me swap models without touching agent config, and gives me a single place to manage the API key.
The LLM proxy sits between the agent and the Anthropic API. It adds request logging, lets me swap models without touching agent config, and gives me a single place to manage the API key.
Qwen3-TTS on the Mac runs via [mlx-audio](https://github.com/Blaizzy/mlx-audio), which uses Apple Silicon's Neural Engine. The voice cloning feature is genuinely impressive — give it a 3-second audio sample and it'll match the voice reasonably well.
@@ -37,7 +37,7 @@ Qwen3-TTS on the Mac runs via [mlx-audio](https://github.com/Blaizzy/mlx-audio),
**The real cost is tokens, not compute.** I assumed the bottleneck would be CPU/GPU. It's not. The hardware handles everything comfortably. What actually costs money is the LLM API. I pulled traces from [Opik](https://github.com/comet-ml/opik) and found 75 million input tokens across 100 traces. Context window size is the killer — long conversations with tool call history get expensive fast.
**V8 memory limits in Docker will silently kill your agent.** OpenClaw runs on Node.js. Docker containers have a default memory limit, and Node's V8 heap will hit it and crash without a clear error. The fix is `NODE_OPTIONS=--max-old-space-size=4096` in your container environment. I lost a few hours to this before finding it.
**V8 memory limits in Docker will silently kill your agent.** The agent framework runs on Node.js. Docker containers have a default memory limit, and Node's V8 heap will hit it and crash without a clear error. The fix is `NODE_OPTIONS=--max-old-space-size=4096` in your container environment. I lost a few hours to this before finding it.
## The Numbers
@@ -45,7 +45,7 @@ Rough monthly costs running this setup:
- LLM API (Claude): ~$1525/month depending on conversation volume
- Electricity for TrueNAS: already running, marginal cost near zero
- Everything else (Piper, Whisper, OpenClaw): $0
- Everything else (Piper, Whisper, agent framework): $0
The 75M input tokens I mentioned came from a period of heavy testing with long context windows. Normal usage is significantly lower. The lesson: be deliberate about what you include in the system prompt and conversation history. Every token in context costs money on both ends of the conversation.