Training the AI, banning the AI, and the KV-cache cost line nobody talks about

View in browser | Past Issue | Subscribe / Unsubscribe

SitePoint Source

Welcome, Developers! 👋

If your team felt 20% faster with AI this quarter, you may actually be shipping 20% slower. That's the MIT finding inside Chris Parsons' deep look at what senior engineers should be doing now (hint: training the AI, not approving its diffs). We've also got the Thoughtworks counter-argument, the cleanest case yet for banning LLM contributions in open source, the KV cache numbers that quietly break long-context budgets, and the 17-year-old who used ChatGPT to breach 7 million records. Five reads worth your attention.

From our sponsor: ManageEngine

Agentic AI and the Future of AIOps

What happens when AIOps moves beyond automation to autonomy? Join this webinar to explore how Agentic AI enables smarter decision-making, reduces alert noise, and accelerates incident resolution. Learn how modern IT teams are shifting from reactive firefighting to proactive, self-driven operations.

🔖 The Reading Room

Articles we have hand-picked for you:

How I Use AI to Code

A senior engineer's lived account of a year inside agentic coding. The senior's job is to train the AI, not approve diffs. Spec the problem, not the solution. The harness around the model now matters more than the model itself, and feedback is the new bottleneck. Footnotes pull in the a16z enterprise-AI report, Karpathy's tight-leash talk, and the MIT study showing developers felt 20% faster while delivering 20% slower.

By Chris Parsons →

Structured-Prompt-Driven Development (SPDD)

The structured counter-argument. Thoughtworks engineers treat prompts as first-class delivery artifacts: version-controlled, reviewable, kept in sync with code. Their REASONS Canvas decomposes intent across seven dimensions before generation, with an open-source CLI (openspdd) and a worked billing-engine example end to end. Read this alongside the Chris Parsons piece above and pick your side.

By Wei Zhang & Jessie Xia →

Contributor Poker and Zig's AI Ban

Zig's VP of Community articulates the cleanest case yet for blanket-banning LLM contributions. Open source is an iterated game where you bet on the contributor, not the contents of their first PR. LLM-assisted PRs, even good ones, give the project nothing to invest in. Includes the Bun footnote: Anthropic's own JavaScript runtime hit a 4× compile speedup with LLM-authored code and won't upstream it because Zig won't take it.

By Loris Cro →

KV Cache Optimization for LLMs 2026: Engineering Guide

Above 32K tokens, KV cache memory starts outpacing parameter memory. At 1M context on a Llama 70B baseline, naive MHA at FP16 needs 135 GB of KV cache, more than the 140 GB of weights themselves. Stacking MLA with FP8 collapses that to 8 GB, a 17× reduction. Five technique families with measured numbers across vLLM, SGLang, and TensorRT-LLM, plus a workload-by-workload decision table.

By Digital Applied →

2026: The Year of AI-Assisted Attacks

A 17-year-old with no coding background used ChatGPT to extract data on 7 million Kaikatsu Club users. Time-to-exploit fell from over 700 days in 2020 to 44 days in 2025; per Mandiant's M-Trends 2026, it has effectively gone negative, with 28.3% of CVEs exploited within 24 hours of disclosure. SWE-bench scores climbed from 33% (Aug 2024) to 81% (Dec 2025). The Shai-Hulud npm attack hit 500+ packages and stole $8.5M from Trust Wallet. Heads up: the closing pitch is for Chainguard Libraries, but the body is the data the rest of the year will reference.

By Patrick Smyth →

⏳ Back in Time

Most clicks from last newsletter:

🔗 The Link Lounge

Unordered finds from around the web:

Find something cool? You can send us links to feature here via email.

🧰 The Toolbox

Tools and products we're excited about today:

ScrapeGraphAI

A Python web-scraping library that uses LLMs and graph-based pipelines instead of brittle CSS selectors. You describe what you want extracted in plain English; it figures out the rest. Works with OpenAI, Gemini, Groq, or local Ollama models, and integrates with LangChain, LlamaIndex, and CrewAI. MIT licensed, 23.9k stars, v2.0.0 shipped April 19.

Learn more →

openspdd

The reference implementation of the SPDD workflow from the Thoughtworks article above. CLI commands wrap the analysis, REASONS Canvas, code generation, and prompt-sync stages so structured prompts stay version-controlled instead of trapped in chat. Useful even if you only steal the templates.

Learn more →

gh-aw (GitHub Agentic Workflows)

GitHub's framework for running agentic workflows as GitHub Actions. The April 13–17 release wave added OpenCode as a first-class engine alongside Claude, Codex, and Copilot, plus a new engine.bare mode that skips loading AGENTS.md for triage and ops workflows. Cache-memory working-tree sanitization closes a real supply-chain attack vector.

Learn more →

LiteParse for the web

Simon Willison ported LlamaIndex's PDF text-extraction CLI to run entirely in the browser, using mostly the same libraries the Node.js version does. Drop a PDF in, get text out, no server, no upload. Useful as a privacy-first alternative when you don't want to ship documents off-machine for parsing.

Learn more →

Agentic AI and the Future of AIOps

Register now →

🎤 Your Voice

Your feedback shapes what comes next! We read every email, so simply hit reply and tell us what's on your mind.

71 Balmain Street

Cremorne

Vic

3121

Australia

You received this email because you signed up on our website or made a purchase from us.

You can unsubscribe using this link.